CN116309226A - Image processing method and related equipment thereof - Google Patents

Image processing method and related equipment thereof Download PDF

Info

Publication number
CN116309226A
CN116309226A CN202310277464.6A CN202310277464A CN116309226A CN 116309226 A CN116309226 A CN 116309226A CN 202310277464 A CN202310277464 A CN 202310277464A CN 116309226 A CN116309226 A CN 116309226A
Authority
CN
China
Prior art keywords
image
ldr
blocks
image blocks
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310277464.6A
Other languages
Chinese (zh)
Inventor
汪海铃
李卫
席瑗苑
胡杰
陈汉亭
王云鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202310277464.6A priority Critical patent/CN116309226A/en
Publication of CN116309226A publication Critical patent/CN116309226A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an image processing method and related equipment, which can enable a plurality of LDR images with low dynamic range to realize higher-quality fusion, so that the finally obtained HDR image has no artifact. The method comprises the following steps: when a high dynamic range HDR image of a target object needs to be acquired, a first LDR image of the target object and a second LDR image of the target object can be acquired first, and the first LDR image and the second LDR image are input into a target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object.

Description

Image processing method and related equipment thereof
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence (artificial intelligence, AI), in particular to an image processing method and related equipment thereof.
Background
High dynamic range (high dynamic range, HDR) imaging has received increasing attention as a key issue in computer vision applications. With the rapid development of AI technology, more and more device manufacturers build neural network models in AI technology into devices to acquire high-quality HDR images through the models.
In the related art, a target object in a certain scene may be photographed with a plurality of different exposure rates, so that a plurality of low dynamic range (low dynamic range, LDR) images of the target object are acquired. The plurality of LDR images may then be input into a neural network model to fuse the plurality of LDR images through the neural network model to obtain an HDR image of the target object.
In the above process, in the process of capturing a plurality of LDR images, a target object may move in a scene, so that there is a difference between contents presented by the plurality of LDR images obtained by capturing, so that artifacts are easy to exist directly based on the HDR image obtained after the plurality of LDR images are fused.
Disclosure of Invention
The embodiment of the application provides an image processing method and related equipment thereof, which can enable a plurality of LDR images to realize higher-quality fusion, so that the finally obtained HDR image has no artifact.
A first aspect of an embodiment of the present application provides an image processing method, which may be implemented by a target model, including:
when the HDR image of the target object needs to be acquired, different exposure rates can be used for shooting the target object, so that a first LDR image of the target object and a second LDR image of the target object are acquired. The exposure rate used to capture the second LDR image may be larger than the exposure rate used to capture the first LDR image or smaller than the exposure rate used to capture the first LDR image.
After obtaining the first LDR image of the target object and the second LDR image of the target object, the first LDR image of the target object and the second LDR image of the target object may be input to the target model. After receiving the first LDR image of the target object and the second LDR image of the target object, the target model may perform image block matching on a plurality of first image blocks of the first LDR image of the target object and a plurality of second image blocks of the second LDR image of the target object, thereby constructing a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks.
After the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained, the target model can fuse the first LDR image and the second LDR image by utilizing the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks, so that an HDR image of the target object is obtained and externally output.
From the above method, it can be seen that: when the HDR image of the target object needs to be acquired, a first LDR image of the target object and a second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into the target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
In one possible implementation, based on the first LDR image and the second LDR image, acquiring correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image includes: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained. In the foregoing implementation manner, after receiving the first LDR image of the target object and the second LDR image of the target object, the target model may perform a series of processing on the first LDR image and the second LDR image, so as to obtain similarities between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image. After the similarity between the plurality of first image blocks and the plurality of second image blocks is obtained, the object model can accurately construct a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks by using the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, based on the first LDR image and the second LDR image, obtaining the similarity between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image comprises: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks. In the foregoing implementation manner, after receiving the first LDR image of the target object and the second LDR image of the target object, the target model performs feature extraction on the first LDR image and the second LDR image respectively, so as to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image, respectively. After the first features of the plurality of first image blocks and the second features of the plurality of second image blocks are obtained, the target model can also calculate the first features of the plurality of first image blocks and the second features of the plurality of second image blocks so as to accurately obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation manner, the plurality of first image blocks include a third image block, and based on the similarity, obtaining a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks includes: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block. In the foregoing implementation manner, for any one of the first image blocks (i.e., the third image block described above), the object model may select, from among the second image blocks, the second image block having the greatest similarity as the second image block (i.e., the fourth image block described above) that is the most similar to the first image block, based on the similarity between the first image block and the second image blocks. The object model may then construct a correspondence between the first image block and a second image block that is most similar to the first image block. In addition, the object model can also execute the operation as executed on the first image blocks for the rest of the first image blocks except the first image blocks, so that the one-to-one correspondence between the first image blocks and the second image blocks can be finally obtained.
In one possible implementation, fusing the first LDR image and the second LDR image based on the correspondence, the obtaining a high dynamic range HDR image of the target object includes: extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object. In the foregoing implementation manner, after receiving the first LDR image and the second LDR image, the target model may further perform feature extraction on the first LDR image and the second LDR image, so as to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image. After the fourth features of the plurality of second image blocks and the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks are obtained, the target model can adjust the ordering of the fourth features of the plurality of second image blocks based on the correspondence because the correspondence indicates a new ordering of the fourth features of the plurality of second image blocks, thereby obtaining the fourth features of the plurality of second image blocks after the ordering is adjusted. Then, after obtaining the third features of the plurality of first image blocks and the fourth features of the plurality of second image blocks after adjusting the ordering, the target model may perform some column processing on the third features of the plurality of first image blocks and the fourth features of the plurality of second image blocks after adjusting the ordering, thereby finally obtaining and externally outputting the HDR image of the target object.
In one possible implementation, processing the third feature of the plurality of first image blocks and the fourth feature of the plurality of second image blocks after the ordering is adjusted to obtain the HDR image of the target object includes: and processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object. In the foregoing implementation manner, after obtaining the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after adjusting the ordering, the target model may perform some column processing on the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after adjusting the ordering, so as to finally obtain and output the HDR image of the target object to the outside.
In one possible implementation, the foregoing process includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process. In the foregoing implementation, the target model may implement fusion for the first LDR image and the second LDR image based on the self-attention mechanism and the interactive attention mechanism. In the fusion process, the detail information of the first LDR image and the second LDR image can be effectively considered, so that the final HDR image can keep good detail and no artifact exists.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A second aspect of the embodiments of the present application provides a model training method, which is characterized in that the method includes: acquiring a first LDR image of a target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees; processing the first LDR image and the second LDR image through a model to be trained to obtain a high dynamic range HDR image of the target object, wherein the model to be trained is used for: acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the corresponding relation, fusing the first LDR image and the second LDR image to obtain an HDR image of the target object; training the model to be trained based on the HDR image to obtain a target model.
The target model trained by the above method has an image processing function (e.g., a function of fusing a plurality of LDR images into an HDR image, etc.). Specifically, when the HDR image of the target object needs to be acquired, the first LDR image of the target object and the second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into the target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
In one possible implementation, the model to be trained is used for: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, a model to be trained, for: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation, the model to be trained is used for: and processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A third aspect of the embodiments of the present application provides an image processing method, including: acquiring a first noisy image of a target object and a second noisy image of the target object, wherein the first noisy image and the second noisy image are images obtained by shooting the target object based on different exposure degrees; acquiring a one-to-one correspondence between a plurality of first image blocks of the first noisy image and a plurality of second image blocks of the second noisy image based on the first noisy image and the second noisy image; and fusing the first noisy image and the second noisy image based on the corresponding relation to obtain a denoising image of the target object.
In one possible implementation, based on the first noisy image and the second noisy image, obtaining correspondence between a plurality of first image blocks of the first noisy image and a plurality of second image blocks of the second noisy image includes: acquiring similarity between a plurality of first image blocks of the first noisy image and a plurality of second image blocks of the second noisy image based on the first noisy image and the second noisy image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, based on the first noisy image and the second noisy image, obtaining the similarity between the plurality of first image blocks of the first noisy image and the plurality of second image blocks of the second noisy image comprises: extracting features of the first noisy image and the second noisy image to obtain first features of a plurality of first image blocks of the first noisy image and second features of a plurality of second image blocks of the second noisy image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation manner, the plurality of first image blocks include a third image block, and based on the similarity, obtaining a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks includes: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, fusing the first noisy image and the second noisy image based on the correspondence, obtaining a high dynamic range denoising image of the target object includes: extracting features of the first noisy image and the second noisy image to obtain third features of a plurality of first image blocks of the first noisy image and fourth features of a plurality of second image blocks of the second noisy image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain a denoising image of the target object.
In one possible implementation manner, processing the third features of the plurality of first image blocks and the fourth features of the plurality of second image blocks after the sorting is adjusted to obtain the denoising image of the target object includes: and processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain a denoising image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A fourth aspect of the embodiments of the present application provides an image processing method, including: acquiring a first low-resolution image of a target object and a second low-resolution image of the target object, wherein the first low-resolution image and the second low-resolution image are images obtained by shooting the target object based on different exposure degrees; acquiring a one-to-one correspondence between a plurality of first image blocks of a first low resolution image and a plurality of second image blocks of a second low resolution image based on the first low resolution image and the second low resolution image; and fusing the first low-resolution image and the second low-resolution image based on the corresponding relation to obtain a high-resolution image of the target object.
In one possible implementation, based on the first low resolution image and the second low resolution image, acquiring correspondence between a plurality of first image blocks of the first low resolution image and a plurality of second image blocks of the second low resolution image includes: acquiring similarities between a plurality of first image blocks of the first low resolution image and a plurality of second image blocks of the second low resolution image based on the first low resolution image and the second low resolution image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, based on the first low resolution image and the second low resolution image, obtaining the similarity between the plurality of first image blocks of the first low resolution image and the plurality of second image blocks of the second low resolution image comprises: extracting features of the first low-resolution image and the second low-resolution image to obtain first features of a plurality of first image blocks of the first low-resolution image and second features of a plurality of second image blocks of the second low-resolution image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation manner, the plurality of first image blocks include a third image block, and based on the similarity, obtaining a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks includes: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, fusing the first low-resolution image and the second low-resolution image based on the correspondence relationship, to obtain a high-dynamic-range high-resolution image of the target object includes: extracting features of the first low-resolution image and the second low-resolution image to obtain third features of a plurality of first image blocks of the first low-resolution image and fourth features of a plurality of second image blocks of the second low-resolution image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain a high-resolution image of the target object.
In one possible implementation, processing the third features of the plurality of first image blocks and the fourth features of the plurality of second image blocks after the ordering is adjusted to obtain the high resolution image of the target object includes: and processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain a high-resolution image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A fifth aspect of embodiments of the present application provides an image processing apparatus including a target model, the apparatus including: the first acquisition module is used for acquiring a first LDR image of the target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees; the second acquisition module is used for acquiring a one-to-one correspondence relationship between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; and the fusion module is used for fusing the first LDR image and the second LDR image based on the corresponding relation to obtain an HDR image of the target object.
From the above device, it can be seen that: when the HDR image of the target object needs to be acquired, a first LDR image of the target object and a second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into the target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
In one possible implementation manner, the second obtaining module is configured to: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation manner, the second obtaining module is configured to: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, and the second obtaining module is configured to: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation manner, the fusion module is configured to perform feature extraction on the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation manner, the fusion module is configured to process the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after the adjustment and sequencing, to obtain an HDR image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A sixth aspect of embodiments of the present application provides a model training apparatus, the apparatus comprising: the acquisition module is used for acquiring a first LDR image of the target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees; the processing module is used for processing the first LDR image and the second LDR image through a model to be trained to obtain a high dynamic range HDR image of the target object, wherein the model to be trained is used for: acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the corresponding relation, fusing the first LDR image and the second LDR image to obtain an HDR image of the target object; the training module is used for training the model to be trained based on the HDR image to obtain a target model.
The object model trained by the above device has an image processing function (e.g., a function of fusing a plurality of LDR images into an HDR image, etc.). Specifically, when the HDR image of the target object needs to be acquired, the first LDR image of the target object and the second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into the target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
In one possible implementation, the model to be trained is used for: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, a model to be trained, for: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation, the model to be trained is configured to process the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after the ordering is adjusted, to obtain an HDR image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A seventh aspect of the embodiments of the present application provides an image processing apparatus including a target model, the apparatus including: the first acquisition module is used for acquiring a first noisy image of the target object and a second noisy image of the target object, wherein the first noisy image and the second noisy image are images obtained by shooting the target object based on different exposure degrees; the second acquisition module is used for acquiring a one-to-one correspondence relationship between a plurality of first image blocks of the first noisy image and a plurality of second image blocks of the second noisy image based on the first noisy image and the second noisy image; and the fusion module is used for fusing the first noisy image and the second noisy image based on the corresponding relation to obtain a denoising image of the target object.
In one possible implementation manner, the second obtaining module is configured to: acquiring similarity between a plurality of first image blocks of the first noisy image and a plurality of second image blocks of the second noisy image based on the first noisy image and the second noisy image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation manner, the second obtaining module is configured to: extracting features of the first noisy image and the second noisy image to obtain first features of a plurality of first image blocks of the first noisy image and second features of a plurality of second image blocks of the second noisy image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, and the second obtaining module is configured to: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation manner, the fusion module is configured to perform feature extraction on the first noisy image and the second noisy image to obtain third features of a plurality of first image blocks of the first noisy image and fourth features of a plurality of second image blocks of the second noisy image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain a denoising image of the target object.
In one possible implementation manner, the fusion module is configured to process the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after the adjustment and sequencing, so as to obtain a denoising image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
An eighth aspect of the embodiments of the present application provides an image processing apparatus including a target model, the apparatus including: the first acquisition module is used for acquiring a first low-resolution image of the target object and a second low-resolution image of the target object, wherein the first low-resolution image and the second low-resolution image are images obtained by shooting the target object based on different exposure degrees; a second acquisition module configured to acquire a one-to-one correspondence relationship between a plurality of first image blocks of the first low resolution image and a plurality of second image blocks of the second low resolution image based on the first low resolution image and the second low resolution image; and the fusion module is used for fusing the first low-resolution image and the second low-resolution image based on the corresponding relation to obtain a high-resolution image of the target object.
In one possible implementation manner, the second obtaining module is configured to: acquiring similarities between a plurality of first image blocks of the first low resolution image and a plurality of second image blocks of the second low resolution image based on the first low resolution image and the second low resolution image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation manner, the second obtaining module is configured to: extracting features of the first low-resolution image and the second low-resolution image to obtain first features of a plurality of first image blocks of the first low-resolution image and second features of a plurality of second image blocks of the second low-resolution image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, and the second obtaining module is configured to: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation manner, the fusion module is configured to perform feature extraction on the first low-resolution image and the second low-resolution image to obtain third features of a plurality of first image blocks of the first low-resolution image and fourth features of a plurality of second image blocks of the second low-resolution image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain a high-resolution image of the target object.
In one possible implementation manner, the fusion module is configured to process the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after the adjustment and sequencing, so as to obtain a high-resolution image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
A ninth aspect of the embodiments of the present application provides an image processing apparatus, the apparatus including a memory and a processor; the memory stores code, the processor being configured to execute the code, the image processing apparatus performing the method as described in any one of the first aspect, any one of the third aspect, any one of the fourth aspect, or any one of the fourth aspect, when the code is executed.
A tenth aspect of embodiments of the present application provides a model training apparatus, the apparatus comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the model training apparatus performs the method as described in the second aspect or any one of the possible implementations of the second aspect.
An eleventh aspect of the embodiments of the present application provides a circuit system, the circuit system comprising a processing circuit configured to perform the method according to any one of the first aspect, any one of the possible implementations of the first aspect, the second aspect, any one of the possible implementations of the second aspect, the third aspect, any one of the possible implementations of the third aspect, the fourth aspect, or any one of the possible implementations of the fourth aspect.
A twelfth aspect of the embodiments of the present application provides a chip system, which includes a processor, configured to invoke a computer program or a computer instruction stored in a memory, to cause the processor to perform a method according to any one of the first aspect, any one of the possible implementations of the first aspect, the second aspect, any one of the possible implementations of the second aspect, the third aspect, any one of the possible implementations of the third aspect, the fourth aspect, or any one of the possible implementations of the fourth aspect.
In one possible implementation, the processor is coupled to the memory through an interface.
In one possible implementation, the system on a chip further includes a memory having a computer program or computer instructions stored therein.
A thirteenth aspect of the embodiments of the present application provides a computer storage medium storing a computer program, which when executed by a computer causes the computer to implement a method as in the first aspect, any one of the possible implementations of the first aspect, the second aspect, any one of the possible implementations of the second aspect, the third aspect, any one of the possible implementations of the third aspect, the fourth aspect, or any one of the possible implementations of the fourth aspect.
A fourteenth aspect of the embodiments of the present application provides a computer program product storing instructions that, when executed by a computer, cause the computer to implement a method as in the first aspect, any one of the possible implementations of the first aspect, the second aspect, any one of the possible implementations of the second aspect, the third aspect, any one of the possible implementations of the third aspect, the fourth aspect, or any one of the possible implementations of the fourth aspect.
In this embodiment of the present application, when an HDR image of a target object needs to be acquired, a first LDR image of the target object and a second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into a target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
Drawings
FIG. 1 is a schematic diagram of a structure of an artificial intelligence main body frame;
FIG. 2a is a schematic diagram of an image processing system according to an embodiment of the present disclosure;
FIG. 2b is a schematic diagram of another architecture of an image processing system according to an embodiment of the present disclosure;
FIG. 2c is a schematic diagram of an apparatus for image processing according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a system 100 architecture according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a target model according to an embodiment of the present disclosure;
fig. 5 is a schematic flow chart of an image processing method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a block search network according to an embodiment of the present application;
fig. 7 is another schematic structural diagram of a block search network according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a fused transformer network according to an embodiment of the present application;
fig. 9 is another schematic structural diagram of a fused transformer network according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a partially reconstructed transformer network according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a self-attention mechanism module or an interactive attention mechanism module according to an embodiment of the present disclosure;
Fig. 12 is a schematic structural diagram of a transducer module according to an embodiment of the present disclosure;
FIG. 13a is a schematic diagram of the comparison result provided in the embodiment of the present application;
FIG. 13b is another schematic diagram of the comparison result provided in the embodiment of the present application;
FIG. 14 is another schematic diagram of the comparison result provided in the embodiment of the present application;
FIG. 15 is another schematic diagram of the comparison result provided in the embodiment of the present application;
FIG. 16 is a schematic flow chart of a model training method according to an embodiment of the present disclosure;
fig. 17 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
FIG. 18 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure;
fig. 19 is a schematic structural diagram of an execution device according to an embodiment of the present application;
FIG. 20 is a schematic structural view of a training device according to an embodiment of the present disclosure;
fig. 21 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides an image processing method and related equipment thereof, which can enable a plurality of LDR images to realize higher-quality fusion, so that the finally obtained HDR image has no artifact.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
HDR imaging has received increasing attention as a key issue in computer vision applications. With the rapid development of AI technology, more and more device manufacturers build neural network models in AI technology into devices to acquire high-quality HDR images through the models.
In the related art, a target object in a certain scene may be photographed by using a plurality of different exposure rates, so as to collect a plurality of LDR images of the target object. The plurality of LDR images may then be input into a neural network model to fuse the plurality of LDR images through the neural network model to obtain an HDR image of the target object. For example, three LDR images of the same scene may be acquired first, and the three LDR images are photographed with three exposure degrees, and after the three LDR images are processed by the neural network model, an HDR image with optimized image indexes such as color, brightness, contrast, and the like may be obtained.
In the above process, in the process of capturing a plurality of LDR images, a target object may move in a scene, so that there is a difference between contents presented by the captured LDR images, so that the neural network model is directly based on the HDR image obtained after the LDR images are fused, and artifacts are easy to exist.
Further, in order to inhibit the generation of artifacts, other related technologies generally make the neural network model ignore some information in the image in the process of fusing multiple LDR images, so that the fused HDR image does not have artifacts, but details of the HDR image itself are often not good enough, that is, the quality of the HDR image is not high.
To solve the above-described problems, embodiments of the present application provide an image processing method that can be implemented in combination with artificial intelligence (artificial intelligence, AI) technology. AI technology is a technical discipline that utilizes digital computers or digital computer controlled machines to simulate, extend and extend human intelligence, and obtains optimal results by sensing environments, acquiring knowledge and using knowledge. In other words, artificial intelligence technology is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Data processing using artificial intelligence is a common application of artificial intelligence.
First, the overall workflow of the artificial intelligence system will be described, referring to fig. 1, fig. 1 is a schematic structural diagram of an artificial intelligence subject framework, and the artificial intelligence subject framework is described below in terms of two dimensions, namely, an "intelligent information chain" (horizontal axis) and an "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.
(1) Infrastructure of
The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.
(2) Data
The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.
Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capability
After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
(5) Intelligent product and industry application
The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, smart city etc.
Next, several application scenarios of the present application are described.
Fig. 2a is a schematic structural diagram of an image processing system according to an embodiment of the present application, where the image processing system includes a user device and a data processing device. The user equipment comprises intelligent terminals such as a mobile phone, a personal computer or an information processing center. The user device is an initiating terminal of image processing, and is used as an initiating terminal of an image processing request, and the user typically initiates the request through the user device.
The data processing device may be a device or a server having a data processing function, such as a cloud server, a web server, an application server, and a management server. The data processing equipment receives an image processing request from the intelligent terminal through the interactive interface, and then performs image processing in modes of machine learning, deep learning, searching, reasoning, decision and the like through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term comprising a database storing the history data locally, either on the data processing device or on another network server.
In the image processing system shown in fig. 2a, the user device may receive an instruction of a user, for example, the user device may acquire a plurality of images input/selected by the user, and then initiate a request to the data processing device, so that the data processing device executes an image fusion application for the plurality of images acquired by the user device, thereby acquiring corresponding fusion results for the plurality of images. For example, the user device may acquire a plurality of LDR images input by the user, and then initiate an image fusion request to the data processing device, so that the data processing device performs a series of processes on the plurality of LDR images based on the image fusion request, thereby obtaining processing results of the plurality of LDR images, that is, an HDR image obtained based on the fusion of the plurality of LDR images.
In fig. 2a, the data processing device may perform the image processing method of the embodiment of the present application.
Fig. 2b is another schematic structural diagram of an image processing system provided in the embodiment of the present application, in fig. 2b, a user device directly serves as a data processing device, and the user device can directly obtain an input from a user and directly process the input by hardware of the user device, and a specific process is similar to that of fig. 2a, and reference is made to the above description and will not be repeated here.
In the image processing system shown in fig. 2b, the user device may receive an instruction from a user, for example, the user device may acquire a plurality of LDR images input by the user, and then perform a series of processing on the plurality of LDR images, so as to obtain a processing result of the plurality of LDR images, that is, an HDR image obtained based on fusion of the plurality of LDR images.
In fig. 2b, the user equipment itself may perform the image processing method according to the embodiment of the present application.
Fig. 2c is a schematic diagram of an apparatus related to image processing according to an embodiment of the present application.
The user device in fig. 2a and 2b may be the local device 301 or the local device 302 in fig. 2c, and the data processing device in fig. 2a may be the executing device 210 in fig. 2c, where the data storage system 250 may store data to be processed of the executing device 210, and the data storage system 250 may be integrated on the executing device 210, or may be disposed on a cloud or other network server.
The processors in fig. 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other models (e.g., a model based on a support vector machine), and perform image processing application on the image using the model obtained by the data final training or learning, thereby obtaining corresponding processing results.
Fig. 3 is a schematic diagram of a system 100 architecture provided in an embodiment of the present application, in fig. 3, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include in an embodiment of the present application: each task to be scheduled, callable resources, and other parameters.
In the process of preprocessing input data by the execution device 110, or performing relevant processing (such as performing functional implementation of a neural network in the present application) such as calculation by the calculation module 111 of the execution device 110, the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing, or may store data, instructions, etc. obtained by corresponding processing in the data storage system 150.
Finally, the I/O interface 112 returns the processing results to the client device 140 for presentation to the user.
It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule for different targets or different tasks, where the corresponding target model/rule may be used to achieve the targets or complete the tasks, thereby providing the user with the desired result. Wherein the training data may be stored in database 130 and derived from training samples collected by data collection device 160.
In the case shown in FIG. 3, the user may manually give input data, which may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.
It should be noted that fig. 3 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 3, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110. As shown in fig. 3, the neural network may be trained in accordance with the training device 120.
The embodiment of the application also provides a chip, which comprises the NPU. The chip may be provided in an execution device 110 as shown in fig. 3 for performing the calculation of the calculation module 111. The chip may also be provided in the training device 120 as shown in fig. 3 to complete the training work of the training device 120 and output the target model/rule.
The neural network processor NPU is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU) which distributes tasks. The core part of the NPU is an operation circuit, and the controller controls the operation circuit to extract data in a memory (a weight memory or an input memory) and perform operation.
In some implementations, the arithmetic circuitry includes a plurality of processing units (PEs) internally. In some implementations, the operational circuit is a two-dimensional systolic array. The arithmetic circuitry may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operational circuitry is a general-purpose matrix processor.
For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to the matrix B from the weight memory and caches the data on each PE in the arithmetic circuit. The operation circuit takes the matrix A data and the matrix B from the input memory to perform matrix operation, and the obtained partial result or the final result of the matrix is stored in an accumulator (accumulator).
The vector calculation unit may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, etc. For example, the vector computation unit may be used for network computation of non-convolutional/non-FC layers in a neural network, such as pooling, batch normalization (batch normalization), local response normalization (local response normalization), and the like.
In some implementations, the vector computation unit can store the vector of processed outputs to a unified buffer. For example, the vector calculation unit may apply a nonlinear function to an output of the arithmetic circuit, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to an arithmetic circuit, for example for use in subsequent layers in a neural network.
The unified memory is used for storing input data and output data.
The weight data is transferred to the input memory and/or the unified memory directly by the memory cell access controller (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory, and the data in the unified memory is stored in the external memory.
And a bus interface unit (bus interface unit, BIU) for implementing interaction among the main CPU, the DMAC and the instruction fetch memory through a bus.
The instruction fetching memory (instruction fetch buffer) is connected with the controller and used for storing instructions used by the controller;
And the controller is used for calling the instruction which refers to the cache in the memory and controlling the working process of the operation accelerator.
Typically, the unified memory, input memory, weight memory, and finger memory are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, which may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), a high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.
Since the embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.
(1) Neural network
The neural network may be composed of neural units, which may refer to an arithmetic unit having xs and intercept 1 as inputs, and the output of the arithmetic unit may be:
Figure BDA0004137614120000171
where s=1, 2, … … n, n is a natural number greater than 1, ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by joining together a number of the above-described single neural units, i.e., the output of one neural unit may be the input of another. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.
The operation of each layer in a neural network can be described by the mathematical expression y=a (wx+b): the operation of each layer in a physical layer neural network can be understood as the transformation of input space into output space (i.e., row space to column space of the matrix) is accomplished by five operations on the input space (set of input vectors), including: 1. dimension increasing/decreasing; 2. zoom in/out; 3. rotating; 4. translating; 5. "bending". Wherein operations of 1, 2, 3 are completed by Wx, operation of 4 is completed by +b, and operation of 5 is completed by a (). The term "space" is used herein to describe two words because the object being classified is not a single thing, but rather a class of things, space referring to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value of a neuron in the layer neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weights W of each layer control how the space is transformed. The purpose of training the neural network is to finally obtain a weight matrix (a weight matrix formed by a plurality of layers of vectors W) of all layers of the trained neural network. Thus, the training process of the neural network is essentially a way to learn and control the spatial transformation, and more specifically to learn the weight matrix.
Since it is desirable that the output of the neural network is as close as possible to the value actually desired, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually desired target value and then according to the difference between the two (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the neural network can predict the actually desired target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and the training of the neural network becomes the process of reducing the loss as much as possible.
(2) Back propagation algorithm
The neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.
The method provided in the present application is described below from the training side of the neural network and the application side of the neural network.
The model training method provided by the embodiment of the application relates to processing of a data sequence, and can be particularly applied to methods such as data training, machine learning, deep learning and the like, and the training data (for example, a first LDR image of a target object and a second LDR image of the target object in the model training method provided by the embodiment of the application) are subjected to symbolizing and formalizing intelligent information modeling, extracting, preprocessing, training and the like, so that a trained neural network (for example, a target model in the model training method provided by the embodiment of the application) is finally obtained; in addition, the image processing method provided in the embodiment of the present application may use the trained neural network to input data (for example, the first LDR image of the target object and the second LDR image of the target object in the image processing method provided in the embodiment of the present application) into the trained neural network, so as to obtain output data (for example, the HDR image of the target object in the image processing method provided in the embodiment of the present application). It should be noted that, the model training method and the image processing method provided in the embodiments of the present application are inventions based on the same concept, and may be understood as two parts in a system or two stages of an overall process: such as a model training phase and a model application phase.
The image processing method provided by the embodiment of the application can be realized through the target model, and the structure of the target model is briefly described below. Fig. 4 is a schematic structural diagram of a target model provided in this embodiment of the present application, where, as shown in fig. 4, the target model includes a block search network based on voice similarity, and a fusion transformer network and a partial reconstruction transformer network based on self-attention-based and interactive attention mechanisms, where an input end of the block search network and a first input end of the fusion transformer network are used as an input end of the whole target model, an output end of the block search network is connected with a second input end of the fusion transformer network, an output end of the fusion transformer network is connected with an input end of the partial reconstruction transformer network, and an output end of the partial reconstruction transformer network is used as an output end of the whole target model. To understand the workflow of the object model shown in fig. 4, the workflow will be described with reference to fig. 5, and fig. 5 is a schematic flow chart of an image processing method according to an embodiment of the present application, as shown in fig. 5, where the method includes:
501. and acquiring a first LDR image of the target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees.
In this embodiment, when the HDR image of the target object needs to be acquired, the target object may be photographed with different exposure rates, so as to acquire a first LDR image of the target object (may also be referred to as a reference image of the target object) and a second LDR image of the target object (may also be referred to as a support image of the target object). It should be noted that, the target object may refer to a certain object in a certain scene, or may refer to an area containing a certain object in a certain scene, for example, in a scene of a park, the target object may refer to a boy in the park, or may refer to a lawn where the boy is located in the park, or the like.
It is noted that the number of second LDR images may be one or a plurality. If a plurality of second LDR images are acquired, the plurality of second LDR images are captured using a plurality of exposure rates (the plurality of exposure rates are different from each other). For any one of the plurality of second LDR images, the exposure rate used to capture the second LDR image may be greater than or less than the exposure rate used to capture the first LDR image.
After obtaining the first LDR image of the target object and the second LDR image of the target object, the first LDR image of the target object and the second LDR image of the target object may be input to the target model, so that the target model performs a series of processing on the first LDR image of the target object and the second LDR image of the target object, thereby obtaining the HDR image of the target object.
502. Based on the first LDR image and the second LDR image, a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image is obtained.
After receiving the first LDR image of the target object and the second LDR image of the target object, the target model may perform block matching on a plurality of first image blocks of the first LDR image of the target object and a plurality of second image blocks of the second LDR image of the target object, thereby constructing a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image.
Specifically, the object model may acquire one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks by:
(1) After receiving the first LDR image of the target object and the second LDR image of the target object, the block search network of the target model may perform a series of processing on the first LDR image and the second LDR image, thereby obtaining similarities between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image.
(2) After obtaining the similarities between the plurality of first image blocks and the plurality of second image blocks, the block search network may construct a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks using the similarities between the plurality of first image blocks and the plurality of second image blocks.
More specifically, as shown in fig. 6 (fig. 6 is a schematic structural diagram of a block search network provided in an embodiment of the present application), the block search network includes a feature extraction module and a block search module. Then, the block search network may obtain the similarity between the plurality of first image blocks and the plurality of second image blocks by:
(1.1) after receiving the first LDR image of the target object and the second LDR image of the target object, a feature extraction module of the block search network (the module may include a classification backbone network, for example, a GhostNet, VGG network, etc.) may cooperate with the block search module to perform feature extraction on the first LDR image and the second LDR image respectively, so as to obtain first features of a plurality of first image blocks (which may also be referred to as semantic features of a plurality of first image blocks, etc.) of the first LDR image and second features of a plurality of second image blocks of the second LDR image (which may also be referred to as semantic features of a plurality of second image blocks, etc.).
For example, as shown in fig. 7 (fig. 7 is another schematic diagram of a block search network provided in the embodiment of the present application), three LDR images are acquired, and the three LDR images are respectively supported by image 1, reference image and support image 2, and are respectively obtained by shooting with exposure rate 1, exposure rate 2 and exposure rate 3, where exposure rate 1 > exposure rate 2 > exposure rate 3. After the support image 1, the reference image, and the support image 2 are input to the block search network, the feature extraction module may first extract the overall semantic features of the support image 1, the overall semantic features of the support image 1 having a size of c (channel) ×h (height) ×w (width), and transmit the overall semantic features of the support image 1 to the block search module. Likewise, the feature extraction module may also extract the global semantic features of the reference image, where the global semantic features of the reference image have a size of c×h×w, and send the global semantic features of the reference image to the block search module. Likewise, the feature extraction module may also extract the whole semantic features of the support image 2, where the size of the whole semantic features of the support image 2 is c×h×w, and send the whole semantic features of the support image 2 to the block search module.
It will be appreciated that the support image 1 is defined by N 2 The support image blocks 1 are composed (N can be 8 or 16, etc.), and the reference image is composed of N 2 Reference picture blocks, support picture block 2 is made up of N 2 The supporting image blocks 2 are composed, and accordingly, the whole semantic feature of the supporting image 1 is composed of N 2 The semantic features of the supporting image block 1 are composed of N 2 Semantic features of the reference image blocks, the overall semantic features of the support image 2 consisting of N 2 The semantic feature composition of the image block 2 is supported.
Then the block search module may split the overall semantic feature of support image 1 into N 2 The semantic features of image block 1 are supported, and the size of the semantic features of each image block 1 is c× (H/N) × (W/N). Likewise, the block search module may also segment the overall semantic features of the reference image into N 2 Semantic features of each reference image block, the size of the semantic features of each reference image block being c× (H/N) × (W/N). Likewise, the block search module may also segment the overall semantic features of the support image 2 into N 2 The semantic features of each support image block 2 are c× (H/N) × (W/N) in size.
(1.2) after obtaining the first features of the plurality of first image blocks and the second features of the plurality of second image blocks, the block search module may further calculate the first features of the plurality of first image blocks and the second features of the plurality of second image blocks, so as to obtain similarities between the plurality of first image blocks and the plurality of second image blocks (which may also be referred to as cosine similarities between the plurality of first image blocks and the plurality of second image blocks, etc.).
Still as in the example above, the block search module may also search for N 2 Semantic features and N of the supporting image blocks 1 2 The semantic features of the reference image blocks are subjected to cosine similarity calculation, so that N is obtained 2 Row N 2 Similarity matrix 1 for columns, similarity matrix 1 comprising N 2 Support tiles 1 and N 2 Similarity between the reference image blocks. Likewise, the block search module may also search for N 2 Semantic features and N of each supporting image block 2 2 The semantic features of the reference image blocks are subjected to cosine similarity calculation, so that N is obtained 2 Row N 2 Similarity matrix 2 for columns, similarity matrix 2 comprising N 2 Support tiles 2 and N 2 Similarity between the reference image blocks.
More specifically, the block search network may obtain one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks by:
(2.1) for convenience of explanation, any one of the plurality of first image blocks (i.e., the aforementioned third image block) will be described below. The block search module may select, based on the similarity between the first image block and the plurality of second image blocks, a second image block with the greatest similarity among the plurality of second image blocks as a second image block (i.e., the aforementioned fourth image block) that is most similar to the first image block.
(2.2) the block search module may then construct a correspondence between the first image block and a second image block that is most similar to the first image block. In addition, the block search module may perform the same operation as that performed on the first image blocks with respect to the rest of the first image blocks except the first image blocks, so that a one-to-one correspondence between the first image blocks and the second image blocks may be obtained finally, and the correspondence may be sent to the fusion transformer network.
Still as in the example above, in similarity matrix 1, row 1 represents the 1 st reference image block and N 2 The block search module may extract the maximum similarity in line 1 as the correspondence between the 1 st reference image block and the support image 1 most similar to the 1 st reference image block. And so on, the block search module may further extract, in line 2, a correspondence between the 2 nd reference image block and the support image 1 most similar to the 2 nd reference image block, untilIn the N < th 2 In the row, extract the N 2 Reference image block and Nth 2 The correspondence between the most similar support images 1 of the reference image blocks.
Similarly, in similarity matrix 2, row 1 represents the 1 st reference image block and N 2 The block search module may extract the maximum similarity in line 1 as the correspondence between the 1 st reference image block and the support image 2 most similar to the 1 st reference image block. By analogy, the block search module may also extract the correspondence between the 2 nd reference image block and the support image 2 most similar to the 2 nd reference image block in line 2 2 In the row, extract the N 2 Reference image block and Nth 2 The correspondence between the most similar support images 2 of the reference image blocks.
Thus, the block search module can obtain N 2 Reference image blocks and N 2 One-to-one correspondence between the support images 1 and N 2 Reference image blocks and N 2 The one-to-one correspondence between images 2 is supported. In addition, the block search module may also operate by reorganization (reshape) to let N 2 Reference image blocks and N 2 The one-to-one correspondence between the support images 1 is presented by a similarity matrix 3 of N rows and N columns, so that N 2 Reference image blocks and N 2 The one-to-one correspondence between the support images 2 is presented in a similarity matrix 4 of N rows and N columns and sent to the fusion transducer network.
503. And fusing the first LDR image and the second LDR image based on the corresponding relation to obtain an HDR image of the target object.
After the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained, the target model can fuse the first LDR image and the second LDR image by utilizing the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks, so that an HDR image of the target object is obtained and externally output.
Specifically, as shown in fig. 8 (fig. 8 is a schematic structural diagram of a fusion transformer network provided in an embodiment of the present application), a fusion transformer network of a target model includes a feature extraction module, a block alignment module, a self-attention mechanism module, an interactive attention mechanism module, and a stitching module. Then, the target model may acquire an HDR image of the target object by:
(1) Upon receiving the first LDR image and the second LDR image, a feature extraction module (e.g., a convolutional network, etc.) that fuses the transform network may perform feature extraction on the first LDR image and the second LDR image, thereby obtaining third features (which may also be referred to as depth features of the first image blocks, etc.) of the first image blocks of the first LDR image and fourth features (which may also be referred to as depth features of the second image blocks, etc.) of the second image blocks of the second LDR image.
As still another example, as shown in fig. 9 (fig. 9 is another schematic structural diagram of the fused transformer network provided in the embodiment of the present application), after the supporting image 1, the reference image and the supporting image 2 are input into the fused transformer network, the feature extraction module may first extract the overall depth feature of the supporting image 1, where the overall depth feature of the supporting image 1 has a size of c×h×w, and send the overall depth feature of the supporting image 1 to the block alignment module. Likewise, the feature extraction module may also extract the overall depth feature of the reference image, where the overall depth feature of the reference image is c×h×w in size, and send the overall depth feature of the reference image to the block alignment module. Likewise, the feature extraction module may also extract the overall depth feature of the support image 2, where the overall depth feature of the support image 2 has a size of c×h×w, and send the overall depth feature of the support image 2 to the block alignment module.
It will be appreciated that the overall depth profile of the support image 1 is defined by N 2 The depth features of the supporting image block 1 are composed of N 2 The depth features of the reference image blocks consist of the overall depth features of the support image 2 consisting of N 2 The depth features of the supporting image block 2.
(2) After the fourth features of the plurality of second image blocks and the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks are obtained, the block alignment module may adjust the ordering of the fourth features of the plurality of second image blocks based on the correspondence because the correspondence indicates a new ordering of the fourth features of the plurality of second image blocks, thereby obtaining fourth features of the plurality of second image blocks after the ordering is adjusted. The block alignment module may then send the third feature of the first image blocks, the fourth feature of the second image blocks, and the fourth feature of the second image blocks after the adjustment ordering to the self-attention mechanism module and the interactive attention mechanism module.
Still as in the example above, the block alignment module, after obtaining the overall depth features of support image 1, the overall depth features of support image 2, the similarity matrix 3, and the similarity matrix 4, is due to the fact that in the overall depth features of support image 1, N 2 The depth features of the support image blocks 1 are set in the original order (i.e. in the support image 1, N 2 The original ordering of the image blocks 1 is supported), while the similarity matrix 3 indicates N 2 A new ordering of depth features of image block 1 is supported so that the block alignment module can align N as indicated by the similarity matrix 3 2 The ordering of the depth features of the support image blocks 1 is adjusted, so that the overall depth features of the support image 1 after the ordering is adjusted are obtained. Then, the block alignment module may divide the overall depth features of the support image 1 after the adjustment of the ordering into N after the adjustment of the ordering 2 The semantic features of the support image blocks 1 are each sized c× (H/N) × (W/N) for the support image block 1 after the ordering.
Likewise, since in the overall depth feature of the support image 2, N 2 The depth features of the supporting image blocks 2 are arranged in the original order, and the similarity matrix 4 indicates N 2 The new ordering of depth features of image blocks 2 is supported so that the block alignment module can align N as indicated by the similarity matrix 4 2 The ordering of the depth features of the support image blocks 2 is adjusted, so that the overall depth features of the support image 2 after the ordering is adjusted are obtained. Then, the block alignment module may divide the overall depth features of the support image 2 after the adjustment ordering into adjustmentsOrdered N 2 The semantic features of the support image blocks 2 are each sized c× (H/N) × (W/N) for the ordered support image blocks 2.
In addition, the block alignment module may further divide the overall depth feature of the support image 1 into N by the block search module 2 The depth features of each support image block 1 are c× (H/N) × (W/N) in size. Likewise, the block search module may also segment the overall depth features of the reference image into N 2 The depth feature of each reference image block is c× (H/N) × (W/N) in size. Likewise, the block search module may also segment the overall depth feature of the support image 2 into N 2 The depth features of each support image block 2 are c× (H/N) × (W/N) in size.
Then the block alignment module may align N 2 Depth features, N, of each supporting image block 1 2 Depth features, N, of individual reference image blocks 2 Depth features of the supporting image blocks 2, and N after adjusting the ordering 2 Semantic features of the supporting image block 1 and N after adjusting the ordering 2 The semantic features of the supporting image blocks 2 are sent to the self-attention mechanism module and the interactive attention mechanism module.
(3) After the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks and the fourth features of the plurality of second image blocks after the adjustment and the sequencing are obtained, the self-attention mechanism module and the interactive attention mechanism module can be matched with the local reconstruction transducer network to process a plurality of columns of the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks and the fourth features of the plurality of second image blocks after the adjustment and the sequencing, so that an HDR image of the target object is obtained and externally output.
More specifically, as shown in fig. 10 (fig. 10 is a schematic structural diagram of a partially reconstructed transformer network provided in an embodiment of the present application), the partially reconstructed transformer network of the target model includes a convolution module, a transformer module, an addition module, and an activation module. Then, the target model may acquire an HDR image of the target object by:
(3.1) obtaining the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks and the fourth features of the plurality of second image blocks after the sorting is adjusted, performing a series of processing on the third features of the plurality of first image blocks by the self-attention mechanism module, and sending the obtained processing results to the splicing module. The interactive attention mechanism module can perform a series of processing on the third features of the plurality of first image blocks and the fourth features of the plurality of second image blocks, and send the obtained processing results to the splicing module, and at the same time, the interactive attention mechanism module can also perform a series of processing on the third features of the plurality of first image blocks and the fourth features of the plurality of second image blocks after the sorting adjustment, and send the obtained processing results to the splicing module. Then, the splicing module can splice all the received processing results and send the obtained splicing results to the local reconstruction transducer network.
Still as in the example above, the self-attention module may be configured to provide a self-attention to N 2 The depth features of the individual reference image blocks are processed, resulting in a processing result 1. The interactive attention module 1 can be used for N 2 Depth features of individual reference image blocks and N 2 The depth features of the supporting image blocks 1 are processed to obtain a processing result 2. The interactive attention module 1 can be used for N 2 Depth features of individual reference image blocks and N 2 The depth features of the supporting image blocks 2 are processed to obtain a processing result 2. The interactive attention module 3 may be directed to N 2 Depth features of reference image blocks and adjusting ordered N 2 The depth features of the supporting image blocks 1 are processed to obtain a processing result 4. The interactive attention module 4 may be directed to N 2 Depth features of reference image blocks and adjusting ordered N 2 The depth features of the supporting image blocks 2 are processed to obtain a processing result 5. Then, the splicing module can splice the processing results 1 to 5 to obtain corresponding splicing results.
And (3.2) in the local reconstruction transformation network, the splicing result is processed by a convolution module, a transformation module, an addition module and an activation module in the local reconstruction transformation network respectively, and then an HDR image of the target object can be finally obtained and output externally.
More specifically, as shown in fig. 11 (fig. 11 is a schematic structural diagram of a self-attention mechanism module or an interactive attention mechanism module provided in the embodiment of the present application), in a fusion transformer network, the structure of the self-attention mechanism module and the structure of the interactive attention module may be the same, and any one of the two modules may include: normalization unit, multi-head attention mechanism unit, addition unit, multi-layer perceptron unit, etc. It can be seen that both modules can implement normalization processing, processing based on a multi-head attention mechanism, addition processing, processing based on a multi-layer perceptron, and the like.
More specifically, as shown in fig. 12 (fig. 12 is a schematic structural diagram of a transducer module provided in an embodiment of the present application), in a partially reconstructed transducer network, the transducer module may include: a multi-head self-attention mechanism unit and a multi-layer perceptron unit. It follows that the transducer unit can implement a multi-head self-attention mechanism based process, a multi-layer perceptron based process, and so on.
It should be understood that, in this embodiment, only the target model is schematically described as an example where a plurality of LDR images can be fused into an HDR image, in practical application, the target model may also fuse a plurality of low resolution images into a high resolution image (for example, fuse a first low resolution image and a second low resolution image into a high resolution image), or fuse a plurality of noisy images into a denoising image, etc. (for example, fuse a first noisy image and a second noisy image into a denoising image), and these fusion processes may refer to steps 501 to 503, and only need to replace an LDR image with a low resolution image (for example, a first low resolution image and a second low resolution image) or a noisy image (for example, a first noisy image and a second noisy image), and replace an HDR image with a high resolution image or a denoising image.
In addition, the object model provided in the embodiment of the present application (i.e. the IFT in table 1) may be compared with the models provided in the related art (i.e. the remaining models except for the IFT in table 1, for example Sen, hu, etc.) on a certain data set, and the comparison result is shown in table 1:
TABLE 1
Method PSNR-μ SSIM-μ PSNR-L SSIM-L HDR-VDP-2
Sen 40.9453 0.9085 38.3147 0.9749 60.5425
Hu 32.1872 0.9716 30.8395 0.9511 57.8278
Kalantari 42.7423 0.9877 41.2518 0.9845 64.6519
Wu 41.6377 0.9869 40.9082 0.9847 58.3739
SCHDR 40.4700 0.9931 39.6800 0.9899 63.6192
AHDR 43.6172 0.9956 41.0309 0.9903 64.8465
NHDRNet 42.4769 0.9942 40.1978 0.9889 63.1585
DWT 43.6734 0.9956 41.2195 0.9905 64.9472
HDRGAN 43.9220 0.9865 41.5720 0.9905 65.4500
DAHDR 43.8400 0.9956 41.3100 0.9905 64.6765
SwinlR 43.4200 0.9882 41.6800 0.9861 64.5200
HDR-T 44.2093 0.9918 42.1687 0.9889 65.5969
Song 44.0981 0.9909 41.7021 0.9872 64.6765
IFT(Ours) 44.5532 0.9914 42.2714 0.9887 65.6296
Based on the data set, the embodiment of the present application has better PSNR- μ/PSNR-L/HDR-VDP-2 (larger and better) than the related art, and as shown in fig. 13a and 13b (fig. 13a is a schematic diagram of the comparison result provided by the embodiment of the present application, and fig. 13b is another schematic diagram of the comparison result provided by the embodiment of the present application), the embodiment of the present application can recover the details more accurately and generate no artifact than the related art. In areas with relatively large movements in the foreground, other methods can generate certain artifacts and cannot well restore color details, and embodiments of the present application are most similar to GT in vision.
Further, the object model provided in the embodiment of the present application (i.e. the IFT in table 1) may be compared with the model provided in the related art (i.e. the remaining models except for the IFT in table 1, for example, sen, hu, etc.) on another data set, and the comparison result is shown in table 2:
TABLE 2
Figure BDA0004137614120000251
Based on the data set, the embodiment of the present application has better PSNR- μ/PSNR-L/HDR-VDP-2 (larger and better) than the related art, and the visual effect is as shown in fig. 14 and 15 (fig. 14 is another schematic diagram of the comparison result provided by the embodiment of the present application, and fig. 15 is another schematic diagram of the comparison result provided by the embodiment of the present application), which can recover the details more accurately and generate no artifact than the related art.
In this embodiment of the present application, when an HDR image of a target object needs to be acquired, a first LDR image of the target object and a second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into a target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
Further, in the embodiment of the present application, the target model includes a fusion transducer network based on a self-attention mechanism and an interactive attention mechanism, where the fusion transducer network can effectively consider the detail information of the first LDR image and the second LDR image when fusing the first LDR image and the second LDR image, so that the finally obtained HDR image maintains better details and no artifact exists.
The foregoing is a detailed description of the image processing method provided in the embodiment of the present application, and the model training method provided in the embodiment of the present application will be described below. Fig. 16 is a schematic flow chart of a model training method according to an embodiment of the present application, as shown in fig. 16, where the method includes:
1601. and acquiring a first LDR image of the target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees.
In this embodiment, when the model to be trained is required to be trained, a batch of training data may be acquired first, where the batch of training data includes a first LDR image of the target object and a second LDR image of the target object, where the first LDR image and the second LDR image are images obtained by photographing the target object based on different exposure degrees. It should be noted that the true HDR image of the target object is known for the first LDR image as well as the second LDR image.
1602. Processing the first LDR image and the second LDR image through a model to be trained to obtain a high dynamic range HDR image of the target object, wherein the model to be trained is used for: acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; and fusing the first LDR image and the second LDR image based on the corresponding relation to obtain an HDR image of the target object.
After the first LDR image and the second LDR image are obtained, the first LDR image and the second LDR image can be input into the model to be trained. Then, the model to be trained may perform a series of processing on the first LDR image and the second LDR image, so as to obtain a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image, and fuse the first LDR image and the second LDR image by using the one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks, so as to obtain a (predicted) HDR image of the target object.
In one possible implementation, the model to be trained is used for: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, a model to be trained, for: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation, the foregoing process includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
It should be noted that, for the description of step 1602, reference may be made to the relevant description of steps 502 to 503 in the embodiment shown in fig. 5, and the description thereof will not be repeated here.
1603. Training the model to be trained based on the HDR image to obtain a target model.
After obtaining the HDR image of the target object, since the true HDR image of the target object is known, the HDR image and the true HDR image may be calculated by a preset loss function, so as to obtain a target loss, where the target loss is used to indicate a difference between the HDR image and the true HDR image. And then, updating the parameters of the model to be trained by utilizing the target loss, thereby obtaining the model to be trained after updating the parameters. The next training data may then be used to continue training the model to be trained after updating the parameters until the model training conditions (e.g., target loss convergence, etc.) are met, thereby yielding the target model in the embodiment shown in fig. 5.
It should be understood that, in this embodiment, only the model to be trained is described schematically as an example in which a plurality of LDR images can be fused into an HDR image, in practical application, the model to be trained may also be fused into a plurality of low resolution images (for example, a first low resolution image and a second low resolution image are fused into a high resolution image), or a plurality of noisy images are fused into a denoising image, etc. (for example, a first noisy image and a second noisy image are fused into a denoising image), these fusion processes and corresponding model training processes can refer to steps 1601 to 1603, and only the LDR images need to be replaced with the low resolution images (for example, the first low resolution image and the second low resolution image) or the noisy images (for example, the first noisy image and the second noisy image), and the HDR images need to be replaced with the high resolution images or the denoising images, which will not be described herein.
The target model trained in the embodiments of the present application has an image processing function (e.g., a function of fusing a plurality of LDR images into an HDR image, etc.). Specifically, when the HDR image of the target object needs to be acquired, the first LDR image of the target object and the second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into the target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
The foregoing is a detailed description of the model training method provided in the embodiment of the present application, and the image processing apparatus and the model training apparatus provided in the embodiment of the present application will be described below. Fig. 17 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, as shown in fig. 17, where the apparatus includes:
a first obtaining module 1701, configured to obtain a first LDR image of the target object and a second LDR image of the target object, where the first LDR image and the second LDR image are images obtained by photographing the target object based on different exposure degrees;
a second obtaining module 1702 configured to obtain a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image;
and a fusion module 1703, configured to fuse the first LDR image and the second LDR image based on the correspondence, to obtain an HDR image of the target object.
In this embodiment of the present application, when an HDR image of a target object needs to be acquired, a first LDR image of the target object and a second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into a target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
In one possible implementation, the second obtaining module 1702 is configured to: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, the second obtaining module 1702 is configured to: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, and the second obtaining module 1702 is configured to: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, the fusion module 1703 is configured to perform feature extraction on the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation, the fusing module 1703 is configured to process the third features of the first image blocks, the fourth features of the second image blocks, and the fourth features of the second image blocks after the ordering is adjusted to obtain an HDR image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
It should be understood that, in this embodiment, only the target model is schematically described as an example in which a plurality of LDR images can be fused into an HDR image, in practical application, the target model may also fuse a plurality of low resolution images into a high resolution image (for example, fuse a first low resolution image and a second low resolution image into a high resolution image), or fuse a plurality of noisy images into a denoising image, etc. (for example, fuse a first noisy image and a second noisy image into a denoising image), and only need to replace an LDR image with a low resolution image (for example, a first low resolution image and a second low resolution image) or a noisy image (for example, a first noisy image and a second noisy image), and replace an HDR image with a high resolution image or a denoising image.
Fig. 18 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application, as shown in fig. 18, where the apparatus includes:
the acquiring module 1801 is configured to acquire a first LDR image of the target object and a second LDR image of the target object, where the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees;
The processing module 1802 is configured to process the first LDR image and the second LDR image through a model to be trained, to obtain a high dynamic range HDR image of the target object, where the model to be trained is used for: acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the corresponding relation, fusing the first LDR image and the second LDR image to obtain an HDR image of the target object;
the training module 1803 is configured to train the model to be trained based on the HDR image, to obtain a target model.
The target model trained in the embodiments of the present application has an image processing function (e.g., a function of fusing a plurality of LDR images into an HDR image, etc.). Specifically, when the HDR image of the target object needs to be acquired, the first LDR image of the target object and the second LDR image of the target object may be acquired first, and the first LDR image and the second LDR image may be input into the target model. Then, the target model may perform image block matching on the first LDR image and the second LDR image, thereby obtaining a one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image. Then, the target model may fuse the first LDR image and the second LDR image using the correspondence, thereby obtaining and outputting an HDR image of the target object. In the foregoing process, in the process of acquiring the one-to-one correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image, it is equivalent to aligning the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image one by one in content. Then, the fusion between the first LDR image and the second LDR image is realized by taking the corresponding relation as a guide, so that the fusion with higher quality can be realized, and the HDR image of the finally obtained target object is free from artifacts.
In one possible implementation, the model to be trained is used for: acquiring similarity between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the similarity, a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks is obtained.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image; and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
In one possible implementation, the plurality of first image blocks includes a third image block, a model to be trained, for: determining a second image block with the largest similarity as a fourth image block in the plurality of second image blocks based on the similarity between the third image block and the plurality of second image blocks; and constructing a corresponding relation between the third image block and the fourth image block.
In one possible implementation, the model to be trained is used for: extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image; based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained; and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
In one possible implementation, the model to be trained is configured to process the third features of the plurality of first image blocks, the fourth features of the plurality of second image blocks, and the fourth features of the plurality of second image blocks after the ordering is adjusted, to obtain an HDR image of the target object.
In one possible implementation, the processing includes at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
In one possible implementation, the self-attention mechanism based processing or the interactive attention mechanism based processing includes at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
In one possible implementation, the transformer network-based process includes at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
It should be understood that, in this embodiment, only the model to be trained is schematically described as an example in which a plurality of LDR images can be fused into an HDR image, in practical application, the model to be trained may also be capable of fusing a plurality of low resolution images into a high resolution image (for example, fusing a first low resolution image and a second low resolution image into a high resolution image), or fusing a plurality of noisy images into a denoising image, etc. (for example, fusing a first noisy image and a second noisy image into a denoising image), and only the LDR image needs to be replaced with a low resolution image (for example, a first low resolution image and a second low resolution image) or a noisy image (for example, a first noisy image and a second noisy image), and the HDR image needs to be replaced with a high resolution image or a denoising image.
It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned apparatus is based on the same concept as the method embodiment of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and specific content may refer to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.
The embodiment of the application also relates to an execution device, and fig. 19 is a schematic structural diagram of the execution device provided in the embodiment of the application. As shown in fig. 19, the execution device 1900 may be embodied as a mobile phone, a tablet, a notebook, a smart wearable device, a server, or the like, which is not limited herein. The image processing apparatus described in the corresponding embodiment of fig. 17 may be disposed on the execution device 1900, so as to implement the function of image processing in the corresponding embodiment of fig. 5. Specifically, the execution device 1900 includes: receiver 1901, transmitter 1902, processor 1903, and memory 1904 (where the number of processors 1903 in executing device 1900 may be one or more, as exemplified by one processor in fig. 19), where processor 1903 may include application processor 19031 and communication processor 19032. In some embodiments of the present application, the receiver 1901, transmitter 1902, processor 1903, and memory 1904 may be connected by a bus or other means.
Memory 1904 may include read only memory and random access memory and provides instructions and data to processor 1903. A portion of the memory 1904 may also include non-volatile random access memory (non-volatile random access memory, NVRAM). The memory 1904 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
The processor 1903 controls the operation of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.
The methods disclosed in the embodiments of the present application may be applied to the processor 1903 or implemented by the processor 1903. The processor 1903 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1903. The processor 1903 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor, or a microcontroller, and may further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The processor 1903 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1904, and the processor 1903 reads the information in the memory 1904 and, in combination with its hardware, performs the steps of the method described above.
The receiver 1901 may be used to receive input numeric or character information and to generate signal inputs related to performing device-related settings and function control. The transmitter 1902 may be configured to output numeric or character information via a first interface; the transmitter 1902 may be further configured to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 1902 may also include a display device such as a display screen.
In this embodiment, in one case, the processor 1903 is configured to obtain an HDR image of a target object through the target model in the corresponding embodiment of fig. 5.
The embodiment of the application also relates to training equipment, and fig. 20 is a schematic structural diagram of the training equipment provided by the embodiment of the application. As shown in fig. 20, the training device 2000 is implemented by one or more servers, the training device 2000 may vary considerably in configuration or performance, and may include one or more central processing units (central processing units, CPU) 2020 (e.g., one or more processors) and memory 2032, one or more storage media 2030 (e.g., one or more mass storage devices) storing applications 2042 or data 2044. Wherein the memory 2032 and the storage medium 2030 may be transitory or persistent. The program stored on the storage medium 2030 may include one or more modules (not shown), each of which may include a series of instruction operations on the training device. Still further, the central processor 2020 may be configured to communicate with the storage medium 2030 and execute a series of instruction operations in the storage medium 2030 on the training apparatus 2000.
The training device 2000 may also include one or more power supplies 2026, one or more wired or wireless network interfaces 2050, one or more input/output interfaces 2058; or one or more operating systems 2041, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
Specifically, the training apparatus may perform the model training method in the corresponding embodiment of fig. 16.
The embodiments of the present application also relate to a computer storage medium in which a program for performing signal processing is stored, which when run on a computer causes the computer to perform the steps as performed by the aforementioned performing device or causes the computer to perform the steps as performed by the aforementioned training device.
Embodiments of the present application also relate to a computer program product storing instructions that, when executed by a computer, cause the computer to perform steps as performed by the aforementioned performing device or cause the computer to perform steps as performed by the aforementioned training device.
The execution device, training device or terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the data processing method described in the above embodiment, or to cause the chip in the training device to perform the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.
Specifically, referring to fig. 21, fig. 21 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU 2100, where the NPU 2100 is mounted as a coprocessor to a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an arithmetic circuit 2103, and the controller 2104 controls the arithmetic circuit 2103 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 2103 includes a plurality of processing units (PEs) inside. In some implementations, the arithmetic circuit 2103 is a two-dimensional systolic array. The arithmetic circuit 2103 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2103 is a general-purpose matrix processor.
For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2102 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 2101 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 2108.
The unified memory 2106 is used for storing input data and output data. The weight data is directly transferred to the weight memory 2102 via the memory cell access controller (Direct Memory Access Controller, DMAC) 2105. The input data is also carried into the unified memory 2106 by the DMAC.
BIU is Bus Interface Unit, bus interface unit 2113, for the AXI bus to interact with the DMAC and finger memory (Instruction Fetch Buffer, IFB) 2109.
The bus interface unit 2113 (Bus Interface Unit, abbreviated as BIU) is used for the instruction fetch memory 2109 to fetch instructions from an external memory, and is also used for the memory unit access controller 2105 to fetch raw data of the input matrix a or the weight matrix B from the external memory.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 2106 or to transfer weight data to the weight memory 2102 or to transfer input data to the input memory 2101.
The vector calculation unit 2107 includes a plurality of operation processing units, and further processes such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like are performed on the output of the operation circuit 2103 as necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a predicted label plane and the like.
In some implementations, the vector computation unit 2107 can store the vector of processed outputs to the unified memory 2106. For example, the vector calculation unit 2107 may perform a linear function; alternatively, a nonlinear function is applied to the output of the arithmetic circuit 2103, such as linear interpolation of the predicted tag plane extracted by the convolutional layer, and then such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 2107 generates a normalized value, a pixel-level summed value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuitry 2103, e.g., for use in subsequent layers in a neural network.
A fetch memory (instruction fetch buffer) 2109 connected to the controller 2104 for storing instructions used by the controller 2104;
the unified memory 2106, the input memory 2101, the weight memory 2102 and the finger memory 2109 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.
The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a training device, or a network device, etc.) to perform the method described in the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims (23)

1. An image processing method, wherein the method is implemented by a target model, the method comprising:
acquiring a first low dynamic range LDR image of a target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees;
acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image;
and fusing the first LDR image and the second LDR image based on the corresponding relation to obtain a high dynamic range HDR image of the target object.
2. The method of claim 1, wherein the acquiring correspondence between the plurality of first image blocks of the first LDR image and the plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image comprises:
acquiring similarities between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image;
And acquiring one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks based on the similarity.
3. The method of claim 2, wherein the obtaining similarities between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image comprises:
extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image;
and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
4. A method according to claim 2 or 3, wherein the plurality of first image blocks comprises a third image block, and wherein the obtaining a one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks based on the similarity comprises:
determining a second image block with the largest similarity as a fourth image block in the second image blocks based on the similarity between the third image block and the second image blocks;
And constructing a corresponding relation between the third image block and the fourth image block.
5. The method of any one of claims 1 to 4, wherein fusing the first LDR image and the second LDR image based on the correspondence, to obtain a high dynamic range HDR image of the target object comprises:
extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image;
based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained;
and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
6. The method of claim 5, wherein processing the third feature of the first plurality of image blocks and the fourth feature of the second plurality of image blocks after the adjustment ordering to obtain the HDR image of the target object comprises:
And processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after adjustment and sequencing to obtain the HDR image of the target object.
7. The method of claim 6, wherein the processing comprises at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
8. The method of claim 7, wherein the self-attention mechanism based process or the interactive attention mechanism based process comprises at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
9. The method according to claim 7 or 8, wherein the transformer network based processing comprises at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
10. A method of model training, the method comprising:
Acquiring a first LDR image of a target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees;
processing the first LDR image and the second LDR image through a model to be trained to obtain a high dynamic range HDR image of the target object, wherein the model to be trained is used for: acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the corresponding relation, fusing the first LDR image and the second LDR image to obtain an HDR image of the target object;
and training the model to be trained based on the HDR image to obtain a target model.
11. The method of claim 10, wherein the model to be trained is for:
acquiring similarities between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image;
And acquiring one-to-one correspondence between the plurality of first image blocks and the plurality of second image blocks based on the similarity.
12. The method of claim 11, wherein the model to be trained is for:
extracting features of the first LDR image and the second LDR image to obtain first features of a plurality of first image blocks of the first LDR image and second features of a plurality of second image blocks of the second LDR image;
and calculating the first characteristics of the plurality of first image blocks and the second characteristics of the plurality of second image blocks to obtain the similarity between the plurality of first image blocks and the plurality of second image blocks.
13. The method according to claim 11 or 12, wherein the plurality of first image blocks comprises a third image block, the model to be trained for:
determining a second image block with the largest similarity as a fourth image block in the second image blocks based on the similarity between the third image block and the second image blocks;
and constructing a corresponding relation between the third image block and the fourth image block.
14. The method according to any one of claims 10 to 13, wherein the model to be trained is for:
Extracting features of the first LDR image and the second LDR image to obtain third features of a plurality of first image blocks of the first LDR image and fourth features of a plurality of second image blocks of the second LDR image;
based on the corresponding relation, the ordering of the fourth features of the plurality of second image blocks is adjusted, and the fourth features of the plurality of second image blocks after the ordering is adjusted are obtained;
and processing the third characteristics of the plurality of first image blocks and the fourth characteristics of the plurality of second image blocks after the adjustment and sequencing to obtain the HDR image of the target object.
15. The method of claim 14, wherein the model to be trained is for:
and processing the third characteristics of the plurality of first image blocks, the fourth characteristics of the plurality of second image blocks and the fourth characteristics of the plurality of second image blocks after adjustment and sequencing to obtain the HDR image of the target object.
16. The method of claim 15, wherein the processing comprises at least one of: a self-attention mechanism based process, an interactive attention mechanism based process, a stitching process, a convolution process, a transducer network based process, an addition process, and an activation process.
17. The method of claim 16, wherein the self-attention mechanism based process or the interactive attention mechanism based process comprises at least one of: normalization processing, processing based on a multi-head attention mechanism, addition processing, and processing based on a multi-layer perceptron.
18. The method according to claim 16 or 17, wherein the transformer network based processing comprises at least one of: processing based on multi-head self-attention mechanism and processing based on multi-layer perceptron.
19. An image processing apparatus, the apparatus comprising a target model, the apparatus comprising:
the first acquisition module is used for acquiring a first LDR image of a target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees;
a second obtaining module, configured to obtain a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image;
And the fusion module is used for fusing the first LDR image and the second LDR image based on the corresponding relation to obtain the HDR image of the target object.
20. A model training apparatus, the apparatus comprising:
the acquisition module is used for acquiring a first LDR image of a target object and a second LDR image of the target object, wherein the first LDR image and the second LDR image are images obtained by shooting the target object based on different exposure degrees;
the processing module is configured to process the first LDR image and the second LDR image through a model to be trained, to obtain a high dynamic range HDR image of the target object, where the model to be trained is configured to: acquiring a one-to-one correspondence between a plurality of first image blocks of the first LDR image and a plurality of second image blocks of the second LDR image based on the first LDR image and the second LDR image; based on the corresponding relation, fusing the first LDR image and the second LDR image to obtain an HDR image of the target object;
and the training module is used for training the model to be trained based on the HDR image to obtain a target model.
21. An image processing apparatus, characterized in that the apparatus comprises a memory and a processor; the memory stores code, the processor being configured to execute the code, the image processing apparatus performing the method of any of claims 1 to 18 when the code is executed.
22. A computer storage medium storing one or more instructions which, when executed by one or more computers, cause the one or more computers to implement the method of any one of claims 1 to 18.
23. A computer program product, characterized in that it stores instructions that, when executed by a computer, cause the computer to implement the method of any one of claims 1 to 18.
CN202310277464.6A 2023-03-15 2023-03-15 Image processing method and related equipment thereof Pending CN116309226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310277464.6A CN116309226A (en) 2023-03-15 2023-03-15 Image processing method and related equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310277464.6A CN116309226A (en) 2023-03-15 2023-03-15 Image processing method and related equipment thereof

Publications (1)

Publication Number Publication Date
CN116309226A true CN116309226A (en) 2023-06-23

Family

ID=86786719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310277464.6A Pending CN116309226A (en) 2023-03-15 2023-03-15 Image processing method and related equipment thereof

Country Status (1)

Country Link
CN (1) CN116309226A (en)

Similar Documents

Publication Publication Date Title
CN112529150B (en) Model structure, model training method, image enhancement method and device
CN112183718B (en) Deep learning training method and device for computing equipment
CN112418392A (en) Neural network construction method and device
CN112598597A (en) Training method of noise reduction model and related device
CN112529149B (en) Data processing method and related device
CN114359289A (en) Image processing method and related device
CN113627163A (en) Attention model, feature extraction method and related device
WO2024046144A1 (en) Video processing method and related device thereof
CN116739154A (en) Fault prediction method and related equipment thereof
WO2023020185A1 (en) Image classification method and related device
CN114841361A (en) Model training method and related equipment thereof
CN113065638A (en) Neural network compression method and related equipment thereof
CN115879524A (en) Model training method and related equipment thereof
CN116309226A (en) Image processing method and related equipment thereof
WO2024061123A1 (en) Image processing method and image processing related device
WO2023231796A1 (en) Visual task processing method and related device thereof
CN116310677A (en) Image processing method and related equipment thereof
CN117149034A (en) Image processing method and related equipment thereof
CN115700729A (en) Image enhancement method and related equipment thereof
CN117765341A (en) Data processing method and related device
CN116758572A (en) Text acquisition method and related equipment thereof
CN116259311A (en) Voice processing method and related equipment thereof
CN116611861A (en) Consumption prediction method and related equipment thereof
CN116882472A (en) Training data evaluation method and related equipment thereof
CN117056589A (en) Article recommendation method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination