CN114359289A

CN114359289A - Image processing method and related device

Info

Publication number: CN114359289A
Application number: CN202011043640.2A
Authority: CN
Inventors: 汪涛; 宋风龙; 任文琦; 操晓春
Original assignee: Huawei Technologies Co Ltd; Institute of Information Engineering of CAS
Current assignee: Huawei Technologies Co Ltd; Institute of Information Engineering of CAS
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-04-15

Abstract

The embodiment of the application discloses an image processing method, which is applied to the field of artificial intelligence and comprises the following steps: acquiring an image to be processed; processing the image to be processed through a first network to obtain a first feature, wherein the first network is configured to at least extract the feature for image enhancement; processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of an image to be processed; generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed; and carrying out image reconstruction on the fourth characteristic to obtain a target image. By introducing the semantic features and semantic segmentation results of the image in the image enhancement processing process, different image enhancement strengths can be adopted for different semantic regions, the texture details can be accurately kept, and the reality of the texture details after the image enhancement is improved.

Description

Image processing method and related device

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an image processing method and a related apparatus.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The deep learning method is a key driving force for the development of the field of artificial intelligence in recent years, and has remarkable effects on various tasks of computer vision. In the field of image enhancement (also referred to as image quality enhancement), methods based on deep learning have surpassed conventional methods.

However, the current image enhancement network based on deep learning has unnatural enhancement effect on images, and the image texture details obtained after the image enhancement network processing are not real.

Disclosure of Invention

The embodiment of the application provides an image processing method and a related device, which are used for improving the image enhancement effect.

A first aspect of the present application provides an image processing method, including: acquiring an image to be processed, wherein the image to be processed can be an image needing image enhancement, for example; processing the image to be processed through a first network to obtain a first feature, wherein the first network is configured to extract at least a feature for image enhancement, and the feature for image enhancement can be an image low-level feature, for example; processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature, and the semantic segmentation feature can be an image high-level feature, for example; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the image to be processed; generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed; and carrying out image reconstruction on the fourth characteristic to obtain a target image.

According to the scheme, the semantic features and the semantic segmentation results of the image are introduced in the image enhancement processing process, and the semantic features and the semantic segmentation results are fused with the features for image enhancement, so that different image enhancement strengths can be adopted for different semantic regions, texture details are accurately kept, and the reality of the texture details after image enhancement is improved.

Optionally, in a possible implementation manner, the obtaining a semantic segmentation result of the image to be processed includes: and processing the third features through a third network to obtain a semantic segmentation result of the image to be processed.

The third feature is a feature after fusing the image low-level feature related to image enhancement and the image high-level feature related to semantic segmentation. Therefore, by processing the third feature, the lower-level features of the image can be introduced on the basis of the higher-level features of the image related to semantic segmentation, that is, the semantic segmentation result of the image to be processed is obtained on the basis of the features of different levels, so that the accuracy of the obtained semantic segmentation result is improved.

Optionally, in a possible implementation manner, the generating a third feature according to the first feature and the second feature includes: performing feature fusion processing on the first feature and the second feature to obtain a third feature; generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, wherein the fourth feature comprises: and performing feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature.

Optionally, in a possible implementation manner, the feature fusion process includes a summation process, a multiplication process, a cascade process, or a cascade process and a convolution process.

Optionally, in a possible implementation manner, before generating a fourth feature according to the third feature and a semantic segmentation result of the image to be processed, the method further includes: processing the third characteristic to obtain a fifth characteristic; generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, wherein the fourth feature comprises: and generating a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed.

That is to say, after feature extraction is performed on the third feature, the image processing device performs feature fusion based on a fifth feature obtained by further feature extraction and a semantic segmentation result of the image to be processed. By further extracting the third feature obtained after feature fusion, the feature with finer granularity can be extracted on the basis of the third feature, so that the precision of the fourth feature obtained by subsequent feature fusion is improved.

Optionally, in a possible implementation manner, the processing the image to be processed through the second network to obtain the second feature includes: preprocessing the image to be processed to obtain preprocessing characteristics; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the image to be processed. By the downsampling operation, the resolution of the preprocessing features is reduced, the calculation amount of extracting the semantic segmentation features can be reduced, and the calculation requirement on an image processing device is reduced.

Optionally, in a possible implementation manner, the method is used for implementing at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

A second aspect of the present application provides a model training method, including: acquiring a training sample pair, wherein the training sample pair comprises a first image and a second image, and the quality of the first image is lower than that of the second image; processing the first image through an image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used for obtaining an image to be processed; processing the first image through a first network resulting in first features, the first network configured to extract at least features for image enhancement; processing the first image through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the first image; generating a fourth feature according to the third feature and the semantic segmentation result of the first image; carrying out image reconstruction on the fourth feature to obtain a predicted image; obtaining a first loss according to a second image in the training sample pair and the predicted image, wherein the first loss is used for describing the difference between the second image and the predicted image; and updating the model parameters of the image processing model to be trained at least according to the first loss until model training conditions are met, so as to obtain the image processing model.

Optionally, in a possible implementation manner, the to-be-trained image processing model is further configured to process the third feature through a third network, so as to obtain a semantic segmentation prediction result of the first image.

Optionally, in a possible implementation manner, the image processing model to be trained is further configured to: obtaining a semantic segmentation real result of the first image; obtaining a second loss according to the semantic segmentation prediction result and the semantic segmentation real result, wherein the second loss is used for describing the difference between the semantic segmentation prediction result and the semantic segmentation real result; and updating the model parameters of the image processing model to be trained at least according to the first loss and the second loss until model training conditions are met, so as to obtain the image processing model.

Optionally, in a possible implementation manner, the image processing model to be trained is further configured to: performing feature fusion processing on the first feature and the second feature to obtain a third feature; generating a fourth feature according to the third feature and the semantic segmentation result of the first image, wherein the generating of the fourth feature comprises: and performing feature fusion processing on the third feature and the semantic segmentation result of the first image to obtain the fourth feature.

Optionally, in a possible implementation manner, the feature fusion processing includes summation processing, cascade processing, or cascade processing and convolution processing; the feature fusion process includes a summation process, a cascade process, or a cascade process and a convolution process.

Optionally, in a possible implementation manner, the to-be-trained image processing model is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the third feature and the semantic segmentation result of the first image.

Optionally, in a possible implementation manner, the to-be-trained image processing model is further configured to pre-process the first image to obtain a pre-processing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the first image.

Optionally, in a possible implementation manner, the image processing model is configured to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

A third aspect of the present application provides an image processing apparatus comprising: an acquisition unit and a processing unit; the acquisition unit is used for acquiring an image to be processed; the processing unit is used for processing the image to be processed through a first network to obtain a first feature, and the first network is configured to at least extract the feature for image enhancement; processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; the acquisition unit is further used for acquiring a semantic segmentation result of the image to be processed; the processing unit is further configured to generate a fourth feature according to the third feature and a semantic segmentation result of the image to be processed; and carrying out image reconstruction on the fourth characteristic to obtain a target image.

Optionally, in a possible implementation manner, the processing unit is further configured to process the third feature through a third network to obtain a semantic segmentation result of the image to be processed.

Optionally, in a possible implementation manner, the processing unit is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; and performing feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature.

Optionally, in a possible implementation manner, the feature fusion process includes at least one of a summation process, a multiplication process, a cascade process, and a cascade convolution process.

Optionally, in a possible implementation manner, the processing unit is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed.

Optionally, in a possible implementation manner, the processing unit is further configured to perform preprocessing on the image to be processed to obtain a preprocessing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the image to be processed.

Optionally, in a possible implementation manner, the image processing apparatus is configured to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

The present application in a fourth aspect provides a model training apparatus comprising: an acquisition unit and a training unit; the acquisition unit is used for acquiring a training sample pair, wherein the training sample pair comprises a first image and a second image, and the quality of the first image is lower than that of the second image; the training unit is used for processing the first image through an image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used for acquiring an image to be processed; processing the first image through a first network resulting in first features, the first network configured to extract at least features for image enhancement; processing the first image through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the first image; generating a fourth feature according to the third feature and the semantic segmentation result of the first image; carrying out image reconstruction on the fourth feature to obtain a predicted image; obtaining a first loss according to a second image in the training sample pair and the predicted image, wherein the first loss is used for describing the difference between the second image and the predicted image; and updating the model parameters of the image processing model to be trained at least according to the first loss until model training conditions are met, so as to obtain the image processing model.

Optionally, in a possible implementation manner, the training unit is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

Optionally, in a possible implementation manner, the training unit is further configured to obtain a semantic segmentation real result of the first image; obtaining a second loss according to the semantic segmentation prediction result and the semantic segmentation real result, wherein the second loss is used for describing the difference between the semantic segmentation prediction result and the semantic segmentation real result; and updating the model parameters of the image processing model to be trained at least according to the first loss and the second loss until model training conditions are met, so as to obtain the image processing model.

Optionally, in a possible implementation manner, the training unit is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; and performing feature fusion processing on the third feature and the semantic segmentation result of the first image to obtain the fourth feature.

Optionally, in a possible implementation manner, the training unit is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the third feature and the semantic segmentation result of the first image.

Optionally, in a possible implementation manner, the training unit is further configured to perform preprocessing on the first image to obtain a preprocessing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the first image.

A fifth aspect of the present application provides an image processing apparatus, which may comprise a processor, a processor coupled to a memory, the memory storing program instructions, which when executed by the processor implement the method of the first aspect. For the processor to execute the steps in each possible implementation manner of the first aspect, reference may be made to the first aspect specifically, and details are not described here.

A sixth aspect of the present application provides a model training apparatus, which may include a processor, a processor coupled to a memory, the memory storing program instructions, which when executed by the processor implement the method of the second aspect. For the processor to execute the steps in each possible implementation manner of the second aspect, reference may be made to the second aspect specifically, and details are not described here.

A seventh aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the method of the first aspect described above.

An eighth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the method of the second aspect described above.

A ninth aspect of the present application provides circuitry comprising processing circuitry configured to perform the method of the first aspect described above.

A tenth aspect of the present application provides circuitry comprising processing circuitry configured to perform the method of the second aspect described above.

An eleventh aspect of the present application provides a computer program which, when run on a computer, causes the computer to perform the method of the first aspect described above.

A twelfth aspect of the present application provides a computer program which, when run on a computer, causes the computer to perform the method of the second aspect described above.

A thirteenth aspect of the present application provides a chip system, which includes a processor, configured to enable a server or a threshold value obtaining apparatus to implement the functions referred to in the above aspects, for example, to transmit or process data and/or information referred to in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the server or the communication device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

Drawings

FIG. 1 is a schematic structural diagram of an artificial intelligence body framework provided by an embodiment of the present application;

fig. 2a is an image processing system according to an embodiment of the present application;

FIG. 2b is a schematic diagram of another exemplary image processing system according to an embodiment of the present disclosure;

FIG. 2c is a schematic diagram of an apparatus related to image processing provided in an embodiment of the present application;

fig. 3a is a schematic diagram of a system 100 architecture according to an embodiment of the present application;

FIG. 3b is a schematic diagram of semantic segmentation of an image according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a densely connected hole convolution network according to an embodiment of the present disclosure;

fig. 6a is a schematic diagram of an architecture for image processing according to an embodiment of the present application;

fig. 6b is a schematic diagram of a network structure for image processing according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a comparison of objective indicators provided in the examples of the present application;

FIG. 8 is a schematic diagram illustrating another objective index comparison provided in the examples of the present application;

FIG. 9 is a schematic diagram of image comparison provided by an embodiment of the present application;

fig. 10 is a schematic flowchart of a model training method according to an embodiment of the present application

Fig. 11 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a training apparatus according to an embodiment of the present disclosure;

fig. 15 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, safe city etc..

Several application scenarios of the present application are presented next.

Fig. 2a is an image processing system provided in an embodiment of the present application, where the image processing system includes a user device and a data processing device. The user equipment comprises a mobile phone, a personal computer or an intelligent terminal such as an information processing center. The user equipment is an initiating end of the image processing, and as an initiator of the image enhancement request, the user usually initiates the request through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a network server, an application server, and a management server. The data processing equipment receives an image enhancement request from the intelligent terminal through the interactive interface, and then performs image processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term that includes a database that stores locally and stores historical data, either on the data processing device or on other network servers.

In the image processing system shown in fig. 2a, a user device may receive an instruction of a user, for example, the user device may acquire an image input/selected by the user device, and then initiate a request to the data processing device, so that the data processing device executes an image enhancement processing application (for example, image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, and the like) on the image acquired by the user device, thereby acquiring a corresponding processing result for the image. For example, the user equipment may obtain an image input by a user, and then initiate an image denoising request to the data processing equipment, so that the data processing equipment performs image denoising on the image, thereby obtaining a denoised image.

In fig. 2a, a data processing apparatus may perform the image processing method of the embodiment of the present application.

Fig. 2b is another image processing system according to an embodiment of the present application, in fig. 2b, a user device directly serves as a data processing device, and the user device can directly obtain an input from a user and directly perform processing by hardware of the user device itself, and a specific process is similar to that in fig. 2a, and reference may be made to the above description, which is not repeated herein.

In the image processing system shown in fig. 2b, the user device may receive an instruction from the user, for example, the user device may acquire an image selected by the user in the user device, and then perform an image processing application (for example, image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, and the like) on the image by the user device itself, so as to obtain a corresponding processing result for the image.

In fig. 2b, the user equipment itself can execute the image processing method according to the embodiment of the present application.

Fig. 2c is a schematic diagram of a related apparatus for image processing provided in an embodiment of the present application.

The user device in fig. 2a and fig. 2b may specifically be the local device 301 or the local device 302 in fig. 2c, and the data processing device in fig. 2a may specifically be the execution device 210 in fig. 2c, where the data storage system 250 may store data to be processed of the execution device 210, and the data storage system 250 may be integrated on the execution device 210, or may be disposed on a cloud or other network server.

The processor in fig. 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other models (e.g., models based on a support vector machine), and perform image processing application on the image using the model finally trained or learned by the data, so as to obtain a corresponding processing result.

Fig. 3a is a schematic diagram of an architecture of a system 100 according to an embodiment of the present application, in fig. 3a, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include: each task to be scheduled, the resources that can be invoked, and other parameters.

During the process that the execution device 110 preprocesses the input data or during the process that the calculation module 111 of the execution device 110 performs the calculation (for example, performs the function implementation of the neural network in the present application), the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processing, and may store the data, the instruction, and the like obtained by corresponding processing into the data storage system 150.

Finally, the I/O interface 112 returns the processing results to the client device 140 for presentation to the user.

It should be noted that the training device 120 may generate corresponding target models/rules based on different training data for different targets or different tasks, and the corresponding target models/rules may be used to achieve the targets or complete the tasks, so as to provide the user with the required results. Wherein the training data may be stored in the database 130 and derived from training samples collected by the data collection device 160.

In the case shown in fig. 3a, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 3a is only a schematic diagram of a system architecture provided in this embodiment of the present application, and the position relationship between the devices, modules, etc. shown in the diagram does not constitute any limitation, for example, in fig. 3a, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110. As shown in fig. 3a, a neural network may be trained from the training device 120.

The embodiment of the application also provides a chip, which comprises the NPU. The chip may be provided in an execution device 110 as shown in fig. 3a to perform the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 3a to complete the training work of the training apparatus 120 and output the target model/rule.

The neural network processor NPU, NPU is mounted as a coprocessor on a main Central Processing Unit (CPU) (host CPU), and tasks are distributed by the main CPU. The core portion of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract data in a memory (weight memory or input memory) and perform an operation.

In some implementations, the arithmetic circuitry includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit is a two-dimensional systolic array. The arithmetic circuit may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix A data from the input memory and carries out matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator).

The vector calculation unit may further process the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector computation unit may be used for network computation of the non-convolution/non-FC layer in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit can store the processed output vector to a unified buffer. For example, the vector calculation unit may apply a non-linear function to the output of the arithmetic circuit, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to arithmetic circuitry, e.g., for use in subsequent layers in a neural network.

The unified memory is used for storing input data and output data.

The weight data directly passes through a memory cell access controller (DMAC) to carry input data in the external memory to the input memory and/or the unified memory, store the weight data in the external memory in the weight memory, and store data in the unified memory in the external memory.

And the Bus Interface Unit (BIU) is used for realizing interaction among the main CPU, the DMAC and the instruction fetch memory through a bus.

An instruction fetch buffer (instruction fetch buffer) connected to the controller for storing instructions used by the controller;

and the controller is used for calling the instruction cached in the finger memory and realizing the control of the working process of the operation accelerator.

Generally, the unified memory, the input memory, the weight memory, and the instruction fetch memory are On-Chip (On-Chip) memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, the neural units may refer to operation units with xs and intercept 1 as inputs, and the output of the operation units may be:

where s is 1, 2, … … n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

The operation of each layer in the neural network can be expressed mathematically

To describe: from the work of each layer in the physical layer neural network, it can be understood that the transformation of the input space into the output space (i.e. the row space to the column space of the matrix) is accomplished by five operations on the input space (set of input vectors), which include: 1. ascending/descending dimensions; 2. zooming in/out; 3. rotating; 4. translating; 5. "bending". Wherein 1, 2, 3 are operated by

The operation of 4 is completed by + b, and the operation of 5 is realized by a (). The expression "space" is used herein because the object being classified is not a single thing, but a class of things, and space refers to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value for a neuron in the layer of neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weight W of each layer controls how the space is transformed. The purpose of training the neural network, i.e. to obtain the weights of all layers of the trained neural networkA matrix (a weight matrix formed by vectors W of many layers). Therefore, the training process of the neural network is essentially a way of learning the control space transformation, and more specifically, the weight matrix.

Because it is desirable that the output of the neural network is as close as possible to the value actually desired to be predicted, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the value actually desired to be predicted, and then updating the weight vector according to the difference between the predicted value and the value actually desired (of course, there is usually an initialization process before the first update, that is, the parameters are configured in advance for each layer of the neural network). Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the neural network becomes a process of reducing the loss as much as possible.

(2) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.

(3) Image enhancement

Image enhancement refers to processing the brightness, color, contrast, saturation, dynamic range, etc. of an image to meet certain specific criteria. In brief, in the process of image processing, by purposefully emphasizing the overall or local characteristics of an image, an original unclear image is made clear or certain interesting characteristics are emphasized, the difference between different object characteristics in the image is enlarged, and the uninteresting characteristics are inhibited, so that the effects of improving the image quality and enriching the image information quantity are achieved, the image interpretation and identification effects can be enhanced, and the requirements of certain special analysis are met. Exemplary image enhancements may include, but are not limited to, image super-resolution reconstruction, image denoising, image defogging, image deblurring, and image contrast enhancement.

(4) Image semantic segmentation

Image semantic segmentation refers to subdividing an image into different categories according to some rule (such as illumination, category). In brief, the semantic segmentation of the image aims to label each pixel point in the image with a label, that is, to label the object class to which each pixel in the image belongs, and the labels may include people, animals, cars, flowers, furniture, and the like. Referring to fig. 3b, fig. 3b is a schematic diagram of semantic segmentation of an image according to an embodiment of the present disclosure. As shown in fig. 3b, the image may be divided into different sub-regions, such as building, sky, plant, etc., according to categories at a pixel level by semantic segmentation.

The method provided by the present application is described below from the training side of the neural network and the application side of the neural network.

The training method of the neural network provided by the embodiment of the application relates to image processing, and particularly can be applied to data processing methods such as data training, machine learning and deep learning, and the training data (such as the image in the application) is subjected to symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like, and a trained image processing model is finally obtained; in addition, the image processing method provided in the embodiment of the present application may use the trained image processing model to input data (e.g., an image to be processed in the present application) into the trained image processing model, so as to obtain output data (e.g., a target image in the present application). It should be noted that the training method of the image processing model and the image processing method provided in the embodiment of the present application are inventions based on the same concept, and can also be understood as two parts in a system or two stages of an overall process: such as a model training phase and a model application phase.

Referring to fig. 4, fig. 4 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. As shown in fig. 4, an image processing method provided in an embodiment of the present application includes the following steps:

step 401, acquiring an image to be processed.

In this embodiment, the image processing apparatus may acquire an image to be processed, which may be, for example, an image that needs to be subjected to image enhancement.

It can be understood that, when the image processing device is deployed in the unmanned vehicle, the image processing device can acquire the street view collected by the unmanned vehicle during driving through the camera. When the image processing device is deployed in the robot, the image processing device can acquire a live-action image of the environment where the robot is located in real time. When the image processing device is deployed in a security device (such as a monitoring camera), the image processing device can acquire a live-action image acquired by the monitoring camera in real time. When the image processing device is deployed on a handheld device such as a mobile phone or a tablet computer, the image processing device can acquire a picture taken by a user or a picture downloaded from a website, and the pictures can be used as images to be processed.

Step 402, processing the image to be processed through a first network to obtain a first feature, wherein the first network is configured to extract at least a feature for image enhancement.

In this embodiment, the first network may be a backbone network (backbone) related to image enhancement, such as a convolutional neural network, and the first network is configured to extract at least features used for image enhancement, such as low level features (low level features). Illustratively, image low-level features may refer to some small detail information in an image, and may include, for example, high-frequency detail information such as edges (edge), corners (corner), colors (color), pixels (pixels), gradients (gradients), and textures.

It will be appreciated that different networks may be employed for different image enhancement tasks to suit the needs of the image enhancement task. For example, when the image enhancement task is image super-resolution reconstruction, the first Network may adopt a Residual Network (ResNet); when the image enhancement task is image contrast enhancement, the first network may adopt a Unet network.

Specifically, the image processing method provided by the present embodiment may be applied to different image enhancement tasks, which may include, but are not limited to, image enhancement tasks such as image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image degraining, image color enhancement, image brightness enhancement, image detail enhancement, and image dynamic range enhancement.

In a possible embodiment, before the image to be processed is processed through the first network, the image to be processed may be further preprocessed through a preprocessing network, so as to obtain a preprocessing feature. And then processing the obtained preprocessing characteristic through the first network to obtain the first characteristic. The preprocessing network may be, for example, a convolutional neural network. By preprocessing the image to be processed, irrelevant information in the image to be processed can be eliminated, useful real information can be recovered, the detectability of relevant information is enhanced, data is simplified to the maximum extent, and therefore the reliability of feature extraction is improved.

It is understood that the preprocessing network may also be included in the first network, that is, the preprocessing network is included in the first network, and the first feature may be obtained by processing the image to be processed through the first network.

Step 403, processing the image to be processed through a second network to obtain a second feature, where the second network is configured to extract at least a semantic segmentation feature.

In this embodiment, the second network may be a backbone network related to semantic segmentation of the image, for example, a convolutional neural network, and is configured to extract at least features used for semantic segmentation of the image, for example, high level features (high level features) of the image. Illustratively, the image high-level features may refer to features that can reflect semantic information of an image based on image low-level features. Generally, the high-level features of the image can be used for identifying and detecting the shape of an object or an object in the image, and have richer semantic information.

In one possible embodiment, the second network may be, for example, a densely connected hole convolutional network. The cavity convolution network can increase the receptive field, and the cavity convolution network can acquire multi-scale information based on dense connection. By means of the combined action of the two, the high-level feature information of the image related to the accurate semantic segmentation can be generated.

The hole convolution network actually introduces an expansion rate (also called a hole number) into a standard convolution network, and the parameter defines the distance between values when a convolution kernel processes data so as to increase the receptive field. Generally, the receptive field is used to indicate the size of the receptive field of the original image by different neurons inside the network, or in other words, the size of the area on the original image mapped by the pixel points on the feature map (feature map) output by each layer of the convolutional network. By increasing the receptive field, pixels on the feature map can be made to respond to a sufficiently large area in the image to capture information about large objects, thereby enabling accurate semantic information to be obtained.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a densely connected hole convolution network according to an embodiment of the present application. As shown in fig. 5, the densely connected hole convolution network includes a multi-layer network, and each layer of the multi-layer network includes a hole convolution network (scaled conv) and a linear activation function (free return). For each layer network in the densely connected hole convolutional network, the output of the network is used as the input of each layer network behind the layer to realize feature multiplexing. By directly outputting the characteristics of the low-level network to each subsequent high-level network for summarizing, the lost characteristics caused by transmission through the middle-level network are reduced, and the characteristics of the low-level network are better utilized.

It is understood that, in the embodiment of the present application, the second network is exemplified as a hole convolution network with dense connections, and in practical cases, the second network may also be another neural network, which is not specifically limited herein.

In a possible embodiment, the processing the image to be processed through the second network to obtain the second feature specifically may include: the image processing apparatus pre-processes the image to be processed to obtain a pre-processing characteristic, for example, pre-processes the image to be processed through the pre-processing network in the step 402; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the image to be processed.

In this embodiment, the downsampling feature with reduced resolution can be output by downsampling the preprocessing feature; after processing the downsampled features through a second network to obtain sixth features; and performing upsampling processing on the sixth feature to generate a second feature with the same size as the resolution of the preprocessed feature, namely, restoring the resolution of the feature.

In practical applications, the down-sampling multiple may be determined according to the desired processing accuracy and the computational power of the target hardware platform, and is not specifically limited herein. Generally speaking, the larger the down-sampling multiple is, the lower the processing precision is, and the smaller the calculation amount is, i.e. the calculation force requirement is lower; the smaller the multiple of down-sampling, the higher the processing accuracy, but the higher the calculation amount, i.e., the higher the calculation power requirement. The multiple of upsampling needs to be consistent with the multiple of downsampling to ensure the resolution of the restored features. The method that can be used to perform upsampling includes, but is not limited to, deconvolution, bilinear interpolation upsampling, neighbor interpolation upsampling, and the like, and the upsampling method is not specifically limited herein.

And 404, generating a third feature according to the first feature and the second feature.

In a possible embodiment, the image processing apparatus may perform a feature fusion process on the first feature and the second feature to obtain the third feature. Illustratively, the feature fusion process may include at least one of fusion process operations such as a summation process, a multiplication process, a cascade process, and a cascade convolution process, wherein the cascade convolution process represents performing the cascade process and the convolution process. In practical situations, a corresponding feature fusion processing manner may be adopted according to actual needs, and is not specifically limited herein.

In this embodiment, by performing fusion processing on the first feature and the second feature, the image low-level feature related to image enhancement and the image high-level feature related to semantic segmentation can be effectively fused, so that complementation of features of different levels is realized, and the robustness of the network is improved.

And 405, acquiring a semantic segmentation result of the image to be processed.

In a possible embodiment, the image processing device may process the third feature to obtain a semantic segmentation result of the image to be processed. The third feature is a feature after fusing the image low-level feature related to image enhancement and the image high-level feature related to semantic segmentation. Therefore, by processing the third feature, the lower-level features of the image can be introduced on the basis of the higher-level features of the image related to semantic segmentation, that is, the semantic segmentation result of the image to be processed is obtained on the basis of the features of different levels, so that the accuracy of the obtained semantic segmentation result is improved. Illustratively, the image processing apparatus may perform a convolution operation on the third feature through a convolution network to obtain a semantic segmentation result of the image to be processed.

In another possible embodiment, the image processing device may process the second feature output by the second network to obtain a semantic segmentation result of the image to be processed, that is, obtain the semantic segmentation result of the image directly based on the feature related to the semantic segmentation. Illustratively, the image processing apparatus may also perform a convolution operation on the second feature through a convolution network to obtain a semantic segmentation result of the image to be processed.

And 406, generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed.

In a possible embodiment, the image processing apparatus may perform feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature. The manner of the feature fusion processing may specifically refer to that described in step 404, and is not described herein again.

In a possible embodiment, after obtaining the third feature, the image processing apparatus may process the third feature, for example, perform further feature extraction on the third feature to obtain a fifth feature. And the image processing device generates a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed. That is to say, after feature extraction is performed on the third feature, the image processing device performs feature fusion based on a fifth feature obtained by further feature extraction and a semantic segmentation result of the image to be processed. By further extracting the third feature obtained after feature fusion, the feature with finer granularity can be extracted on the basis of the third feature, so that the precision of the fourth feature obtained by subsequent feature fusion is improved.

And 407, carrying out image reconstruction on the fourth characteristic to obtain a target image.

In this embodiment, the image processing apparatus obtains the fourth feature after the two feature fusion processes, and the image processing apparatus may perform image reconstruction on the fourth feature, for example, perform convolution post-processing operation on the fourth feature, to obtain a target image, where the target image is an image obtained after image enhancement.

In the embodiment, in the image enhancement processing process, the semantic segmentation features and the semantic segmentation results are fused with the related image enhancement features through two times of feature fusion processing, so that feature information complementation is realized, different image enhancement strengths can be adopted for different semantic regions, texture details are accurately kept, and the reality of the texture details after image enhancement is improved.

It should be understood that the execution subjects (i.e., image processing apparatuses) of steps 401 to 407 may be terminal devices, or may be servers on the cloud side, and steps 401 to 407 may also be obtained by performing data processing and interaction between the terminal devices and the servers.

For ease of understanding, how the image processing method provided in the present embodiment achieves image defogging will be described in detail below with reference to specific examples.

Referring to fig. 6a and fig. 6b, fig. 6a is a schematic diagram of an architecture for image processing according to an embodiment of the present disclosure; fig. 6b is a schematic diagram of a network structure for image processing according to an embodiment of the present disclosure. As shown in fig. 6a and 6b, the architecture may include:

a pre-processing unit 100 for receiving the fogged low contrast image and pre-processing the image to generate pre-processed features F. The preprocessing unit 100 may be, for example, a convolution network, and generates the preprocessed features F by performing a convolution operation on the received image (e.g., 12 megapixels, 3000 × 4000 resolution image), where the resolution of the preprocessed features F is the same as the resolution of the image, i.e., 3000 × 4000 resolution.

A first feature extraction unit 101, configured to perform feature extraction on the preprocessed features F, for example, perform extraction of low-level features of an image on the preprocessed features F to obtain first features F_L. The first feature extraction unit 101 may employ a backbone network related to the defogging task, for example, a multi-stage cascaded convolutional network + Instance Normalization (IN) network. The IN network-based high-nonlinearity contrast normalization effect can be learned, so that the final prediction result cannot be influenced by the deviation of the image IN brightness, color, style and other appearances, and the compatibility problem of the extracted low-level features of the image and the subsequently extracted high-level features of the image can be improved. The number of cascade layers N may be determined according to the desired processing accuracy and the computational power of the target hardware platform, and generally, the larger the number of cascade layers N is, the higher the accuracy of feature extraction is, and the larger the computational load is.

A down-sampling unit 200 for performing a preprocessing operation on the preprocessing characteristic F to perform down-samplingSample processing to obtain down-sampling feature F with reduced resolution_down. Illustratively, the down-sampling unit 200 may be, for example, a pooling layer network, and may perform k × k average pooling (averaging) down-sampling on the preprocessed feature F to obtain the down-sampled feature F_down. The down-sampling factor k may be determined according to a desired processing accuracy and computational power of a target hardware platform, and generally, the smaller the number of cascade layers N, the higher the accuracy of feature extraction and the larger the amount of computation. Illustratively, k may take a value of 4, i.e., 4 times down-sampling the width and height of the feature simultaneously.

A second feature extraction unit 201 for down-sampling the feature F_downPerforming feature extraction processing, e.g. on down-sampled features F_downExtracting high-level features of the image to obtain a sixth feature F_down-seg. Illustratively, the second feature extraction unit 201 may be a densely connected hole convolution network.

An up-sampling unit 202 for up-sampling the second feature F_down-segPerforming an up-sampling process to obtain a second feature F having the same resolution as the original input image_H-seg. The sampling method adopted by the upsampling unit 202 may be, for example, deconvolution, bilinear interpolation upsampling, neighbor interpolation upsampling, and the upsampling multiple thereof is the same as the downsampling multiple.

A first feature fusion unit 102 for fusing the first feature FL and the second feature F_down-segPerforming feature fusion processing, e.g. on the first feature FL and the second feature F_down-segPerforming cascade operation to obtain a fused third feature F_fusion1。

A third feature extraction unit 103 for extracting a third feature F_fusion1Further feature extraction is performed, e.g. on the third feature F by a convolutional network_fusion1Performing convolution processing to extract fine-grained features, namely fifth features F_fine。

A semantic result prediction unit 203 for predicting the third feature F_fusion1Making predictions of the result of semantic segmentation, e.g. post-processing the third feature Ffusion1 through a convolutional networkAnd obtaining a semantic segmentation result corresponding to the input image.

A second feature fusion unit 104 for fusing a fifth feature F_fineAnd inputting the semantic segmentation result corresponding to the image, and performing feature fusion processing, such as on the fifth feature F_fineAnd performing cascade operation on semantic segmentation results corresponding to the input image to obtain a fused fourth feature F_fusion2。

An image reconstruction unit 105 for reconstructing the fused fourth feature F_fusion2Performing image reconstruction processing, e.g. on the fourth feature F by means of a convolutional network_fusion2And carrying out post-processing operation to obtain the defogged image.

Taking the method of the present embodiment as an example for image defogging, the present embodiment performs a test on the open-source simulation data set to compare the method with the existing defogging algorithm.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a comparison of objective indexes provided in the embodiments of the present application. As can be seen from fig. 7, compared to various existing defogging algorithms, the image processing method provided in the embodiment of the present application has a higher Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM).

PSNR is an engineering term representing the ratio of the maximum possible power of a signal and the power of destructive noise affecting its representation accuracy. PSNR is generally used as a method for measuring signal reconstruction quality in the field of image processing and the like, and is generally defined by mean square error. Generally, the higher the PSNR, the smaller the difference from the true value.

SSIM is an index for measuring the similarity between two images, and the similarity of images is evaluated mainly based on brightness (luminance), contrast (contrast), and structure (structure).

In addition, under the condition that different levels of noise are added to the open source simulation data set, compared with the existing defogging method, the method has higher PSNR and SSIM, namely the method has stronger robustness and stability. Specifically, referring to fig. 8, fig. 8 is a schematic diagram illustrating another objective index comparison provided in the embodiments of the present application.

The method is used for testing the real fog-containing data set, and can obtain a clearer and transparent result without artifact distortion. The existing defogging algorithm has the defects of insufficient defogging level and low contrast ratio of the defogged image; or too much defogging, the problem of losing texture detail in some local areas. Specifically, referring to fig. 9, fig. 9 is a schematic image contrast diagram provided in the embodiment of the present application. As can be seen from fig. 9, the method of the present embodiment in the lower right corner well maintains the details of the green plants, the floor, the ground, and other areas on the basis of defogging, and the picture is transparent and natural, and has the best visual effect.

Referring to fig. 10, fig. 10 is a schematic flowchart of a model training method according to an embodiment of the present disclosure. As shown in fig. 10, a model training method provided in the embodiment of the present application includes the following steps:

step 1001, a training sample pair is obtained, where the training sample pair includes a first image and a second image, and the quality of the first image is lower than that of the second image.

In this embodiment, before the image training apparatus performs model training, a pair of training samples may be obtained. The first image and the second image are two images in the same scene, and the image quality of the first image is lower than that of the second image. Image quality refers to one or more of color, brightness, saturation, contrast, dynamic range, resolution, texture detail, sharpness, etc. For example, the first image is an image with fog, the second image is an image without fog, and the first image has lower brightness, contrast, definition and the like than the second image.

Step 1002, processing the first image through an image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used for obtaining an image to be processed; processing the first image through a first network resulting in first features, the first network configured to extract at least features for image enhancement; processing the first image through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the first image; generating a fourth feature according to the third feature and the semantic segmentation result of the first image; and carrying out image reconstruction on the fourth feature to obtain a predicted image.

Step 1003, obtaining a first loss according to the second image in the training sample pair and the predicted image, where the first loss is used to describe a difference between the second image and the predicted image.

In this embodiment, after obtaining the predicted image, a first loss corresponding to the second image and the predicted image may be obtained based on a preset loss function to determine a difference between the second image and the predicted image.

In a possible implementation manner, the first loss corresponding to the second image and the predicted image may be obtained based on a reconstruction loss function (reconstruction loss) and a gradient loss function (gradient loss), so as to ensure that the enhanced image can meet the objective index and subjective index requirements.

Illustratively, the reconstruction loss function may be the loss at the pixel level of the second image and the predicted image acquired using the L1 paradigm. The reconstruction loss function may be as shown in equation 1:

wherein L is_recThe reconstruction loss is expressed, | | represents an L1 paradigm, GT represents a true value, i.e., a value of a pixel of the second image, output represents a value of a pixel of the predicted image, and P is the number of pixels. The L1 paradigm is that the difference between the value of the pixel of the second image and the value of the pixel of the predicted image is calculated, and the sum of the absolute values of the differences corresponding to the respective pixels is calculated.

Illustratively, the gradient penalty function may be a penalty representing an average gradient in the x/y direction of the predicted image and the second image. The gradient loss function may be as shown in equation 2:

L_gradgrad (gt) -grad (output) formula 2

Wherein L is_gradRepresenting the gradient loss, | | represents the L1 paradigm, GT represents the true value, i.e. the value of the pixel of the second image, output represents the value of the pixel of the predicted image, and grad () represents the average gradient of the image in the x/y direction.

Based on the reconstruction loss function and the gradient loss function, a first loss corresponding to the second image and the predicted image can be obtained. Illustratively, the function for finding the first loss may be as shown in equation 3:

L_total＝L_rec+α*L_gradequation 3

Wherein L is_totalThe first penalty is indicated, and α is a hyperparameter used to adjust the weight of the gradient penalty.

And 1004, updating the model parameters of the image processing model to be trained at least according to the first loss until model training conditions are met to obtain the image processing model.

The image processing model obtained after the training in step 1004 may refer to the description in the embodiment corresponding to fig. 4, and is not described herein again.

In other words, in the model training process, the semantic segmentation loss function can be used to perform constraint control on the semantic segmentation prediction result of the first image, so that the semantic segmentation result generated by the model can be more accurate.

For example, the semantic segmentation loss function for obtaining the second loss of the semantic segmentation predicted result and the semantic segmentation true result may be a cross entropy loss function. The semantic segmentation loss function may be, for example, as shown in equation 4:

wherein L is_segFor the second loss, p is the number of pixels of the image,

s_ifor representing the probability of a semantic segmentation predictor at pixel i position for semantic class z,

for representing the probability of a semantic segmentation truth at pixel i for semantic class z, and log () for representing the logarithm.

And updating the model parameters of the image processing model to be trained according to the third loss until a model training condition is met so as to obtain the image processing model.

For example, the formula for finding the third loss may be as shown in formula 5:

L_total＝L_rec+α*L_seg+β*L_gradequation 5

Wherein L is_totalAnd the weight is used for expressing a third loss, alpha is a first hyperparameter, beta is a second hyperparameter, and alpha and beta are respectively used for adjusting the weight of the semantic segmentation loss and the gradient loss.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. As shown in fig. 11, an image processing apparatus according to an embodiment of the present application includes: an acquisition unit 1101 and a processing unit 1102; the acquiring unit 1101 is configured to acquire an image to be processed; the processing unit 1102 is configured to process the image to be processed through a first network to obtain a first feature, where the first network is configured to extract at least a feature for image enhancement; processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; the obtaining unit 1101 is further configured to obtain a semantic segmentation result of the image to be processed; the processing unit 1102 is further configured to generate a fourth feature according to the third feature and a semantic segmentation result of the image to be processed; and carrying out image reconstruction on the fourth characteristic to obtain a target image.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to process the third feature through a third network, so as to obtain a semantic segmentation result of the image to be processed.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; and performing feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to perform preprocessing on the image to be processed to obtain a preprocessing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the image to be processed.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure. As shown in fig. 12, an embodiment of the present application provides a model training apparatus, including: an acquisition unit 1201 and a training unit 1202; the obtaining unit 1201 is configured to obtain a training sample pair, where the training sample pair includes a first image and a second image, and a quality of the first image is lower than that of the second image; the prediction unit is used for processing the first image through an image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used for acquiring an image to be processed; processing the first image through a first network resulting in first features, the first network configured to extract at least features for image enhancement; processing the first image through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the first image; generating a fourth feature according to the third feature and the semantic segmentation result of the first image; carrying out image reconstruction on the fourth feature to obtain a predicted image; obtaining a first loss according to a second image in the training sample pair and the predicted image, wherein the first loss is used for describing the difference between the second image and the predicted image; and updating the model parameters of the image processing model to be trained at least according to the first loss until model training conditions are met, so as to obtain the image processing model.

Optionally, in a possible implementation manner, the training unit 1202 is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

Optionally, in a possible implementation manner, the training unit 1202 is further configured to obtain a semantic segmentation real result of the first image; obtaining a second loss according to the semantic segmentation prediction result and the semantic segmentation real result, wherein the second loss is used for describing the difference between the semantic segmentation prediction result and the semantic segmentation real result; and updating the model parameters of the image processing model to be trained at least according to the first loss and the second loss until model training conditions are met, so as to obtain the image processing model.

Optionally, in a possible implementation manner, the training unit 1202 is further configured to perform feature fusion processing on the first feature and the second feature to obtain the third feature; and performing feature fusion processing on the third feature and the semantic segmentation result of the first image to obtain the fourth feature.

Optionally, in a possible implementation manner, the training unit 1202 is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the third feature and the semantic segmentation result of the first image.

Optionally, in a possible implementation manner, the training unit 1202 is further configured to perform preprocessing on the first image to obtain a preprocessing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the first image.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 1300 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, and the like, which is not limited herein. The execution device 1300 may be disposed with the data processing apparatus described in the embodiment corresponding to fig. 13, and is configured to implement the function of data processing in the embodiment corresponding to fig. 13. Specifically, the execution apparatus 1300 includes: the apparatus includes a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (where the number of processors 1303 in the execution apparatus 1300 may be one or more, and one processor is taken as an example in fig. 13), where the processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of the present application, the receiver 1301, the transmitter 1302, the processor 1303, and the memory 1304 may be connected by a bus or other means.

The memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A portion of memory 1304 may also include non-volatile random access memory (NVRAM). The memory 1304 stores processors and operating instructions, executable modules or data structures, or subsets thereof, or expanded sets thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1303 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiment of the present application may be applied to the processor 1303, or implemented by the processor 1303. The processor 1303 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method may be implemented by hardware integrated logic circuits in the processor 1303 or instructions in the form of software. The processor 1303 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1303 may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1304, and the processor 1303 reads information in the memory 1304 and completes the steps of the method in combination with hardware thereof.

The receiver 1301 may be used to receive input numeric or character information and generate signal inputs related to performing device related settings and function control. The transmitter 1302 may be used to output numeric or character information through a first interface; the transmitter 1302 may also be used to send instructions to the disk groups through the first interface to modify data in the disk groups; the transmitter 1302 may also include a display device such as a display screen.

In this embodiment, in one case, the processor 1303 is configured to execute an image processing method executed by the execution device in the corresponding embodiment of fig. 4.

Referring to fig. 14, fig. 14 is a schematic structural diagram of the training device provided in the embodiment of the present application, specifically, the training device 1400 is implemented by one or more servers, and the training device 1400 may generate a larger difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1414 (e.g., one or more processors) and a memory 1432, and one or more storage media 1430 (e.g., one or more mass storage devices) storing an application 1442 or data 1444. Memory 1432 and storage media 1430, among other things, may be transient or persistent storage. The program stored on storage medium 1430 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, central processor 1414 may be disposed in communication with storage medium 1430 for executing a sequence of instruction operations on storage medium 1430 on exercise device 1400.

Training apparatus 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input-output interfaces 1458; or, one or more operating systems 1441, such as Windows Server^TM，Mac OS X^TM，Unix^TM,Linux^TM，FreeBSD^TMAnd so on.

In particular, the training apparatus may perform the steps in the embodiment corresponding to fig. 10.

Embodiments of the present application also provide a computer program product, which when executed on a computer causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.

Also provided in an embodiment of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is run on a computer, the program causes the computer to execute the steps executed by the aforementioned execution device, or causes the computer to execute the steps executed by the aforementioned training device.

The execution device, the training device, or the terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored by the storage unit to cause the chip in the execution device to execute the data processing method described in the above embodiment, or to cause the chip in the training device to execute the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, please refer to fig. 15, where fig. 15 is a schematic structural diagram of a chip provided in the embodiment of the present application, the chip may be represented as a neural network processor NPU 1500, and the NPU 1500 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 1503, and the controller 1504 controls the arithmetic circuit 1503 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 1503 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1501 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 1508.

The unified memory 1506 is used to store input data and output data. The weight data directly passes through a Memory cell Access Controller (DMAC) 1505, and the DMAC is transferred to the weight Memory 1502. The input data is also carried into the unified memory 1506 by the DMAC.

The BIU is a Bus Interface Unit, Bus Interface Unit 1510, for interaction of the AXI Bus with the DMAC and the Instruction Fetch memory (IFB) 1509.

A Bus Interface Unit 1510(Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to fetch instructions from the external memory, and for the storage Unit access controller 1505 to fetch the original data of the input matrix a or the weight matrix B from the external memory.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data into the weight memory 1502 or to transfer input data into the input memory 1501.

The vector calculation unit 1507 includes a plurality of operation processing units, and performs further processing on the output of the operation circuit 1503 if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 1507 can store the processed output vector to the unified memory 1506. For example, the vector calculation unit 1507 may calculate a linear function; alternatively, a non-linear function is applied to the output of the arithmetic circuit 1503, such as to linearly interpolate the feature planes extracted from the convolutional layers, and then such as to accumulate vectors of values to generate activation values. In some implementations, the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 1503, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer (instruction fetch buffer)1509 connected to the controller 1504 for storing instructions used by the controller 1504;

the unified memory 1506, the input memory 1501, the weight memory 1502, and the instruction fetch memory 1509 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

The processor mentioned in any of the above may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. An image processing method, comprising:

acquiring an image to be processed;

processing the image to be processed through a first network to obtain a first feature, wherein the first network is configured to extract at least a feature for image enhancement;

processing the image to be processed through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature;

generating a third feature according to the first feature and the second feature;

obtaining a semantic segmentation result of the image to be processed;

generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed;

and carrying out image reconstruction on the fourth characteristic to obtain a target image.

2. The image processing method according to claim 1, wherein the obtaining of the semantic segmentation result of the image to be processed comprises:

and processing the third features through a third network to obtain a semantic segmentation result of the image to be processed.

3. The image processing method according to claim 1 or 2, wherein the generating a third feature from the first feature and the second feature comprises:

performing feature fusion processing on the first feature and the second feature to obtain a third feature;

generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, wherein the fourth feature comprises:

and performing feature fusion processing on the third feature and the semantic segmentation result of the image to be processed to obtain the fourth feature.

4. The image processing method according to claim 3, wherein the feature fusion process includes at least one of a summation process, a multiplication process, a concatenation process, and a concatenation convolution process.

5. The image processing method according to any one of claims 1 to 4, wherein before generating a fourth feature according to the third feature and a semantic segmentation result of the image to be processed, the method further comprises:

processing the third characteristic to obtain a fifth characteristic;

generating a fourth feature according to the third feature and the semantic segmentation result of the image to be processed, wherein the fourth feature comprises: and generating a fourth feature according to the fifth feature and the semantic segmentation result of the image to be processed.

6. The image processing method according to any one of claims 1 to 5, wherein the processing the image to be processed through the second network to obtain a second feature comprises:

preprocessing the image to be processed to obtain preprocessing characteristics;

performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic;

processing the downsampled features through the second network to obtain sixth features;

and performing upsampling processing on the sixth feature to obtain a second feature of the image to be processed.

7. The image processing method according to any of claims 1 to 6, wherein the method is used to implement at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

8. A method of model training, comprising:

acquiring a training sample pair, wherein the training sample pair comprises a first image and a second image, and the quality of the first image is lower than that of the second image;

processing the first image through an image processing model to be trained to obtain a predicted image, wherein the image processing model to be trained is used for obtaining an image to be processed; processing the first image through a first network resulting in first features, the first network configured to extract at least features for image enhancement; processing the first image through a second network to obtain a second feature, wherein the second network is configured to extract at least a semantic segmentation feature; generating a third feature according to the first feature and the second feature; obtaining a semantic segmentation result of the first image; generating a fourth feature according to the third feature and the semantic segmentation result of the first image; carrying out image reconstruction on the fourth feature to obtain a predicted image;

obtaining a first loss according to a second image in the training sample pair and the predicted image, wherein the first loss is used for describing the difference between the second image and the predicted image;

and updating the model parameters of the image processing model to be trained at least according to the first loss until model training conditions are met, so as to obtain the image processing model.

9. The model training method of claim 8, wherein the to-be-trained image processing model is further configured to process the third feature through a third network to obtain a semantic segmentation prediction result of the first image.

10. The model training method of claim 9, wherein the image processing model to be trained is further configured to:

obtaining a semantic segmentation real result of the first image;

obtaining a second loss according to the semantic segmentation prediction result and the semantic segmentation real result, wherein the second loss is used for describing the difference between the semantic segmentation prediction result and the semantic segmentation real result;

and updating the model parameters of the image processing model to be trained at least according to the first loss and the second loss until model training conditions are met, so as to obtain the image processing model.

11. The model training method according to any one of claims 8 to 10, wherein the image processing model to be trained is further configured to:

12. The model training method of claim 11, wherein the feature fusion process comprises at least one of a summation process, a multiplication process, a concatenation process, and a concatenation convolution process.

13. The model training method according to any one of claims 8 to 12, wherein the to-be-trained image processing model is further configured to process the third feature to obtain a fifth feature; and generating a fourth feature according to the third feature and the semantic segmentation result of the first image.

14. The model training method according to any one of claims 8 to 13, wherein the image processing model to be trained is further configured to preprocess the first image to obtain a preprocessing feature; performing down-sampling processing on the preprocessing characteristic to obtain a down-sampling characteristic; processing the downsampled features through the second network to obtain sixth features; and performing upsampling processing on the sixth feature to obtain a second feature of the first image.

15. The model training method of any one of claims 8 to 14, wherein the image processing model is configured to perform at least one of the following image enhancement tasks: image super-resolution reconstruction, image denoising, image defogging, image deblurring, image contrast enhancement, image demosaicing, image raining, image color enhancement, image brightness enhancement, image detail enhancement and image dynamic range enhancement.

16. An image processing apparatus, characterized in that the apparatus comprises a memory and a processor; the memory stores code, the processor is configured to execute the code, and when executed, the image processing apparatus performs the method of any of claims 1 to 15.

17. A computer storage medium, characterized in that the computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to implement the method of any of claims 1 to 15.