WO2021244270A1 - 图像处理方法、装置、设备及计算机可读存储介质 - Google Patents

图像处理方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021244270A1
WO2021244270A1 PCT/CN2021/094049 CN2021094049W WO2021244270A1 WO 2021244270 A1 WO2021244270 A1 WO 2021244270A1 CN 2021094049 W CN2021094049 W CN 2021094049W WO 2021244270 A1 WO2021244270 A1 WO 2021244270A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
pixel
processed
model
training
Prior art date
Application number
PCT/CN2021/094049
Other languages
English (en)
French (fr)
Inventor
陈法圣
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2022566432A priority Critical patent/JP7464752B2/ja
Priority to EP21817967.9A priority patent/EP4044106A4/en
Publication of WO2021244270A1 publication Critical patent/WO2021244270A1/zh
Priority to US17/735,942 priority patent/US20220270207A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/67Circuits for processing colour signals for matrixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present application relate to the field of image processing technology, and relate to but not limited to an image processing method, device, device, and computer-readable storage medium.
  • Image processing is a method and technology for removing noise, enhancing, restoring, and improving resolution of an image through a computer.
  • image processing is widely used in various fields such as work, life, military, and medicine.
  • image processing can be achieved through machine learning to achieve better processing results.
  • the embodiments of the present application provide an image processing method, device, device, and computer-readable storage medium, which can not only ensure the pixel coherence of a target image, but also perform image processing in real time, thereby improving image processing efficiency.
  • An embodiment of the present application provides an image processing method, which is executed by an image processing device, and includes:
  • the image to be processed is a grayscale image
  • extract the feature vector of each pixel in the image to be processed and determine the neighborhood image block corresponding to each pixel;
  • An embodiment of the present application provides an image processing device, including:
  • the first acquisition module is configured to acquire the image to be processed
  • the first extraction module is configured to extract the feature vector of each pixel in the image to be processed when the image to be processed is a grayscale image, and determine the neighborhood image block corresponding to each pixel;
  • the output module is configured to output the target image.
  • An embodiment of the application provides an image processing device, including:
  • Memory configured to store executable instructions
  • the processor is configured to execute the executable instructions stored in the memory to implement the foregoing method.
  • the embodiment of the present application provides a computer-readable storage medium that stores executable instructions for causing a processor to execute to implement the above-mentioned method.
  • the lightweight model is obtained by lightweight processing the trained neural network model; due to the training used It is a neural network structure, so it can ensure that the output pixel coherent target image is output when various special losses are used, and the lightweight model (such as subspace model, or decision tree) obtained through model conversion is used for image processing. This enables it to run and output the target image in real time, thereby improving image processing efficiency while ensuring the processing effect.
  • FIG. 1A is a schematic diagram of a network architecture of an image processing system provided by an embodiment of this application;
  • FIG. 1B is a schematic diagram of another network architecture of the image processing system provided by an embodiment of this application.
  • FIG. 2 is a schematic structural diagram of a first terminal 100 according to an embodiment of the application.
  • Figure 3 is a schematic diagram of an implementation process of the image processing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of the implementation process of obtaining a lightweight model provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of still another implementation process of the image processing method provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of the application.
  • FIG. 7A is a schematic diagram of an implementation process of constructing a data set according to an embodiment of the application.
  • FIG. 7B is a schematic diagram of an implementation process of extracting low-resolution image features according to an embodiment of the application.
  • 8A is a schematic diagram of the implementation process of a deep learning model and its training according to an embodiment of the application;
  • FIG. 8B is a schematic diagram of the implementation process of the superdivision network structure and network usage method provided by an embodiment of this application;
  • FIG. 8C is a schematic diagram of a network structure of a discriminator provided by an embodiment of the application.
  • FIG. 8D is a schematic diagram of the implementation process of constructing and generating an objective function provided by an embodiment of the application.
  • FIG. 8E is a schematic diagram of the implementation process of constructing a discrimination objective function provided by an embodiment of the application.
  • FIG. 8F is a schematic diagram of a model training implementation process provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of the implementation process of model conversion according to an embodiment of the application.
  • FIG. 10 is a schematic diagram of the implementation process of real-time reasoning in an embodiment of the application.
  • FIG. 11 is a schematic diagram of an implementation process of performing super-division processing on a color image according to an embodiment of the application.
  • FIG. 12 is a schematic diagram of the implementation process of super-division processing on a video provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of the composition structure of an image processing device provided by an embodiment of the application.
  • first ⁇ second ⁇ third refers only to distinguish similar objects, and does not represent a specific order for the objects. Understandably, “first ⁇ second ⁇ third” Where permitted, the specific order or sequence can be interchanged, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
  • Image processing the processing of images, that is, the processing of pixel maps to pixel maps, such as super-resolution, image denoising and enhancement.
  • Super Resolution (SR) algorithm that is, an algorithm that can improve image resolution, can be referred to as a super resolution algorithm for short, and belongs to an image processing method.
  • Super-resolution algorithms can be divided into two types: multi-frame super-division and single-frame super-division.
  • Single-frame super-division processes a picture to obtain the super-resolution image corresponding to the picture; the multi-frame super-resolution algorithm processes multiple pictures to obtain the super-resolution image corresponding to multiple pictures.
  • the focus of this patent is the single-frame super-resolution algorithm.
  • methods based on deep learning have the best results (obviously better than traditional methods).
  • Computer Central Processing Unit (CPU, Central Processing Unit), the computing and control core of a computer system, is the final execution unit for information processing and program operation, and can be used in various computing scenarios.
  • GPU Graphics Processing Unit
  • display core visual processor
  • display chip is a kind of specialized in personal computers, workstations, game consoles and some mobile devices (such as tablet computers, smart phones, etc.) It is a microprocessor that does image and graphics related operations. GPU has strong computing power and can often far exceed CPU, so it is widely used in deep learning model reasoning. Since GPU resources are scarce resources, there is a delay in deployment.
  • Deep Learning that is, machine learning using neural networks.
  • Model conversion algorithm that is, the algorithm for converting model types, such as converting a deep learning network into a decision tree model, or a subspace model, etc.
  • the model conversion algorithm can convert a complex model to a simple model, greatly improving its calculation speed (the disadvantage is that it may lead to a decrease in accuracy).
  • Convolution kernel when image processing, given an input image, the weighted average of pixels in a small area in the input image becomes each corresponding pixel in the output image, where the weight is defined by a function, this function is called convolution nuclear.
  • the objective function also known as the Loss Function or the Cost Function, is to map the value of a random event or its related random variable to a non-negative real number to express the "risk” or “loss of the random event” "The function.
  • the objective function is usually used as a learning criterion to be associated with optimization problems, that is, to solve and evaluate the model by minimizing the objective function.
  • the parameter estimation used in the model in statistics and machine learning is the optimization goal of the machine learning model.
  • Color gamut also known as color space, represents the range of colors that a color image can display.
  • the current common color gamuts include Luminance Chrominance (YUV) color gamut, red, green and blue (RGB, Red Green Blue color gamut, Cyan Magenta Yellow Black (Cyan Magenta Yellow Black, CMYK color gamut, etc.).
  • image processing methods for improving resolution include at least the following two:
  • Step S001 first enlarge the image to the target size
  • Step S002 Calculate the gradient feature of each pixel on the enlarged image
  • Step S003 each pixel indexes the filter (convolution kernel) to be used by the gradient feature;
  • step S004 each pixel is convolved with its indexed filter to obtain a super-divided pixel.
  • RAISR uses 3 features calculated based on gradients, and divides the feature space into many small blocks by dividing each feature into different paragraphs.
  • the target value can be directly fitted with the least square method to obtain the convolution kernel parameters.
  • the least squares are used to fit the image block to the target pixel (high-resolution pixel) to achieve the training of the model.
  • the RAISR method Compared with the deep learning method, the RAISR method has a slightly lower effect, but the calculation speed can be greatly improved (in the RAISR paper, compared with the deep learning, the speed is more than 100 times that of the latter).
  • SRGAN Super Resolution Generative Adversarial Network
  • SRGAN is a super-resolution technology based on generative confrontation networks. In general, it is to use the characteristics of the generated confrontation network to train two networks at the same time, one is used to construct a more realistic high-resolution image generation network, and the other is used to determine whether the input high-resolution image is constructed by an algorithm. To discriminate the network, the two networks are trained using two objective functions. Through continuous alternating training of these two networks, the performance of these two networks is getting stronger and stronger. Finally, take out the generating network and use it in inference.
  • the disadvantage of the SRGAN algorithm is that it needs to ensure that the network is deep enough, so the network structure will taste very complicated, and it is difficult to run in real time like RAISR.
  • the embodiment of the present application proposes a method of combining an image processing deep learning solution with a supporting model acceleration (model conversion).
  • the neural network structure is used during training to ensure that the output result pixel is coherent when various special losses are used, and No additional noise is introduced; and through the method of model conversion, the model is simplified into a lightweight model (such as a subspace model or a decision tree) so that it can run in real time.
  • the image processing device provided by the embodiment of the application can be implemented as a notebook computer, a tablet computer, a desktop computer, a mobile device (for example, a mobile phone, a portable music player, Any terminal with a screen display function, such as personal digital assistants, dedicated messaging devices, portable game devices), smart TVs, smart robots, etc., can also be implemented as servers.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, and intermediate Cloud servers for basic cloud computing services such as software services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the first terminal 100 may request to obtain a video or a picture from the server 200 (in this embodiment, the picture 101 is taken as an example for description).
  • the image processing method provided by the embodiment of this application can be integrated into the gallery App of the terminal as a functional plug-in. If the first terminal 100 starts the image processing function, then the first terminal 100 can use the image processing method provided by the embodiment of this application. , Perform real-time processing on the picture 101 obtained from the server 200 to obtain the processed picture 102 and present it on the display interface of the first terminal 100.
  • Fig. 1A the super-division processing of the image is taken as an example. Comparing 101 and 102 in Fig. 1A, it can be seen that the resolution of the processed picture 102 is higher, so that the bit rate can be unchanged. Next, improve the user’s picture quality experience.
  • FIG. 1B is a schematic diagram of another network architecture of the image processing system provided by an embodiment of the application.
  • the image processing system includes a first terminal 400, a second terminal 700, a server 500, and a network 600.
  • the first terminal 400 is connected to the server 500 through the network 600.
  • the first terminal 400 may be a smart terminal.
  • Various application programs may be installed on the smart terminal, such as watching a video.
  • the network 600 may be a wide area network, a local area network, or a combination of the two, and wireless links are used to implement data transmission.
  • the second terminal 700 can also be any such as a notebook computer, a tablet computer, a desktop computer, a mobile device (for example, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), a smart TV, a smart robot, etc. Terminal with screen display function.
  • the second terminal 700 may upload a picture or video file to the server 500.
  • the server 500 may process the picture or video according to the image processing method provided in the embodiment of the present application. And get the processed picture or video.
  • the server 500 may return the processed picture or video to the first terminal 400, and the first terminal 400 displays it on its own display interface.
  • FIG. 1B Display the processed picture or video to improve the user’s picture quality experience.
  • the image denoising is used as an example.
  • the image 201 in Fig. 1B is the original image
  • the image 202 in Fig. 1B is the processed image.
  • a comparison of the image 201 and the image 202 shows that the processed image The image has almost no noise points, thereby improving the user’s picture quality experience.
  • FIG. 2 is a schematic structural diagram of a first terminal 100 according to an embodiment of the application.
  • the first terminal 100 shown in FIG. 2 includes: at least one processor 110, a memory 150, at least one network interface 120, and a user interface 130 .
  • the various components in the first terminal 100 are coupled together through the bus system 140.
  • the bus system 140 is used to implement connection and communication between these components.
  • the bus system 140 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the bus system 140 in FIG. 2.
  • the processor 110 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates or transistor logic devices, or discrete hardware Components, etc., where the general-purpose processor may be a microprocessor or any conventional processor.
  • DSP Digital Signal Processor
  • the user interface 130 includes one or more output devices 131 that enable the presentation of media content, including one or more speakers and/or one or more visual display screens.
  • the user interface 130 also includes one or more input devices 132, including user interface components that facilitate user input, such as a keyboard, a mouse, a microphone, a touch screen display, a camera, and other input buttons and controls.
  • the memory 150 may be removable, non-removable, or a combination thereof.
  • Exemplary hardware devices include solid-state storage, hard disk drives, optical disk drives, and the like.
  • the memory 150 may include one or more storage devices that are physically remote from the processor 110.
  • the memory 150 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory may be a read only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
  • ROM read only memory
  • RAM Random Access Memory
  • the memory 150 described in the embodiment of the present application is intended to include any suitable type of memory.
  • the operating system 151 includes system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • the network communication module 152 is used to reach other computing devices via one or more (wired or wireless) network interfaces 120.
  • Exemplary network interfaces 120 include: Bluetooth, Wireless Compatibility Authentication (WiFi), and Universal Serial Bus ( USB, Universal Serial Bus), etc.;
  • the input processing module 153 is configured to detect one or more user inputs or interactions from one or more of the one or more input devices 132 and translate the detected inputs or interactions.
  • FIG. 2 shows an image processing device 154 stored in the memory 150.
  • the image processing device 154 may be a device in the first terminal 100.
  • An image processing device which can be software in the form of programs and plug-ins, and includes the following software modules: a first acquisition module 1541, a first extraction module 1542, a first processing module 1543, and an output module 1544. These modules are logical and therefore According to the realized function, it can be combined or split arbitrarily. The function of each module will be explained below.
  • the device provided in the embodiment of the application may be implemented in hardware.
  • the device provided in the embodiment of the application may be a processor in the form of a hardware decoding processor, which is programmed to execute the application.
  • the image processing method provided by the embodiment for example, a processor in the form of a hardware decoding processor may adopt one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, and Programmable Logic Device (PLD, Programmable Logic Device). ), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field-Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processing
  • PLD Programmable Logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science. It attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning. Each direction will be explained separately below.
  • Machine Learning is a multi-disciplinary interdisciplinary, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.
  • AIaaS AI as a Service
  • AIaaS AI as a Service
  • the AIaaS platform will split several common AI services , And provide independent or packaged services in the cloud.
  • This service model is similar to opening an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through API interfaces, and some senior developers can also use the platform Provide AI framework and AI infrastructure to deploy and operate their own exclusive cloud artificial intelligence services.
  • FIG. 3 is a schematic diagram of an implementation flow of the image processing method provided by an embodiment of the application, and will be described with reference to the steps shown in FIG. 3.
  • Step S101 Obtain an image to be processed.
  • the image to be processed can be a grayscale image or a multi-channel color image.
  • the image to be processed may be a video frame image obtained by decoding a video file.
  • the image to be processed may be obtained from the server.
  • the image to be processed may also be an image collected by the first terminal.
  • the image to be processed may be uploaded to the server by the second terminal.
  • step S101 after the image to be processed is obtained in step S101, it can also be executed: determine whether the image to be processed is a grayscale image, wherein when the image to be processed is a grayscale image, go to step S102; when the image to be processed is For color images, the color gamut of the image to be processed needs to be converted, and then the image processing process is performed.
  • Step S102 when the image to be processed is a grayscale image, extract the feature vector of each pixel in the image to be processed, and determine the neighborhood image block corresponding to each pixel.
  • step S102 the first direction gradient value and the second direction gradient value of each pixel can be determined according to the pixel value of each pixel in the image to be processed, and then according to the first direction gradient value of each pixel And the second direction gradient value determines the feature vector of each pixel.
  • the neighborhood image block may be a K*K image block centered on each pixel, where K is an odd number, for example, K may be 5, 7, 9, 13, and so on.
  • Step S103 using the lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain a processed target image.
  • the lightweight model is obtained by performing lightweight processing on the trained neural network model.
  • it can be based on the trained neural network model to perform subspace division or generate a decision tree to obtain the lightweight model
  • the lightweight model is more simplified. Therefore, when the lightweight model is used to perform image processing on the feature vector of each pixel and the neighborhood image block, compared with the neural network model, the calculation efficiency can be improved and the image processing time can be shortened. So as to realize real-time processing.
  • step S103 it can be based on the feature vector of each pixel to determine the subspace corresponding to each pixel, or determine the leaf node in the decision tree corresponding to each pixel, and then determine the subspace or leaf node corresponding
  • the convolution kernel of performs convolution operation on the convolution kernel and the neighboring image block to obtain the processed pixel value corresponding to each pixel, and determines the target image based on the processed pixel value of each pixel.
  • Step S104 output the target image.
  • step S104 when step S104 is implemented by the first terminal shown in FIG. 1A, the target image may be presented on the display device of the first terminal.
  • step S104 when step S104 is implemented by the server shown in FIG. 1B, the target image may be Sent to the first terminal.
  • step S104 when step S104 is implemented by the server shown in FIG. 1B, after step S104, it can also be executed: the server stores the target image in the local storage space.
  • the neighborhood image block corresponding to each pixel in the image to be processed is determined; when the image to be processed is a grayscale image, the image to be processed is extracted Process the feature vector of each pixel in the image; use the lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain the processed target image, where the lightweight model is a trained neural network model It is obtained by lightweight processing; because the neural network structure is used for training, it can ensure that the output pixel coherent target image is used when various special losses are used, and the light weight obtained through model conversion is used for image processing
  • a model (such as a subspace model, or a decision tree) enables it to run and output the target image in real time, thereby improving the image processing efficiency while ensuring the processing effect.
  • step S102 extracting the feature vector of each pixel in the image to be processed.
  • Step S1021 Determine the first direction gradient map and the second direction gradient map corresponding to the image to be processed.
  • the first direction may be a horizontal direction
  • the second direction may be a vertical direction.
  • step S1021 for each pixel in the image to be processed, the right neighbor pixel of each pixel The pixel value minus the pixel value of the left adjacent pixel, and the difference is divided by 2, to obtain the gradient value of the pixel in the first direction, and determine the to-be-processed based on the gradient value of each pixel in the first direction
  • the first direction gradient map corresponding to the image subtract the pixel value of each pixel's lower neighboring pixel from the pixel value of the upper neighboring pixel, and divide the difference by 2, to get the pixel in the second direction
  • edge pixels in the image to be processed you can use edge symmetrical flipping to calculate their gradient values.
  • the vertical gradient values of the pixels on the upper and lower edges of the image to be processed are all 0, and the pixels on the left and right edges are The horizontal gradient values are all 0.
  • Step S1022 Determine the first gradient neighborhood block in the first direction gradient map and the second gradient neighborhood block in the second direction gradient map of each pixel in the image to be processed.
  • the first gradient neighborhood block and the second gradient neighborhood block have the same size, and both have the same size as the neighborhood image block of each pixel in the image to be processed.
  • Step S1023 Determine the feature vector of each pixel based on the first gradient neighborhood block and the second gradient neighborhood block of each pixel.
  • step S1023 can be implemented through the following steps:
  • Step S231 Determine the covariance matrix of each pixel based on the first gradient neighborhood block and the second gradient neighborhood block of each pixel.
  • the covariance matrix A of pixel i can be obtained by formula (1-1) out:
  • Step S232 Determine each first eigenvalue and each second eigenvalue corresponding to each cosquare matrix.
  • the first eigenvalue ⁇ 1 and the second eigenvalue ⁇ 2 of the cosquare matrix A can be calculated according to formula (1-2) and formula (1-3).
  • a ⁇ x i x i
  • b ⁇ x i y i
  • c ⁇ y i y i .
  • Step S233 Determine each variance value corresponding to the neighboring image block of each pixel.
  • Step S234 Determine the feature vector of each pixel based on each first feature value, each second feature value, and each variance value.
  • the feature vector of each pixel point may be 4-dimensional.
  • the fourth-dimensional feature f 4 v, where v is the variance value determined in step S233.
  • the first direction gradient value and the second direction gradient value of each pixel point can be directly used as each pixel point.
  • Eigenvectors In some embodiments, other feature extraction algorithms can also be used to extract the feature vector of each pixel in the image to be processed.
  • the dimension of the obtained feature vector cannot be too large, so as to avoid the excessive number of lightweight models obtained after the model conversion. More, which in turn causes the computational complexity to be too high.
  • the preset neural network model needs to be trained through the following steps to obtain a trained neural network model:
  • Step S001 Obtain training data and a preset neural network model.
  • the training data includes at least a first training image and a second training image, where the second training image is obtained by down-sampling the first training image, that is to say, the resolution of the second training image is lower than The resolution of the first training image.
  • the first training image and the second training image are both grayscale images.
  • the training data may also include the feature vector of each pixel in the second training image.
  • the preset neural network model may be a deep learning neural network model, and the neural network model may include a generative model and a discriminant model.
  • Step S002 Use the neural network model to process the second training image to obtain a predicted image.
  • the training data when the training data includes the feature vector of each pixel in the second training image, when step S002 is implemented, the feature vector of each pixel in the second training image may be input to the neural network model to obtain the predicted image
  • the training data when the training data includes only the first training image and the second training image, when step S002 is implemented, the second training image may be input to the neural network model to obtain a predicted image.
  • Step S003 Perform back propagation training on the neural network model based on the predicted image, the first training image and the preset objective function to obtain a trained neural network model.
  • the preset objective function includes generating objective function and discriminating objective function.
  • this step S003 can be implemented through the following steps:
  • Step S31 Fix the discriminant parameters of the discriminant model, and perform backpropagation training on the generative model based on the predicted image, the first training image and the generative target function to adjust the generative parameters of the generative model.
  • Step S32 Fix generation parameters of the discriminant model, and perform back propagation training on the discriminant model based on the predicted image, the first training image and the discriminant objective function, so as to adjust the discriminant parameters of the discriminant model until it reaches the preset value.
  • the training completion conditions are obtained, and a trained neural network model is obtained.
  • the preset training completion condition may be that the number of training times reaches the preset threshold of times, or the difference value between the predicted image and the first training image is lower than the preset difference threshold.
  • the objective function can be constructed and generated through the following steps:
  • Step S41a Determine the pixel-level error value and the content error value between the predicted image and the first training image.
  • the pixel-level error value when determining the pixel-level error value between the predicted image and the first training image, you can first determine the error value between each pixel in the predicted image and the first training image, and then use the difference between each pixel.
  • the error value determines the pixel-level error value between the predicted image and the first training image, where the pixel-level error value can be an average error calculated based on the error value between each pixel, or it can be based on the difference between each pixel
  • the mean square error (MSE, Mean Square Error), absolute error, etc. of the calculated error value MSE, Mean Square Error
  • the predicted image and the first training image can be input to the content feature module respectively, and the predicted content feature vector and the training content feature vector can be obtained correspondingly, where the content feature
  • the module is a pre-trained module.
  • the first multi-layer structure of VGG19 is used (the first 17 layers are recommended), and the content error value is calculated based on the predicted content feature vector and the training content feature vector, where the content error value can be the predicted content
  • the average error between the feature vector and the training content feature vector can also be in the form of mean square error or absolute error between the two.
  • Step S42a Determine the first pixel discrimination error value and the first global discrimination error value of the prediction image based on the prediction image and the discrimination model.
  • the predicted image can be first input to the discriminant model to obtain the predicted pixel discriminant matrix and the predicted global discriminant value, where the size of the predicted pixel discriminant matrix is consistent with the size of the predicted image, and the predicted pixel discriminant matrix
  • the predicted global discriminant value is a value, which indicates the probability that the predicted image is constructed by the generator (the value is a real number between 0-1);
  • the first pixel discrimination error value is determined based on the predicted pixel discrimination matrix value (that is, 0), and the first global discrimination error value is determined based on the predicted global discrimination value.
  • the first pixel discrimination error value can be obtained by calculating the average error between the predicted pixel discrimination matrix and the negative value, or can be obtained by calculating the mean square error between the two; similarly, the first global discrimination error value It can be obtained by calculating the average error between the predicted global discriminant value and the negative value, or by calculating the mean square error between the two.
  • Step S43a based on the preset generation weight value, the pixel-level error value, the content error value, the first pixel discriminant error value, and the first global discriminant error value, the target function is determined to be generated.
  • the preset generated weight value includes the first weight value corresponding to the pixel-level error value, the second weight value corresponding to the content error value, the third weight value corresponding to the first pixel discrimination error value, and the first global value. Determine the fourth weight value corresponding to the error value.
  • step S43a the pixel-level error value, the content error value, the first pixel discrimination error value, the first global discrimination error value and the corresponding weight value are weighted. And, get the generated objective function.
  • the discriminant objective function can be constructed through the following steps:
  • Step S41b Determine the second pixel discrimination error value and the second global discrimination error value of the prediction image based on the prediction image and the discrimination model.
  • step S41b the predicted image is first input to the discriminant model to obtain the predicted pixel discriminant matrix and the predicted global discriminant value; then the second pixel discriminant error value is determined based on the predicted pixel discriminant matrix and the value (that is, 1) , And determine the second global discriminant error value based on the predicted global discriminant value and the yes value.
  • the second pixel discriminant error value can be obtained by calculating the average error between the predicted pixel discriminant matrix and the yes value, or can be obtained by calculating the mean square error between the two; similarly, the second global discriminant error value It can be obtained by calculating the average error between the predicted global discriminant value and the yes value, or by calculating the mean square error between the two.
  • Step S42b Determine a third pixel discrimination error value and a third global discrimination error value of the first training image based on the first training image and the discrimination model.
  • step S42b when step S42b is implemented, first input the first training image to the discriminant model to obtain the training pixel discriminant matrix and the training global discriminant value; and then determine the third pixel discriminant based on the training pixel discriminant matrix value (that is, 0) Error value, and the third global discriminant error value is determined based on the training global discriminant value or not.
  • the third pixel discrimination error value can be obtained by calculating the average error between the training pixel discrimination matrix and the negative value, or the mean square error between the two; similarly, the third global discrimination error value It can be obtained by calculating the average error between the training global discriminant value and the negative value, or by calculating the mean square error between the two.
  • Step S43b Determine the discrimination objective function based on the preset discrimination weight value, the second pixel discrimination error value, the second global discrimination error value, the third pixel discrimination error value and the third global discrimination error value.
  • the preset discrimination weight value includes the fifth weight value corresponding to the second pixel discrimination error value, the sixth weight value corresponding to the second global discrimination error value, and the seventh weight value corresponding to the third pixel discrimination error value.
  • the eighth weight value corresponding to the third global discrimination error value when step S43b is implemented, may be the second pixel discrimination error value, the second global discrimination error value, the third pixel discrimination error value, and the third pixel discrimination error value.
  • the global discriminant error and the corresponding weight value are weighted and summed to obtain the discriminant objective function.
  • the lightweight model can be obtained through step S51a to step S54a as shown in FIG. 4:
  • Step S51a Determine the feature space based on the feature vector corresponding to each pixel in the image to be processed.
  • the feature space may be determined based on the maximum value and the minimum value of each dimension in the corresponding feature vector in each pixel point.
  • step S52a the feature space is divided into N feature subspaces according to a preset division rule, and N central coordinates corresponding to the N feature subspaces are determined respectively.
  • each dimension of the feature vector can be divided.
  • Space and determine the corresponding center coordinates based on the maximum and minimum values of each dimension in each feature subspace.
  • the median of the maximum value and the minimum value of each dimension in each feature subspace may be determined as the center coordinate corresponding to the feature subspace.
  • step S53a the N center coordinates are respectively input to the trained neural network model, and N convolution kernels of N feature subspaces are obtained correspondingly.
  • Step S54a Determine N feature subspaces and N convolution kernels as a lightweight model.
  • the lightweight model can also be obtained through the following steps:
  • step S51b a decision tree is constructed based on the feature vector corresponding to each pixel in the image to be processed.
  • step S51b when step S51b is implemented, it can be that all the feature vectors are first regarded as a node, and then a feature vector is selected from all the feature vectors to segment all the feature vectors to generate several child nodes; for each child The node judges, if the condition to stop splitting is met, set the node as a leaf node; otherwise, select a feature vector from the child nodes to split all the feature vectors in the child node until the condition to stop splitting is reached. Decision tree.
  • step S52b each leaf node in the decision tree is input to the trained neural network model, and the convolution kernel corresponding to each leaf node is correspondingly obtained.
  • each leaf node is input to the trained neural network model, that is, the feature vector as the leaf node is input to the trained neural network model, and the convolution kernel corresponding to each leaf node is obtained.
  • Step S53b Determine each leaf node and the corresponding convolution kernel as the lightweight model.
  • a decision tree is constructed based on the feature vector of each pixel, and the convolution kernel corresponding to each leaf node in the decision tree is determined, thus obtaining a lightweight model.
  • the above step S103 "Using the lightweight model to process the feature vector of each pixel and the neighborhood image block , Get the processed target image" can be achieved through the following steps:
  • Step S1031 Determine the convolution kernel corresponding to each pixel based on the feature vector of each pixel and the lightweight model.
  • step S1031 when the lightweight model is obtained by dividing the feature space to obtain feature subspaces, then when step S1031 is implemented, it can be determined based on the feature vector of a certain pixel i that the feature vector falls into the lightweight model. Which feature subspace, and then obtain the convolution kernel corresponding to the feature subspace. In the embodiment of this application, when different image processing is performed, the number of channels of the obtained convolution kernel is different.
  • the super-division processing is performed, and the super-division multiple is P, P is an integer greater than 1 (for example, It can be 2)
  • the size of the original image before processing is W*D (for example, 1280*720)
  • the size of the processed image is W*P*D*P (for example, the size of the processed image is 1280*2*720* 2, which is 2560*1440)
  • the number of channels of the convolution kernel obtained at this time is P*P (that is 4); if it is denoising processing, because the original image before processing and the processed image The size of is the same, then the number of channels of the convolution kernel obtained at this time is 1.
  • step S1031 when the lightweight model is obtained by constructing a decision tree, then when step S1031 is implemented, the feature vector of each pixel is compared with each node in the decision tree, and finally the corresponding pixel is obtained.
  • the target leaf node and obtain the convolution kernel corresponding to the target leaf node.
  • Step S1032 Perform convolution calculation on the neighborhood image block of each pixel and each corresponding convolution kernel to obtain the processed pixel value.
  • the number of processed pixel values of a pixel value after convolution calculation is related to the number of channels of the convolution kernel. For example, if the number of channels of the convolution kernel is 1, then the processed pixel value obtained The number is also 1; and the number of channels of the convolution kernel is P*P, then the number of processed pixel values obtained is P*P.
  • Step S1033 Determine a processed target image based on the processed pixel value.
  • the processed target image is obtained directly based on the processed pixel values; when the number of processed pixel values is P*P, the processed pixels The values are spliced and reordered to obtain the processed target image.
  • steps S1031 to S1033 are located, a lightweight model is used to determine the convolution kernel corresponding to each pixel, the dimensionality is reduced compared to the convolution kernel corresponding to the neural network model before the lightweight processing Therefore, when performing convolution calculations, the amount of calculation can be reduced, thereby improving processing efficiency and realizing real-time processing.
  • FIG. 5 is a schematic diagram of another implementation flow of the image processing method provided by the embodiment of the application, which is applied to the network architecture shown in FIG. 1A, as shown in FIG. As shown in 5, the method includes:
  • Step S201 The first terminal receives an operation instruction for watching a video.
  • the operation instruction may be triggered by a click or touch operation made by the user to watch the video viewing portal of the video App.
  • Step S202 The first terminal sends a request message for watching the video to the server based on the operation instruction.
  • the request message carries the target video identifier.
  • Step S203 The server obtains the target video file based on the request message.
  • the server parses the request message, obtains the target video identifier, and obtains the target video file based on the target video identifier.
  • Step S204 The server returns a video data stream to the first terminal based on the target video file.
  • Step S205 The first terminal decodes the received video data stream to obtain an image to be processed.
  • step S205 the first terminal decodes the received video data stream to obtain each video image frame, and determines each video image frame as an image to be processed.
  • Step S206 The first terminal determines whether the image to be processed is a grayscale image.
  • step S207 when the image to be processed is a grayscale image, go to step S207; when the image to be processed is a color image, go to step S209.
  • the image to be processed when the image to be processed is a color image, it may be an RGB color image, or an sRGB color image, a CMYK color image, and so on.
  • Step S207 The first terminal extracts the feature vector of each pixel in the image to be processed, and determines the neighborhood image block corresponding to each pixel.
  • step S208 the first terminal uses the lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain a processed target image.
  • the lightweight model is obtained by performing lightweight processing on the trained neural network model.
  • it can be based on the trained neural network model to perform subspace division or generate a decision tree to obtain the lightweight model .
  • step S207 and step S208 in the embodiment of the present application is similar to the implementation process of step S102 and step S103 in other embodiments, and the implementation process of step S102 and step S103 can be referred to.
  • step S209 the first terminal converts the image to be processed into the luminance and chrominance (YUV) color gamut to obtain the luminance Y channel to be processed image and the chrominance UV channel to be processed image.
  • YUV luminance and chrominance
  • step S209 may be implemented by converting the color image to be processed to the YUV color gamut according to a preset conversion function, so as to obtain the Y channel to be processed image and the UV channel to be processed image. Since the Y channel information in the YUV image is sufficient to display the gray scale of the image, that is, the Y channel to be processed image is a single-channel gray scale image at this time.
  • Step S210 The first terminal extracts the feature vector of each Y-channel pixel in the Y-channel to-be-processed image, and determines the neighborhood image block corresponding to each Y-channel pixel.
  • step S210 is similar to the implementation process of step S102 described above, and the implementation process of step S102 can be referred to in actual implementation.
  • step S211 the first terminal uses the lightweight model to process the feature vector of each Y channel pixel and the neighborhood image block to obtain a processed Y channel target image.
  • the lightweight model is used to perform image processing on only the Y channel to be processed image, so as to obtain the processed Y channel target image.
  • the implementation process of step S211 is similar to the implementation process of step S103 described above. In actual implementation, the implementation process of step S103 can be referred to.
  • step S212 the first terminal uses a preset image processing algorithm to process the UV channel to-be-processed image to obtain the UV channel target image.
  • the preset image processing algorithms are different.
  • the preset image processing algorithm may be an image interpolation algorithm, for example, it may be bicubic. Interpolation algorithm; when the purpose of image processing is to remove image noise, the preset image processing algorithm can be a filtering algorithm, for example, a spatial domain filtering algorithm, a transform domain filtering algorithm, etc.
  • Step S213 The first terminal determines a target image based on the Y channel target image and the UV channel target image, where the target image has the same color gamut as the image to be processed.
  • step S213 after using a preset image processing algorithm to process the UV channel to-be-processed image to obtain the UV channel target image, in step S213, the Y channel target image and the UV channel target image obtained in step S211 are subjected to color gamut conversion to obtain The target image with the same color gamut as the image to be processed.
  • Step S214 the first terminal outputs the target image.
  • step S214 the target image may be presented on the display interface of the first terminal.
  • the first terminal decodes the video data stream to obtain the image to be processed.
  • the image to be processed is a grayscale image
  • the light source is directly used.
  • the quantization model processes the image to be processed to obtain the target image; when the image to be processed is a color image, convert the image to be processed to the YUV color gamut, and use the lightweight model to process the Y channel to be processed image to obtain the Y channel target image , Use the preset image processing algorithm to process the UV channel to be processed image to obtain the UV channel target image, and then convert the Y channel target image and the UV channel target image to the same color gamut as the image to be processed to obtain the target image, and Output the target image, which can increase the image processing speed and realize real-time operation (the acceleration ratio is different after different models are converted, theoretically up to 100 times or more).
  • the image processing method provided by the embodiment of this application can be used for super-resolution processing and denoising It has a wide range of applications in processing, image enhancement processing, etc.
  • image processing method provided in the embodiments of this application can be used in a variety of image processing applications (such as image super-resolution, denoising, enhancement, etc.).
  • image processing applications such as image super-resolution, denoising, enhancement, etc.
  • the application of image and video super-resolution is taken as an example Be explained.
  • FIG. 6 is a schematic diagram of an implementation flow of an image processing method provided by an embodiment of this application.
  • the method is applied to an image processing device, where the image processing device may be the first terminal shown in FIG. 1A, or it may be the first terminal shown in FIG. The server shown.
  • the method includes:
  • step S601 the image processing device constructs a training data set.
  • step S601 when step S601 is implemented, the high-resolution image is first down-sampled to construct a low-resolution image, and then the feature extraction algorithm is used to extract the features of each pixel in the low-resolution image to obtain a feature map, and finally each group is used ⁇ High-resolution image, low-resolution image, feature map> Construct a training data set.
  • step S602 the image processing device trains the deep learning model.
  • step S602 the deep learning model is trained based on the training data set, the training algorithm, and the loss function.
  • step S603 the image processing device performs model conversion.
  • a model conversion algorithm is used to simplify the trained deep learning model into a lightweight model, such as a subspace model.
  • step S604 the image processing device performs real-time inference.
  • FIG. 7A is a schematic diagram of the implementation process of constructing a data set according to an embodiment of the application. As shown in Figure 7A, the implementation process includes:
  • step S6011 a high-resolution image is obtained.
  • the width and height of the high-resolution image must be an integer multiple of the super-division multiple N, and must be a grayscale image.
  • step S6012 the artificial downsampling algorithm is used to reduce the resolution of the high-resolution image to obtain a low-resolution image.
  • step S6013 the feature extraction algorithm is used to extract features of the low-resolution image to obtain a feature map.
  • step S6014 the high-resolution image, the low-resolution image, and the feature map are combined into a training set.
  • step S6013 when step S6013 is implemented, gradient features and variance can be used as features of the low-resolution image to construct a feature map.
  • the corresponding 4-dimensional feature may be calculated for each pixel. After that, it is arranged in the order of the original pixels into a feature map with the same width and height as the low-resolution image, and the number of channels is 4.
  • FIG. 7B is a schematic diagram of the implementation process of extracting low-resolution image features according to an embodiment of the application. As shown in FIG. 7B, the process includes:
  • step S31 the image processing device calculates the first direction gradient map dx of the low-resolution image.
  • step S32 the image processing device calculates the second direction gradient map dy of the low-resolution image 6012.
  • the value of the lower pixel minus the upper pixel is used, and the difference is divided by 2, to obtain the corresponding gradient value of the pixel i on dy.
  • Step S33 For each pixel i on the low-resolution image, the image processing device performs the following processing to obtain its corresponding feature (the four-dimensional feature obtained in the embodiment of the present application):
  • Step 331 The image processing device calculates the neighboring image blocks of the corresponding positions of the pixel i on dx and dy, which are respectively denoted as x and y.
  • x and y correspond to the dx block and the dy block in FIG. 7B.
  • Step 333 Calculate the eigenvalues ⁇ 1 and ⁇ 2 of the covariance matrix A.
  • the eigenvalues ⁇ 1 and ⁇ 2 of the cosquare matrix A are calculated according to formula (1-2) and formula (1-3), respectively.
  • a ⁇ x i x i
  • b ⁇ x i y i
  • c ⁇ y i y i .
  • Step 334 On the low-resolution image, extract the neighborhood image block of pixel i, and calculate the variance v of the neighborhood image block.
  • the 4th dimension feature f4 v.
  • the feature of each pixel on the low-resolution image is calculated, thereby constructing a feature map.
  • Fig. 8A is a schematic diagram of the implementation process of the deep learning model and its training according to an embodiment of the application. As shown in Fig. 8A, the process includes:
  • step S6021 a generator (super-division model) is constructed.
  • Step S6022 construct a discriminator (discrimination model).
  • Step S6023 construct and generate the objective function.
  • Step S6024 construct a discriminant objective function.
  • step S6025 two objective functions are used to train the super-resolution model and the discriminant model.
  • the available superdivision network structure and network usage method are shown in FIG. 8B (the network structure is not limited to this), and the available superdivision network structure is shown as 811 in FIG. 8B.
  • the deep superdivision network is a deep neural network, as shown in FIG. 2Z+1 8114, Reshape layer 2 8115.
  • the residual module i 8113 as shown in FIG. 8B, further includes a fully connected layer i_1 1131, a fully connected layer i_2 1132, and an addition layer 1133.
  • the feature map of the low-resolution image is input into the deep neural network, and the convolution kernel used for the current image block super-division is output.
  • the recommended value of Z is 10
  • the "-" in the table indicates the batch dimension.
  • Step S801 the focus pixel i correspond to remove low-resolution image data from the block R i, 4-dimensional feature F i.
  • Step S802 the feature F i is input to the super-network depth, obtained by convolution super-tile i R i used.
  • Step S803 the image block R i and i convolve the convolution kernel, to give N 2 after the super-pixels, referred to as the vector I i.
  • step S804 after the super-divided values I i of all pixels are calculated, they are spliced and reordered (ie, pixel reordering, PixelShuffle) to obtain a super-resolution image S.
  • the directly composed super-divided pixels obtain an image S as a three-dimensional matrix with three dimensions W, H, and N 2 respectively , and the priority is increased in turn, where N is the super-resolution multiple.
  • W is 640
  • H is 360
  • N is 2
  • the three dimensions of the image S obtained after super-division are 640, 360, and 4, respectively.
  • the convolution kernel output by the superdivision network is a convolution kernel with N 2 channels.
  • the super-division network uses the above-mentioned input features to ensure that the subsequent model conversion steps can run effectively (because the feature dimensions used are not many, only 4 dimensions).
  • Fig. 8C is a schematic diagram of a network structure of a discriminator provided by an embodiment of the application.
  • the network model includes convolutional layer 1 821, convolutional layer 2 822, convolutional layer 3 823, Fully connected layer 1 824 and convolutional layer 4 825.
  • the network structure parameters of the discriminant network model shown in Figure 8C are shown in Table 2 below:
  • the discrimination network will have two outputs: global discrimination output 827 and pixel discrimination output 828, among which:
  • the global discriminant output 827 is used to discriminate whether the input image is an image constructed by a superdivision network.
  • the output is a value indicating the probability that the input image is constructed by the generator (between 0-1, 0 means no, 1 means yes) .
  • Pixel discrimination output 828 is used to determine whether the input image is an image constructed by a superdivision network.
  • the output is a matrix with the same width and height as the input image.
  • Each element represents the probability that the input image pixel at the corresponding position is the generator structure (0- Between 1, 0 means no, 1 means yes).
  • the objective function can be constructed and generated as shown in FIG. 8D:
  • Step S231 Calculate the pixel-level error.
  • step S231 when step S231 is implemented, the average error of each pixel point between the high-resolution image and the super-divided image is calculated.
  • the error can be in various forms such as minimum square error (MSE), absolute error, and so on.
  • step S232 the content error is calculated.
  • step S232 can be implemented through the following steps:
  • Step S2321 Input the high-resolution image into the content feature module to obtain the high-scoring content feature.
  • the content feature module is a pre-trained module, and generally uses the first multi-layer structure of VGG19 (recommended to use the first 17 layers); other networks or different first multi-layers can be used.
  • Step S2322 input the super-divided image into the content feature module to obtain the super-divided content feature.
  • Step S2323 Calculate the average error between the high-scoring content feature and the super-scoring content feature, that is, the content error.
  • the error can be in various forms such as minimum square error (MSE) and absolute error.
  • Step S233 Calculate the pixel discrimination error and the global discrimination error.
  • step S233 can be implemented through the following steps:
  • Step S2331 input the super-divided image into the discrimination network to obtain super-divided pixel discrimination and super-divided global discrimination;
  • Step S2332 Calculate the average error of the super-divided pixel discrimination value (0), that is, the pixel discrimination error (hope that the generator can fool the discrimination network to make the discrimination network think that the pixels of the input image are not over-divided).
  • the pixel discrimination error may be in various forms such as binary cross entropy.
  • Step S2333 Calculate the average error of the global discriminant value (0), which is the global discriminant error (hope that the generator can fool the discriminant network so that the discriminant network thinks that the input image is not super-resolution as a whole ).
  • the global discrimination error may be in various forms such as binary cross entropy.
  • step S234 the weighted sum of the 4 errors is obtained to generate the objective function.
  • the suggested weights are: pixel discrimination error weight 7e-4, global discrimination error weight 3e-4, content error weight 2e-6, and pixel-level error weight 1.0.
  • Step S241 Calculate the super-resolution global error and the super-resolution pixel error of the super-resolution image.
  • step S241 can be implemented through the following steps:
  • step S2411 the super-resolution image is input into the discrimination network to obtain the super-division global judgment and super-division pixel judgment.
  • Step S2412 Calculate the average error between the super-division pixel judgment and the value (1), that is, the super-division pixel error (it is hoped that the discrimination network can recognize that each pixel of the input super-division image is constructed by the generator super-division module) .
  • the super-resolution pixel error may be in various forms such as binary cross entropy.
  • Step S2413 Calculate the average error between the super-division global discrimination and the yes value (1), that is, the super-division global error (it is hoped that the discriminant network can recognize that the input super-division image is constructed by the generator super-division module as a whole).
  • the hyperdivision global error may be in various forms such as binary cross entropy.
  • Step S242 Calculate the high-resolution global error and high-resolution pixel error of the high-resolution image.
  • step S242 can be implemented through the following steps:
  • step S2421 the high-resolution image is input into the discrimination network to obtain high-score global judgment and high-score pixel judgment.
  • Step S2422 Calculate the average error of the high-resolution pixel judgment value (0), that is, the high-resolution pixel error (it is hoped that the discrimination network can recognize that each pixel of the input high-resolution image is not constructed for the generator super-division module) .
  • the high-resolution pixel error may be in various forms such as binary cross entropy.
  • Step S2423 Calculate the average error of the high-score global discrimination value (0), that is, the high-score global error (it is hoped that the high-resolution image input by the discrimination network is not constructed by the generator super-score module as a whole),
  • the high score global error may be in various forms such as binary cross entropy.
  • step S243 the four errors are weighted and summed to obtain a discriminative loss function.
  • the suggested weights are respectively: the weight of the super-resolution global error is 0.25, the weight of the super-resolution pixel error is 0.25, the weight of the high-resolution global error is 0.25, and the weight of the high-resolution pixel error is 0.25.
  • Fig. 8F is a schematic diagram of a model training implementation process provided by an embodiment of the application. As shown in Fig. 8F, the process includes:
  • the number of initialization iterations is 1, and the parameter structure of the discrimination network is initialized and the network is generated.
  • step S842 the image processing device determines whether the number of iterations is less than T.
  • T is a preset threshold for the number of iterations, for example, it may be 10,000 times.
  • step S843 is entered; when the number of iterations is greater than or equal to T, the process ends.
  • step S843 the image processing device fixes the parameters of the discriminator, uses the optimization algorithm, uses the data in the training set and the generation loss function, and trains (iteratively) the generator parameters once.
  • step S844 the image processing device fixes the parameters of the generator, uses the optimization algorithm, uses the data in the training set and the discriminant loss function, and trains (iteratively) the discriminator parameters once.
  • step S845 the number of iterations is +1, and step S842 is entered again.
  • the trained generator parameters and discriminator parameters can be obtained, where the generator parameters are the parameters of the deep superdivision network.
  • step S603 model conversion.
  • the core idea of model conversion is to approximate the deep learning model and transform it into a simple and lightweight model.
  • the following is an example of the method of converting the deep superdivision network model into a subspace model. To describe in one sentence is to divide the input feature space to obtain each subspace, and approximate all the deep learning output values of each subspace to the output value of the deep learning model corresponding to the current space center point.
  • Figure 9 is a schematic diagram of the implementation process of model conversion in an embodiment of this application. As shown in Figure 9, the process includes:
  • step S6031 the image processing device discretizes the feature space.
  • each dimension of the feature space (the aforementioned 4-dimensional feature space) is segmented, where: Feature 1 is recommended to be divided into N 1 segments evenly from [0-2 ⁇ ] (recommended value is 16) ; Feature 2 is recommended to be divided into N 2 segments according to the maximum and minimum data (recommended value 8); Feature 3 is recommended to be divided into N 3 segments according to the maximum and minimum data (recommended value 8); Feature 4 is recommended From 0 to the maximum of the data, it is evenly divided into N 4 segments (the recommended value is 8). According to the above segmentation, the feature space is divided into N 1 *N 2 *N 3 *N 4 (the recommended value is 8192) subspaces.
  • step S6032 for each subspace i, the image processing device calculates the center of the subspace, that is, the center coordinate i.
  • step S6032 the median value of the upper and lower bounds of each dimension may be calculated to obtain the center coordinates of the subspace.
  • step S6033 the image processing device inputs the center coordinate i into the deep superdivision network to obtain a convolution kernel i.
  • step S6034 the image processing device composes each subspace and its corresponding convolution kernel into a converted subspace model.
  • the deep learning model in addition to being converted into a subspace model, in some embodiments, may also be converted into other lightweight models, such as decision trees. For this type of model conversion, you can use the deep learning model to construct the data to train a new target lightweight model.
  • step S604 real-time reasoning.
  • the step of real-time reasoning we will use the lightweight model (for example, subspace model) obtained in step S603 to realize real-time reasoning of image super-division.
  • FIG. 10 is a schematic diagram of the implementation process of real-time reasoning in an embodiment of the application. As shown in FIG. 10, the process includes:
  • step S6041 the image processing device calculates a feature map of the image to be super-divided.
  • the calculation method is the same as that of S6013, using a feature extraction algorithm to extract the feature map of the image to be super-divided, where the image to be super-divided is a single-channel image.
  • Step S6042 For each pixel i on the image to be super-divided, on the image to be super-divided, the image processing device obtains the low-resolution image block Ri of the pixel i .
  • step S6043 the image processing device obtains the feature F i of the pixel i on the feature map.
  • step S6044 the image processing device inputs the feature F i into the subspace model, queries the subspace to which it belongs, and obtains the convolution kernel i corresponding to the subspace.
  • Step S6045 the image processing apparatus of a low resolution image block with R i corresponding to the determined subspace i convolution kernel convolution operation, the result obtained super-pixel i L i, i.e., to obtain the super-N 2 super-divided pixels.
  • step S6046 the image processing device performs splicing and reordering on all the pixels L i (N 2 channels, where N is the multiple of the super-division) after the super-division, to obtain the super-division image.
  • step S6046 can refer to the implementation of step S804.
  • step S1101 the image processing device transfers the color image from the original color gamut (for example, RGB color gamut) to the YUV color gamut to obtain the Y-channel to-be-super-divided image and the UV-channel to-be-super-divided image.
  • the original color gamut for example, RGB color gamut
  • the YUV color gamut to obtain the Y-channel to-be-super-divided image and the UV-channel to-be-super-divided image.
  • Step S1102 the image processing device inputs the Y-channel to-be-super-divided image to the real-time super-division module to perform real-time super-division, and obtain the Y-channel super-division image.
  • step S1103 the image processing device uses the traditional image interpolation method to perform the super-division processing on the UV channel to be super-division image, to obtain the UV channel super-division image.
  • bicubic interpolation can be used to perform super-division processing on the UV channel to be super-division image.
  • other image interpolation methods can also be used.
  • step S1104 the image processing device transfers the super-divided YUV image to the original color gamut, and the converted image is the super-divided image.
  • step S1201 the image processing device obtains a video to be super-divided.
  • step S1202 the image processing device decodes the video to obtain each video frame to be superdivided.
  • step S1203 the image processing device inputs each video frame i to be super-divided into the real-time super-division module, performs super-division processing, and obtains a super-division image of the video frame i.
  • step S1203 can be implemented with reference to step S1101 to step S1104.
  • step S1204 the image processing device performs video encoding on the super-divided image of each video frame i to obtain the super-divided video.
  • various objective functions in deep learning can be used during training, which can make the trained model have better picture effects and can convert the deep super-division model into lightweight Model, which can greatly improve its inference speed, realize real-time operation (the acceleration ratio is different after different models are converted, theoretically up to 100 times or more).
  • the method proposed in the embodiments of this application can also be used for other image processing In applications, such as image denoising, enhancement, etc., the scope of application is wider.
  • FIG. 13 is a schematic diagram of the composition structure of the image processing device provided by the embodiment of the application. As shown in FIG. 13, the image processing device 154 includes :
  • the first obtaining module 1541 is configured to obtain the image to be processed
  • the first extraction module 1542 is configured to extract the feature vector of each pixel in the image to be processed when the image to be processed is a grayscale image, and determine the neighborhood image block corresponding to each pixel in the image to be processed;
  • the first processing module 1543 is configured to use a lightweight model to process the feature vector of each pixel and the neighborhood image block to obtain a processed target image.
  • the lightweight model is a lightweight model for the trained neural network model. Quantified processing;
  • the output module 1544 is configured to output the target image.
  • the image processing device further includes:
  • the color gamut conversion module is configured to convert the to-be-processed image to the YUV color gamut when the to-be-processed image is a color image to obtain the Y-channel to-be-processed image and the UV-channel to-be-processed image;
  • the second extraction module is configured to extract the feature vector of each Y-channel pixel in the Y-channel to-be-processed image, and determine the neighborhood image block corresponding to each Y-channel pixel;
  • the second processing module is configured to use the lightweight model to process the feature vector of each Y channel pixel and the neighborhood image block to obtain a processed Y channel target image;
  • the third processing module is configured to use a preset image processing algorithm to process the UV channel to-be-processed image to obtain a UV channel target image;
  • the first determining module is configured to determine the target image based on the Y channel target image and the UV channel target image, wherein the target image has the same color gamut as the image to be processed.
  • the first obtaining module is further configured to:
  • the respective video frame image is determined as the image to be processed.
  • the first extraction module is further configured to:
  • the feature vector of each pixel is determined based on the first gradient neighborhood block and the second gradient neighborhood block of each pixel.
  • the first extraction module is further configured to:
  • the feature vector of each pixel is determined based on each first feature value, each second feature value, and each variance value.
  • the image processing device further includes:
  • the second acquisition module is configured to acquire training data and a preset neural network model, where the training data includes a first training image and a second training image, and the second training image is to download the first training image Obtained by sampling, the neural network model includes a generative model and a discriminant model;
  • the fourth processing module is configured to use the neural network model to process the second training image to obtain a predicted image
  • the model training module is configured to perform back propagation training on the neural network model based on the predicted image, the first training image, and a preset objective function to obtain a trained neural network model.
  • the preset objective function includes generating objective function and discriminating objective function.
  • the model training module is further configured as:
  • Fix generation parameters of the discriminant model and perform back propagation training on the discriminant model based on the predicted image, the first training image and the discriminant objective function to adjust the discriminant parameters of the discriminant model until the preset training is completed Condition, get a trained neural network model.
  • the image processing device further includes:
  • the second determining module is configured to determine the pixel-level error value and the content error value between the predicted image and the first training image
  • the third determining module is configured to determine the first pixel discrimination error value and the first global discrimination error value of the predicted image based on the predicted image and the discriminant model;
  • the fourth determination module is configured to determine the generation objective function based on the preset generation weight value, the pixel-level error value, the content error value, the first pixel discrimination error value and the first global discrimination error value.
  • the image processing device further includes:
  • a fifth determining module configured to determine a second pixel discrimination error value and a second global discrimination error value of the predicted image based on the predicted image and the discriminant model;
  • a sixth determining module configured to determine a third pixel discrimination error value and a third global discrimination error value of the first training image based on the first training image and the discrimination model;
  • the seventh determining module is configured to determine the discrimination objective function based on the preset discrimination weight value, the second pixel discrimination error value, the second global discrimination error value, the third pixel discrimination error value, and the third global discrimination error value .
  • the image processing device further includes:
  • the eighth determining module is configured to determine the feature space based on the feature vector corresponding to each pixel in the image to be processed;
  • the subspace division module is configured to divide the feature space into N feature subspaces according to preset division rules, and respectively determine the N center coordinates corresponding to the N feature subspaces;
  • the first input module is configured to input the N center coordinates to the trained neural network model respectively, and obtain N convolution kernels of N feature subspaces correspondingly;
  • the ninth determining module is configured to determine the N feature subspaces and the N convolution kernels as the lightweight model.
  • the image processing device further includes:
  • the decision tree building module is configured to build a decision tree based on the feature vector corresponding to each pixel in the image to be processed;
  • the second input module is configured to input each leaf node in the decision tree to the trained neural network model, and correspondingly obtain a convolution kernel corresponding to each leaf node;
  • the tenth determining module is configured to determine each leaf node and the corresponding convolution kernel as the lightweight model.
  • the first processing module is further configured to:
  • the processed target image is determined.
  • the embodiments of the present application provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the image processing method described in the embodiment of the present application.
  • the embodiment of the present application provides a storage medium storing executable instructions, and the executable instructions are stored therein.
  • the processor will cause the processor to execute the method provided in the embodiments of the present application.
  • the storage medium may be a computer-readable storage medium, for example, Ferromagnetic Random Access Memory (FRAM), Read Only Memory (ROM), and Programmable Read Only Memory (PROM). Read Only Memory), Erasable Programmable Read Only Memory (EPROM, Erasable Programmable Read Only Memory), Electrically Erasable Programmable Read Only Memory (EEPROM, Electrically Erasable Programmable Read Only Memory), flash memory, magnetic surface memory, optical disks, Or CD-ROM (Compact Disk-Read Only Memory) and other memories; it can also be a variety of devices including one or any combination of the foregoing memories.
  • FRAM Ferromagnetic Random Access Memory
  • ROM Read Only Memory
  • PROM Programmable Read Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • flash memory magnetic surface memory, optical disks, Or CD-ROM (Compact Disk-Read Only Memory) and other memories; it can also be a variety of devices including one or any combination of the foregoing
  • executable instructions may be in the form of programs, software, software modules, scripts or codes, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and their It can be deployed in any form, including being deployed as an independent program or as a module, component, subroutine or other unit suitable for use in a computing environment.
  • executable instructions may but do not necessarily correspond to files in the file system, and may be stored as part of a file that saves other programs or data, for example, in a HyperText Markup Language (HTML, HyperText Markup Language) document
  • HTML HyperText Markup Language
  • One or more scripts in are stored in a single file dedicated to the program in question, or in multiple coordinated files (for example, a file storing one or more modules, subroutines, or code parts).
  • executable instructions can be deployed to be executed on one computing device, or on multiple computing devices located in one location, or on multiple computing devices that are distributed in multiple locations and interconnected by a communication network Executed on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法、装置、设备及计算机可读存储介质,其中,方法包括:获取待处理图像(S101);当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量,并确定该各个像素点对应的邻域图像块(S102);利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像(S103),其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;输出该目标图像(S104)。

Description

图像处理方法、装置、设备及计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为202010495781.1、申请日为2020年06月03日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请实施例涉及图像处理技术领域,涉及但不限于一种图像处理方法、装置、设备及计算机可读存储介质。
背景技术
图像处理是通过计算机对图像进行去除噪声、增强、复原、提高分辨率等处理的方法和技术。随着计算机科学技术以及数字化技术的不断发展,图像处理被广泛应用在工作、生活、军事、医学等各个领域。而伴随着人工智能技术的发展,图像处理在实现时,可以通过机器学习来达到更好的处理效果。
目前在通过机器学习进行图像处理时,往往需要保证使用的神经网络模型的层数足够深,因此网络结构会很复杂,计算量大,不能实现实时处理。
发明内容
本申请实施例提供一种图像处理方法、装置、设备及计算机可读存储介质,不仅能够保证目标图像的像素连贯并且能够实时进行图像处理,提高图像处理效率。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种图像处理方法,该方法由图像处理设备执行,包括:
获取待处理图像;
当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量,并确定该各个像素点对应的邻域图像块;
利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;
输出该目标图像。
本申请实施例提供一种图像处理装置,包括:
第一获取模块,配置为获取待处理图像;
第一提取模块,配置为当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量,并确定该各个像素点对应的邻域图像块;
第一处理模块,配置为利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;
输出模块,配置为输出该目标图像。
本申请实施例提供一种图像处理设备,包括:
存储器,配置为存储可执行指令;
处理器,配置为执行该存储器中存储的可执行指令时,实现上述的方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现上述的方法。
本申请实施例具有以下有益效果:
在获取到待处理图像后,确定该待处理图像中各个像素点对应的邻域图像块;当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量;利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;由于训练时使用的是神经网络结构,因此能够保证使用各种特殊损失时输出像素连贯的目标图像,并且在进行图像处理时使用的是通过模型转换得到的轻量级模型(例如子空间模型,或者决策树),使得其能够实时运行输出目标图像,从而在保证处理效果的同时提高图像处理效率。
附图说明
图1A为本申请实施例提供的图像处理系统的一种网络架构示意图;
图1B为本申请实施例提供的图像处理系统的另一种网络架构示意图;
图2为本申请实施例提供的第一终端100的结构示意图;
图3为本申请实施例提供的图像处理方法的一种实现流程示意图
图4为本申请实施例提供的得到轻量化模型的实现流程示意图;
图5为本申请实施例提供的图像处理方法的再一种实现流程示意图;
图6为本申请实施例提供的图像处理方法的实现流程示意图;
图7A为本申请实施例构造数据集的实现流程示意图;
图7B为本申请实施例提取低分辨率图像特征的实现流程示意图;
图8A为本申请实施例深度学习模型及其训练的实现流程示意图;
图8B为本申请实施例提供的超分网络结构与网络的使用方法的实现流程示意图;
图8C为本申请实施例提供的一种判别器的网络结构示意图;
图8D为本申请实施例提供的构造生成目标函数的实现流程示意图;
图8E为本申请实施例提供的构造判别目标函数的实现流程示意图;
图8F为本申请实施例提供的模型训练实现流程示意图;
图9为本申请实施例模型转换的实现流程示意图;
图10为本申请实施例中实时推理的实现流程示意图;
图11为本申请实施例提供的对彩色图像进行超分处理的实现流程示意图;
图12为本申请实施例提供的对视频进行超分处理的实现流程示意图;
图13为本申请实施例提供的图像处理装置的组成结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,技术方案和优点更加清楚,下面将结合附图对本申请实施例进行描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请实施例。
对本申请实施例进行说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)图像处理,对图像的处理,即像素图到像素图的处理,例如超分辨率,图像去噪增强等处理。
2)超分辨率(SR,Super Resolution)算法,即能够提高图像分辨率的算法,可简称为超分算法,属于一种图像处理方法。超分辨率算法可以分为多帧超分与单帧超分两类。单帧超分通过处理一张图,处理得到该张图对应的超分辨率图像;多帧超分辨率算法,通过处理多张图,得到多张图对应的超分辨率图像。本专利关注的重点为单帧超分辨率算法。单帧超分辨率算法中,尤其数基于深度学习的方法效果最好(明显优于传统方法)。
3)计算机中央处理器(CPU,Central Processing Unit),计算机系统的运算和控制核心,是信息处理、程序运行的最终执行单元,可以用于各种计算场景。
4)图形处理器(GPU,Graphics Processing Unit),又称显示核心、视觉处理器、显示芯片,是一种专门在个人电脑、工作站、游戏机和一些移动设备(如平板电脑、智能手机等)上做图像和图形相关运算工作的微处理器。GPU的计算能力强,往往能够远超CPU,因此广泛地用于深度学习的模型推理中。由于GPU资源属于稀缺资源,因此在部署的时候有延后性。
5)深度学习(DL,Deep Learning),即使用神经网络的机器学习。
6)模型转换算法,即转换模型类型的算法,例如将深度学习网络转化成决策树模型,或者子空间模型等。通过模型转换算法可以把复杂模型转换为简单模型,大大提高其计算速度(缺点是可能导致精度下降)。
7)卷积核,图像处理时,给定输入图像,输入图像中一个小区域中像素加权平均后成为输出图像中的每个对应像素,其中权值由一个函数定义,这个函数称为卷积核。
8)目标函数,又称为损失函数(Loss Function)或代价函数(cost function)是将随机事件或其有关随机变量的取值映射为非负实数以表示该随机事件的“风险”或“损失”的函数。在应用中,目标函数通常作为学习准则与优化问题相联系,即通过最小化目标函数求解和评估模型。例如在统计学和机器学习中被用于模型的参数估计,是机器学习模型的优化目标。
9)色域,又称之为色彩空间,代表的是一个色彩影像所能显示的颜色范围,目前常见的色域包括亮度色度(Luminance Chrominance,YUV)色域、红绿蓝(RGB,Red Green Blue)色域、青洋红黄黑(Cyan Magenta Yellow Black,CMYK色域等等。
为了更好地理解本申请实施例中提供的图像处理方法,首先对相关技术中用于提高分辨率的图像处理方法及存在的确定进行说明。
相关技术中,用于提高分辨率的图像处理方法至少包括以下两种:
第一、快速精准图像超分辨率方法(RAISR,Rapid and Accurate Super Image Resolution)。
RAISR是一种基于索引滤波器的超分辨率方法。简单的来说,RAISR在推理时, 就是按照如下步骤处理的:
步骤S001,先把图像放大到目标尺寸;
步骤S002,在放大的图像上计算每个像素的梯度特征;
步骤S003,每个像素通过梯度特征索引其要使用的滤波器(卷积核);
步骤S004,每个像素与其索引的滤波器卷积,得到超分后的像素。
在实现时,RAISR使用了基于梯度计算出的3个特征,通过将每个特征分割成不同的段落,把特征空间分割成许多小块。每个特征子空间(小块)里,可以直接用最小二乘的方法拟合目标值,得到卷积核参数。这样,就可以人为构造高分辨率-低分辨率上采样的图像对,再按照上采样图像中像素的梯度特征,将每个图像块分配到不同的子空间里。在每个子空间里用最小二乘做图像块到目标像素(高分辨率像素)的拟合,做到模型的训练。
在相关技术中,还有基于RAISR的优化版本,其不使用上采样的方法,而是使用深度学习的upscale,即在最小二乘的时候拟合M 2个卷积核(M为放大倍数),将一个图像转化为M 2个通道,然后通过像素重排序(pixelshuffle)的方法重新组合成一张大图,这样卷积核的感受也更大,效果更好。
RAISR类方法相较于深度学习方法,效果略微下降,但是计算速度能大大提升(RAISR论文中相较深度学习超分,速度是后者的100倍以上)。
第二、基于生成对抗网络的图像超分辨率方法(SRGAN,Super Resolution Generative Adversarial Network)。
SRGAN是基于生成对抗网络的超分辨率技术。总的来说,就是利用了生成对抗网络的特性,同时训练两个网络,一个用于构造更为真实的高分辨率图像生成网络,一个用于判断输入的高分辨率图像是否是算法构造的判别网络,两个网络使用两个目标函数训练。通过不断交替训练这两个网络,让这两个网络性能越来越强。最后,将生成网络取出来,在推理时使用。另外,在生成网络的目标函数中,还会加入内容损失:计算超分后图像与真实高分辨率图像,在特征层面的距离损失(特征是通过VGG等物体识别网络前N层组成的网络提取出来的)。通过内容损失,训练的生成网络就能让构造的超分辨率图像有更多随机细节,让画面更好。
RAISR这类算法的缺点是:需要在不同的子空间下分别训练滤波器(卷积核),因此如果加入深度学习中的各种特殊损失(例如内容损失),就很难保证相邻像素的连续性,导致噪点。
而SRGAN类算法的缺点是:需要保证网络足够深,因此网络结构尝尝会很复杂,很难像RAISR一样能够实时运行。
基于此,本申请实施例提出一种图像处理深度学习方案与配套的模型加速(模型转换)相结合的方法,在训练时使用神经网络结构,保证使用各种特殊损失时输出结果像素连贯,且不引入额外噪点;并且通过模型转换的方法,将模型简化成轻量级模型(例如子空间模型或者决策树)使得其能够实时运行。
下面说明本申请实施例提供的图像处理设备的示例性应用,本申请实施例提供的图像处理设备可以实施为笔记本电脑,平板电脑,台式计算机,移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备)、智能电视、智能机器人等任意具有屏幕显示功能的终端,也可以实施为服务器。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。
参见图1A,图1A为本申请实施例提供的图像处理系统的一种网络架构示意图。如 图1A所示,图像处理系统中包括第一终端100、服务器200和网络300。为实现支撑一个示例性应用,第一终端100通过网络300连接到服务器200,第一终端100可以是智能终端,在智能终端上可以安装有各种各样的应用程序(Application,App),例如可以是观看视频App、即时通讯App、购物App、图像采集App等,网络300可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。
用户在通过第一终端100观看视频,或者在网页查看图片时,第一终端100可以从服务器200请求获取视频或图片(在本实施例中,以获取图片101为例进行说明)。本申请实施例提供的图像处理方法可以作为一个功能插件集成在终端的图库App中,如果第一终端100启动了该图像处理功能,那么第一终端100可以利用本申请实施例提供的图像处理方法,对从服务器200获取到的图片101进行实时处理,得到处理后的图片102,并呈现于第一终端100的显示界面中。在图1A中以对图像进行超分处理为例进行说明,对比图1A中的101和102可以看出,处理后的图片102的分辨率更高,从而能够在码率码流不变的情况下,提高用户的画质体验。
参见图1B,图1B为本申请实施例提供的图像处理系统的另一种网络架构示意图。如图1B所示,图像处理系统中包括第一终端400、第二终端700、服务器500和网络600。为实现支撑一个示例性应用,第一终端400通过网络600连接到服务器500,第一终端400可以是智能终端,在智能终端上可以安装有各种各样的应用程序App,例如可以是观看视频App、即时通讯App、购物App、图像采集App等,网络600可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。
第二终端700也可以是诸如笔记本电脑,平板电脑,台式计算机,移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备)、智能电视、智能机器人等任意具有屏幕显示功能的终端。第二终端700可以将图片或者视频文件上传至服务器500,服务器500在接收到第二终端700上传的图片或者视频后,可以将该图片或者视频依据本申请实施例提供的图像处理方法进行处理,并得到处理后的图片或视频,第一终端400在向服务器500请求该图片或视频时,服务器500可以向第一终端400返回处理后的图片或视频,第一终端400在自身的显示界面中显示处理后的图片或视频,以提高用户的画质体验。在图1B中,以对图像进行去噪进行示例性说明,图1B中的图像201为原始图像,图1B中的图像202为处理后的图像,对比图像201和图像202可以看出,处理后的图像几乎没有噪声点,从而提高用户的画质体验。
参见图2,图2为本申请实施例提供的第一终端100的结构示意图,图2所示的第一终端100包括:至少一个处理器110、存储器150、至少一个网络接口120和用户接口130。第一终端100中的各个组件通过总线系统140耦合在一起。可理解,总线系统140用于实现这些组件之间的连接通信。总线系统140除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统140。
处理器110可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口130包括使得能够呈现媒体内容的一个或多个输出装置131,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口130还包括一个或多个输入装置132,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器150可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存 储器,硬盘驱动器,光盘驱动器等。存储器150可以包括在物理位置上远离处理器110的一个或多个存储设备。
存储器150包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器150旨在包括任意适合类型的存储器。
在一些实施例中,存储器150能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统151,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块152,用于经由一个或多个(有线或无线)网络接口120到达其他计算设备,示例性的网络接口120包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(USB,Universal Serial Bus)等;
输入处理模块153,用于对一个或多个来自一个或多个输入装置132之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的装置可以采用软件方式实现,图2示出了存储在存储器150中的一种图像处理装置154,该图像处理装置154可以是第一终端100中的图像处理装置,其可以是程序和插件等形式的软件,包括以下软件模块:第一获取模块1541、第一提取模块1542、第一处理模块1543和输出模块1544,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或拆分。将在下文中说明各个模块的功能。
在另一些实施例中,本申请实施例提供的装置可以采用硬件方式实现,作为示例,本申请实施例提供的装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的图像处理方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
为了更好地理解本申请实施例提供的方法,首先对人工智能、人工智能的各个分支,以及本申请实施例提供的方法所涉及的应用领域、云技术和人工智能云服务进行说明。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。以下对各个方向分别进行说明。
计算机视觉技术(CV,Computer Vision)计算机视觉是一门研究如何使机器“看”的科学,或者说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为 一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
机器学习(ML,Machine Learning)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习等技术。
云技术(Cloud technology)是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。云技术基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。
所谓人工智能云服务,一般也被称作是AI即服务(AIaaS,AI as a Service),是目前主流的一种人工智能平台的服务方式,AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务。这种服务模式类似于开了一个AI主题商城:所有的开发者都可以通过API接口的方式来接入使用平台提供的一种或者是多种人工智能服务,部分资深的开发者还可以使用平台提供的AI框架和AI基础设施来部署和运维自己专属的云人工智能服务。
本申请实施例提供的方案涉及人工智能的计算机视觉技术、机器学习、人工智能云服务等技术,通过如下实施例进行说明。
下面将结合本申请实施例提供的第一终端100的示例性应用和实施,说明本申请实施例提供的图像处理方法,该方法由图像处理设备执行,该图像处理设备可以是图1A中所示的第一终端,还可以是图1B所示的服务器。参见图3,图3为本申请实施例提供的图像处理方法的一种实现流程示意图,将结合图3示出的步骤进行说明。
步骤S101,获取待处理图像。
这里,待处理图像可以是灰度图像,也可以是多通道彩色图。在一些实施例中,该待处理图像可以是将视频文件进行解码得到的视频帧图像。
当步骤S101为图1A中的第一终端实现时,待处理图像可以是从服务器获取到的。在一些实施例中,待处理图像还可以是利用第一终端采集到的图像。当步骤S101为图1B中的服务器实现时,待处理图像可以是由第二终端上传至服务器的。
在一些实施例中,在步骤S101获取到待处理图像之后,还可以执行:判断待处理图像是否为灰度图像,其中当待处理图像为灰度图像时,进入步骤S102;当待处理图像为彩色图像时,需要将待处理图像进行色域转换,再进行图像处理过程。
步骤S102,当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量,并确定各个像素点对应的邻域图像块。
这里,步骤S102在实现时,可以根据待处理图像中各个像素点的像素值确定出各个像素点的第一方向梯度值和第二方向梯度值,进而再根据各个像素点的第一方向梯度 值和第二方向梯度值确定出各个像素点的特征向量。
该邻域图像块可以是以各个像素点为中心,K*K的图像块,其中K为奇数,例如K可以是5、7、9、13等。
步骤S103,利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像。
其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的,在实际实现时,可以是基于训练好的神经网络模型,进行子空间划分或者生成决策树,从而得到轻量化模型,轻量化模型相对于神经网络模型更加简化,因此利用轻量化模型对各个像素点的特征向量和邻域图像块进行图像处理时,相较于神经网络模型能够提高计算效率,缩短图像处理时长,从而实现实时处理。
步骤S103在实现时,可以是基于各个像素点的特征向量,确定各个像素点对应的子空间,或者确定各个像素点对应的决策树中的叶子节点,进而再确定出该子空间或叶子节点对应的卷积核,将该卷积核和该邻域图像块进行卷积运算,得到各个像素点对应的处理后的像素值,并基于各个像素点处理后的像素值确定出目标图像。
步骤S104,输出该目标图像。
这里,当步骤S104为图1A所示的第一终端实现时,可以是在第一终端的显示设备中呈现该目标图像,当步骤S104为图1B所示的服务器实现时,可以是将目标图像发送至第一终端。在一些实施例中,当步骤S104由图1B所示的服务器实现时,在步骤S104之后,还可以执行:服务器将目标图像存储至本地存储空间。
在本申请实施例提供的图像处理方法中,在获取到待处理图像后,确定该待处理图像中各个像素点对应的邻域图像块;当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量;利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;由于训练时使用的是神经网络结构,因此能够保证使用各种特殊损失时输出像素连贯的目标图像,并且在进行图像处理时使用的是通过模型转换得到的轻量级模型(例如子空间模型,或者决策树),使得其能够实时运行输出目标图像,从而在保证处理效果的同时提高图像处理效率。
在一些实施例中,上述步骤S102“提取该待处理图像中各个像素点的特征向量”可以通过以下步骤实现:
步骤S1021,确定该待处理图像对应的第一方向梯度图和第二方向梯度图。
这里,第一方向可以是水平方向,第二方向可以是竖直方向,对应地步骤S1021在实现时,可以是对于待处理图像中的各个像素点,将各个像素点的右相邻像素点的像素值减去左相邻像素点的像素值,并将差值除以2,得到该像素点在第一方向上的梯度值,基于各个像素点在第一方向上的梯度值确定该待处理图像对应的第一方向梯度图;将各个像素点的下相邻像素点的像素值减去上相邻像素点的像素值,并将差值除以2,得到该像素点在第二方向上的梯度值,基于各个像素点在第一方向上的梯度值确定该待处理图像对应的第二方向梯度图。对于待处理图像中的边缘像素点,可以利用边缘对称翻转的方式,计算其梯度值,这样,待处理图像中上下边缘的像素点的竖直方向梯度值均为0,左右边缘的像素点的水平方向梯度值均为0。在一些实施例中,还可以选择不计算边缘像素点的梯度值,在得到最终结果后向边缘扩展即可。
步骤S1022,确定该待处理图像中各个像素点在第一方向梯度图中的第一梯度邻域块和在第二方向梯度图中的第二梯度邻域块。
这里,第一梯度邻域块与第二梯度邻域块的大小相同,并且都与各个像素点在待处理图像中的邻域图像块的大小相同。
步骤S1023,基于该各个像素点的第一梯度邻域块和第二梯度邻域块确定该各个像素点的特征向量。
这里,该步骤S1023可以通过以下步骤实现:
步骤S231,基于该各个像素点的第一梯度邻域块和第二梯度邻域块确定该各个像素点的协方矩阵。
这里,假设像素i第一梯度邻域块X与第二梯度邻域块Y均为5*5大小的图像块,也即第一梯度邻域块和第二梯度邻域块中均包括25个梯度值,其中,X={x 1,x 2,…x 25},Y={y 1,y 2,…y 25}那么像素点i的协方矩阵A可以利用公式(1-1)得出:
Figure PCTCN2021094049-appb-000001
由公式(1-1)可以看出,协方矩阵为2*2的对称矩阵。
步骤S232,确定各个协方矩阵对应的各个第一特征值和各个第二特征值。
这里,可以按照公式(1-2)和公式(1-3)计算协方矩阵A的第一特征值λ 1和第二特征值λ 2
Figure PCTCN2021094049-appb-000002
Figure PCTCN2021094049-appb-000003
其中,a=∑x ix i、b=∑x iy i、c=∑y iy i
步骤S233,确定该各个像素点的邻域图像块对应的各个方差值。
步骤S234,基于该各个第一特征值、各个第二特征值和各个方差值确定该各个像素点的特征向量。
这里,在本申请实施例中,各个像素点的特征向量可以是4维的,此时步骤S234在实现时,可以是第1维特征f 1=atan2(λ 12),第2维特征f 2=λ 1,第3维特征
Figure PCTCN2021094049-appb-000004
第4维特征f 4=v,其中,v为步骤S233确定出的方差值。
需要说明的是,在一些实施例中,还可以在确定出第一方向梯度图和第二方向梯度图后,直接将各个像素点的第一方向梯度值和第二方向梯度值作为各个像素点的特征向量。在一些实施例中,还可以利用其它的特征提取算法,提取待处理图像中各个像素点的特征向量。但是由于后续需要基于特征向量进行模型转换,因此不论使用哪种方式提取待处理图像的特征,所得到的特征向量的维度不能过大,以避免在进行模型转换后得到的轻量化模型的数量过多,进而造成计算复杂度过高。
在一些实施例中,在步骤S101之前,需要通过以下步骤来对预设的神经网络模型进行训练,以得到训练好的神经网络模型:
步骤S001,获取训练数据和预设的神经网络模型。
其中,该训练数据至少包括第一训练图像和第二训练图像,其中,该第二训练图像是对该第一训练图像进行下采样得到的,也就是说第二训练图像的分辨率是低于第一训练图像的分辨率的。在本申请实施例中,第一训练图像和第二训练图像均为灰度图像。在一些实施例中,训练数据还可以包括第二训练图像中各个像素点的特征向量。
该预设的神经网络模型可以是深度学习神经网络模型,该神经网络模型可以包括生成模型和判别模型。
步骤S002,利用该神经网络模型对该第二训练图像进行处理,得到预测图像。
这里,当训练数据中包括第二训练图像中各个像素点的特征向量时,步骤S002在实现时,可以是将第二训练图像中的各个像素点的特征向量输入该神经网络模型,得到预测图像;当训练数据中仅包括第一训练图像和第二训练图像时,步骤S002在实现时,可以是将第二训练图像输入该神经网络模型,得到预测图像。
步骤S003,基于该预测图像、该第一训练图像和预设的目标函数对该神经网络模型进行反向传播训练,得到训练好的神经网络模型。
这里,该预设的目标函数包括生成目标函数和判别目标函数,对应地,该步骤S003可以通过以下步骤实现:
步骤S31,固定该判别模型的判别参数,基于该预测图像、该第一训练图像和生成目标函数对该生成模型进行反向传播训练,以对该生成模型的生成参数进行调整。
步骤S32,固定生成判别模型的生成参数,基于该预测图像、该第一训练图像和判别目标函数对该判别模型进行反向传播训练,以对该判别模型的判别参数进行调整,直至达到预设的训练完成条件,得到训练好的神经网络模型。
这里,在本申请实施例中,预设的训练完成条件可以是训练次数达到预设的次数阈值,还可以是预测图像和第一训练图像之间的差异值低于预设的差异阈值。
在一些实施例中,可以通过以下步骤构造生成目标函数:
步骤S41a,确定该预测图像和该第一训练图像之间的像素级误差值和内容误差值。
这里,在确定预测图像和第一训练图像之间的像素级误差值时,可以首先确定预测图像和第一训练图像中对应的各个像素点之间的误差值,进而再利用各个像素点之间的误差值确定预测图像和第一训练图像之间的像素级误差值,其中,该像素级误差值可以是根据各个像素点之间的误差值计算的平均误差,还可以是根据各个像素点之间的误差值计算的均方误差(MSE,Mean Square Error)、绝对误差等。
在确定预测图像和第一训练图像之间的内容误差值时,可以分别将预测图像和第一训练图像输入至内容特征模块,并对应得到预测内容特征向量和训练内容特征向量,其中,内容特征模块为预先训练好的模块,一般使用VGG19的前多层构成(建议使用前17层),进而基于预测内容特征向量和训练内容特征向量计算内容误差值,其中,该内容误差值可以是预测内容特征向量和训练内容特征向量之间的平均误差,还可以是两者之间的均方误差、绝对误差等形式。
步骤S42a,基于该预测图像和该判别模型确定该预测图像的第一像素判别误差值和第一全局判别误差值。
这里,步骤S42a在实现时,可以首先将预测图像输入到判别模型,得到预测像素判别矩阵和预测全局判别值,其中,预测像素判别矩阵的大小与预测图像的大小一致,并且预测像素判别矩阵中的每个元素表示对应位置的预测图像的像素点是生成器构造的概率,预测全局判别值是一个数值,表示预测图像是生成器构造的概率(该数值为0-1之间的实数);进而再基于预测像素判别矩阵与否值(也即0)确定第一像素判别误差值,并基于预测全局判别值与否值确定第一全局判别误差值。其中,第一像素判别误差值可以是计算预测像素判别矩阵与否值之间的平均误差得到的,还可以是计算两者之间的均方误差得到的;类似地,第一全局判别误差值可以是计算预测全局判别值和否值之间的平均误差得到的,还可以是计算两者之间的均方误差得到的。
步骤S43a,基于预设的生成权重值、该像素级误差值、该内容误差值、该第一像素判别误差值和该第一全局判别误差值确定生成目标函数。
这里,预设的生成权重值中包括像素级误差值对应的第一权重值、该内容误差值对应的第二权重值、该第一像素判别误差值对应的第三权重值和该第一全局判别误差值对应的第四权重值,步骤S43a在实现时,将像素级误差值、该内容误差值、该第一像素判别误差值、该第一全局判别误差值与对应的权重值进行加权求和,得到生成目标函数。
在一些实施例中,可以通过以下步骤构造判别目标函数:
步骤S41b,基于该预测图像和该判别模型确定该预测图像的第二像素判别误差值和第二全局判别误差值。
这里,步骤S41b在实现时,首先将预测图像输入到判别模型,得到预测像素判别矩阵和预测全局判别值;进而再基于预测像素判别矩阵与是值(也即1)确定第二像素判别误差值,并基于预测全局判别值与是值确定第二全局判别误差值。其中,第二像素判别误差值可以是计算预测像素判别矩阵与是值之间的平均误差得到的,还可以是计算两者之间的均方误差得到的;类似地,第二全局判别误差值可以是计算预测全局判别值和是值之间的平均误差得到的,还可以是计算两者之间的均方误差得到的。
步骤S42b,基于该第一训练图像和该判别模型确定该第一训练图像的第三像素判别误差值和第三全局判别误差值。
这里,步骤S42b在实现时,首先将第一训练图像输入到判别模型,得到训练像素判别矩阵和训练全局判别值;进而再基于训练像素判别矩阵与否值(也即0)确定第三像素判别误差值,并基于训练全局判别值与否值确定第三全局判别误差值。其中,第三像素判别误差值可以是计算训练像素判别矩阵与否值之间的平均误差得到的,还可以是计算两者之间的均方误差得到的;类似地,第三全局判别误差值可以是计算训练全局判别值和否值之间的平均误差得到的,还可以是计算两者之间的均方误差得到的。
步骤S43b,基于预设的判别权重值、该第二像素判别误差值、该第二全局判别误差值、该第三像素判别误差值和该第三全局判别误差值确定判别目标函数。
这里,预设的判别权重值中包括第二像素判别误差值对应的第五权重值、该第二全局判别误差值对应的第六权重值、该第三像素判别误差值对应的第七权重值和该第三全局判别误差值对应的第八权重值,步骤S43b在实现时,可以是将第二像素判别误差值、该第二全局判别误差值、该第三像素判别误差值、该第三全局判别误差和对应的权重值进行加权求和,得到判别目标函数。
在一些实施例中,可以通过如图4所示的步骤S51a至步骤S54a得到轻量化模型:
步骤S51a,基于该待处理图像中各个像素点对应的特征向量,确定特征空间。
这里,该特征空间可以是基于各个像素点中对应的特征向量中每个维度的最大值和最小值确定的。
步骤S52a,将该特征空间按照预设的划分规则,划分为N个特征子空间,并分别确定N个特征子空间对应的N个中心坐标。
这里,步骤S52a在实现时,可以是将特征向量的各个维度进行划分,例如特征向量有4个维度,每个维度平均划分为8份,那么得到8*8*8*8=4096个特征子空间,并基于每个特征子空间中各个维度的最大值和最小值确定对应的中心坐标。在实现时,可以是将每个特征子空间中各个维度的最大值与最小值的中值确定为特征子空间对应的中心坐标。
步骤S53a,将N个中心坐标分别输入至该训练好的神经网络模型,对应得到N个特征子空间的N个卷积核。
步骤S54a,将N个特征子空间和N个卷积核确定为轻量化模型。
在上述的步骤S51a至步骤S54a中,由于对特征空间进行了划分,得到了范围更小的特征子空间,并确定出各个特征子空间对应的卷积核,因此得到了轻量化模型。
在一些实施例中,还可以通过以下步骤得到轻量化模型:
步骤S51b,基于该待处理图像中各个像素点对应的特征向量,构建决策树。
这里,步骤S51b在实现时,可以是首先将所有的特征向量看成是一个节点,然后从所有的特征向量中挑选一个特征向量对所有的特征向量进行分割,生成若干孩子节点;对每一个孩子节点进行判断,如果满足停止分裂的条件,设置该节点是叶子节点;否则,在从孩子节点中挑选出一个特征向量对该孩子节点中的所有特征向量进行分割,直至达到停止分裂的条件,得到决策树。
步骤S52b,将决策树中各个叶子节点分别输入至该训练好的神经网络模型,对应得到各个叶子节点对应的卷积核。
这里,将各个叶子节点输入至训练好的神经网络模型,也即是将作为叶子节点的特征向量输入至训练好的神经网络模型,得到各个叶子节点对应的卷积核。
步骤S53b,将该各个叶子节点和对应的卷积核确定为该轻量化模型。
在上述的步骤S51b至步骤S53b中,基于各个像素点的特征向量构建了决策树,并确定出决策树中每个叶子节点对应的卷积核,因此得到了轻量化模型。
在基于上述步骤S51a至步骤S54a得到轻量化模型之后,或者基于上述步骤S51b至步骤S53b得到轻量化模型之后,上述步骤S103“利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像”可以通过以下步骤实现:
步骤S1031,基于该各个像素点的特征向量和该轻量化模型,确定各个像素点对应卷积核。
这里,当该轻量化模型是对特征空间进行划分得到特征子空间而得到的,那么步骤S1031在实现时,可以基于某一像素点i的特征向量确定该特征向量落入到了轻量化模型中的哪个特征子空间,进而再获取该特征子空间对应的卷积核。在本申请实施例中,进行不同的图像处理时,得到的卷积核的通道数是不同的,例如,如果是进行超分处理,并且超分倍数为P,P为大于1的整数(例如可以是2),处理前的原始图像大小为W*D(例如为1280*720),处理后的图像大小为W*P*D*P(例如处理后的图像大小为1280*2*720*2,也即为2560*1440),那么此时得到的卷积核的通道数是P*P(也即为4);如果是进行去噪处理,由于处理前的原始图像与处理后的图像的大小是一致的,那么此时得到的卷积核的通道数为1。
这里,当该轻量化模型是对通过构建决策树而得到的,那么步骤S1031在实现时,可以是将各个像素点的特征向量与决策树中的各个节点进行比较,最终得到各个像素点对应的目标叶子节点,并获取目标叶子节点对应的卷积核。
步骤S1032,将该各个像素点的邻域图像块和对应的各个卷积核进行卷积计算,得到处理后的像素值。
这里,一个像素值在经过卷积计算后得到的处理后的像素值的个数卷积核的通道数是相关的,例如,卷积核的通道数为1,那么得到的处理后的像素值个数也为1;而卷积核的通道数为P*P,那么得到的处理后的像素值个数为P*P。
步骤S1033,基于处理后的像素值,确定处理后的目标图像。
这里,当处理后的像素值个数为1时,那么直接基于处理后的像素值即得到处理后的目标图像;当处理后的像素值个数为P*P时,需要将处理后的像素值进行拼接重排序,从而得到处理后的目标图像。
由于在步骤S1031至步骤S1033所在的实施例中,是利用轻量化模型来确定各个像素点对应的卷积核,相比于轻量化处理之前的神经网络模型对应的卷积核来说维度被降低,因此在进行卷积计算时,能够降低计算量,从而提高处理效率,实现实时处理。
基于前述的实施例,本申请实施例再提供一种图像处理方法,图5为本申请实施例提供的图像处理方法的再一种实现流程示意图,应用于图1A所示的网络架构,如图5所示,该方法包括:
步骤S201,第一终端接收观看视频的操作指令。
这里,该操作指令可以是由用户观看视频App的视频观看入口做出的点击或触控操作而触发的。
步骤S202,第一终端基于该操作指令向服务器发送观看视频的请求消息。
这里,该请求消息中携带有目标视频标识。
步骤S203,服务器基于该请求消息获取目标视频文件。
这里,服务器在接收到该请求消息后,解析该请求消息,获取目标视频标识,并基于目标视频标识获取目标视频文件。
步骤S204,服务器基于该目标视频文件向第一终端返回视频数据流。
步骤S205,第一终端对接收到的视频数据流进行解码得到待处理图像。
这里,步骤S205在实现时,第一终端对接收到的视频数据流进行解码,得到各个视频图像帧,并将各个视频图像帧确定为待处理图像。
步骤S206,第一终端判断该待处理图像是否为灰度图像。
这里,当待处理图像为灰度图像时,进入步骤S207;当待处理图像为彩色图像时,进入步骤S209。在本申请实施例中,待处理图像为彩色图像时,可以是RGB彩色图像,还可以是sRGB彩色图像、CMYK彩色图像等等。
步骤S207,第一终端提取该待处理图像中各个像素点的特征向量,并确定各个像素点对应的邻域图像块。
步骤S208,第一终端利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像。
其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的,在实际实现时,可以是基于训练好的神经网络模型,进行子空间划分或者生成决策树,从而得到轻量化模型。
本申请实施例中的步骤S207和步骤S208的实现过程与其他实施例中的步骤S102和步骤S103的实现过程类似,可参考步骤S102和步骤S103的实现过程。
步骤S209,第一终端将待处理图像转换至亮度色度(YUV)色域,得到亮度Y通道待处理图像和色度UV通道待处理图像。
这里,步骤S209在实现时可以是根据预设的转换函数将待处理的彩色图像转换至YUV色域,以得到Y通道待处理图像和UV通道待处理图像。由于YUV图像中的Y通道信息即可足以显示图像的灰度,也即此时Y通道待处理图像为单通道灰度图像。
步骤S210,第一终端提取Y通道待处理图像中各个Y通道像素点的特征向量,并确定各个Y通道像素点对应的邻域图像块。
这里,步骤S210的实现过程与上述步骤S102的实现过程类似,在实际实现时可以参考步骤S102的实现过程。
步骤S211,第一终端利用该轻量化模型对各个Y通道像素点的特征向量和邻域图像块进行处理,得到处理后的Y通道目标图像。
在本申请实施例中,将待处理的彩色图像转换至YUV色域后,利用轻量化模型仅对Y通道待处理图像进行图像处理,从而得到处理后的Y通道目标图像。步骤S211的实现过程与上述步骤S103的实现过程类似,在实际实现时,可以参考步骤S103的实现过程。
步骤S212,第一终端利用预设的图像处理算法对UV通道待处理图像进行处理,得到UV通道目标图像。
这里,对于不同的图像处理目的,预设的图像处理算法是不同的,例如,当图像处理目的为提高图像分辨率时,预设的图像处理算法可以是图像插值算法,例如,可以是双立方插值算法;当图像处理的目的是去除图像噪声时,预设的图像处理算法可以是滤波算法,例如可以是空间域滤波算法、变换域滤波算法等。
步骤S213,第一终端基于Y通道目标图像和UV通道目标图像确定目标图像,其中,该目标图像与待处理图像的色域相同。
这里,在利用预设的图像处理算法对UV通道待处理图像进行处理得到UV通道目标图像之后,在步骤S213中,将步骤S211得到的Y通道目标图像和UV通道目标图像进行色域转换,得到与待处理图像色域相同的目标图像。
步骤S214,第一终端输出目标图像。
这里,步骤S214在实现时,可以是在第一终端的显示界面中呈现该目标图像。
在本申请实施例提供的图像处理方法中,第一终端在从服务器获取到视频数据流后,对视频数据流进行解码,得到待处理图像,在待处理图像为灰度图像时,直接利用轻量化模型对待处理图像进行处理,得到目标图像;在待处理图像是彩色图像时,将待处理图像转换至YUV色域,并利用轻量化模型对Y通道待处理图像进行处理,得到Y通道目标图像,利用预设的图像处理算法对UV通道待处理图像进行处理得到UV通道目标图像,进而将Y通道目标图像和UV通道目标图像再转换至与待处理图像相同的色域,得到目标图像,并输出目标图像,如此能够提高图像处理速度,实现实时运行(不同模型转换后加速比例不同,理论上可达100倍以上),本申请实施例提供的图像处理方法可以用于超分处理、去噪处理、图像增强处理等方面,适用范围广。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。本申请实施例提供的图像处理方法可以用于多种图像处理的应用(比如图像超分辨率、去噪、增强等)中,在本申请实施例中以图像、视频超分辨率的应用为例进行说明。
参照图6,图6为本申请实施例提供的图像处理方法的实现流程示意图,该方法应用于图像处理设备,其中该图像处理设备可以是图1A所示的第一终端,还可以是图1B所示的服务器。如图6所示,该方法包括:
步骤S601,图像处理设备进行训练数据集构造。
这里,步骤S601在实现时,首先对高分辨率图像通过降采样,构造低分辨率图像,然后再使用特征提取算法提取低分辨率图像中每个像素的特征,得到特征图,最后使用每组<高分辨率图像、低分辨率图像、特征图>构造训练数据集。
步骤S602,图像处理设备进行深度学习模型的训练。
这里,步骤S602在实现时,基于训练数据集、训练算法与损失函数训练深度学习模型。
步骤S603,图像处理设备进行模型转换。
这里,在实现时,使用模型转换算法,将训练好的深度学习模型简化为轻量级模型,例如子空间模型。
步骤S604,图像处理设备进行实时推理。
这里,在实现时,使用轻量化模型进行实时推理。首先特征提取算法提取待超分图像的特征,然后使用提取到的特征与待超分图像,用轻量化模型(如子空间模型)进行快速处理,得到超分图像。
以下结合附图对步骤S601至步骤S604进行说明。
首先对步骤S601“训练数据集构造”进行说明。参见图7A,图7A为本申请实施 例构造数据集的实现流程示意图,如图7A所示,该实现流程包括:
步骤S6011,获取高分辨率图。
这里,高分辨率图像的宽高必须是超分倍数N的整数倍,且必须是灰度图像。
步骤S6012,使用人工降采样算法,把高分辨率图像降低分辨率,得到低分辨率图像。
这里,使用人工降采样方法,将高分辨率图像缩放N倍,在本申请实施例中,降采样方法可是均值滤波,线性差值等各种方法。
步骤S6013,使用特征提取算法,提取低分辨率图像的特征,得到特征图。
步骤S6014,将高分辨率图像、低分辨率图像与特征图组成训练集。
在本申请实施例中,步骤S6013在实现时,可以使用梯度特征与方差作为低分辨率图像的特征,进而来构造特征图。在一些实施例中,可以是对每个像素计算与其对应的4维特征。之后按照原先像素的顺序排列成宽高与低分辨率图像相同,通道数为4的特征图。
图7B为本申请实施例提取低分辨率图像特征的实现流程示意图,如图7B所示,该流程包括:
步骤S31,图像处理设备计算低分辨率图像的第一方向梯度图dx。
这里,图像处理设备可以是图1A所示的第一终端,还可以是图1B所示的服务器。在实现时,在低分辨率图像上,对每个像素i,使用右边一个像素减去左边一个像素的值,其差除以2,得到该像素i在dx上对应的梯度值。
步骤S32,图像处理设备计算低分辨率图像6012的第二方向梯度图dy。
这里,在实现时,在低分辨率图像上,对每个像素i,使用下边一个像素减去上边一个像素的值,其差除以2,得到该像素i在dy上对应的梯度值。
步骤S33,对于低分辨率图像上的每一个像素i,图像处理设备进行如下处理,得到其对应的特征(在本申请实施例中得到的是4维特征):
步骤331,图像处理设备计算像素i在dx,dy上的对应位置的邻域图像块,分别记为x,y。
这里,x和y对应图7B中的dx块和dy块。
步骤332,将x、y视为向量,记x、y的长度为M,其内元素分别为x i(i=1,2,…,M),计算x,y的协方矩阵A,协方矩阵A的定义如公式(1-1)所示:
Figure PCTCN2021094049-appb-000005
其中,在公式(1-1)中,i=1,2…,M。
步骤333,计算协方矩阵A的特征值λ 1、λ 2
这里,分别按照公式(1-2)和公式(1-3)计算协方矩阵A的特征值λ 1和λ 2
Figure PCTCN2021094049-appb-000006
Figure PCTCN2021094049-appb-000007
其中,a=∑x ix i、b=∑x iy i、c=∑y iy i
步骤334,在低分辨率图像上,取出像素i的邻域图像块,并计算该邻域图像块的方差v。
步骤335,计算像素i的4维特征。
这里,第1维特征f 1=atan2(λ 1,λ 2),第2维特征f 2=λ 1,第3维特征
Figure PCTCN2021094049-appb-000008
第4维特征f4=v。
按照上述的步骤S31至步骤S33,计算出低分辨率图像上每个像素的特征,从而构造出特征图。
然后,对步骤S602“深度学习模型及其训练”进行说明。参见图8A,图8A为本申请实施例深度学习模型及其训练的实现流程示意图,如图8A所示,该流程包括:
步骤S6021,构造生成器(超分模型)。
步骤S6022,构造判别器(判别模型)。
步骤S6023,构造生成目标函数。
步骤S6024,构造判别目标函数。
步骤S6025,使用两个目标函数训练超分模型和判别模型。
在本申请实施例中,可用的超分网络结构与网络的使用方法如图8B所示(其中网络结构不限于此),其中,可用的超分网络结构如图8B中的811所示。
在本申请实施例中,深度超分网络是一个深度神经网络,如图8B所示,包括全连接层0 8111、重构(Reshape)层1 8112、残差模块1至Z 8113、全连接层2Z+1 8114,Reshape层2 8115。其中,残差模块i 8113如图8B所示,又包括全连接层i_1 1131、全连接层i_2 1132和加法层1133。
将低分辨率图像的特征图输入到深度神经网络中,输出当前图像块超分所使用的卷积核。
假设低分辨率图像的尺寸为P乘以P,超分辨率的倍数为N,则建议的网络参数如下表1所示:
表1、本申请实施例提供的神经网络的网络参数
Figure PCTCN2021094049-appb-000009
其中,Z的建议值为10,表中“-”表示批处理维度。
如图8B所示,该深度超分网络的使用流程如下:
步骤S801,从数据集中取出像素i对应的低分辨率图像块R i、4维特征F i
步骤S802,将特征F i输入到深度超分网络中,得到图像块R i使用的超分用卷积核i。
步骤S803,将图像块R i与卷积核i做卷积运算,得到超分后N 2个像素,记为向量I i
步骤S804,当计算出所有像素的超分后值I i之后,将其拼接重排序(既像素重排序,PixelShuffle),得到超分辨率图像S。
这里,假设低分辨率图像的宽高分别为W、H,那么直接组成的超分后像素得到图像S为三维矩阵,三个维度分别为W,H,N 2,优先级依次升高,其中N为超分辨率倍 数。
例如,W为640,H为360,N为2,超分后得到的图像S的三个维度分别为640、360和4。
步骤S804在实现时,首先将S重塑为4维矩阵,维度分别为W,H,N,N(例如分别为640,360,2,2),然后交换S第2维与第3维,再将S重塑为2维矩阵,维度分别为WN(640*2=1280),HN(360*2=720),重塑后的S既为超分辨率图像S。
在本申请实施例中,超分网络输出的卷积核是N 2个通道的卷积核。
需要说明的是,超分网络使用上述输入特征,能够保证后续的模型转换步骤可以有效运行(因为使用的特征维数不多,只有4维)。
参见图8C,图8C为本申请实施例提供的一种判别器的网络结构示意图,如图8C所示,该网络模型包括卷积层1 821、卷积层2 822、卷积层3 823、全连接层1 824和卷积层4 825。图8C所示的判别网络模型的网络结构参数见下表2:
表2、本申请实施例提供的判别网络模型的参考参数
Figure PCTCN2021094049-appb-000010
如图8C所示,将一个图像输826入到判别网络中后,判别网络会有两个输出:全局判别输出827和像素判别输出828,其中:
全局判别输出827,用于判别输入的图像是否为超分网络构造的图像,输出是一个数值,表示输入的图像是生成器构造的概率(0-1之间,0表示不是,1表示是)。
像素判别输出828,用于判别输入的图像是否为超分网络构造的图像,输出是与输入图像宽高一样的矩阵,每个元素表示对应位置的输入图像像素是生成器构造的概率(0-1之间,0表示不是,1表示是)。
在本申请实施例中,可以如图8D所示,构造生成目标函数:
步骤S231,计算像素级误差。
这里,步骤S231在实现时,计算高分辨率图像与超分后图像之间各个像素点的平均误差,误差可以是最小平方误差(MSE),绝对误差等各种形式。
步骤S232,计算内容误差。
在本申请实施例中,步骤S232可以通过以下步骤实现:
步骤S2321,将高分辨率图像输入到内容特征模块中,得到高分内容特征。
这里,内容特征模块为预先训练好的模块,一般使用VGG19的前多层构成(建议使用前17层);使用其他网络,或不同的前多层都可以。
步骤S2322,将超分后图像输入到内容特征模块中,得到超分内容特征。
步骤S2323,计算高分内容特征与超分内容特征的平均误差,也即内容误差,误差可以是最小平方误差(MSE),绝对误差等各种形式。
步骤S233,计算像素判别误差与全局判别误差。
这里,步骤S233可以通过以下步骤实现:
步骤S2331,将超分后图像输入到判别网络中,得到超分像素判别与超分全局判别;
步骤S2332,计算超分像素判别与否值(0)的平均误差,也即像素判别误差(希 望生成器能够骗过判别网络,让判别网络认为输入的图像的像素不是超分出来的)。
在本申请实施例中,像素判别误差可以是二元交叉熵等各种形式。
步骤S2333,计算超分全局判别与否值(0)的平均误差,也即全局判别误差(希望生成器能够骗过判别网络,让判别网络认为输入的图像的从整体上看不是超分出来的)。
在本申请实施例中,全局判别误差可以是二元交叉熵等各种形式。
步骤S234,将4个误差加权求和得到生成目标函数。
在本申请实施例中,建议的权值为:像素判别误差权重7e-4,全局判别误差权重3e-4,内容误差权重2e-6,像素级误差权重1.0。
在本申请实施例中,判别目标函数的构造方法如图8E所示:
步骤S241,计算超分辨率图像的超分全局误差和超分像素误差。
这里,步骤S241可以通过以下步骤实现:
步骤S2411,将超分辨率图像输入到判别网络中,得到超分全局判断与超分像素判断。
步骤S2412,计算超分像素判断与是值(1)的平均误差,也即超分像素误差(希望判别网络能识别出输入的超分后图像的每个像素是生成器超分模块构造的)。
在本申请实施例中,超分像素误差可以是二元交叉熵等各种形式。
步骤S2413,计算超分全局判别与是值(1)的平均误差,也即超分全局误差(希望判别网络能识别出输入的超分后图像整体上是生成器超分模块构造的)。
在本申请实施例中,超分全局误差可以是二元交叉熵等各种形式。
步骤S242,计算高分辨率图像的高分全局误差和高分像素误差。
这里,步骤S242可以通过以下步骤实现:
步骤S2421,将高分辨率图像输入到判别网络中,得到高分全局判断与高分像素判断。
步骤S2422,计算高分像素判断与否值(0)的平均误差,也即高分像素误差(希望判别网络能识别输入的高分辨率图像的每个像素不是为生成器超分模块构造的)。
在本申请实施例中,高分像素误差可以是二元交叉熵等各种形式。
步骤S2423,计算高分全局判别与否值(0)的平均误差,也即高分全局误差(希望判别网络识别输入的高分辨率图像整体上不是生成器超分模块构造的),
在本申请实施例中,高分全局误差可以是二元交叉熵等各种形式。
步骤S243,将4个误差加权求和,得到判别损失函数。
在本申请实施例中,建议的权重分别为:超分全局误差的权重为0.25,超分像素误差的权重为0.25,高分全局误差的权重为0.25,高分像素误差的权重为0.25。
在构造好生成模型、判别模型以及生成损失函数和判别损失函数之后,那么需要以及生成损失函数与判别损失函数对生成模型和判别模型进行训练。参见图8F,图8F为本申请实施例提供的模型训练实现流程示意图,如图8F所示,该流程包括:
步骤S841,图像处理设备初始化训练参数以及初始化模型参数。
这里,初始化迭代次数为1,初始化判别网络、生成网络的参数结构。
步骤S842,图像处理设备判断迭代次数是否小于T。
这里,T为预设的迭代次数阈值,例如可以是10000次。
这里,当迭代次数小于T时,进入步骤S843;当迭代次数大于或者等于T时,结束流程。
步骤S843,图像处理设备固定判别器的参数,使用最优化算法,使用训练集里的数据与生成损失函数,训练(迭代)一次生成器参数。
步骤S844,图像处理设备固定生成器的参数,使用最优化算法,使用训练集里的数据与判别损失函数,训练(迭代)一次判别器参数。
步骤S845,迭代次数+1,再次进入步骤S842。
经过上述的步骤S841至步骤S845,能够得到训练好的生成器参数与判别器参数,其中,生成器参数即为深度超分网络的参数。
以下对步骤S603“模型转换”进行说明。模型转换的核心思想,就是对深度学习模型进行近似采样,将其转化为简单轻量化的模型。下面以将深度超分网络模型转换为子空间模型的方法举例。用一句话来描述,就是把输入的特征空间进行划分,得到各个子空间,并把每个子空间的所有深度学习输出值,近似成当前空间中心点对应的深度学习模型的输出值。
参见图9,图9为本申请实施例模型转换的实现流程示意图,如图9所示,该流程包括:
步骤S6031,图像处理设备对特征空间进行离散化。
这里,步骤S6031在实现时将特征空间(前述4维的特征空间)的每个维度进行分段,其中:特征1建议从[0-2π],均匀分为N 1段(建议值为16);特征2建议按照数据的最大最小值,均匀分为N 2段(建议值为8);特征3建议按照数据的最大最小值,均匀分为N 3段(建议值为8);特征4建议按照0到数据的最大,均匀分为N 4段(建议值为8)。按照上述分段,将特征空间分割成N 1*N 2*N 3*N 4(建议值即为8192)个子空间。
步骤S6032,对于每个子空间i,图像处理设备计算该子空间的中心,也即中心坐标i。
这里,步骤S6032在实现时,可以是分别对每个维度,计算其上下界的中值,得到该子空间的中心坐标。
步骤S6033,图像处理设备将中心坐标i输入到深度超分网络中,得到卷积核i。
步骤S6034,图像处理设备将每个子空间及其对应卷积核组成转换后的子空间模型。
需要说明的是,在模型转换的步骤中,除了可以转换为子空间模型,在一些实施例中也可以将深度学习模型转化为其他轻量化模型,例如决策树等。对于这类模型转换来说,可以用深度学习模型构造数据,来训练一个新的目标轻量化模型的方法达到。
以下对步骤S604“实时推理”进行说明。在实时推理的步骤中,我们会利用步骤S603中得到的轻量化模型(例如子空间模型)实现图像超分的实时推理。图10为本申请实施例中实时推理的实现流程示意图,如图10所示,该流程包括:
步骤S6041,图像处理设备计算待超分图像的特征图。
这里,计算方法与S6013相同,使用特征提取算法,提取待超分图像的特征图,其中,待超分图像为单通道图像。
步骤S6042,对于待超分图像上的每个像素i,在待超分图像上,图像处理设备获取像素i的低分辨率图像块R i
步骤S6043,图像处理设备获取像素i在特征图上的特征F i
步骤S6044,图像处理设备将特征F i输入到子空间模型中,查询其所属的子空间,得到该子空间对应的卷积核i。
步骤S6045,图像处理设备对低分辨率图像块R i与确定出的子空间对应的卷积核i进行卷积运算,得到像素i超分后的结果L i,也即得到超分后的N 2个超分像素。
步骤S6046,图像处理设备对所有的超分后像素L i(N 2通道,N为超分倍数)进行拼接重排序,得到超分后图像。
这里,步骤S6046的拼接重排序方法的实现方式可参考步骤S804的实现方式。
在本申请实施例中,由于图像往往是彩色的,因此对于彩色图像,按照如图11所 示的流程得到超分图像:
步骤S1101,图像处理设备将彩色图像从原先色域(例如,RGB色域)转到YUV色域,得到Y通道待超分图像和UV通道待超分图像。
步骤S1102,图像处理设备将Y通道待超分图像输入到实时超分模块,以进行实时超分,得到Y通道超分后图像。
步骤S1103,图像处理设备对UV通道待超分图像,使用传统图像插值方法进行超分处理,得到UV通道超分后图像。
在本申请实施例中,可以使用双立方插值对UV通道待超分图像进行超分处理,在一些实施例中,还可以使用其他的图像插值方法。
步骤S1104,图像处理设备把超分后的YUV图像转到原先色域,转换得到的图像即为超分后图像。
在一些实施例中,当需要对视频文件进行超分处理时,可以通过如图12所示的步骤实现:
步骤S1201,图像处理设备获取待待超分视频。
步骤S1202,图像处理设备对视频进行解码,得到各个待超分视频帧。
步骤S1203,图像处理设备对于每个待超分视频帧i,将其输入到实时超分模块中,进行超分处理,得到视频帧i的超分后图像。
这里,当待超分视频帧i为彩色图像帧,步骤S1203可以参照步骤S1101至步骤S1104实现。
步骤S1204,图像处理设备将各个视频帧i的超分后图像进行视频编码,得到超分后视频。
在本申请实施例提供的图像处理方法中,训练的时候可以使用各种深度学习中的目标函数,能让训练出的模型有着更好的画面效果,并且能够将深度超分模型转换为轻量化模型,从而能够大大提高其推理速度,实现实时运行(不同模型转换后加速比例不同,理论上可达100倍以上)并且除了超分辨率,本申请实施例提出的方法还能用到其他图像处理应用中,例如图像去噪、增强等等,适用范围更广。
下面继续说明本申请实施例提供的图像处理装置154实施为软件模块的示例性结构,图13为本申请实施例提供的图像处理装置的组成结构示意图,如图13所示,图像处理装置154包括:
第一获取模块1541,配置为获取待处理图像;
第一提取模块1542,配置为当该待处理图像为灰度图像时,提取该待处理图像中各个像素点的特征向量,并确定该待处理图像中各个像素点对应的邻域图像块;
第一处理模块1543,配置为利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,该轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;
输出模块1544,配置为输出该目标图像。
在一些实施例中,该图像处理装置还包括:
色域转换模块,配置为当该待处理图像为彩色图像时,将该待处理图像转换至YUV色域,得到Y通道待处理图像和UV通道待处理图像;
第二提取模块,配置为提取该Y通道待处理图像中各个Y通道像素点的特征向量,并确定该各个Y通道像素点对应的邻域图像块;
第二处理模块,配置为利用该轻量化模型对该各个Y通道像素点的特征向量和邻域图像块进行处理,得到处理后的Y通道目标图像;
第三处理模块,配置为利用预设的图像处理算法对该UV通道待处理图像进行处理, 得到UV通道目标图像;
第一确定模块,配置为基于该Y通道目标图像UV通道目标图像确定目标图像,其中,该目标图像与待处理图像的色域相同。
在一些实施例中,该第一获取模块,还配置为:
获取待处理视频文件;
对该视频文件进行解码,得到该视频文件中的各个视频帧图像;
将该各个视频帧图像确定为该待处理图像。
在一些实施例中,该第一提取模块,还配置为:
确定该待处理图像对应的第一方向梯度图和第二方向梯度图;
确定该待处理图像中各个像素点在第一方向梯度图中的第一梯度邻域块和在第二方向梯度图中的第二梯度邻域块;
基于该各个像素点的第一梯度邻域块和第二梯度邻域块确定该各个像素点的特征向量。
在一些实施例中,该第一提取模块,还配置为:
基于该各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的协方矩阵;
确定各个协方矩阵对应的各个第一特征值和各个第二特征值;
确定该各个像素点的邻域图像块对应的各个方差值;
基于该各个第一特征值、各个第二特征值和各个方差值确定该各个像素点的特征向量。
在一些实施例中,该图像处理装置还包括:
第二获取模块,配置为获取训练数据和预设的神经网络模型,其中,该训练数据包括第一训练图像和第二训练图像,其中,该第二训练图像是对该第一训练图像进行下采样得到的,该神经网络模型包括生成模型和判别模型;
第四处理模块,配置为利用该神经网络模型对该第二训练图像进行处理,得到预测图像;
模型训练模块,配置为基于该预测图像、该第一训练图像和预设的目标函数对该神经网络模型进行反向传播训练,得到训练好的神经网络模型。
在一些实施例中,该预设的目标函数包括生成目标函数和判别目标函数,对应地,该模型训练模块,还配置为:
固定该判别模型的判别参数,基于该预测图像、该第一训练图像和生成目标函数对该生成模型进行反向传播训练,以对该生成模型的生成参数进行调整;
固定生成判别模型的生成参数,基于该预测图像、该第一训练图像和判别目标函数对该判别模型进行反向传播训练,以对该判别模型的判别参数进行调整,直至达到预设的训练完成条件,得到训练好的神经网络模型。
在一些实施例中,该图像处理装置还包括:
第二确定模块,配置为确定该预测图像和该第一训练图像之间的像素级误差值和内容误差值;
第三确定模块,配置为基于该预测图像和该判别模型确定该预测图像的第一像素判别误差值和第一全局判别误差值;
第四确定模块,配置为基于预设的生成权重值、该像素级误差值、该内容误差值、该第一像素判别误差值和该第一全局判别误差值确定生成目标函数。
在一些实施例中,该图像处理装置还包括:
第五确定模块,配置为基于该预测图像和该判别模型确定该预测图像的第二像素判 别误差值和第二全局判别误差值;
第六确定模块,配置为基于该第一训练图像和该判别模型确定该第一训练图像的第三像素判别误差值和第三全局判别误差值;
第七确定模块,配置为基于预设的判别权重值、该第二像素判别误差值、该第二全局判别误差值、该第三像素判别误差值和该第三全局判别误差值确定判别目标函数。
在一些实施例中,该图像处理装置还包括:
第八确定模块,配置为基于该待处理图像中各个像素点对应的特征向量,确定特征空间;
子空间划分模块,配置为将该特征空间按照预设的划分规则,划分为N个特征子空间,并分别确定该N个特征子空间对应的N个中心坐标;
第一输入模块,配置为将该N个中心坐标分别输入至该训练好的神经网络模型,对应得到N个特征子空间的N个卷积核;
第九确定模块,配置为将该N个特征子空间和该N个卷积核确定为该轻量化模型。
在一些实施例中,该图像处理装置还包括:
决策树构建模块,配置为基于该待处理图像中各个像素点对应的特征向量,构建决策树;
第二输入模块,配置为将该决策树中各个叶子节点分别输入至该训练好的神经网络模型,对应得到各个叶子节点对应的卷积核;
第十确定模块,配置为将该各个叶子节点和对应的卷积核确定为该轻量化模型。
在一些实施例中,该第一处理模块,还配置为:
基于该各个像素点的特征向量和该轻量化模型,确定各个像素点对应的卷积核;
将该各个像素点的邻域图像块和对应的各个卷积核进行卷积计算,得到该各个像素点处理后的像素值;
基于各个像素点处理后的像素值,确定处理后的目标图像。
需要说明的是,本申请实施例图像处理装置的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例上述的图像处理方法。
本申请实施例提供一种存储有可执行指令的存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的方法。
在一些实施例中,存储介质可以是计算机可读存储介质,例如,铁电存储器(FRAM,Ferromagnetic Random Access Memory)、只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read Only Memory)、带电可擦可编程只读存储器(EEPROM,Electrically Erasable Programmable Read Only Memory)、闪存、磁表面存储器、光盘、或光盘只读存储器(CD-ROM,Compact Disk-Read Only Memory)等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (15)

  1. 一种图像处理方法,所述方法由图像处理设备执行,包括:
    获取待处理图像;
    当所述待处理图像为灰度图像时,提取所述待处理图像中各个像素点的特征向量,并确定所述各个像素点对应的邻域图像块;
    利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,所述轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;
    输出所述目标图像。
  2. 根据权利要求1中所述的方法,其中,所述方法还包括:
    当所述待处理图像为彩色图像时,将所述待处理图像转换至亮度色度YUV色域,得到亮度Y通道待处理图像和色度UV通道待处理图像;
    提取所述Y通道待处理图像中各个Y通道像素点的特征向量,并确定所述各个Y通道像素点对应的邻域图像块;
    利用所述轻量化模型对所述各个Y通道像素点的特征向量和邻域图像块进行处理,得到处理后的Y通道目标图像;
    利用预设的图像处理算法对所述UV通道待处理图像进行处理,得到UV通道目标图像;
    基于所述Y通道目标图像和UV通道目标图像确定目标图像,其中,所述目标图像与待处理图像的色域相同。
  3. 根据权利要求1中所述的方法,其中,所述获取待处理图像,包括:
    获取待处理视频文件;
    对所述待处理视频文件进行解码,得到所述待处理视频文件中的各个视频帧图像;
    将所述各个视频帧图像确定为所述待处理图像。
  4. 根据权利要求1中所述的方法,其中,所述提取所述待处理图像中各个像素点的特征向量,包括:
    确定所述待处理图像对应的第一方向梯度图和第二方向梯度图;
    确定所述待处理图像中各个像素点在第一方向梯度图中的第一梯度邻域块和在第二方向梯度图中的第二梯度邻域块;
    基于所述各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的特征向量。
  5. 根据权利要求4中所述的方法,其中,所述基于所述各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的特征向量,包括:
    基于所述各个像素点的第一梯度邻域块和第二梯度邻域块确定所述各个像素点的协方矩阵;
    确定各个协方矩阵对应的各个第一特征值和各个第二特征值;
    确定所述各个像素点的邻域图像块对应的各个方差值;
    基于所述各个第一特征值、各个第二特征值和各个方差值确定所述各个像素点的特征向量。
  6. 根据权利要求1至5中任一项所述的方法,其中,所述方法还包括:
    获取训练数据和预设的神经网络模型,其中,所述训练数据包括第一训练图像和第二训练图像,其中,所述第二训练图像是对所述第一训练图像进行下采样得到的,所述神经网络模型包括生成模型和判别模型;
    利用所述神经网络模型对所述第二训练图像进行处理,得到预测图像;
    基于所述预测图像、所述第一训练图像和预设的目标函数对所述神经网络模型进行反向传播训练,得到训练好的神经网络模型。
  7. 根据权利要求6中所述的方法,其中,所述预设的目标函数包括生成目标函数和判别目标函数,所述基于所述预测图像、所述第一训练图像和预设的目标函数对所述神经网络模型进行反向传播训练,得到训练好的神经网络模型,包括:
    固定所述判别模型的判别参数,基于所述预测图像、所述第一训练图像和生成目标函数对所述生成模型进行反向传播训练,对所述生成模型的生成参数进行调整;
    固定生成判别模型的生成参数,基于所述预测图像、所述第一训练图像和判别目标函数对所述判别模型进行反向传播训练,对所述判别模型的判别参数进行调整,直至达到预设的训练完成条件,得到训练好的神经网络模型。
  8. 根据权利要求7中所述的方法,其中,所述方法还包括:
    确定所述预测图像和所述第一训练图像之间的像素级误差值和内容误差值;
    基于所述预测图像和所述判别模型确定所述预测图像的第一像素判别误差值和第一全局判别误差值;
    基于预设的生成权重值、所述像素级误差值、所述内容误差值、所述第一像素判别误差值和所述第一全局判别误差值确定生成目标函数。
  9. 根据权利要求7中所述的方法,其中,所述方法还包括:
    基于所述预测图像和所述判别模型确定所述预测图像的第二像素判别误差值和第二全局判别误差值;
    基于所述第一训练图像和所述判别模型确定所述第一训练图像的第三像素判别误差值和第三全局判别误差值;
    基于预设的判别权重值、所述第二像素判别误差值、所述第二全局判别误差值、所述第三像素判别误差值和所述第三全局判别误差值确定判别目标函数。
  10. 根据权利要求1中所述的方法,其中,所述方法还包括:
    基于所述待处理图像中各个像素点对应的特征向量,确定特征空间;
    将所述特征空间按照预设的划分规则,划分为N个特征子空间,并分别确定所述N个特征子空间对应的N个中心坐标,其中N为大于2的整数;
    将所述N个中心坐标分别输入至所述训练好的神经网络模型,对应得到N个特征子空间的N个卷积核;
    将所述N个特征子空间和所述N个卷积核确定为所述轻量化模型。
  11. 根据权利要求1中所述的方法,其中,所述方法还包括:
    基于所述待处理图像中各个像素点对应的特征向量,构建决策树;
    将所述决策树中各个叶子节点分别输入至所述训练好的神经网络模型,对应得到各个叶子节点对应的卷积核;
    将所述各个叶子节点和对应的卷积核确定为所述轻量化模型。
  12. 根据权利要求10或11中所述的方法,其中,所述利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,包括:
    基于所述各个像素点的特征向量和所述轻量化模型,确定各个像素点对应的各个卷积核;
    将所述各个像素点的邻域图像块和对应的各个卷积核进行卷积计算,得到所述各个像素点处理后的像素值;
    基于各个像素点处理后的像素值,确定处理后的目标图像。
  13. 一种图像处理装置,包括:
    第一获取模块,配置为获取待处理图像;
    第一提取模块,配置为当所述待处理图像为灰度图像时,提取所述待处理图像中各个像素点的特征向量,并确定所述各个像素点对应的邻域图像块;
    第一处理模块,配置为利用轻量化模型对各个像素点的特征向量和邻域图像块进行处理,得到处理后的目标图像,其中,所述轻量化模型是对训练好的神经网络模型进行轻量化处理得到的;
    输出模块,配置为输出所述目标图像。
  14. 一种图像处理设备,包括:
    存储器,配置为存储可执行指令;
    处理器,配置为执行所述存储器中存储的可执行指令时,实现权利要求1至12任一项所述的方法。
  15. 一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至12任一项所述的方法。
PCT/CN2021/094049 2020-06-03 2021-05-17 图像处理方法、装置、设备及计算机可读存储介质 WO2021244270A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022566432A JP7464752B2 (ja) 2020-06-03 2021-05-17 画像処理方法、装置、機器及びコンピュータプログラム
EP21817967.9A EP4044106A4 (en) 2020-06-03 2021-05-17 IMAGE PROCESSING METHOD AND APPARATUS, DEVICE, AND COMPUTER READABLE STORAGE MEDIA
US17/735,942 US20220270207A1 (en) 2020-06-03 2022-05-03 Image processing method, apparatus, device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010495781.1A CN111402143B (zh) 2020-06-03 2020-06-03 图像处理方法、装置、设备及计算机可读存储介质
CN202010495781.1 2020-06-03

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/735,942 Continuation US20220270207A1 (en) 2020-06-03 2022-05-03 Image processing method, apparatus, device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2021244270A1 true WO2021244270A1 (zh) 2021-12-09

Family

ID=71431873

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094049 WO2021244270A1 (zh) 2020-06-03 2021-05-17 图像处理方法、装置、设备及计算机可读存储介质

Country Status (5)

Country Link
US (1) US20220270207A1 (zh)
EP (1) EP4044106A4 (zh)
JP (1) JP7464752B2 (zh)
CN (1) CN111402143B (zh)
WO (1) WO2021244270A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066780A (zh) * 2022-01-17 2022-02-18 广东欧谱曼迪科技有限公司 4k内窥镜图像去雾方法、装置、电子设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402143B (zh) * 2020-06-03 2020-09-04 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN113936163A (zh) * 2020-07-14 2022-01-14 武汉Tcl集团工业研究院有限公司 一种图像处理方法、终端以及存储介质
CN111932463B (zh) * 2020-08-26 2023-05-30 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN114173137A (zh) * 2020-09-10 2022-03-11 北京金山云网络技术有限公司 视频编码方法、装置及电子设备
CN114266696A (zh) * 2020-09-16 2022-04-01 广州虎牙科技有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN112333456B (zh) * 2020-10-21 2022-05-10 鹏城实验室 一种基于云边协议的直播视频传输方法
CN114612294A (zh) * 2020-12-08 2022-06-10 武汉Tcl集团工业研究院有限公司 一种图像超分辨率处理方法和计算机设备
CN112801879B (zh) * 2021-02-09 2023-12-08 咪咕视讯科技有限公司 图像超分辨率重建方法、装置、电子设备及存储介质
CN112837242B (zh) * 2021-02-19 2023-07-14 成都国科微电子有限公司 一种图像降噪处理方法、装置、设备及介质
CN112991203B (zh) * 2021-03-08 2024-05-07 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及存储介质
CN113128116B (zh) 2021-04-20 2023-09-26 上海科技大学 可用于轻量级神经网络的纯整型量化方法
CN113242440A (zh) * 2021-04-30 2021-08-10 广州虎牙科技有限公司 直播方法、客户端、系统、计算机设备以及存储介质
CN113379629A (zh) * 2021-06-08 2021-09-10 深圳思谋信息科技有限公司 卫星图像去噪方法、装置、计算机设备和存储介质
US20230252603A1 (en) * 2022-02-08 2023-08-10 Kyocera Document Solutions, Inc. Mitigation of quantization-induced image artifacts
CN117475367B (zh) * 2023-06-12 2024-05-07 中国建筑第四工程局有限公司 基于多规则协调的污水图像处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070086627A1 (en) * 2005-10-18 2007-04-19 Samsung Electronics Co., Ltd. Face identification apparatus, medium, and method
CN104598908A (zh) * 2014-09-26 2015-05-06 浙江理工大学 一种农作物叶部病害识别方法
CN109308679A (zh) * 2018-08-13 2019-02-05 深圳市商汤科技有限公司 一种图像风格转换方及装置、设备、存储介质
US20200082154A1 (en) * 2018-09-10 2020-03-12 Algomus, Inc. Computer vision neural network system
CN111402143A (zh) * 2020-06-03 2020-07-10 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及计算机可读存储介质

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010093650A (ja) 2008-10-09 2010-04-22 Nec Corp 端末、画像表示方法及びプログラム
GB2539845B (en) * 2015-02-19 2017-07-12 Magic Pony Tech Ltd Offline training of hierarchical algorithms
US10235608B2 (en) * 2015-12-22 2019-03-19 The Nielsen Company (Us), Llc Image quality assessment using adaptive non-overlapping mean estimation
CN108960514B (zh) * 2016-04-27 2022-09-06 第四范式(北京)技术有限公司 展示预测模型的方法、装置及调整预测模型的方法、装置
US10861143B2 (en) * 2017-09-27 2020-12-08 Korea Advanced Institute Of Science And Technology Method and apparatus for reconstructing hyperspectral image using artificial intelligence
CN108062744B (zh) * 2017-12-13 2021-05-04 中国科学院大连化学物理研究所 一种基于深度学习的质谱图像超分辨率重建方法
KR101882704B1 (ko) * 2017-12-18 2018-07-27 삼성전자주식회사 전자 장치 및 그 제어 방법
US20190325293A1 (en) * 2018-04-19 2019-10-24 National University Of Singapore Tree enhanced embedding model predictive analysis methods and systems
US11756160B2 (en) * 2018-07-27 2023-09-12 Washington University ML-based methods for pseudo-CT and HR MR image estimation
CN109034102B (zh) * 2018-08-14 2023-06-16 腾讯科技(深圳)有限公司 人脸活体检测方法、装置、设备及存储介质
CN109063666A (zh) * 2018-08-14 2018-12-21 电子科技大学 基于深度可分离卷积的轻量化人脸识别方法及系统
CN109598676A (zh) * 2018-11-15 2019-04-09 华南理工大学 一种基于哈达玛变换的单幅图像超分辨率方法
CN109409342A (zh) * 2018-12-11 2019-03-01 北京万里红科技股份有限公司 一种基于轻量卷积神经网络的虹膜活体检测方法
CN109902720B (zh) * 2019-01-25 2020-11-27 同济大学 基于子空间分解进行深度特征估计的图像分类识别方法
CN109949235A (zh) * 2019-02-26 2019-06-28 浙江工业大学 一种基于深度卷积神经网络的胸部x光片去噪方法
CN110084108A (zh) * 2019-03-19 2019-08-02 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 基于gan神经网络的行人重识别系统及方法
CN110136063B (zh) * 2019-05-13 2023-06-23 南京信息工程大学 一种基于条件生成对抗网络的单幅图像超分辨率重建方法
CN110348350B (zh) * 2019-07-01 2022-03-25 电子科技大学 一种基于面部表情的驾驶员状态检测方法
CN110796101A (zh) * 2019-10-31 2020-02-14 广东光速智能设备有限公司 一种嵌入式平台的人脸识别方法及系统
CN110907732A (zh) * 2019-12-04 2020-03-24 江苏方天电力技术有限公司 基于pca-rbf神经网络的调相机故障诊断方法
CN111105352B (zh) * 2019-12-16 2023-04-25 佛山科学技术学院 超分辨率图像重构方法、系统、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070086627A1 (en) * 2005-10-18 2007-04-19 Samsung Electronics Co., Ltd. Face identification apparatus, medium, and method
CN104598908A (zh) * 2014-09-26 2015-05-06 浙江理工大学 一种农作物叶部病害识别方法
CN109308679A (zh) * 2018-08-13 2019-02-05 深圳市商汤科技有限公司 一种图像风格转换方及装置、设备、存储介质
US20200082154A1 (en) * 2018-09-10 2020-03-12 Algomus, Inc. Computer vision neural network system
CN111402143A (zh) * 2020-06-03 2020-07-10 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4044106A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114066780A (zh) * 2022-01-17 2022-02-18 广东欧谱曼迪科技有限公司 4k内窥镜图像去雾方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
JP2023523833A (ja) 2023-06-07
EP4044106A1 (en) 2022-08-17
EP4044106A4 (en) 2023-02-01
CN111402143A (zh) 2020-07-10
JP7464752B2 (ja) 2024-04-09
CN111402143B (zh) 2020-09-04
US20220270207A1 (en) 2022-08-25

Similar Documents

Publication Publication Date Title
WO2021244270A1 (zh) 图像处理方法、装置、设备及计算机可读存储介质
US11954822B2 (en) Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium
CN110599492B (zh) 图像分割模型的训练方法、装置、电子设备及存储介质
KR102663519B1 (ko) 교차 도메인 이미지 변환 기법
CN108921225B (zh) 一种图像处理方法及装置、计算机设备和存储介质
EP3716198A1 (en) Image reconstruction method and device
EP3678059A1 (en) Image processing method, image processing apparatus, and a neural network training method
CN109934792B (zh) 电子装置及其控制方法
CN114008663A (zh) 实时视频超分辨率
US20220108478A1 (en) Processing images using self-attention based neural networks
CN115457531A (zh) 用于识别文本的方法和装置
KR20200128378A (ko) 이미지 생성 네트워크의 훈련 및 이미지 처리 방법, 장치, 전자 기기, 매체
EP4018411B1 (en) Multi-scale-factor image super resolution with micro-structured masks
CN111091010A (zh) 相似度确定、网络训练、查找方法及装置和存储介质
CN113066018A (zh) 一种图像增强方法及相关装置
CN116109892A (zh) 虚拟试衣模型的训练方法及相关装置
US11948090B2 (en) Method and apparatus for video coding
WO2024041235A1 (zh) 图像处理方法、装置、设备、存储介质及程序产品
CN110442719A (zh) 一种文本处理方法、装置、设备及存储介质
Shaharabany et al. End-to-end segmentation of medical images via patch-wise polygons prediction
CN115861605A (zh) 一种图像数据处理方法、计算机设备以及可读存储介质
CN114299105A (zh) 图像处理方法、装置、计算机设备及存储介质
CN114586056A (zh) 图像处理方法及装置、设备、视频处理方法及存储介质
US20240135490A1 (en) Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium
WO2023178801A1 (zh) 图像描述方法和装置、计算机设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817967

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021817967

Country of ref document: EP

Effective date: 20220419

ENP Entry into the national phase

Ref document number: 2022566432

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE