WO2017161710A1 - 基于深度学习的解像方法和系统 - Google Patents

基于深度学习的解像方法和系统 Download PDF

Info

Publication number
WO2017161710A1
WO2017161710A1 PCT/CN2016/086494 CN2016086494W WO2017161710A1 WO 2017161710 A1 WO2017161710 A1 WO 2017161710A1 CN 2016086494 W CN2016086494 W CN 2016086494W WO 2017161710 A1 WO2017161710 A1 WO 2017161710A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution image
low
feature information
face feature
image
Prior art date
Application number
PCT/CN2016/086494
Other languages
English (en)
French (fr)
Inventor
张丽杰
李正龙
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US15/537,677 priority Critical patent/US10769758B2/en
Publication of WO2017161710A1 publication Critical patent/WO2017161710A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present disclosure relates to the field of image processing in television display, and more particularly to a method and system for face and image high resolution decoding based on deep learning.
  • Super resolution is based on the resolution of the current video source is not as good as the resolution that HDTV can display.
  • Super-resolution technology enhances visual clarity by stretching, comparing, and correcting original images to output images that are more suitable for display on Full HD (Full High Definition) LCD TVs. Compared with ordinary LCD TVs, it is simple to stretch the SD signal to a high-definition screen. The super-resolution technology shows more prominent details, which changes the impression that HDTV is not as good as low-resolution TV.
  • the resolution of an image also known as resolution, resolution, and resolution, refers to how many pixels can be displayed on the display. The more pixels on the display, the finer the picture.
  • High-resolution images contain high pixel densities, provide rich detail information, and more accurate and detailed descriptions of objective scenes. High-resolution images are in great demand in the information age, such as satellite remote sensing images, video security surveillance, military surveillance aerial photography, medical digital imaging and video standard conversion.
  • Face illusion is a specific area super-resolution technique that produces a high-resolution output from a low-resolution input.
  • Low-resolution images are obtained by down-sampling and linear convolution processing of high-resolution images.
  • the illusion technique can be understood as reconstructing high-frequency details. For the current super-resolution technology, there is very little point about the illusion of the face.
  • an image decoding method comprising the steps of: a. building a sample library using an original high resolution image set; b. training the convolutional structure network using the sample library; c. utilizing training The subsequent convolutional fabric network processes the low resolution input signal to obtain a high resolution output signal.
  • an image decoding system comprising: a sample library building device configured to build a sample library using an original high resolution image set; a training device configured to utilize a sample library pair convolution structure network Training is performed; an output device configured to process the low resolution input signal with the adjusted convolutional structure network to obtain a high resolution output signal.
  • the method of the present disclosure adds the similarity information of the face feature part while utilizing the information of the original input image to enlarge the image, and enriches the details of the face in the image after the image, so that the sharpness is obviously improved.
  • the image decoding method and system according to the present disclosure can realize the processing of the data after the expansion by simply expanding the hardware without requiring a large change algorithm; and deploying the complex algorithm into a parallel design, which can work independently between different servers. And modular design, the functional module design can be changed through later optimization.
  • Figure 1 shows a general method of face resolution techniques.
  • FIG. 2 illustrates another general method of face resolution techniques.
  • FIG. 3 illustrates a flow chart of a method of decoding in accordance with an embodiment of the present disclosure.
  • FIG. 4 illustrates a specific implementation flow of the resolution method of FIG. 3 in accordance with at least one embodiment of the present disclosure.
  • FIG. 5 shows a specific implementation process of the training process S405 of FIG.
  • FIG. 6 illustrates a specific implementation flow of the resolution method of FIG. 3 in accordance with at least one embodiment of the present disclosure.
  • FIG. 7 shows a specific implementation process of the second training process S607 of FIG. 6.
  • FIG. 8 shows a block diagram of an imaging system in accordance with an embodiment of the present disclosure.
  • FIG. 9 illustrates a block diagram of a particular implementation of the resolution system of FIG. 8 in accordance with at least one embodiment of the present disclosure.
  • FIG. 10 illustrates a block diagram of a particular implementation of the resolution system of FIG. 8 in accordance with at least one embodiment of the present disclosure.
  • Figure 1 shows a general method of face resolution techniques.
  • face recognition is performed using a PCA (Principal Component Analysis) algorithm.
  • the low-resolution image is mapped to a high-resolution image, and the constraint is used for face reconstruction, and finally the high-resolution image is output.
  • PCA Principal Component Analysis
  • FIG. 2 illustrates another general method of face resolution techniques.
  • Feature mapping is performed on the input face image, and the high-frequency details with high similarity are added to complete the reconstruction of the face.
  • the output of the high-resolution face image is performed.
  • Question 2 The filling of the details in the feature reconstruction of the face is based on the reconstructed image, and the output is more unnatural and unreal.
  • the deep neural network is used to construct the training model of high and low resolution face database. After the model is well fitted, the sample library and quantity can be changed at any time. It is only necessary to update the whole training model and obtain new feature filtering parameters.
  • the method adds the similarity information of the face feature part while using the information of the original input image to zoom in and out, and enriches the details of the face in the image after the image, so that the clarity is obviously raised. Rise.
  • FIG. 3 illustrates a flow chart of a method of decoding in accordance with an embodiment of the present disclosure.
  • step S301 a sample library is created using the original high resolution image set.
  • step S302 the convolutional structure network is trained using the sample library.
  • step S303 the low resolution input signal is processed using the trained convolutional fabric network to obtain a high resolution output signal.
  • FIG. 4 illustrates a specific implementation flow of the resolution method of FIG. 3 in accordance with at least one embodiment of the present disclosure.
  • step S301 in FIG. 3 further includes steps S401-S404.
  • step S401 the original high resolution image set is downsampled to obtain a low resolution image set.
  • the downsampling process may take, for example, an existing or future process capable of achieving the same function, such as a linear convolution process.
  • step S402 face feature information of the low resolution image is extracted using the face feature extraction method.
  • the face feature extraction method may be an existing or future method capable of achieving the same function such as an edge detection algorithm.
  • step S403 face feature point marking is performed on the high resolution image to obtain face feature information of the high resolution image.
  • the structuring of face images mainly consists of facial components, contours and smooth regions. Marker detection is used for local facial components and contours.
  • step S404 the face feature information of the low resolution image and the face feature information of the high resolution image are used to establish a face feature information including face feature information of the low resolution image and the associated high resolution image. Face feature sample library.
  • step S302 in FIG. 3 further includes step S405.
  • step S405 the face feature information pairs of the low resolution image and the high resolution image in the face feature sample library are trained to obtain the first filter parameter.
  • the first filtering parameter is, for example, a classifier filtering parameter for a convolutional fabric network.
  • step S303 in FIG. 3 further includes steps S406-S408.
  • step S406 face feature information of the low resolution image is input as an input signal.
  • step S407 the face feature information of the input low resolution image is processed using the convolutional structure network based on the adjusted first filter parameter obtained in step S405.
  • step S408 the face feature information of the high resolution image processed by the convolutional structure network is output as an output signal.
  • FIG. 5 shows a specific implementation process of the training process S405 of FIG.
  • step S501 correlation analysis is performed on the face feature information of the low resolution image in the face feature sample library and the face feature information pair of the associated high resolution image to obtain a convolution structure network.
  • the first filter parameter is the first filter parameter.
  • steps S502 and S503 high-pass filtering and low-pass filtering are respectively performed on the face feature information of the high-resolution image to obtain high-frequency information of the face feature as a high-pass filtered face result and a face feature low-frequency information as a low-pass filtered face. result.
  • the high-pass filtering facial feature information can obtain high-frequency features, for example, facial structure contour information; and the low-pass filtering facial feature information can obtain detailed information, such as details of facial skin texture/roughness.
  • step S504 the high-pass filtered face result and the low-pass filtered face result are superimposed to obtain the superimposed result, that is, the superposition of the extracted high-frequency and low-frequency information (feature contour and detail texture).
  • step S505 feature classification is performed on the superimposed result, and the detail template of the face feature information of the high resolution image is obtained as a feedback signal of the convolution structure network. For example, different features such as a, b, c, etc. are marked as one category, respectively, to obtain different categories of detail templates.
  • the prediction result signal obtained after the processing is substantially the same as the feedback signal. That is, the difference between the prediction result signal and the feedback signal is smaller than the first threshold, and the first threshold may be set according to actual conditions, for example, the first threshold may be less than or equal to 0.01.
  • the facial feature information of the low resolution image is processed by the convolution structure network to obtain the facial feature information of the high resolution image.
  • the convolutional structure network is formed by alternately connecting a plurality of convolution layers and excitation layers.
  • the number of convolutional layers and excitation layers can be set according to actual conditions, for example, 2 or more.
  • each layer of the convolutional structure network takes the output of the previous layer as input and passes the output of this layer to the next layer.
  • Each convolutional layer can include a plurality of filtering units having adjustable filtering parameters.
  • the number of filtering units included in each convolutional layer may be the same or different.
  • the convolutional layer extracts the features from the input signal or the feature map of the previous layer by a convolution operation to obtain a convolved facial feature map.
  • the excitation layer is used to remove features with low sensitivity to the human eye.
  • the convolutional structure network When the convolutional structure network is used to obtain the prediction result signal ( among them Representing the prediction result signal (ie, the feature value), m represents the number of image sets in the face feature sample library, and FM i represents the feature map output through the last layer of the excitation layer), and uses the following variance as compared with the feedback signal.
  • the function calculates the error rate J(W,b), then calculates the partial derivative of the error rate for each filter parameter, and then adjusts the filter parameters according to the partial derivative (gradient).
  • J(W,b) represents the error rate and m represents the number of image sets in the face feature sample library.
  • m represents the number of image sets in the face feature sample library.
  • Represents a feedback signal Indicates the prediction result signal, h W,b represents the weight coefficient.
  • h W,b is the empirical value, the default value is 1, h W,b depends on the complexity of the network to adjust the size according to experience.
  • FIG. 6 illustrates a specific implementation flow of the resolution method of FIG. 3 in accordance with at least one embodiment of the present disclosure.
  • step S301 in FIG. 3 further includes steps S601-S605.
  • step S601 the original high resolution image set is downsampled to obtain a low resolution image set.
  • the downsampling process may take, for example, an existing or future process capable of achieving the same function, such as a linear convolution process.
  • step S602 the face feature information of the low resolution image of the low resolution image is extracted using the face feature extraction method.
  • the face feature extraction method may be an existing or future method capable of achieving the same function such as an edge detection algorithm.
  • step S603 the face feature point mark is performed on the high resolution image to obtain face feature information of the high resolution image.
  • step S604 the face feature information of the low resolution image and the face feature information of the high resolution image are used to establish a face feature information including the face feature information of the low resolution image and the associated high resolution image. Face feature sample library.
  • a library of image samples containing the low resolution image and the associated high resolution image pair is created using the low resolution image and the high resolution image.
  • step S302 in FIG. 3 further includes steps S606 and S607.
  • step S606 the face feature information pairs of the low resolution image and the high resolution image in the face feature sample library are trained to obtain the first filter parameter.
  • step S607 the low resolution image and the high resolution image pair in the image sample library are trained to obtain the second filter parameter.
  • step S303 in FIG. 3 further includes steps S608-S610.
  • step S608 low resolution information is input as an input signal.
  • step S609 the input signal is processed by the convolutional structure network based on the adjusted first filter parameter obtained in step S606 and the adjusted second filter parameter obtained in step S607.
  • step S610 high-resolution information processed by the convolutional structure network is output as an output signal.
  • the first training process S606 in FIG. 6 is the same as the training process S405 in FIG. I will not repeat them here.
  • FIG. 7 shows a specific implementation process of the second training process S607 of FIG. 6.
  • step S701 a correlation analysis is performed on the low resolution image and the associated high resolution image pair in the image sample library to obtain a second filter parameter of the convolutional structure network.
  • steps S702 and S703 high-resolution filtering and low-pass filtering are respectively performed on the high-resolution image to obtain image high-frequency information as a high-pass filtered image result and image low-frequency information as low-pass filtered image results.
  • the high-pass filtered image can obtain the high-frequency information of the image, which is the relatively prominent feature in the image; and the low-pass filtered image can obtain the low-frequency information of the image, that is, the detailed texture feature in the image.
  • step S704 the high-pass filtered face result and the low-pass filtered face result are superimposed to obtain a superimposed result, that is, a superposition of the extracted high-frequency and low-frequency information (feature contour and detail texture).
  • step S705 the result of the superposition is subjected to feature classification, and the detail template of the high-resolution image is obtained as a feedback signal of the convolution structure network.
  • the detail template of the high-resolution image is obtained as a feedback signal of the convolution structure network.
  • different features such as a, b, c, etc. are marked as one category, respectively, to obtain different categories of detail templates.
  • the second filtering parameter in the convolutional structure network uses the low resolution image as the input signal of the convolutional structure network to make the prediction obtained by processing the input signal by using the adjusted second filtering parameter in the convolution network
  • the resulting signal is substantially the same as the feedback signal. That is, the difference between the prediction result signal and the feedback signal is smaller than the first threshold, and the first threshold may be set according to actual conditions, for example, the first threshold may be less than or equal to 0.01.
  • the low resolution image is processed using a convolutional structure network to obtain a high resolution image.
  • the training process S405 in FIG. For the specific training process of the second training process S707, it is the training process S405 in FIG.
  • the specific training process is similar. The difference is that the face feature information of the low-resolution image in the face feature sample library and the face feature information pair of the associated high-resolution image are replaced with the image sample library established in step S705 of FIG. 7. Low resolution image and high resolution image pair. Therefore, it will not be described here.
  • FIG. 8 shows a block diagram of an imaging system in accordance with an embodiment of the present disclosure.
  • the resolution system includes a sample library construction device 801, a training device 802, and an output device 803.
  • a sample library building device 801 is configured to build a sample library using the original high resolution image set.
  • Training device 802 is configured to train the convolutional fabric network using a sample library.
  • Output device 803 is configured to process the low resolution input signal using the trained convolutional fabric network to obtain a high resolution output signal.
  • FIG. 9 illustrates a block diagram of a particular implementation of the resolution system of FIG. 8 in accordance with at least one embodiment of the present disclosure.
  • the sample library construction apparatus 801 in FIG. 8 further includes a downsampling unit 901, a face parsing unit 902, a feature point marking unit 903, and a face feature sample library establishing unit 904.
  • the training device 802 of FIG. 8 further includes a training unit 905.
  • the output device 803 in FIG. 8 further includes an input unit 906, a convolutional fabric network 907, and an output unit 908.
  • the downsampling unit 901 is configured to downsample the original high resolution image set to obtain a low resolution image set.
  • the downsampling process may take, for example, an existing or future process capable of achieving the same function, such as a linear convolution process.
  • the face parsing unit 902 is configured to extract face feature information of the low resolution image of the low resolution image using the face feature extraction method.
  • the face feature extraction method may be an existing or future method capable of achieving the same function such as an edge detection algorithm.
  • the feature point marking unit 903 is configured to perform face feature point marking on the high resolution image to obtain face feature information of the high resolution image.
  • the structuring of face images mainly consists of facial components, contours and smooth regions. Marker detection is used for local facial components and contours.
  • the face feature sample library establishing unit 904 is configured to use the face feature information of the low resolution image and the face feature information of the high resolution image to create face feature information and related high resolution images including the low resolution image.
  • the training unit 905 is configured to train the face feature information pairs of the low resolution image and the high resolution image in the face feature sample library to obtain the first filter parameter.
  • the first filter parameter For example, a classifier filter parameter for a convolutional fabric network.
  • the input unit 906 is configured to input face feature information of the low resolution image as an input signal.
  • the convolutional fabric network 907 is configured to process the facial feature information of the input low resolution image using the convolutional fabric network based on the adjusted first filtering parameters.
  • the output unit 908 is configured to output face feature information of the high resolution image processed by the convolutional structure network as an output signal.
  • FIG. 10 illustrates a block diagram of a particular implementation of the resolution system of FIG. 8 in accordance with at least one embodiment of the present disclosure.
  • the sample library construction apparatus 801 in FIG. 8 further includes: a downsampling unit 1001, a face parsing unit 1002, a feature point marking unit 1003, a face feature sample library establishing unit 1004, and an image sample library establishing unit 1005. .
  • the training device 802 of FIG. 8 further includes a first training unit 1006 and a second training unit 1007.
  • the output device 803 in FIG. 8 further includes an input unit 1008, a convolutional fabric network 1009, and an output unit 1010.
  • the downsampling unit 1001 is configured to downsample the original high resolution image set to obtain a low resolution image set.
  • the downsampling process may take, for example, an existing or future process capable of achieving the same function, such as a linear convolution process.
  • the face parsing unit 1002 is configured to extract face feature information of the low resolution image of the low resolution image using the face feature extraction method.
  • the face feature extraction method may be an existing or future method capable of achieving the same function such as an edge detection algorithm.
  • the feature point marking unit 1003 is configured to perform face feature point marking on the high resolution image to obtain face feature information of the high resolution image.
  • the face feature sample library establishing unit 1004 is configured to use the face feature information of the low resolution image and the face feature information of the high resolution image to create face feature information and related high resolution images including the low resolution image.
  • the image sample library building unit 1005 is configured to create a library of image samples containing low resolution images and associated high resolution image pairs using low resolution images and high resolution images.
  • the first training unit 1006 is configured to train the face feature information of the low resolution image and the face feature information pair of the high resolution image in the face feature sample library to obtain the first filter parameter.
  • a second training unit 1007 configured to image low resolution images and high resolution in a library of image samples The rate image pair is trained to obtain a second filter parameter.
  • the input unit 1008 is configured to input face feature information and/or an image of the low resolution image as an input signal.
  • Convolutional fabric network 1009 is configured to process facial feature information and/or images of the input low resolution image using a convolutional fabric network based on the adjusted first and/or second filtering parameters.
  • the output unit 1010 is configured to output face feature information and/or images of the high resolution image processed by the convolutional fabric network as an output signal.
  • Depth learning system based on deep learning including: parallelization and hierarchical design training model and resolution model.
  • first, second, third, etc. may be used herein to describe various elements, components and/or portions, these elements, components and/or portions are not limited by these terms. These terms are only used to distinguish one element, component or part. Thus, a first element, component or portion discussed below may be referred to as a second element, component or portion without departing from the teachings of the present disclosure.
  • the computer program instructions may also be stored in a computer readable memory, and the computer or other programmable data processing apparatus may be directed to operate in a particular manner such that the instructions stored in the computer readable memory comprise an implementation flow diagram and/or a block diagram block.
  • the computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on a computer or other programmable device to produce computer-implemented processing such that instructions are executed on a computer or other programmable device.
  • the steps of the specified function/action in the flowchart and/or block diagram block are implemented.
  • Each block may represent a code module, segment or portion that includes one or more executable instructions for implementing the specified logical function.
  • the functions noted in the blocks may not occur in the order noted. For example, two blocks shown in succession may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种解像方法和系统。该解像方法包括:利用原始高分辨率图像集建立样本库;利用样本库对卷积结构网络进行训练;利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。根据本公开的解像方法和系统通过简单扩展硬件,不需要大的改变算法,就可实现对扩容之后的数据进行处理;将复杂算法部署为并行化设计,可以在不同服务器之间相互独立工作;并且是模块化设计,可以通过后期的优化更改各功能模块设计方案。

Description

基于深度学习的解像方法和系统 技术领域
本公开涉及电视显示中的图像处理领域,更具体地说,涉及一种基于深度学习的人脸和图像高分辨率解像方法和系统。
背景技术
超解像,是针对当前视频信号源的分辨率不如高清电视可显示的分辨率而提出的。超解像技术通过将原始图像拉伸、比较、修正从而输出更适合在全高清(FHD,Full High Definition)液晶电视上显示的图像,增强视觉清晰度。与普通液晶电视只是简单的将标清信号拉伸放大到高清屏幕上相比,超解像技术显示细节更为突出,改变了人们对高清电视看有线DVD反而不如低分辨率电视的印象。
图像的分辨率,又称解像度、解像力、解析度,是指显示器上所能显示的像素多少,显示器上的像素点越多,其画面就越精细。分辨率高的图像含有的像素密度高,能提供丰富的细节信息,对客观场景的描述更准确细致。高分辨率图像在信息时代的需求非常广泛,诸如卫星遥感图像、视频安全监控、军事侦查航拍领域、医学数字影像和视频标准转换等领域都具有十分重要的应用。
人脸幻化是一种由低分辨率输入,生成一种高分辨率输出的特定区域超解像技术。低分辨率图像由高分辨率图像经过降采样、线性卷积处理所得到。幻化技术可理解为重构高频细节,对于目前的超解像技术,很少有关于点在人脸幻化上的。
发明内容
在本公开的实施例中提供一种基于深度学习的人脸和图像高分辨率解像方法和系统。按照本公开的一个方面,提供一种解像方法,该方法包括下列步骤:a.利用原始高分辨率图像集建立样本库;b.利用样本库对卷积结构网络进行训练;c.利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。
按照本公开的另一方面,提供一种解像系统,包括:样本库构建装置,被配置以利用原始高分辨率图像集建立样本库;训练装置,被配置以利用样本库对卷积结构网络进行训练;输出装置,被配置以利用调整后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。
本公开的方法在利用原输入图像本身信息解像放大的同时,添加了人脸特征部位相似度信息,丰富了解像后图像中人脸的细节,使得清晰度得到明显的提升。
根据本公开的解像方法和系统通过简单扩展硬件,不需要大的改变算法,就可实现对扩容之后的数据进行处理;将复杂算法部署为并行化设计,可以在不同服务器之间相互独立工作;并且是模块化设计,可以通过后期的优化更改各功能模块设计方案。
附图说明
图1示出了人脸解像技术的一种通用方法。
图2示出了人脸解像技术的另一种通用方法。
图3示出了根据本公开实施例的解像方法的流程图。
图4示出了根据本公开至少一个实施例的图3的解像方法的具体实现流程。
图5示出了图4的训练过程S405的具体实现过程。
图6示出了根据本公开至少一个实施例的图3的解像方法的具体实现流程。
图7示出了图6的第二训练过程S607的具体实现过程。
图8示出了根据本公开实施例的解像系统的方框图。
图9示出了根据本公开至少一个实施例的图8的解像系统的具体实现框图。
图10示出了根据本公开至少一个实施例的图8的解像系统的具体实现框图。
具体实施方式
通过结合附图对本公开的示例性实施例进行详细描述,本公开的上述特 性和优点将会变得更加清楚,附图中相同的标号指定相同结构的单元。
下面将参照附图描述本公开的实施例。然而,本公开可以以许多不同的形式实现,而不应当认为限于这里所述的实施例。在附图中,为了清楚起见放大了组件。
图1示出了人脸解像技术的一种通用方法。
如图1所示,采用PCA(主成分分析)算法进行人脸识别。将低分辨率图像映射到高分辨率图像,并利用约束条件进行人脸重构,最后进行高分辨率图像输出。
图2示出了人脸解像技术的另一种通用方法。
如图2所示,首先,输入低分辨率的人脸图像;进行人脸部的各个特征检测,并对检测到的特征(眼睛/鼻子/嘴巴/)进行标记;然后提取脸部不同特征,得到多张不同的特征成分模板;从这些模板中利用高通滤波函数继续提取,得到高频成分,也就是图像的高分辨率面部成分;人脸结构性较强,因此可利用结构相似性函数,对输入的人脸图像进行特征映射,补充添加相似性较高的高频细节,完成人脸的重构。最后进行高分辨率人脸图像的输出。
以上方法存在的问题:
问题1:人脸高低分辨率特征图像对映射函数中的系数是确定的、唯一的,一旦固定,对应的特征图像对将不可更改,可移植性和可扩展性不强。
问题2:人脸的特征重建中细节的填充是建立在重构的图像上,输出结果表现的较为不自然和不真实。
针对以上问题,本公开提出了如下解决方案:
问题1的解决方案:
利用深度神经网络构建高低分辨率人脸库的训练模型,模型拟合好之后,后续可随意更改样本库和数量,只需更新整个训练模型,获得新的特征滤波参数即可。
问题2的解决方案:
在训练整个人脸高低分辨率模型的同时,对高分辨率人脸的主要特征进行标记,通过高低通滤波器,对标记的图像块进行细节模型训练,获得对应的细节填充滤波参数。
该方法在利用原输入图像本身信息解像放大的同时,添加了人脸特征部位相似度信息,丰富了解像后图像中人脸的细节,使得清晰度得到明显的提 升。
图3示出了根据本公开实施例的解像方法的流程图。
如图3所示,在步骤S301,利用原始高分辨率图像集建立样本库。
在步骤S302,利用样本库对卷积结构网络进行训练。
在步骤S303,利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。
图4示出了根据本公开至少一个实施例的图3的解像方法的具体实现流程。
如图4所示,图3中的步骤S301进一步包括步骤S401-S404。
在步骤S401,对原始高分辨率图像集进行下采样处理得到低分辨率图像集。其中,下采样处理例如可以采取线性卷积处理等现有的或将来的能够实现相同功能的处理。
在步骤S402,使用人脸特征提取方法提取低分辨率图像的人脸特征信息。其中,人脸特征提取方法可以是诸如边缘检测算法的现有的或将来的能够实现相同功能的方法。
在步骤S403,对高分辨率图像进行人脸特征点标记,以获得高分辨率图像的人脸特征信息。人脸图像的结构化主要包括面部成分、轮廓和平滑区域组成。标记检测用于局部的面部成分和轮廓。
在步骤S404,利用低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息建立包含低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对的人脸特征样本库。
如图4所示,图3中的步骤S302进一步包括步骤S405。在步骤S405,对人脸特征样本库中的低分辨率图像和高分辨率图像的人脸特征信息对进行训练以得到第一滤波参数。所述第一滤波参数例如是用于卷积结构网络的分类器滤波参数。
如图4所示,图3中的步骤S303进一步包括步骤S406-S408。
在步骤S406,输入低分辨率图像的人脸特征信息作为输入信号。
在步骤S407,基于在步骤S405得到的调整后的第一滤波参数,利用卷积结构网络对输入的低分辨率图像的人脸特征信息进行处理。
在步骤S408,输出经卷积结构网络处理的高分辨率图像的人脸特征信息作为输出信号。
图5示出了图4的训练过程S405的具体实现过程。
如图5所示,在步骤S501,对人脸特征样本库中的低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对进行相关性分析,得到卷积结构网络的第一滤波参数。
在步骤S502和S503,对高分辨率图像的人脸特征信息分别进行高通滤波和低通滤波以得到人脸特征高频信息作为高通滤波人脸结果和人脸特征低频信息作为低通滤波人脸结果。其中,高通滤波人脸特征信息可以得到高频特征,例如,人脸部结构轮廓信息;而低通滤波人脸特征信息可以得到细化信息,例如,人脸部皮肤纹理/粗糙程度等细节。
在步骤S504,将高通滤波人脸结果和低通滤波人脸结果叠加,以得到叠加的结果,也即,提取的高频和低频信息的叠加(特征轮廓和细节纹理)。
在步骤S505,对叠加的结果进行特征分类,得到高分辨率图像的人脸特征信息的细节模板作为卷积结构网络的反馈信号。例如,标记诸如a、b、c等的不同特征分别作为一个类别,得到不同类别的细节模板。
采用低分辨率图像的人脸特征信息作为卷积结构网络的输入信号,调整卷积结构网络中的第一滤波参数,以使得在卷积结构网络中采用调整后的第一滤波参数对输入信号进行处理后得到的预测结果信号与反馈信号基本相同。也即,预测结果信号与反馈信号之间的差值小于第一阈值,第一阈值可以根据实际情况进行设置,例如第一阈值可以小于等于0.01。
之后,基于调整后的第一滤波参数,利用卷积结构网络对低分辨率图像的人脸特征信息进行处理以得到高分辨率图像的人脸特征信息。
如图5所述,卷积结构网络是由多个卷积层和激励层交替连接而成。卷积层和激励层的数目可以根据实际情况设定,例如2个或更多。卷积结构网络除了输入和输出之外,其他每层都是将上一层的输出作为输入,并将本层的输出传递到下一层。
每个卷积层都可以包括多个具有可调整滤波参数的滤波单元。每个卷积层中包含的滤波单元的个数可以相同也可以不同。
卷积层通过卷积操作从输入信号或者前一层的特征图中提取特征得到卷积后的人脸特征图。一般地,每个滤波单元采用如下公式进行卷积操作:F(x)=Wx+b。其中,W和b是滤波参数,x是输入,F(x)为输出。
激励层用于去除人眼敏感度低的特征。激励层例如可以采用激励函数 F(x)=max(0,x)实现。也即,去除人眼敏感度低的特征,即F(x)≤0的特征,以找到敏感度最高的特征图。
当经过卷积结构网络得到预测结果信号(
Figure PCTCN2016086494-appb-000001
其中
Figure PCTCN2016086494-appb-000002
表示预测结果信号(即,特征值),m表示人脸特征样本库中图像集的个数,FMi表示经由最后一层激励层输出的特征图)之后,与反馈信号相比,利用如下方差函数计算出错误率J(W,b),然后对每个滤波参数计算其对错误率的偏导数,然后根据偏导数(梯度)调整滤波参数。
Figure PCTCN2016086494-appb-000003
其中J(W,b)表示错误率,m表示人脸特征样本库中图像集的个数,
Figure PCTCN2016086494-appb-000004
表示反馈信号,
Figure PCTCN2016086494-appb-000005
表示预测结果信号,hW,b表示权重系数。hW,b为经验值,默认值为1,hW,b按照网络复杂程度依靠经验进行大小调整。
图6示出了根据本公开至少一个实施例的图3的解像方法的具体实现流程。
如图6所示,图3中的步骤S301进一步包括步骤S601-S605。
在步骤S601,对原始高分辨率图像集进行下采样处理得到低分辨率图像集。其中,下采样处理例如可以采取线性卷积处理等现有的或将来的能够实现相同功能的处理。
在步骤S602,使用人脸特征提取方法提取低分辨率图像的低分辨率图像的人脸特征信息。其中,人脸特征提取方法可以是诸如边缘检测算法的现有的或将来的能够实现相同功能的方法。
在步骤S603,对高分辨率图像进行人脸特征点标记,以获得高分辨率图像的人脸特征信息。
在步骤S604,利用低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息建立包含低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对的人脸特征样本库。
在步骤S605,利用低分辨率图像和高分辨率图像建立包含低分辨率图像和相关的高分辨率图像对的图像样本库。
如图6所示,图3中的步骤S302进一步包括步骤S606和S607。
在步骤S606,对人脸特征样本库中的低分辨率图像和高分辨率图像的人脸特征信息对进行训练以得到第一滤波参数。
在步骤S607,对图像样本库中的低分辨率图像和高分辨率图像对进行训练以得到第二滤波参数。
如图6所示,图3中的步骤S303进一步包括步骤S608-S610。
在步骤S608,输入低分辨率的信息作为输入信号。
在步骤S609,基于在步骤S606得到的调整后的第一滤波参数以及在步骤S607得到的调整后的第二滤波参数,利用卷积结构网络对输入信号进行处理。
在步骤S610,输出经卷积结构网络处理的高分辨率的信息作为输出信号。
图6中的第一训练过程S606与图4中的训练过程S405相同。在此不再赘述。
图7示出了图6的第二训练过程S607的具体实现过程。
如图7所示,在步骤S701,对图像样本库中的低分辨率图像和相关的高分辨率图像对进行相关性分析,得到卷积结构网络的第二滤波参数。
在步骤S702和S703,对高分辨率图像分别进行高通滤波和低通滤波以得到图像高频信息作为高通滤波图像结果和图像低频信息作为低通滤波图像结果。其中,高通滤波图像可以得到图像的高频信息,也就是图像中相对较为突出的特征;而低通滤波图像可以得到图像的低频信息,也就是图像中的细节纹理特征。
在步骤S704,将高通滤波人脸结果和低通滤波人脸结果叠加,以得到叠加的结果,也即,提取的高频和低频信息的叠加(特征轮廓和细节纹理)。
在步骤S705,对叠加的结果进行特征分类,得到高分辨率图像的细节模板作为卷积结构网络的反馈信号。例如,标记诸如a、b、c等的不同特征分别作为一个类别,得到不同类别的细节模板。
采用低分辨率图像作为卷积结构网络的输入信号,调整卷积结构网络中的第二滤波参数,以使得在卷积网络中采用调整后的第二滤波参数对输入信号进行处理后得到的预测结果信号与反馈信号基本相同。也即,预测结果信号与反馈信号之间的差值小于第一阈值,第一阈值可以根据实际情况进行设置,例如第一阈值可以小于等于0.01。
之后,基于调整后的滤波参数,利用卷积结构网络对低分辨率图像进行处理以得到高分辨率图像。
对于第二训练过程S707的具体训练过程,其与图4中的训练过程S405 的具体训练过程类似。其不同之处在于:将人脸特征样本库中的低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对替换为在图7的步骤S705中建立的图像样本库中的低分辨率图像和高分辨率图像对。因此,在此将不对其进行赘述。
图8示出了根据本公开实施例的解像系统的方框图。
如图8所示,解像系统包括:样本库构建装置801、训练装置802和输出装置803。
样本库构建装置801,被配置以利用原始高分辨率图像集建立样本库。
训练装置802,被配置以利用样本库对卷积结构网络进行训练。
输出装置803,被配置以利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。
图9示出了根据本公开至少一个实施例的图8的解像系统的具体实现框图。
如图9所示,图8中的样本库构建装置801进一步包括:下采样单元901、人脸解析单元902、特征点标记单元903、人脸特征样本库建立单元904。图8中的训练装置802进一步包括训练单元905。图8中的输出装置803进一步包括:输入单元906、卷积结构网络907和输出单元908。
下采样单元901,被配置以对原始高分辨率图像集进行下采样处理得到低分辨率图像集。其中,下采样处理例如可以采取线性卷积处理等现有的或将来的能够实现相同功能的处理。
人脸解析单元902,被配置以使用人脸特征提取方法提取低分辨率图像的低分辨率图像的人脸特征信息。其中,人脸特征提取方法可以是诸如边缘检测算法的现有的或将来的能够实现相同功能的方法。
特征点标记单元903,被配置以对高分辨率图像进行人脸特征点标记,以获得高分辨率图像的人脸特征信息。人脸图像的结构化主要包括面部成分、轮廓和平滑区域组成。标记检测用于局部的面部成分和轮廓。
人脸特征样本库建立单元904,被配置以利用低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息建立包含低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对的人脸特征样本库。
训练单元905,被配置以对人脸特征样本库中的低分辨率图像和高分辨率图像的人脸特征信息对进行训练以得到第一滤波参数。所述第一滤波参数 例如是用于卷积结构网络的分类器滤波参数。
输入单元906,被配置以输入低分辨率图像的人脸特征信息作为输入信号。
卷积结构网络907,被配置以基于调整后的第一滤波参数,利用卷积结构网络对输入的低分辨率图像的人脸特征信息进行处理。
输出单元908,被配置以输出经卷积结构网络处理的高分辨率图像的人脸特征信息作为输出信号。
图10示出了根据本公开至少一个实施例的图8的解像系统的具体实现框图。
如图10所示,图8中的样本库构建装置801进一步包括:下采样单元1001、人脸解析单元1002、特征点标记单元1003、人脸特征样本库建立单元1004、图像样本库建立单元1005。图8中的训练装置802进一步包括第一训练单元1006、和第二训练单元1007。图8中的输出装置803进一步包括:输入单元1008、卷积结构网络1009和输出单元1010。
下采样单元1001,被配置以对原始高分辨率图像集进行下采样处理得到低分辨率图像集。其中,下采样处理例如可以采取线性卷积处理等现有的或将来的能够实现相同功能的处理。
人脸解析单元1002,被配置以使用人脸特征提取方法提取低分辨率图像的低分辨率图像的人脸特征信息。其中,人脸特征提取方法可以是诸如边缘检测算法的现有的或将来的能够实现相同功能的方法。
特征点标记单元1003,被配置以对高分辨率图像进行人脸特征点标记,以获得高分辨率图像的人脸特征信息。
人脸特征样本库建立单元1004,被配置以利用低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息建立包含低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对的人脸特征样本库。
图像样本库建立单元1005,被配置以利用低分辨率图像和高分辨率图像建立包含低分辨率图像和相关的高分辨率图像对的图像样本库。
第一训练单元1006,被配置以对人脸特征样本库中的低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息对进行训练以得到第一滤波参数。
第二训练单元1007,被配置以对图像样本库中的低分辨率图像和高分辨 率图像对进行训练以得到第二滤波参数。
输入单元1008,被配置以输入低分辨率图像的人脸特征信息和/或图像作为输入信号。
卷积结构网络1009,被配置以基于调整后的第一和/或第二滤波参数,利用卷积结构网络对输入的低分辨率图像的人脸特征信息和/或图像进行处理。
输出单元1010,被配置以输出经卷积结构网络处理的高分辨率图像的人脸特征信息和/或图像作为输出信号。
对于图9和图10中的训练单元的具体训练过程,可参照图5和图7的具体训练过程。因此,在此将不对其进行赘述。
基于深度学习的解像系统,包含:并行化和层次化设计训练模型和解像模型。具有以下优点:
1.具有可扩展性:通过简单扩展硬件,不需要大的改变算法,就可实现对扩容之后的数据进行处理。
2.高效性:将复杂算法部署为并行化设计,不同服务器之间相互独立工作。
3.多变性:模块化设计,可以通过后期的优化更改各功能模块设计方案。
应当理解,当称“元件”“连接到”或“耦接”到另一元件时,它可以是直接连接或耦接到另一元件或者可以存在中间元件。相反,当称元件“直接连接到”或“直接耦接到”另一元件时,不存在中间元件。相同的附图标记指示相同的元件。这里使用的术语“和/或”包括一个或多个相关列出的项目的任何和所有组合。
应当理解,尽管这里可以使用术语第一、第二、第三等描述各个元件、组件和/或部分,但这些元件、组件和/或部分不受这些术语限制。这些术语仅仅用于将元件、组件或部分相互区分开来。因此,下面讨论的第一元件、组件或部分在不背离本公开教学的前提下可以称为第二元件、组件或部分。
这里使用的术语仅仅是为了描述特定实施例的目的,而并不意图限制本公开。这里使用的单数形式“一”、“一个”和“那(这个)”也意图包含复数形式,除非上下文中明确地指出不包含。应当理解,术语“包括”当用在本说明书中时指示所述特征、整数、步骤、操作、元件和/或组件的存在,但并不排除一个或多个其他特征、整数、步骤、操作、元件、组件和/或其组合的存在或添加。
除非另有定义,这里使用的所有术语(包括技术和科学术语)具有与本公开所属领域的普通技术人员共同理解的相同含义。还应当理解,诸如在通常字典里定义的那些术语应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。
这里参照支持根据本公开实施例的方法、装置(系统)的方框图和流程图描述本公开示例性实施例。应当理解,流程图和/或方框图的每个方框以及流程图和/或方框图的方框组合可以通过计算机程序指令实现。这些计算机程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器以产生机器,使得通过计算机或其他可编程数据处理装置的处理器执行的指令创建实现流程图和/或方框图方框中指定功能/动作的手段。
这些计算机程序指令也可以存储在计算机可读存储器中,可以引导计算机或其他可编程数据处理装置以特定方式运行,使得存储在计算机可读存储器中的指令产生包括实现流程图和/或方框图方框中指定功能/动作的指令手段的制造物品。
计算机程序指令还可以加载到计算机或其他可编程数据处理装置上,导致在计算机或其他可编程装置上执行一系列操作步骤来产生计算机实现的处理,使得计算机或其他可编程装置上执行的指令提供实现流程图和/或方框图方框中指定功能/动作的步骤。每个方框可以表示代码模块、片断或部分,其包括一个或多个用来实现指定逻辑功能的可执行指令。还应当注意,在其他实现中,方框中标出的功能可能不按图中标出的顺序发生。例如,根据所涉及的功能,连续示出的两个方框可能实际上基本上并发地执行,或者方框有时可能以相反的顺序执行。
上面是对本公开的说明,而不应被认为是对其的限制。尽管描述了本公开的若干示例性实施例,但本领域技术人员将容易地理解,在不背离本公开的新颖教学和优点的前提下可以对示例性实施例进行许多修改。因此,所有这些修改都意图包含在权利要求书所限定的本公开范围内。应当理解,上面是对本公开的说明,而不应被认为是限于所公开的特定实施例,并且对所公开的实施例以及其他实施例的修改意图包含在所附权利要求书的范围内。本公开由权利要求书及其等效物限定。
本申请要求于2016年03月21日递交的中国专利申请第201610161589.2 号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。

Claims (20)

  1. 一种解像方法,包括:
    利用原始高分辨率图像集建立样本库;
    利用样本库对卷积结构网络进行训练;
    利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。
  2. 如权利要求1所述的解像方法,其中,所述样本库包括人脸特征样本库,
    所述利用原始高分辨率图像集建立样本库进一步包括:
    对原始高分辨率图像集进行下采样处理得到低分辨率图像集;
    使用人脸特征提取方法提取低分辨率图像的人脸特征信息;
    对高分辨率图像进行人脸特征点标记,以获得高分辨率图像的人脸特征信息;
    利用低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息建立包含低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对的人脸特征样本库。
  3. 如权利要求2所述的解像方法,其中,所述利用样本库对卷积结构网络进行训练进一步包括:
    对人脸特征样本库中的低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对进行相关性分析,得到卷积结构网络的第一滤波参数;
    对高分辨率图像的人脸特征信息分别进行高通滤波和低通滤波以得到高通滤波人脸结果和低通滤波人脸结果;将高通滤波人脸结果和低通滤波人脸结果叠加并进行特征分类,得到高分辨率图像的人脸特征信息的细节模板作为卷积结构网络的反馈信号;
    采用低分辨率图像的人脸特征信息作为卷积结构网络的输入信号,调整卷积结构网络中的第一滤波参数,采用调整后的第一滤波参数利用卷积结构网络对输入信号进行处理以得到与反馈信号相同的预测结果信号。
  4. 如权利要求3所述的解像方法,其中,所述利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号进一步包括:
    输入低分辨率图像的人脸特征信息;
    基于调整后的第一滤波参数,利用卷积结构网络对输入的低分辨率图像的人脸特征信息进行处理;
    输出经卷积结构网络处理的高分辨率图像的人脸特征信息。
  5. 如权利要求2-4中的任何一个所述的解像方法,其中,所述样本库包括图像样本库,
    所述利用原始高分辨率图像集建立样本库进一步包括:
    利用低分辨率图像集和高分辨率图像集建立包含低分辨率图像和相关的高分辨率图像对的图像样本库。
  6. 如权利要求5所述的解像方法,其中,所述利用样本库对卷积结构网络进行训练进一步包括:
    对低分辨率图像和相关的高分辨率图像对进行相关性分析,得到卷积结构网络的第二滤波参数;
    对高分辨率图像分别进行高通滤波和低通滤波以得到高通滤波结果和低通滤波结果;将高通滤波结果和低通滤波结果叠加并进行特征分类,得到高分辨率图像的细节模板作为卷积结构网络的反馈信号;
    采用低分辨率图像作为卷积结构网络的输入信号,调整卷积结构网络中的第二滤波参数,采用调整后的第二滤波参数利用卷积结构网络对输入信号进行处理以得到与反馈信号相同的预测结果信号。
  7. 如权利要求6所述的解像方法,其中,所述利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号进一步包括:
    输入低分辨率的图像;
    基于调整后的第二滤波参数,利用卷积结构网络对输入的低分辨率的图像进行处理;
    输出经卷积结构网络处理的高分辨率的图像。
  8. 如权利要求7所述的解像方法,其中,卷积结构网络是由多个卷积层和激励层交替连接而成,
    每个卷积层都可以包括多个具有可调整滤波参数的滤波单元,其中,每个滤波单元采用公式F(x)=Wx+b进行卷积操作,其中,W和b是滤波参数,x是输入,F(x)为输出。
  9. 如权利要求8所述的解像方法,其中,当根据如下公式得到的J(W,b)小于第一阈值时,确定预测结果信号与反馈信号相同,
    Figure PCTCN2016086494-appb-100001
    其中J(W,b)表示均方差,m表示人脸特征样本库中图像集的个数,
    Figure PCTCN2016086494-appb-100002
    表示反馈信号,
    Figure PCTCN2016086494-appb-100003
    表示预测结果信号,hW,b表示权重系数。
  10. 如权利要求9所述的解像方法,当预测结果信号与反馈信号不相同时,对于每个滤波参数,计算J(W,b)的偏导数,并根据偏导数调整第一滤波参数或第二滤波参数。
  11. 如权利要求10所述的解像方法,其中,所述第一滤波参数是用于卷积结构网络的分类器滤波参数。
  12. 一种解像系统,包括:
    样本库构建装置,被配置利用原始高分辨率图像集建立样本库;
    训练装置,被配置利用样本库对卷积结构网络进行训练;
    输出装置,被配置利用训练后的卷积结构网络对低分辨率的输入信号进行处理以得到高分辨率的输出信号。
  13. 如权利要求12所述的解像系统,其中,所述样本库包括人脸特征样本库,
    样本库构建装置进一步包括:
    下采样单元,被配置以对原始高分辨率图像集进行下采样处理得到低分辨率图像集;
    人脸解析单元,被配置以使用人脸特征提取方法提取低分辨率图像的低分辨率图像的人脸特征信息;
    特征点标记单元,被配置以对高分辨率图像进行人脸特征点标记,以获得高分辨率图像的人脸特征信息;
    人脸特征样本库建立单元,被配置以利用低分辨率图像的人脸特征信息和高分辨率图像的人脸特征信息建立包含低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对的人脸特征样本库。
  14. 如权利要求13所述的解像系统,其中,训练装置进一步包括:
    第一训练单元,被配置以对人脸特征样本库中的低分辨率图像的人脸特征信息和相关的高分辨率图像的人脸特征信息对进行相关性分析,得到卷积结构网络的第一滤波参数;对高分辨率图像的人脸特征信息分别进行高通滤波和低通滤波以得到高通滤波人脸结果和低通滤波人脸结果;将高通滤波人 脸结果和低通滤波人脸结果叠加并进行特征分类,得到高分辨率图像的人脸特征信息的细节模板作为卷积结构网络的反馈信号;采用低分辨率图像的人脸特征信息作为卷积结构网络的输入信号,调整卷积结构网络中的第一滤波参数,采用调整后的第一滤波参数利用卷积结构网络对输入信号进行处理以得到与反馈信号相同的预测结果信号。
  15. 如权利要求13-14中的任何一个所述的解像系统,其中,所述样本库包括图像样本库,
    样本库构建装置进一步包括:
    图像样本库建立单元,被配置以利用低分辨率图像和高分辨率图像建立包含低分辨率图像和相关的高分辨率图像对的图像样本库。
  16. 如权利要求15所述的解像系统,其中,训练装置进一步包括:
    第二训练单元,被配置以对低分辨率图像和相关的高分辨率图像对进行相关性分析,得到卷积结构网络的第二滤波参数;对高分辨率图像分别进行高通滤波和低通滤波以得到高通滤波结果和低通滤波结果;将高通滤波结果和低通滤波结果叠加并进行特征分类,得到高分辨率图像的细节模板作为卷积结构网络的反馈信号;采用低分辨率图像作为卷积结构网络的输入信号,调整卷积结构网络中的第二滤波参数,采用调整后的第二滤波参数利用卷积结构网络对输入信号进行处理以得到与反馈信号相同的预测结果信号。
  17. 如权利要求16所述的解像系统,其中,输出装置进一步包括:
    输入单元,被进一步配置以输入低分辨率的人脸特征信息和/或图像;
    卷积结构网络,被进一步配置基于调整后的第一和/或第二滤波参数,利用卷积结构网络对输入的低分辨率的人脸特征信息和/或图像进行处理;
    输出单元,被进一步配置以输出经卷积结构网络处理的高分辨率的人脸特征信息和/或图像。
  18. 如权利要求17所述的解像系统,其中,卷积结构网络是由多个卷积层和激励层交替连接而成,
    每个卷积层都可以包括多个具有可调整滤波参数的滤波单元,其中,每个滤波单元采用公式F(x)=Wx+b进行卷积操作,其中,W和b是滤波参数,x是输入,F(x)为输出。
  19. 如权利要求18所述的解像系统,其中,当根据如下公式得到的J(W,b)小于第一阈值时,确定预测结果信号与反馈信号相同,
    Figure PCTCN2016086494-appb-100004
    其中J(W,b)表示均方差,m表示人脸特征样本库中图像集的个数,
    Figure PCTCN2016086494-appb-100005
    表示反馈信号,
    Figure PCTCN2016086494-appb-100006
    表示预测结果信号,hW,b表示权重系数。
  20. 如权利要求19所述的解像系统,当预测结果信号与反馈信号不相同时,对于每个滤波参数,计算J(W,b)的偏导数,并根据偏导数调整第一滤波参数或第二滤波参数。
PCT/CN2016/086494 2016-03-21 2016-06-21 基于深度学习的解像方法和系统 WO2017161710A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/537,677 US10769758B2 (en) 2016-03-21 2016-06-21 Resolving method and system based on deep learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610161589.2 2016-03-21
CN201610161589.2A CN105847968B (zh) 2016-03-21 2016-03-21 基于深度学习的解像方法和系统

Publications (1)

Publication Number Publication Date
WO2017161710A1 true WO2017161710A1 (zh) 2017-09-28

Family

ID=56588161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086494 WO2017161710A1 (zh) 2016-03-21 2016-06-21 基于深度学习的解像方法和系统

Country Status (3)

Country Link
US (1) US10769758B2 (zh)
CN (1) CN105847968B (zh)
WO (1) WO2017161710A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018138603A1 (en) * 2017-01-26 2018-08-02 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device and electronic device including the semiconductor device
US10489887B2 (en) * 2017-04-10 2019-11-26 Samsung Electronics Co., Ltd. System and method for deep learning image super resolution
CN107633218B (zh) 2017-09-08 2021-06-08 百度在线网络技术(北京)有限公司 用于生成图像的方法和装置
US11768979B2 (en) * 2018-03-23 2023-09-26 Sony Corporation Information processing device and information processing method
CN108875904A (zh) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 图像处理方法、图像处理装置和计算机可读存储介质
CN109977963B (zh) * 2019-04-10 2021-10-15 京东方科技集团股份有限公司 图像处理方法、设备、装置以及计算机可读介质
CN112215761A (zh) * 2019-07-12 2021-01-12 华为技术有限公司 图像处理方法、装置及设备
CN110543815B (zh) * 2019-07-22 2024-03-08 平安科技(深圳)有限公司 人脸识别模型的训练方法、人脸识别方法、装置、设备及存储介质
CN111899252B (zh) * 2020-08-06 2023-10-27 腾讯科技(深圳)有限公司 基于人工智能的病理图像处理方法和装置
CN112580617B (zh) * 2021-03-01 2021-06-18 中国科学院自动化研究所 自然场景下的表情识别方法和装置
CN113658040A (zh) * 2021-07-14 2021-11-16 西安理工大学 一种基于先验信息和注意力融合机制的人脸超分辨方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216889A (zh) * 2008-01-14 2008-07-09 浙江大学 一种融合全局特征与局部细节信息的人脸图像超分辨率方法
CN101299235A (zh) * 2008-06-18 2008-11-05 中山大学 一种基于核主成分分析的人脸超分辨率重构方法
CN101719270A (zh) * 2009-12-25 2010-06-02 武汉大学 一种基于非负矩阵分解的人脸超分辨率处理方法
CN103020940A (zh) * 2012-12-26 2013-04-03 武汉大学 一种基于局部特征转换的人脸超分辨率重建方法
US20130241633A1 (en) * 2011-09-13 2013-09-19 Jeol Ltd. Method and Apparatus for Signal Processing
CN105120130A (zh) * 2015-09-17 2015-12-02 京东方科技集团股份有限公司 一种图像升频系统、其训练方法及图像升频方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477684B (zh) * 2008-12-11 2010-11-10 西安交通大学 一种利用位置图像块重建的人脸图像超分辨率方法
KR20110065997A (ko) * 2009-12-10 2011-06-16 삼성전자주식회사 영상처리장치 및 영상처리방법
JP5706177B2 (ja) 2010-02-09 2015-04-22 パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America 超解像処理装置及び超解像処理方法
CN101950415B (zh) * 2010-09-14 2011-11-16 武汉大学 一种基于形状语义模型约束的人脸超分辨率处理方法
US8743119B2 (en) * 2011-05-24 2014-06-03 Seiko Epson Corporation Model-based face image super-resolution
US9734558B2 (en) * 2014-03-20 2017-08-15 Mitsubishi Electric Research Laboratories, Inc. Method for generating high-resolution images using regression patterns
CN106462549B (zh) * 2014-04-09 2020-02-21 尹度普有限公司 使用从显微变化中进行的机器学习来鉴定实体对象
CN105960657B (zh) * 2014-06-17 2019-08-30 北京旷视科技有限公司 使用卷积神经网络的面部超分辨率
CN104899830B (zh) * 2015-05-29 2017-09-29 清华大学深圳研究生院 一种图像超分辨方法
CN204948182U (zh) * 2015-09-17 2016-01-06 京东方科技集团股份有限公司 一种图像升频系统及显示装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216889A (zh) * 2008-01-14 2008-07-09 浙江大学 一种融合全局特征与局部细节信息的人脸图像超分辨率方法
CN101299235A (zh) * 2008-06-18 2008-11-05 中山大学 一种基于核主成分分析的人脸超分辨率重构方法
CN101719270A (zh) * 2009-12-25 2010-06-02 武汉大学 一种基于非负矩阵分解的人脸超分辨率处理方法
US20130241633A1 (en) * 2011-09-13 2013-09-19 Jeol Ltd. Method and Apparatus for Signal Processing
CN103020940A (zh) * 2012-12-26 2013-04-03 武汉大学 一种基于局部特征转换的人脸超分辨率重建方法
CN105120130A (zh) * 2015-09-17 2015-12-02 京东方科技集团股份有限公司 一种图像升频系统、其训练方法及图像升频方法

Also Published As

Publication number Publication date
US20180089803A1 (en) 2018-03-29
CN105847968A (zh) 2016-08-10
US10769758B2 (en) 2020-09-08
CN105847968B (zh) 2018-12-21

Similar Documents

Publication Publication Date Title
WO2017161710A1 (zh) 基于深度学习的解像方法和系统
Liu et al. Video super-resolution based on deep learning: a comprehensive survey
WO2017036092A1 (zh) 超解像方法和系统、服务器、用户设备及其方法
US20220222776A1 (en) Multi-Stage Multi-Reference Bootstrapping for Video Super-Resolution
DE102020214863A1 (de) Selbstüberwachtes verfahren und system zur tiefenschätzung
CN107204010A (zh) 一种单目图像深度估计方法与系统
US8538200B2 (en) Systems and methods for resolution-invariant image representation
CN106327422A (zh) 一种图像风格化重建方法及装置
CN105488759B (zh) 一种基于局部回归模型的图像超分辨率重建方法
JP2017527011A (ja) イメージをアップスケーリングする方法及び装置
CN106530231B (zh) 一种基于深层协作表达的超分辨率图像的重建方法及系统
Huang et al. Fast blind image super resolution using matrix-variable optimization
Luvizon et al. Adaptive multiplane image generation from a single internet picture
CN104376544B (zh) 一种基于多区域尺度放缩补偿的非局部超分辨率重建方法
Liu et al. Asflow: Unsupervised optical flow learning with adaptive pyramid sampling
Zhang et al. Remote-sensing image superresolution based on visual saliency analysis and unequal reconstruction networks
CN117593702B (zh) 远程监控方法、装置、设备及存储介质
CN117541629B (zh) 基于可穿戴头盔的红外图像和可见光图像配准融合方法
CN109766938A (zh) 基于场景标签约束深度网络的遥感影像多类目标检测方法
CN114494786A (zh) 一种基于多层协调卷积神经网络的细粒度图像分类方法
WO2021213188A1 (zh) 图像处理模型的训练方法、图像处理方法及对应的装置
Niu et al. Learning from multi-perception features for real-word image super-resolution
CN112184555A (zh) 一种基于深度交互学习的立体图像超分辨率重建方法
CN116152710A (zh) 一种基于跨帧实例关联的视频实例分割方法
US20230005104A1 (en) Method and electronic device for performing ai based zoom of image

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 15537677

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16895062

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16895062

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.05.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16895062

Country of ref document: EP

Kind code of ref document: A1