CN111242911A - Method and system for determining image definition based on deep learning algorithm - Google Patents

Method and system for determining image definition based on deep learning algorithm Download PDF

Info

Publication number
CN111242911A
CN111242911A CN202010017473.8A CN202010017473A CN111242911A CN 111242911 A CN111242911 A CN 111242911A CN 202010017473 A CN202010017473 A CN 202010017473A CN 111242911 A CN111242911 A CN 111242911A
Authority
CN
China
Prior art keywords
image
face image
definition
processing
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010017473.8A
Other languages
Chinese (zh)
Inventor
柴胜
杨强
刘华根
何韦澄
王玉鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Laikang Technology Co Ltd
Original Assignee
Laikang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Laikang Technology Co Ltd filed Critical Laikang Technology Co Ltd
Priority to CN202010017473.8A priority Critical patent/CN111242911A/en
Publication of CN111242911A publication Critical patent/CN111242911A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a system for determining image definition based on a deep learning algorithm, wherein the method comprises the following steps: acquiring an original face image data set containing a tongue region, and labeling the definition; performing data enhancement processing on the marked original face image data set to obtain a data-enhanced face image data set; establishing a depth network on the basis of a frame of a residual error network, establishing an incidence relation between an optimizer and the residual error network to form an image definition judgment model, and training and testing the image definition judgment model by using the data-enhanced face image data set to determine the trained image definition judgment model; and preprocessing the to-be-detected face image containing the tongue region, and analyzing the preprocessed face image by utilizing the trained image definition judgment model to determine the image definition of the to-be-detected face image containing the tongue region.

Description

Method and system for determining image definition based on deep learning algorithm
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method and a system for determining image sharpness based on a deep learning algorithm.
Background
With the rapid development of computer vision and deep learning, the progress of image analysis in the field of traditional Chinese medicine is also promoted. In order to realize intelligent facial tongue diagnosis, the definition of images of the face and the tongue photographed by a doctor is highly required, so that the images need to be subjected to fuzzy judgment before diagnosis so as to improve the accuracy of facial tongue diagnosis. The traditional image processing scheme or the scheme of feature engineering plus machine learning classification has the following problems that: a. difficulty in extracting image features; b. the model manually screened for characteristics and trained by adopting the traditional machine learning has poor generalization capability and poor robustness; c. the accuracy of the result obtained by using a machine learning method or a traditional image processing method is low; d. the threshold is difficult to select during image processing.
The concept of deep learning stems from the study of artificial neural networks. A multi-layer perceptron with multiple hidden layers is a deep learning structure. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data. The concept of deep learning was proposed by Hinton et al in 2006. An unsupervised greedy layer-by-layer training algorithm is provided based on a Deep Belief Network (DBN), and a multilayer automatic encoder deep structure is provided later to hope for solving the optimization problem related to the deep structure. In addition, the convolutional neural network proposed by Lecun et al is the first true multi-level structure learning algorithm that uses spatial relative relationships to reduce the number of parameters to improve training performance.
Therefore, a method for determining the image sharpness in a deep learning algorithm is needed.
Disclosure of Invention
The invention provides a method and a system for determining image definition based on a deep learning algorithm, which are used for solving the problem of how to automatically determine the image definition.
In order to solve the above problem, according to an aspect of the present invention, there is provided a method of determining sharpness of an image based on a deep learning algorithm, the method including:
acquiring an original face image data set containing a tongue region, and labeling the definition degree of each original face image according to the definition degree of the original face image;
performing data enhancement processing on the marked original face image data set to obtain a data-enhanced face image data set;
establishing a depth network on the basis of a frame of a residual error network, establishing an incidence relation between an optimizer and the residual error network to form an image definition judgment model, and training and testing the image definition judgment model by using the data-enhanced face image data set to determine the trained image definition judgment model;
and preprocessing the to-be-detected face image containing the tongue region, and analyzing the preprocessed face image by utilizing the trained image definition judgment model to determine the image definition of the to-be-detected face image containing the tongue region.
Preferably, the data enhancement processing on the labeled original facial image data set to obtain the data enhanced facial image data set includes:
processing the face image data in the labeled original face image data set by using at least one processing mode of color space conversion processing, brightness adjustment processing, saturation adjustment processing, channel conversion processing, random clipping processing, horizontal mirroring processing and normalization processing to obtain a data-enhanced face image data set.
Preferably, wherein the residual network is ResNet-18, consisting of 17 convolutional layers and 1 fully-connected layer; the depth network is SENET, and for SENET, a channel attention mechanism is added on the basis of the framework of ResNet-18 and is used for selecting weights in the training process.
Preferably, the training and testing the image sharpness judging model by using the data-enhanced facial image data set to determine a trained image sharpness judging model includes:
initializing the network weight value of a convolution layer group of a residual error network ResNet by adopting a network weight value of ResNet-18, and randomly initializing the network weight value of a complete connection layer of a ResNet structure;
setting the initial learning rate of a convolution layer group and a full connection layer of a residual error network ResNet to be 0.01, using Cross Engine as a loss function, keeping the learning rate of a first preset number of training samples in the early stage when iterative training is carried out unchanged, and reducing the learning rate of every second preset number of training samples in the later stage to 0.001 and 0.0001 when iterative training is carried out;
randomly selecting a face image with a preset percentage threshold value from the data-enhanced face image data set as a training set, and using the rest face images as a test set for training and testing;
and (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, and selecting the network model with the minimum loss function as the trained image definition judgment model.
Preferably, the preprocessing the to-be-detected face image including the tongue region, and analyzing the preprocessed face image by using the trained image sharpness determination model to determine the image sharpness of the to-be-detected face image including the tongue region includes:
carrying out scaling processing on the face image to be detected containing the tongue region according to a preset size, and subtracting the mean value of each channel to obtain the preprocessed face image to be detected containing the tongue region;
analyzing the preprocessed face image by using the trained image definition judging model to obtain probability values corresponding to different definition degrees, and selecting the definition degree corresponding to the maximum probability value as the image definition of the face image containing the tongue region to be detected;
wherein the degree of clarity comprises: clear, relatively clear, and unclear.
According to another aspect of the present invention, there is provided a system for determining sharpness of an image based on a deep learning algorithm, the system comprising:
the definition degree labeling unit is used for acquiring an original face image data set containing a tongue region and labeling the definition degree of each original face image according to the definition degree of the original face image;
the data enhancement processing unit is used for carrying out data enhancement processing on the marked original face image data set so as to obtain a data enhanced face image data set;
the image definition judgment model determining unit is used for building a depth network on the basis of a frame of a residual error network, building an incidence relation between an optimizer and the residual error network to form an image definition judgment model, and training and testing the image definition judgment model by utilizing the data-enhanced face image data set to determine the trained image definition judgment model;
and the image definition determining unit is used for preprocessing the to-be-detected face image containing the tongue region and analyzing the preprocessed face image by utilizing the trained image definition judging model so as to determine the image definition of the to-be-detected face image containing the tongue region.
Preferably, the data enhancement processing unit performs data enhancement processing on the labeled original face image data set to obtain a data-enhanced face image data set, and includes:
processing the face image data in the labeled original face image data set by using at least one processing mode of color space conversion processing, brightness adjustment processing, saturation adjustment processing, channel conversion processing, random clipping processing, horizontal mirroring processing and normalization processing to obtain a data-enhanced face image data set.
Preferably, wherein the residual network is ResNet-18, consisting of 17 convolutional layers and 1 fully-connected layer; the depth network is SENET, and for SENET, a channel attention mechanism is added on the basis of the framework of ResNet-18 and is used for selecting weights in the training process.
Preferably, the image sharpness determination model determining unit, which trains and tests the image sharpness determination model by using the data-enhanced face image data set to determine a trained image sharpness determination model, includes:
initializing the network weight value of a convolution layer group of a residual error network ResNet by adopting a network weight value of ResNet-18, and randomly initializing the network weight value of a complete connection layer of a ResNet structure;
setting the initial learning rate of a convolution layer group and a full connection layer of a residual error network ResNet to be 0.01, using Cross Engine as a loss function, keeping the learning rate of a first preset number of training samples in the early stage when iterative training is carried out unchanged, and reducing the learning rate of every second preset number of training samples in the later stage to 0.001 and 0.0001 when iterative training is carried out;
randomly selecting a face image with a preset percentage threshold value from the data-enhanced face image data set as a training set, and using the rest face images as a test set for training and testing;
and (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, and selecting the network model with the minimum loss function as the trained image definition judgment model.
Preferably, the image sharpness determining unit preprocesses the to-be-detected face image including the tongue region, and analyzes the preprocessed face image by using the trained image sharpness determining model to determine the image sharpness of the to-be-detected face image including the tongue region, and includes:
carrying out scaling processing on the face image to be detected containing the tongue region according to a preset size, and subtracting the mean value of each channel to obtain the preprocessed face image to be detected containing the tongue region;
analyzing the preprocessed face image by using the trained image definition judging model to obtain probability values corresponding to different definition degrees, and selecting the definition degree corresponding to the maximum probability value as the image definition of the face image containing the tongue region to be detected;
wherein the degree of clarity comprises: clear, relatively clear, and unclear.
The invention provides a method and a system for determining image definition based on a deep learning algorithm, wherein the method comprises the following steps: marking the definition of each original face image; performing data enhancement processing on the marked original face image data set; establishing a depth network on the basis of a frame of a residual error network, establishing an incidence relation between an optimizer and the residual error network to form an image definition judgment model, and performing training and testing to determine the trained image definition judgment model; and analyzing the face image to be detected by using the trained image definition judgment model so as to determine the image definition of the face image to be detected. The method does not need manual feature screening, avoids the process of selecting classification features by feature engineering, can lead the original data to derive richer new data sets as training data through data enhancement, thereby leading the model to adapt to different scenes, and laying a foundation for the accuracy of the diagnosis of the tongue and the face by judging the definition of the image.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flow diagram of a method 100 for determining sharpness of an image based on a deep learning algorithm according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a residual unit according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep network SEnet according to an embodiment of the present invention; and
fig. 4 is a schematic structural diagram of a system 400 for determining image sharpness based on a deep learning algorithm according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a flow chart of a method 100 for determining sharpness of an image based on a deep learning algorithm according to an embodiment of the present invention. As shown in fig. 1, the method for determining the image sharpness based on the deep learning algorithm provided by the embodiment of the present invention does not need manual feature screening, avoids the process of selecting classification features in feature engineering, and can derive a new data set, which is richer in raw data derivation, as training data through data enhancement, so that the model can adapt to different scenes, and can lay a foundation for the accuracy of the diagnosis of the facial and tongue by judging the sharpness of the image. In the method 100 for determining the image definition based on the deep learning algorithm according to the embodiment of the present invention, starting from step 101, an original face image dataset including a tongue region is obtained in step 101, and the definition of each original face image is labeled according to the definition of the original face image.
In the embodiment of the invention, the images with the face and the tongue are collected through a hospital, a community and the like, so as to ensure the diversity and uniformity of data sources. Classification criteria according to degree of clarity: clear, relatively clear and unclear, and carries out classification and labeling on the data.
At step 102, data enhancement processing is performed on the annotated raw facial image dataset to obtain a data enhanced facial image dataset.
Preferably, the data enhancement processing on the labeled original facial image data set to obtain the data enhanced facial image data set includes:
processing the face image data in the labeled original face image data set by using at least one processing mode of color space conversion processing, brightness adjustment processing, saturation adjustment processing, channel conversion processing, random clipping processing, horizontal mirroring processing and normalization processing to obtain a data-enhanced face image data set.
In the embodiment of the invention, in order to improve the robustness of the training model, a richer new data set is derived from the original data as training data in a data enhancement mode, so that the model can adapt to different scenes. The processing mode for enhancing the data of the original face image data containing the tongue part comprises the following steps: the transformation processing of the color space, transform the picture from RGB space to HUE space; adjusting brightness value of the image randomly; adjusting the saturation degree, and randomly adjusting the saturation degree of the image; performing channel conversion processing, and exchanging positions of the three channels of the image to generate a new image; random clipping processing, wherein images are randomly clipped on the basis of the original image to enrich the background; horizontal mirroring to increase diversity; and normalization processing is carried out, so that the interference of the direct current signal of the image is reduced. One or more processing modes can be selected from any original facial image data to be processed simultaneously.
In step 103, a depth network is established on the basis of a frame of a residual error network, an association relationship between an optimizer and the residual error network is established to construct an image definition judgment model, and the image definition judgment model is trained and tested by using the data-enhanced face image data set to determine the trained image definition judgment model.
Preferably, wherein the residual network is ResNet-18, consisting of 17 convolutional layers and 1 fully-connected layer; the depth network is SENET, and for SENET, a channel attention mechanism is added on the basis of the framework of ResNet-18 and is used for selecting weights in the training process.
Preferably, the training and testing the image sharpness judging model by using the data-enhanced facial image data set to determine a trained image sharpness judging model includes:
initializing the network weight value of a convolution layer group of a residual error network ResNet by adopting a network weight value of ResNet-18, and randomly initializing the network weight value of a complete connection layer of a ResNet structure;
setting the initial learning rate of a convolution layer group and a full connection layer of a residual error network ResNet to be 0.01, using Cross Engine as a loss function, keeping the learning rate of a first preset number of training samples in the early stage when iterative training is carried out unchanged, and reducing the learning rate of every second preset number of training samples in the later stage to 0.001 and 0.0001 when iterative training is carried out;
randomly selecting a face image with a preset percentage threshold value from the data-enhanced face image data set as a training set, and using the rest face images as a test set for training and testing;
and (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, and selecting the network model with the minimum loss function as the trained image definition judgment model.
In the embodiment of the invention, the residual network is ResNet-18 and is composed of 17 convolutional layers and 1 complete connection layer. Fig. 2 is a schematic diagram of a residual error unit according to an embodiment of the present invention. As shown in fig. 2, the network at the early stage mainly extracts the high-level features of the image data of the data, the network at the middle stage mainly extracts the low-level features of the data, and the network at the later stage synthesizes some features by itself to form the feature combination of the whole model.
Fig. 3 is a schematic diagram of a deep network SENet according to an embodiment of the present invention. As shown in fig. 3, C is 16 for the deep network SENet classification network constructed in the embodiment of the present invention. The conditions of global blurring and local blurring of the blur distribution in the image acquired by analysis are divided into the following according to the reasons for blurring: motion blur and out-of-focus blur. In the face of such practical situations, we introduce a mechanism of attention to improve the accuracy of classification.
For the SENET network, on the basis of a ResNet-18 framework, a Channel Attention mechanism is added to solve the problem that local blurring in image blurring is not easy to detect, and the function is to select weights in the training process. In the process of classifying the blurred image, the classification sizes of blurred pixel blocks are different, and then the classification accuracy can be improved by performing weight screening on the part of pixels effectively during training and aiming at the blurred part through centralized learning.
In the training process, an SGD optimizer is adopted, Cross Engine is used as a loss function, the initial learning rate is set to be 0.01, the first 30k iterations are set, the learning rate is unchanged, and the learning rate is reduced to 0.001 and 0.0001 every 10k iterations in the later period. Then, 80% of the face images in the data-enhanced face image data set are randomly selected as a training set, and the remaining 20% of the face images are selected as a testing set, so as to perform training and testing. And (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, adjusting parameters, re-training until the available accuracy is reached, and selecting the network model with the minimum loss function as the trained image definition judgment model.
In step 104, preprocessing the to-be-detected face image including the tongue region, and analyzing the preprocessed face image by using the trained image sharpness judgment model to determine the image sharpness of the to-be-detected face image including the tongue region.
Preferably, the preprocessing the to-be-detected face image including the tongue region, and analyzing the preprocessed face image by using the trained image sharpness determination model to determine the image sharpness of the to-be-detected face image including the tongue region includes:
carrying out scaling processing on the face image to be detected containing the tongue region according to a preset size, and subtracting the mean value of each channel to obtain the preprocessed face image to be detected containing the tongue region;
analyzing the preprocessed face image by using the trained image definition judging model to obtain probability values corresponding to different definition degrees, and selecting the definition degree corresponding to the maximum probability value as the image definition of the face image containing the tongue region to be detected;
wherein the degree of clarity comprises: clear, relatively clear, and unclear.
In the implementation method of the invention, when a to-be-detected face image containing a tongue region is obtained, firstly, the to-be-detected face image is scaled according to a preset size, then, the average value of each channel is subtracted, then, the preprocessed image is input into a trained model, three probability values are output, and the class with the maximum probability is selected as a corresponding classification result. When the classification result indicates that the image is clear or not clear, the user can be reminded to take the picture again in time.
Fig. 4 is a schematic structural diagram of a system 400 for determining image sharpness based on a deep learning algorithm according to an embodiment of the present invention. As shown in fig. 4, a system 400 for determining image sharpness based on a deep learning algorithm according to an embodiment of the present invention includes: a definition labeling unit 401, a data enhancement processing unit 402, an image definition judgment model determining unit 403, and an image definition determining unit 404.
Preferably, the definition labeling unit 401 is configured to obtain an original face image data set including a tongue region, and label the definition of each original face image according to the definition of the original face image.
Preferably, the data enhancement processing unit 402 is configured to perform data enhancement processing on the labeled original facial image data set to obtain a data enhanced facial image data set.
Preferably, the data enhancement processing unit 402 performs data enhancement processing on the labeled original facial image data set to obtain a data enhanced facial image data set, including:
processing the face image data in the labeled original face image data set by using at least one processing mode of color space conversion processing, brightness adjustment processing, saturation adjustment processing, channel conversion processing, random clipping processing, horizontal mirroring processing and normalization processing to obtain a data-enhanced face image data set.
Preferably, the image sharpness determination model determining unit 403 is configured to build a depth network based on a frame of a residual network, build an association relationship between an optimizer and the residual network to form an image sharpness determination model, and train and test the image sharpness determination model by using the data-enhanced facial image data set to determine the trained image sharpness determination model.
Preferably, wherein the residual network is ResNet-18, consisting of 17 convolutional layers and 1 fully-connected layer; the depth network is SENET, and for SENET, a channel attention mechanism is added on the basis of the framework of ResNet-18 and is used for selecting weights in the training process.
Preferably, the image sharpness determination model determining unit 403, which trains and tests the image sharpness determination model by using the data-enhanced face image data set to determine a trained image sharpness determination model, includes:
initializing the network weight value of a convolution layer group of a residual error network ResNet by adopting a network weight value of ResNet-18, and randomly initializing the network weight value of a complete connection layer of a ResNet structure;
setting the initial learning rate of a convolution layer group and a full connection layer of a residual error network ResNet to be 0.01, using Cross Engine as a loss function, keeping the learning rate of a first preset number of training samples in the early stage when iterative training is carried out unchanged, and reducing the learning rate of every second preset number of training samples in the later stage to 0.001 and 0.0001 when iterative training is carried out;
randomly selecting a face image with a preset percentage threshold value from the data-enhanced face image data set as a training set, and using the rest face images as a test set for training and testing;
and (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, and selecting the network model with the minimum loss function as the trained image definition judgment model.
Preferably, the image sharpness determining unit 404 is configured to pre-process the to-be-detected face image including the tongue region, and analyze the pre-processed face image by using the trained image sharpness determining model to determine the image sharpness of the to-be-detected face image including the tongue region.
Preferably, the image sharpness determining unit preprocesses the to-be-detected face image including the tongue region, and analyzes the preprocessed face image by using the trained image sharpness determining model to determine the image sharpness of the to-be-detected face image including the tongue region, and includes:
carrying out scaling processing on the face image to be detected containing the tongue region according to a preset size, and subtracting the mean value of each channel to obtain the preprocessed face image to be detected containing the tongue region;
analyzing the preprocessed face image by using the trained image definition judging model to obtain probability values corresponding to different definition degrees, and selecting the definition degree corresponding to the maximum probability value as the image definition of the face image containing the tongue region to be detected;
wherein the degree of clarity comprises: clear, relatively clear, and unclear.
The system 400 for determining the image sharpness based on the deep learning algorithm according to the embodiment of the present invention corresponds to the method 100 for determining the image sharpness based on the deep learning algorithm according to another embodiment of the present invention, and is not described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A method for determining image sharpness based on a deep learning algorithm, the method comprising:
acquiring an original face image data set containing a tongue region, and labeling the definition degree of each original face image according to the definition degree of the original face image;
performing data enhancement processing on the marked original face image data set to obtain a data-enhanced face image data set;
establishing a depth network on the basis of a frame of a residual error network, establishing an incidence relation between an optimizer and the residual error network to form an image definition judgment model, and training and testing the image definition judgment model by using the data-enhanced face image data set to determine the trained image definition judgment model;
and preprocessing the to-be-detected face image containing the tongue region, and analyzing the preprocessed face image by utilizing the trained image definition judgment model to determine the image definition of the to-be-detected face image containing the tongue region.
2. The method of claim 1, wherein the data enhancement processing the annotated original facial image dataset to obtain a data enhanced facial image dataset comprises:
processing the face image data in the labeled original face image data set by using at least one processing mode of color space conversion processing, brightness adjustment processing, saturation adjustment processing, channel conversion processing, random clipping processing, horizontal mirroring processing and normalization processing to obtain a data-enhanced face image data set.
3. The method of claim 1, wherein the residual network is ResNet-18, consisting of 17 convolutional layers and 1 fully-connected layer; the depth network is SENET, and for SENET, a channel attention mechanism is added on the basis of the framework of ResNet-18 and is used for selecting weights in the training process.
4. The method of claim 3, wherein the training and testing the image sharpness determination model with the data-enhanced facial image dataset to determine a trained image sharpness determination model comprises:
initializing the network weight value of a convolution layer group of a residual error network ResNet by adopting a network weight value of ResNet-18, and randomly initializing the network weight value of a complete connection layer of a ResNet structure;
setting the initial learning rate of a convolution layer group and a full connection layer of a residual error network ResNet to be 0.01, using Cross Engine as a loss function, keeping the learning rate of a first preset number of training samples in the early stage when iterative training is carried out unchanged, and reducing the learning rate of every second preset number of training samples in the later stage to 0.001 and 0.0001 when iterative training is carried out;
randomly selecting a face image with a preset percentage threshold value from the data-enhanced face image data set as a training set, and using the rest face images as a test set for training and testing;
and (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, and selecting the network model with the minimum loss function as the trained image definition judgment model.
5. The method of claim 1, wherein the preprocessing the image of the face containing the tongue region to be tested and analyzing the preprocessed image of the face with the trained image sharpness determination model to determine the image sharpness of the image of the face containing the tongue region to be tested comprises:
carrying out scaling processing on the face image to be detected containing the tongue region according to a preset size, and subtracting the mean value of each channel to obtain the preprocessed face image to be detected containing the tongue region;
analyzing the preprocessed face image by using the trained image definition judging model to obtain probability values corresponding to different definition degrees, and selecting the definition degree corresponding to the maximum probability value as the image definition of the face image containing the tongue region to be detected;
wherein the degree of clarity comprises: clear, relatively clear, and unclear.
6. A system for determining sharpness of an image based on a deep learning algorithm, the system comprising:
the definition degree labeling unit is used for acquiring an original face image data set containing a tongue region and labeling the definition degree of each original face image according to the definition degree of the original face image;
the data enhancement processing unit is used for carrying out data enhancement processing on the marked original face image data set so as to obtain a data enhanced face image data set;
the image definition judgment model determining unit is used for building a depth network on the basis of a frame of a residual error network, building an incidence relation between an optimizer and the residual error network to form an image definition judgment model, and training and testing the image definition judgment model by utilizing the data-enhanced face image data set to determine the trained image definition judgment model;
and the image definition determining unit is used for preprocessing the to-be-detected face image containing the tongue region and analyzing the preprocessed face image by utilizing the trained image definition judging model so as to determine the image definition of the to-be-detected face image containing the tongue region.
7. The system of claim 6, wherein the data enhancement processing unit performs data enhancement processing on the labeled original facial image dataset to obtain a data enhanced facial image dataset, comprising:
processing the face image data in the labeled original face image data set by using at least one processing mode of color space conversion processing, brightness adjustment processing, saturation adjustment processing, channel conversion processing, random clipping processing, horizontal mirroring processing and normalization processing to obtain a data-enhanced face image data set.
8. The system of claim 6, wherein the residual network is ResNet-18, consisting of 17 convolutional layers and 1 fully-connected layer; the depth network is SENET, and for SENET, a channel attention mechanism is added on the basis of the framework of ResNet-18 and is used for selecting weights in the training process.
9. The system according to claim 8, wherein the image sharpness determination model determining unit trains and tests the image sharpness determination model using the data-enhanced facial image data set to determine a trained image sharpness determination model, comprising:
initializing the network weight value of a convolution layer group of a residual error network ResNet by adopting a network weight value of ResNet-18, and randomly initializing the network weight value of a complete connection layer of a ResNet structure;
setting the initial learning rate of a convolution layer group and a full connection layer of a residual error network ResNet to be 0.01, using Cross Engine as a loss function, keeping the learning rate of a first preset number of training samples in the early stage when iterative training is carried out unchanged, and reducing the learning rate of every second preset number of training samples in the later stage to 0.001 and 0.0001 when iterative training is carried out;
randomly selecting a face image with a preset percentage threshold value from the data-enhanced face image data set as a training set, and using the rest face images as a test set for training and testing;
and (3) performing iterative training on the image definition judgment model by adopting a random gradient descent algorithm, and selecting the network model with the minimum loss function as the trained image definition judgment model.
10. The system of claim 6, wherein the image sharpness determining unit pre-processes the image of the face containing the tongue region to be measured, and analyzes the pre-processed image of the face using the trained image sharpness determining model to determine the image sharpness of the image of the face containing the tongue region to be measured, and comprises:
carrying out scaling processing on the face image to be detected containing the tongue region according to a preset size, and subtracting the mean value of each channel to obtain the preprocessed face image to be detected containing the tongue region;
analyzing the preprocessed face image by using the trained image definition judging model to obtain probability values corresponding to different definition degrees, and selecting the definition degree corresponding to the maximum probability value as the image definition of the face image containing the tongue region to be detected;
wherein the degree of clarity comprises: clear, relatively clear, and unclear.
CN202010017473.8A 2020-01-08 2020-01-08 Method and system for determining image definition based on deep learning algorithm Pending CN111242911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010017473.8A CN111242911A (en) 2020-01-08 2020-01-08 Method and system for determining image definition based on deep learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010017473.8A CN111242911A (en) 2020-01-08 2020-01-08 Method and system for determining image definition based on deep learning algorithm

Publications (1)

Publication Number Publication Date
CN111242911A true CN111242911A (en) 2020-06-05

Family

ID=70874355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010017473.8A Pending CN111242911A (en) 2020-01-08 2020-01-08 Method and system for determining image definition based on deep learning algorithm

Country Status (1)

Country Link
CN (1) CN111242911A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111885297A (en) * 2020-06-16 2020-11-03 北京迈格威科技有限公司 Image definition determining method, image focusing method and device
CN113435470A (en) * 2021-05-10 2021-09-24 北京化工大学 Three-dimensional object feature region identification method based on semantic segmentation
CN117019883A (en) * 2023-08-25 2023-11-10 华北电力大学(保定) Strip rolling process plate shape prediction method based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018035794A1 (en) * 2016-08-22 2018-03-01 中国科学院深圳先进技术研究院 System and method for measuring image resolution value
CN108898579A (en) * 2018-05-30 2018-11-27 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device and storage medium
CN110309789A (en) * 2019-07-04 2019-10-08 北京维联众诚科技有限公司 Video monitoring human face clarity evaluation method and device based on deep learning
CN110533097A (en) * 2019-08-27 2019-12-03 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018035794A1 (en) * 2016-08-22 2018-03-01 中国科学院深圳先进技术研究院 System and method for measuring image resolution value
CN108898579A (en) * 2018-05-30 2018-11-27 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device and storage medium
CN110309789A (en) * 2019-07-04 2019-10-08 北京维联众诚科技有限公司 Video monitoring human face clarity evaluation method and device based on deep learning
CN110533097A (en) * 2019-08-27 2019-12-03 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIE HU ET AL.: "Squeeze-and-Excitation Networks", 《ARXIV:1709.01507V4 [CS.CV]》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111885297A (en) * 2020-06-16 2020-11-03 北京迈格威科技有限公司 Image definition determining method, image focusing method and device
CN113435470A (en) * 2021-05-10 2021-09-24 北京化工大学 Three-dimensional object feature region identification method based on semantic segmentation
CN113435470B (en) * 2021-05-10 2024-04-26 北京化工大学 Three-dimensional object feature region identification method based on semantic segmentation
CN117019883A (en) * 2023-08-25 2023-11-10 华北电力大学(保定) Strip rolling process plate shape prediction method based on deep learning
CN117019883B (en) * 2023-08-25 2024-02-13 华北电力大学(保定) Strip rolling process plate shape prediction method based on deep learning

Similar Documents

Publication Publication Date Title
CN108090902B (en) Non-reference image quality objective evaluation method based on multi-scale generation countermeasure network
CN110675328B (en) Low-illumination image enhancement method and device based on condition generation countermeasure network
CN111292264A (en) Image high dynamic range reconstruction method based on deep learning
CN106875373B (en) Mobile phone screen MURA defect detection method based on convolutional neural network pruning algorithm
CN112614077B (en) Unsupervised low-illumination image enhancement method based on generation countermeasure network
CN108900769A (en) Image processing method, device, mobile terminal and computer readable storage medium
CN109255758B (en) Image enhancement method based on all 1 x 1 convolution neural network
CN111242911A (en) Method and system for determining image definition based on deep learning algorithm
CN110807757B (en) Image quality evaluation method and device based on artificial intelligence and computer equipment
CN111064904A (en) Dark light image enhancement method
CN111161178A (en) Single low-light image enhancement method based on generation type countermeasure network
CN111047543A (en) Image enhancement method, device and storage medium
CN109872313A (en) A kind of method for detecting surface defects of products based on depth convolution self-encoding encoder
CN114399431A (en) Dim light image enhancement method based on attention mechanism
CN113284061B (en) Underwater image enhancement method based on gradient network
CN112465726A (en) Low-illumination adjustable brightness enhancement method based on reference brightness index guidance
CN117237279A (en) Blind quality evaluation method and system for non-uniform distortion panoramic image
CN108257117B (en) Image exposure evaluation method and device
CN111080754B (en) Character animation production method and device for connecting characteristic points of head and limbs
CN112818774A (en) Living body detection method and device
CN115018729B (en) Content-oriented white box image enhancement method
CN114187380B (en) Color transfer method based on visual saliency and channel attention mechanism
CN111598144A (en) Training method and device of image recognition model
CN115100312B (en) Image cartoon method and device
CN117456313B (en) Training method, estimation and mapping method and system of tone curve estimation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200605