CN117422855A - Machine vision-oriented image preprocessing method, device, equipment and storage medium - Google Patents

Machine vision-oriented image preprocessing method, device, equipment and storage medium Download PDF

Info

Publication number
CN117422855A
CN117422855A CN202311750184.9A CN202311750184A CN117422855A CN 117422855 A CN117422855 A CN 117422855A CN 202311750184 A CN202311750184 A CN 202311750184A CN 117422855 A CN117422855 A CN 117422855A
Authority
CN
China
Prior art keywords
image
network
feature
neural network
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311750184.9A
Other languages
Chinese (zh)
Other versions
CN117422855B (en
Inventor
马思伟
蒋云
滕波
黄志勐
高文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Advanced Institute of Information Technology AIIT of Peking University
Original Assignee
Peking University
Advanced Institute of Information Technology AIIT of Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Advanced Institute of Information Technology AIIT of Peking University filed Critical Peking University
Priority to CN202311750184.9A priority Critical patent/CN117422855B/en
Priority claimed from CN202311750184.9A external-priority patent/CN117422855B/en
Publication of CN117422855A publication Critical patent/CN117422855A/en
Application granted granted Critical
Publication of CN117422855B publication Critical patent/CN117422855B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks

Abstract

The application provides a machine vision-oriented image preprocessing method, a device, equipment and a storage medium, wherein the method comprises the following steps: the original image is subjected to blurring processing to generate an image to be enhanced, and the definition of the image to be enhanced is lower than that of the original image; carrying out enhancement processing on semantic features of an image to be enhanced to generate a target image; inputting a target image into an image processing neural network to trigger the image processing neural network to execute an image analysis task based on semantic features of the target image. The image preprocessing technology provided by the embodiment of the application can maintain the analysis of the image processing neural network at a better level under the condition of reducing the code rate.

Description

Machine vision-oriented image preprocessing method, device, equipment and storage medium
Technical Field
The application belongs to the technical field of image processing, and particularly relates to a machine vision-oriented image preprocessing method, a machine vision-oriented image preprocessing device, machine vision-oriented image preprocessing equipment and a storage medium.
Background
Machine vision (also known as computer vision) oriented image analysis tasks may include neural network based image segmentation, image matching, and the like. The amount of data of the original image is generally large, and some image data which is not related to the image analysis task is contained. In some embodiments, then, to reduce the data throughput and save resources, the original image should be preprocessed, and the preprocessed image is input into the neural network, so that the neural network performs a corresponding image analysis task. It can be seen that the quality of the preprocessed image has a decisive influence on the analytical performance of the neural network.
However, most of the existing image preprocessing techniques are used for processing from color, brightness and the like of image pixels, and are not suitable for machine vision-oriented image analysis task scenes, and other image processing techniques are used for machine vision-oriented image analysis tasks, so that the problems of lower analysis performance and the like are caused. Based on the above, a preprocessing method suitable for the image analysis task facing the machine vision, and the preprocessed image can improve the analysis performance is a technology in the field of urgent need.
Disclosure of Invention
The application provides an image preprocessing method, device, equipment and storage medium for machine vision, which are suitable for image preprocessing of an image analysis task for the machine vision, and enable the preprocessed image to be beneficial to improving analysis performance.
An embodiment of a first aspect of the present application provides a machine vision-oriented image preprocessing method, including:
performing blurring processing on an original image to generate an image to be enhanced, wherein the definition of the image to be enhanced is lower than that of the original image;
performing enhancement processing on the semantic features of the image to be enhanced to generate a target image;
inputting the target image into an image processing neural network to trigger the image processing neural network to execute an image analysis task based on semantic features of the target image.
An embodiment of a second aspect of the present application provides an image preprocessing apparatus, including:
the blurring processing module is used for blurring processing the original image to generate an image to be enhanced, and the definition of the image to be enhanced is lower than that of the original image;
the enhancement module is used for enhancing the semantic features of the image to be enhanced to generate a target image;
and the input module is used for inputting the target image into an image processing neural network so as to trigger the image processing neural network to execute an image analysis task based on the semantic features of the target image.
Embodiments of the third aspect of the present application provide a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor running the computer program to implement the method of the first aspect.
An embodiment of the fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor to implement the method of the first aspect.
The technical scheme provided in the embodiment of the application has at least the following technical effects or advantages:
In the embodiment of the application, the original image may be subjected to blurring processing to generate the image to be enhanced, where the sharpness of the image to be enhanced is lower than that of the original image. Thus, by reducing the sharpness of the original image, the code rate of the original image is reduced. Further, the semantic features of the image to be enhanced are enhanced to generate a target image, and the target image is input into the image processing neural network to trigger the image processing neural network to execute an image analysis task based on the semantic features of the target image. The semantic features may be feature data of the image processing neural network performing an image analysis task. Compared with the traditional image preprocessing from the angles of the colors and the brightness of each pixel of the image, the technical scheme of the embodiment of the application comprehensively reduces the definition of the original image, and then pertinently enhances the part of the semantic features in the image, thereby achieving the effect of reducing the code rate of the original image on the basis of not reducing the intensity of the semantic features of the original image, being beneficial to reducing the calculated amount and saving the cost. Compared with the traditional image preprocessing from the angles of the color and brightness of each pixel of the image, the technical scheme of the embodiment of the application performs preprocessing in the aspect of the semantic features of the image, so that the target image can be used as an input image of the image processing neural network, and the analysis performance of the image processing neural network can be maintained at a better level.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 shows a schematic block diagram of a CV-oriented analysis system provided by an embodiment of the present application;
fig. 2 is a schematic view of a machine vision-oriented image preprocessing method according to an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for machine vision oriented image preprocessing according to an embodiment of the present application;
FIG. 4A is a schematic diagram illustrating a data flow of a machine vision-oriented image preprocessing method according to an embodiment of the present disclosure;
FIG. 4B is a schematic diagram illustrating a data flow of another image preprocessing method for machine vision according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an architecture of an image preprocessing network training system according to an embodiment of the present application;
FIGS. 6a and 6b are schematic diagrams showing the effect contrast of image preprocessing according to an embodiment of the present application;
fig. 7 shows a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application are described below with reference to the drawings in the embodiments of the present application.
The terminology used in the following examples of the application is for the purpose of describing particular embodiments and is not intended to be limiting of the technical aspects of the application. As used in the specification and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates to the contrary. It should also be understood that, although the terms first, second, etc. may be used in the following embodiments to describe certain types of objects, the objects should not be limited to these terms. These terms are used to distinguish between specific implementation objects of such objects. For example, the terms first, second, etc. are used in the following embodiments to describe semantic features, but the semantic features are not limited to these terms. These terms are only used to distinguish semantic features of different images. Other classes of objects that may be described in the following embodiments using the terms first, second, etc. are not described here again.
The following describes the related art related to the embodiments of the present application.
The embodiment of the application relates to the technical field of image processing and discloses a method for preprocessing an image based on artificial intelligence (Artificial Intelligence, AI) so that the preprocessed image supports an image analysis task oriented to Computer Vision (CV).
AI is a theory, method, technique, and application system that utilizes a digital computer or a digital computer-controlled machine to simulate, extend, and extend human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. AI software technologies mainly include Computer Vision (CV) technology, speech processing (Speech Technology) technology, natural language processing (Nature Language processing, NLP) technology, and Machine Learning (ML)/deep Learning.
The technical scheme mainly relates to CV technology, which is a science for researching how to make a machine 'look', and further means that a camera and a computer are used for replacing human eyes to recognize, track, measure and other machine vision of a target, and further graphic processing is carried out, so that the computer is processed into an image which is more suitable for human eyes to observe or transmit to an instrument to detect. CV technology typically includes image processing (including image encryption, etc.), image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, etc., as well as common biometric recognition technologies such as face recognition. The CV-based image analysis tasks to which embodiments of the present application relate include, for example, at least one of: image matching, image recognition, image separation, image content extraction, face recognition and the like.
Taking the CV-based image analysis task as an example of image content extraction, the CV network may extract a target object (e.g., a target face, a target lesion site, a target vehicle, etc.) included in the input image by performing a series of processes such as encoding and feature extraction on the image, and may further perform other applications according to the extracted target object. Taking the extraction of the target vehicle as an example, the extraction ratio of the CV network to all the target vehicles contained in the input image, the accuracy of the extracted vehicle as the target vehicle, and the like, characterizes the analysis performance of the CV network. The higher the extraction ratio of the CV network to all the target vehicles is, the higher the accuracy of the extracted vehicles as the target vehicles is, and the better the analysis performance of the CV network can be represented; conversely, analytical performance of the CV network may be characterized as relatively poor.
It should be understood that, in the case where the CV-based image analysis task is another image analysis task, the representation of the analysis performance of the CV network may be represented by the expression level of the object to be analyzed by the analysis result of the CV network, which is not developed here.
In the existing CV processing process, after the original image is encoded, the image data is convenient to transmit, analyze and the like. The original image is generally a lossless image with higher code rate, and the computing power of most CV processing equipment may be limited, so that the definition of the encoded image is reduced or even distorted, thereby reducing the analysis performance of the CV analysis network. In order to reduce the data processing capacity of the CV network, the original image can be preprocessed, and the preprocessed image is input into the CV network. However, the conventional image preprocessing is based on three-component information of Red Green Blue (RGB) of an image pixel, and the image is subjected to gray-scale processing and geometric transformation processing to obtain a preprocessed image. That is, the conventional image preprocessing processes from the viewpoints of gray scale, brightness and the like of pixels in an image, does not involve the processing and understanding of semantic features contained in the image, and the CV technology is used as a deep learning technology, and the processing of the image is oriented to the viewpoints of semantic features represented by the pixel distribution of the image, so that the conventional image preprocessing cannot be suitable for the image analysis task oriented to the CV.
In view of this, the embodiments of the present application provide a CV-oriented image preprocessing technique, in which an original image is first blurred, so that the bitrate of the original image is reduced by reducing the sharpness of the original image. And further, enhancing the semantic features of the image to be enhanced obtained after the blurring processing, so that the strength of the semantic features of the image after the enhancing processing is greater than or equal to that of the semantic features of the original image. Therefore, on the basis of not reducing the strength of the semantic features of the original image, the effect of reducing the code rate is achieved by reducing the code rate of the content which has weak correlation with the image analysis task in the original image. The processing process is not limited to the smaller aspect of the gray level of the pixels in the image, but is performed in the aspect of the semantic features of the image, so that the image after enhancement processing can be used as an input image of the image processing neural network, the image processing neural network can perform image analysis tasks based on the semantic features, and the analysis performance of the image processing neural network can be maintained at a better level.
Technical scenarios and system architectures related to the embodiments of the present application are described below.
Referring to FIG. 1, FIG. 1 shows a schematic block diagram of a CV-oriented analysis system provided by an embodiment of the present application. As shown in fig. 1, the CV-oriented analysis system may include an image source 11, an image preprocessing network 12, an encoding network 13, and an analysis network 14. In a specific implementation manner, the image source 11, the image preprocessing network 12, the encoding network 13, and the analysis network 14 may be implemented as hardware components, software components, or a combination of hardware and software components. The image preprocessing network 12, the encoding network 13 and the analysis network 14 may be algorithm networks based on deep learning. The encoding network 13 and the analysis network 14 may be used to compose the CV network described above. The descriptions are as follows:
the image source 11 may be used to provide raw images for a CV-oriented analysis system, may include or may be any type of image capture device for capturing, for example, real world images, real object images, and/or any type of images. The image source 11 may be a camera for capturing images or a memory for storing images. When the image source 11 is a camera, the image source 11 may be, for example, an integrated camera, either local or integrated in the source device; when the image source 11 is a memory, the image source 11 may be local or an integrated memory integrated in the source device, for example.
The image preprocessing network 12 may be used to preprocess the original image from the image source 11 to minimize the code rate of the image while maintaining the analysis performance of the content to be analyzed effectively. In the embodiment of the present application, the image preprocessing network 12 may reduce the amount of data with smaller relevance to the content to be analyzed in the original image, and maintain or enhance the portion of the original image that characterizes the content to be analyzed. For example, the preprocessing performed by the image preprocessing network 12 may include blurring processing, image orientation enhancement processing after blurring processing.
In one embodiment, the image pre-processing network 12 may include an algorithm module, a deep learning based image processing network, or a model. The algorithm modules, the image processing network based on deep learning or the model can realize the machine vision-oriented image preprocessing method in the embodiment of the application in different combination modes. Such as degenerate algorithms, U-hop layer connection (U-net) networks, etc.
In this way, the manner and principles of image preprocessing by the image preprocessing network 12 are matched to CV techniques so that the image preprocessing network 12 can be applied to CV analysis systems. The preprocessed image output from the image preprocessing network 12 can be used directly for the image analysis task of the CV network.
The encoding network 13 may be configured to receive the image preprocessed by the image preprocessing network 12, and process the preprocessed image by using an operation module preset by the encoding network 13, so as to provide image data including semantic features of an original image. In some embodiments, the encoding network 13 may be implemented as an all-neural network.
The analysis network 14 may be used to perform image analysis tasks on the encoded data images and to output analysis results. The analysis network 14 may be a neural network for performing at least one image analysis task of image matching, image recognition, image separation, image content extraction, and face recognition. The analysis results output by the analysis network 14 may be implemented as predictions of probability distribution, confidence parameters, etc., which may characterize how well the analysis network 14 understands the semantic features in the original image.
It should be appreciated that while the image pre-processing network 12 is integrated in the CV-oriented analysis system in FIG. 1, in an apparatus embodiment, the image pre-processing network 12 may be deployed within an exemplary CV-oriented analysis system and coupled with other functional devices within the CV-oriented analysis system; the functions of the image preprocessing network 12 may also be integrated into a separate computer device, so that the computer device has the image preprocessing functions of the embodiments of the present application, and the computer device is used in different CV analysis application scenarios.
Therefore, the machine vision-oriented image preprocessing method can be flexibly applied to CV image analysis tasks for different scenes, and CV basic network structures in any application scene can be not required to be changed, so that the expandability is good.
The method, the device, the equipment and the storage medium for preprocessing the image for machine vision in the embodiment of the application are described below in conjunction with the foregoing embodiments.
First, in the embodiment of the present application, a machine vision-oriented image preprocessing method is provided, which may be used in the CV-oriented analysis system illustrated in fig. 1, and the execution subject of the machine vision-oriented image preprocessing method may be the image preprocessing network 12 in fig. 1. The image preprocessing network 12 in fig. 1 may include:
performing blurring processing on an original image to generate an image to be enhanced, wherein the definition of the image to be enhanced is lower than that of the original image;
performing enhancement processing on the semantic features of the image to be enhanced to generate a target image;
inputting the target image into an image processing neural network to trigger the image processing neural network to execute an image analysis task based on semantic features of the target image.
The intensity of the semantic features of the target image may be greater than or equal to the intensity of the semantic features of the original image. The image processing neural network may be implemented as the analysis network 14 in fig. 1.
In some embodiments, as shown in fig. 2, in order to make the strength of the semantic features of the target image not smaller than the strength of the semantic features of the original image, after obtaining the image to be enhanced, the image preprocessing network 12 may generate enhancement parameters according to the original image and the image to be enhanced, and further perform enhancement processing on the image to be enhanced according to the enhancement parameters to generate the target image.
The following describes, with reference to examples, specific processing procedures related to the machine vision-oriented image preprocessing method.
As shown in fig. 3, fig. 3 illustrates an exemplary machine vision oriented image preprocessing method according to an embodiment of the present application. The machine vision-oriented image preprocessing method specifically comprises the following steps:
step S101, downsampling the original image according to a preset sampling rate.
In some embodiments, the original image may be an image from the image source 11 in FIG. 1. The original image may include a target object to be analyzed by the analysis network 14 in fig. 1, where the target object may be, for example, an image of the target object to be analyzed, and the target object to be analyzed may include, for example, a target object, a target face, a target building, and the like, which is not limited herein.
For example, the preset sampling rate may be related to the resolution of the original image, for example, if the resolution of at least one direction of the width and height of the original image is greater than or equal to a preset value, determining that the preset sampling rate is the first preset sampling rate; and if the resolutions of the original image in the width direction and the high direction are smaller than the preset value, determining that the preset sampling multiplying power is a second preset sampling multiplying power. Here, the first preset sampling magnification is larger than the second preset sampling magnification.
For example, if the resolution in at least one direction of the width and height of the original image is greater than or equal to 1080 pixels (P), it may be determined that the preset sampling magnification is 8; if the original image has a resolution of less than 1080P, the preset sampling rate may be determined to be 4.
Step S102, up-sampling the down-sampled image according to the preset sampling rate to generate an image to be enhanced.
Wherein the resolution of the image to be enhanced may be the same as the resolution of the original image.
It is to be noted that step S101 and step S102 are exemplary implementation procedures for performing degradation processing on the original image. In step S102, the image after the up-sampling is an image which may also be referred to as a degradation process, that is, the blurred image.
It should be understood that the degradation process is only one exemplary processing manner of blurring an original image, and the blurring process according to the embodiment of the present application is not limited. In this embodiment of the present application, the blurring process may be performed on the original image in any one of the following manners: adding noise points into the original image, and carrying out blurring processing on the original image by adopting an image blurring algorithm.
In some embodiments, if the original image is blurred by using an image blurring algorithm, any one of the following blurring algorithms may be used: gaussian Blur (Gaussian blue), box Blur (Box blue), kawase Blur (Kawase blue), double Blur (Dual blue), dispersion Blur (Bokeh blue), shift Blur (Tilt Shift blue), aperture Blur (Iris blue), grainy blue, radial Blur (Radial blue), direction Blur (direct blue) and the like.
Therefore, the blurring process of the original image is not limited to the angle of the gray scale of the pixels in the image, but is indiscriminately blurring process is performed towards the whole distribution of the image by adopting the implementation mode. Therefore, on the basis of reducing the code rate of the original image, the integrity of semantic features in the original image is maintained, and the analysis performance of the CV network is improved.
Step S103, generating a residual image between the original image and the image to be enhanced.
Step S104, extracting semantic features of the original image to obtain first semantic features, and extracting features of the residual image.
Step S105, performing feature fusion on the first semantic feature and the feature of the residual image to obtain the enhancement parameter.
Wherein the enhancement parameters are used for representing the enhancement degree of the image to be enhanced. In particular for characterizing the degree of enhancement of a semantic feature part in the image to be enhanced.
And S106, carrying out enhancement processing on the semantic features of the image to be enhanced according to the enhancement parameters to generate the target image.
It should be noted that the semantic features of the target image may refer to features of the target object to be analyzed by the image processing neural network. The strength of the semantic features may include the dimensions of the features of the target object, the number of feature values contained by the features of each dimension, the size of the receptive field corresponding to the features of each dimension, and so on.
The dimensions of the semantic features may include, among other things, visual dimensions, object dimensions, and concept dimensions. The features of the visual dimension may include features of color, texture, and shape of the target object; the object dimensions may contain target object attribute features (e.g., animals, plants, scenery), etc.; the concept dimension may characterize the meaning expressed by the target object. For example, the target object includes beach, blue sky, sea water, etc., and the visual dimension features may include contours, colors, textures, and shapes of each of the beach, blue sky, and sea water, etc., and the object dimension features may include attribute features of each of the sand, blue sky, and sea water; the characteristics of the conceptual dimension may characterize a beach.
In some embodiments, the more feature dimensions the semantic features contain, the more the number of feature values the features of each dimension contain, and the smaller the receptive field corresponding to the features of each dimension, the greater the strength of the semantic features can be considered; conversely, the smaller the intensity of the semantic feature can be considered.
In combination with the above-mentioned blurring process for the original image, it is known that the image to be enhanced is obtained by performing an overall indiscriminate blurring process on the original image, that is, the sharpness of the target object and other contents other than the target object included in the image to be enhanced are reduced. In order to improve the analysis performance of the image processing neural network under the condition of reducing the data processing amount of the image processing neural network, in an embodiment of the application, the image preprocessing stage can perform directional enhancement on a target object in an image to be enhanced so as to increase the strength of semantic features.
Step S107, inputting the target image into an image processing neural network to trigger the image processing neural network to execute an image analysis task based on semantic features of the target image.
Further, the image processing neural network may perform an image analysis task on the target image based on semantic features of the target image. For example, the image analysis task is image segmentation, and the image processing neural network may determine a target object to be segmented in the target image by processing semantic features, and then separate the target object from the target image.
In summary, according to the technical scheme of the embodiment of the application, the code rate of the original image is reduced by comprehensively reducing the definition of the original image. And then, the part of the semantic features in the image is enhanced in a targeted manner, so that the effect of reducing the code rate of the original image on the basis of not reducing the strength of the semantic features of the original image can be achieved, the calculated amount is reduced, and the cost is saved. And preprocessing is performed on the aspect of semantic features of the image, so that the analysis performance of the image processing neural network is maintained at a better level.
It is to be noted that step S101 and step S102 are exemplary implementation procedures for performing degradation processing on the original image. In step S102, the image after the up-sampling is an image which may also be referred to as a degradation process, that is, the blurred image.
It should be understood that steps S103 to S105 are only one exemplary processing manner of calculating the enhancement parameters, and the enhancement processing of the embodiment of the present application is not limited. In other embodiments of the present application, the image preprocessing network may also calculate enhancement parameters according to other methods. For example, the enhancement parameters may be determined based on semantic features of the original image and semantic features of the image to be enhanced.
For example, the image preprocessing network may extract semantic features of the original image to obtain first semantic features, and extract semantic features of the image to be enhanced to obtain second semantic features. As can be seen from the foregoing description of semantic features, the first semantic feature and the second semantic feature each include at least two dimensional features, and any semantic feature may include, for example, a feature of a visual dimension, a feature of an object dimension, and a feature of a conceptual dimension. Further, for the feature of any dimension, the image preprocessing network may calculate a difference between the feature of the dimension in the first semantic feature and the feature of the dimension in the second semantic feature to obtain at least two differences, and then calculate the enhancement parameter according to the at least two differences (as shown in the embodiment illustrated in fig. 4B).
It should be noted that a feature of any dimension may comprise a plurality of feature values, and the number of feature values that the feature of that dimension comprises may be determined according to the settings of the enhancement algorithm.
Further, in some embodiments, the difference value corresponding to the feature of the dimension may be a difference value between an average value corresponding to the first semantic feature and an average value corresponding to the second semantic feature. The average value corresponding to the first semantic feature is the average value of the dimension feature value in the first semantic feature, and the average value corresponding to the second semantic feature is the average value of the dimension feature value in the second semantic feature.
In other embodiments, the difference value corresponding to the feature of the dimension may be a variance between the feature value of the dimension in the first semantic feature and the feature value of the dimension in the second semantic feature.
After obtaining at least two differences, for any difference, the image preprocessing network may multiply the difference with the weight of the difference, where the multiplication result is an enhancement factor corresponding to the difference, so as to obtain at least two enhancement factors, and then calculate a summation result of the at least two enhancement factors, where the summation result is the enhancement parameter.
In some embodiments, the weights may be preset. The weights are used to characterize the degree of enhancement to the features of the corresponding dimension of the respective difference. Taking the example that the semantic features include the features of the visual dimension, the features of the object dimension and the features of the concept dimension, the weight corresponding to the features of the visual dimension is, for example, 0.15, the weight corresponding to the features of the object dimension is, for example, 0.35, and the weight corresponding to the features of the concept dimension is, for example, 0.5. It can be characterized that the degree of enhancement of the features of the visual dimension is relatively weakest, the degree of enhancement of the features of the object dimension is relatively medium, and the degree of enhancement of the features of the conceptual dimension is relatively strongest.
It should be noted that the image preprocessing network may employ a U-net model to extract the first semantic features of the original image and enhance the second semantic features of the image to be enhanced according to the enhancement parameters.
In some embodiments, the image preprocessing network may extract global semantic features of the original image. The global features include image features of target objects and image features of non-target objects in the original image. Furthermore, the image preprocessing network can perform at least one dimension reduction processing on the global semantic features to obtain initial image features of the target object. And carrying out the at least one dimension lifting processing on the initial image feature to obtain the first semantic feature.
Wherein, the arbitrary dimension reduction process can be to downsample the features. By way of example, downsampling may be performed by convolution operations and pooling.
In some embodiments, the global semantic features include image features of the target object and image features of non-target objects in the original image, and the image features of the target object in the global semantic features are relatively unobtrusive. Through gradual dimension reduction, more compact semantic information in the original image can be obtained, so that the importance degree of the image characteristics of the target object can be enhanced.
For example, the number of times of the dimension increasing process and the number of times of the dimension decreasing process may be the same. Any one dimension up process may be up-sampling the features.
Specifically, the image preprocessing network may upsample the initial image feature, and splice the upsampled feature with a dimension-reducing feature of the same size, where the dimension-reducing feature of the same size is a feature with the dimension of the upsampled feature obtained in the at least one dimension-reducing processing procedure. And if the size of the spliced feature is the same as the size of the global semantic feature, taking the spliced feature as the first semantic feature. If the size of the spliced feature is smaller than the size of the global semantic feature, taking the spliced feature as a new initial image feature, and upsampling the new initial image feature.
For example, the global semantic feature size of the original image is 224×224. The U-net model can execute three convolution operations and pooling operations on the global semantic features, and the dimension of the obtained feature after dimension reduction is 112 x 112 after the first convolution operation and pooling operation; after the second convolution operation and pooling operation, for example, the obtained feature after dimension reduction has a size of 56×56; after the third convolution operation and pooling operation, for example, the obtained feature after dimension reduction has a size of 28×28. Features of size 28 x 28 may be considered, for example, as initial image features of the target object. Further, up-sampling features of size 28 x 28 resulted in up-dimensional features of size 56 x 56. And splicing the dimension-increasing feature with the 56 x 56 feature obtained after dimension reduction. And performing convolution operation on the spliced features, and then performing up-sampling again to obtain dimension-increasing features with the dimensions of 112 x 112. And splicing the dimension-increasing feature with the feature of 112 x 112 obtained after dimension reduction. And performing convolution operation on the spliced features, and then performing up-sampling for the third time to obtain dimension-increasing features with the size of 224 x 224. And splicing the dimension-increasing feature with the 224 x 224 feature obtained after dimension reduction, and performing convolution operation on the spliced feature to obtain a 224 x 224 feature which is the first semantic feature of the original image.
In this way, the feature which highlights the importance of the image feature of the target object is gradually increased in dimension, which is beneficial to gradually perfecting the detail feature of the target object, so that the obtained first semantic feature can accurately and perfectly represent the semantic of the target image. Further, the enhancement parameters obtained based on the first semantic features have stronger reliability.
Taking an example that an image preprocessing network comprises a degradation module and a U-net model as an example, the machine vision-oriented image preprocessing method of the embodiment of the application is introduced by combining the example.
As illustrated in fig. 4A and 4B, for example, the original image includes an image of a circular area, which may be, for example, a lawn (not illustrated), and an image of a house, which are displayed with the same effect as the similar effect of the circle in the drawing, which is not illustrated. The image of the circular area is the background of the image of the house, and the image of the house covers part of the image of the circular area and is displayed at the forefront of the original image. The image of the house is, for example, an object to be analyzed by the image processing neural network.
Referring to fig. 4A, fig. 4A illustrates an exemplary dataflow diagram of a machine vision oriented image preprocessing method. In this example, after receiving the original image, the image preprocessing network may call a pre-deployed degradation module, and perform a blurring process by degrading the original image, so as to obtain a degraded image (i.e., the image to be enhanced).
For example, the original image may have a width greater than 1080p, the degradation module may select a sampling rate of 8, first downsampling the original image by 8, and then upsampling the downsampled image by 8. The resolution of the image after upsampling is, for example, the same as the resolution of the original image.
Referring again to fig. 4A, the image of the circular area in the original image and the image of the house have higher definition, and specifically, the outline lines of the image of the circular area and the image of the house are finer. The image of the circular area in the degraded image and the image of the house have relatively low definition, and specifically, the image of the circular area and the outline of the image of the house both show granular feel of pixels.
Further, the degradation module may generate a residual image between the original image and the degraded image, and input the original image, the residual image, and the degraded image to the U-net model. The image preprocessing network may invoke the U-net model to enhance the degraded image to obtain a preprocessed image (i.e., the aforementioned target image).
For example, the U-net model may extract a first semantic feature from the original image. In this example, the first semantic feature may refer to a feature of an image that characterizes a house in the original image.
For example, the U-net model may first extract global semantic features of the original image, which may include features of circular areas and features of houses. Then, the U-net model can successively perform three dimension reduction processes on the global semantic features, and the dimension reduction process achieves the effect of downsampling through convolution operation and pooling each time. Reference may be made in particular to the foregoing exemplary description of dimension reduction, which is not repeated here. In the features obtained after the three-time dimension reduction treatment, the strength of the house features is larger than that of the features of the round area, the weight of the house features can be larger than that of the features of the round area, and the weight can be used for representing the importance degree of the features. Then, the feature obtained after the three dimension reduction processes can be used as the initial image feature of the house. Further, three dimension lifting processes are sequentially performed on the initial image features of the house, and features obtained after the three dimension lifting processes are the first semantic features of the original image.
As mentioned above, in the initial image features of the house, the intensity and importance of the house features are large, so that most of the features of enrichment and detail are features of the house in the upstroke process, and the first semantic feature can be considered as the final image feature of the house.
The residual image may characterize the degree of difference between the degraded image and the original image. Then, the U-net model may extract features of the residual image, perform feature fusion on the first semantic feature and the features of the residual image, and the fused result may be an enhancement parameter, where the enhancement parameter characterizes the enhancement degree of the degraded image.
The U-net model may extract a second semantic feature of the degraded image, which may refer to a feature of a house in the degraded image. And then, the U-net model can enhance the second semantic features according to the enhancement parameters to obtain the target image. By way of example, the U-net model may output enhanced semantic features by performing a series of convolution operations on the enhanced parameter and the second semantic feature, where the semantic features may exhibit feature values between 0 and 1, and the larger the feature values, the greater the semantic strength may be characterized.
Referring again to fig. 4A, the image sharpness of the house in the target image is higher, while the outline of the circular area image still presents a graininess of pixels. Compared with the original image, the code rate of the target image obtained through preprocessing is reduced in the area with smaller relevance to the image analysis task, and the intensity of the semantic features is unchanged in the area with larger relevance to the image analysis task. Thus, after the target image is input into the CV network, the calculation amount is reduced on the basis of effectively maintaining the analysis performance of the CV network.
It should be understood that fig. 4A is an exemplary implementation of image preprocessing of the present application, and the method of machine vision oriented image preprocessing of the embodiments of the present application is not limited. In other implementations, the image preprocessing network may also enhance semantic features of the house in other ways. For example, see another exemplary machine vision oriented image preprocessing method shown in fig. 4B.
As shown in fig. 4B, in this example, after receiving the original image, the image preprocessing network also invokes the degradation module to perform blurring processing by degrading the original image, resulting in a degraded image. This implementation may be described with reference to the embodiment illustrated in fig. 4A, and will not be described here.
Unlike in fig. 4A, in this example, the degradation module does not regenerate the residual image after obtaining the degraded image, but inputs both the original image and the degraded image into the U-net model.
Referring to fig. 4B, in this example, the U-net model may extract semantic features of the original image and the degraded image, respectively, resulting in a first semantic feature of the original image and a second semantic feature of the degraded image. In combination with the processing procedure of the U-net model, the first semantic features can represent final image features of the house in the original image, and the second semantic features can represent final image features of the house in the degraded image. Then, the U-net model may obtain the enhancement parameters by computing the difference between the first semantic feature and the second semantic feature.
For example, in this example, the U-net model may separate the first semantic features, resulting in features of the visual dimension, features of the object dimension, and features of the conceptual dimension; and separating the second semantic features to obtain features of the visual dimension, features of the object dimension and features of the concept dimension. For each dimension feature, calculating the feature value of the dimension in the first semantic feature and the variance of the feature value in the corresponding first semantic feature, for example, to obtain a visual dimension feature variance, an object dimension feature variance and a concept dimension feature variance respectively. Further, the feature variance corresponding to each dimension is multiplied by the weight value corresponding to the dimension to obtain the enhancement factor of the visual dimension, the enhancement factor of the object dimension and the enhancement factor of the concept dimension respectively. In this example, the enhancement factor of the visual dimension, the enhancement factor of the object dimension, and the addition result of the enhancement factor of the conceptual dimension are enhancement parameters.
Then, the U-net model may enhance the second semantic feature according to the enhancement parameter to obtain the target image, which is not described herein.
It should be understood that the embodiments illustrated in fig. 4A and fig. 4B are presented by way of example with respect to a degradation algorithm and a U-net model, and are not to be construed as limiting the image preprocessing network of the present application. In other embodiments of the present application, the image preprocessing network may include other algorithm models with the same or similar functions, or a combination network, etc., and the image preprocessing network may also include more algorithm models than illustrated, etc. Many modifications and variations will be apparent to those of ordinary skill in the art in light of the above teachings.
In summary, according to the technical scheme of the embodiment of the application, the definition of the original image is comprehensively reduced, and then the part of the semantic features in the image is pertinently enhanced, so that the effect of reducing the code rate of the original image is achieved on the basis that the strength of the semantic features of the original image is not reduced. As shown in fig. 6a and fig. 6b, where fig. 6a is an original image, and both the grass and the racket in fig. 6a are clear, which means that the code rates of the grass and the racket are high, and fig. 6b is an image of fig. 6a processed by the technical scheme, and the racket in fig. 6b is still clear, and the grass is relatively blurred, i.e. the code rate of the grass part is reduced. Therefore, the method is beneficial to reducing the calculated amount and saving the cost. And preprocessing is performed in the aspect of semantic features of the image, so that the target image can be used as an input image of the image processing neural network, and the analysis performance of the image processing neural network is maintained at a better level.
In combination with the foregoing description, the image preprocessing network for executing the machine vision oriented image preprocessing method in the embodiment of the present application may be obtained by training the network to be trained using the proxy neural network, and further, may establish a connection with the image processing neural network to obtain the CV oriented analysis system shown in fig. 1. The proxy neural network is a neural network for performing the image analysis tasks of the CV network described above.
In some embodiments, the image preprocessing network is typically a pre-set algorithm for performing blurring processing, while the enhancement function is typically a deep learning network. In this example, the network to be trained may be a network model for performing enhanced functions. For example, the network to be trained is the U-net model to be trained in FIGS. 4A and 4B.
Furthermore, it should be noted that since the image preprocessing network may be independent of the image processing neural network, then in some embodiments, the network to be trained may be a pre-built initial network; in other embodiments, the network to be trained may be a preprocessing network to be trained, where the preprocessing network to be trained is adapted for another image analysis task, and the other image analysis task is different from the image analysis task. If the network to be trained is the preprocessing network to be trained, the connection between the network to be trained and the original image processing neural network can be disconnected before the network to be trained is trained by adopting the proxy neural network to obtain the image preprocessing network. The original image processing neural network here is an image processing neural network that performs the other image analysis task.
Training the network to be trained by adopting the proxy neural network to obtain an image preprocessing network can comprise the following steps: and calling the network to be trained to preprocess the sample image, wherein the preprocessed image is an image to be analyzed, and then calling the agent neural network to execute the image analysis task on the image to be analyzed to obtain a prediction analysis result. Calculating a loss value and determining whether the loss value converges, the loss value including a preprocessing loss, an analysis loss, and a proxy loss. If the loss value converges, the network to be trained is used as the image preprocessing network; and if the image preprocessing network is not converged, adjusting parameters of the network to be trained, taking the model after the parameters are adjusted as a new network to be trained, and executing the operation of calling the network to be trained to preprocess the sample image again.
Illustratively, the parameters of the network to be trained may be parameters of an enhancement model to be trained, such as parameters of a U-net model to be trained.
In some embodiments, preprocessing losses are used to characterize the losses of the image to be analyzed and the sample image, e.g., may be implemented as a mean square error (Mean Square Error, MSE) of the image to be analyzed and the sample image to constrain the enhancement bias. The analysis loss is used for representing the loss of the prediction analysis result and the sample image labeling result. The proxy loss is used to characterize the loss of the proxy neural network to the sample image processing process.
It should be noted that the arithmetic function contained in the proxy loss may be related to the processing of the proxy neural network, for example, if the proxy neural network includes a proxy coding network and an execution network for image analysis tasks, the proxy loss may include coding loss and discrete cosine transform (Discrete Cosine Transform, DCT) loss. The coding loss is used for representing the code rate loss of the data after coding and the data before coding, and the DCT loss is used for representing the coding complexity loss.
Taking the example of a proxy neural network comprising a proxy encoding network and an execution network for image analysis tasks, an image preprocessing network training system is shown in fig. 5. Referring to fig. 5, the image preprocessing network training system includes a network to be trained, a proxy encoding network, and an execution network for image analysis tasks.
In the training process, a sample image is input into a network to be trained, and the image to be analyzed is obtained after the sample image is processed by the network to be trained. Further, the MSE between the image to be analyzed and the sample image may be calculated, used as a preprocessing penalty for the network to be trained, and the image to be analyzed may be input into the proxy encoding network. And then, the proxy coding network is called to estimate DCT loss and coding loss of the image to be analyzed, and the proxy coding network is called to code the image to be analyzed so as to obtain coded image data. Further, the image data is input to an image analysis task execution network to call the execution network of the image analysis task to execute the image analysis task on the image data, and after the analysis result is obtained, analysis loss is calculated.
Further, taking the sum of MSE, DCT loss, coding loss and analysis loss as a loss value of the training, and if the loss value converges, the network to be trained can be used as an image preprocessing network corresponding to the image analysis task; otherwise, the parameters of the network to be trained are adjusted, and the training process is continuously executed until the loss value converges.
Therefore, the image preprocessing network in the embodiment of the application is a neural network based on deep learning, so that the image preprocessing network can be used as an image analysis task for a CV network, and can be trained by adopting a proxy neural network based on deep learning, so that the image preprocessing network in the technical scheme of the application can be flexibly and widely applied to image analysis tasks of various scenes.
Corresponding to the above image preprocessing method facing machine vision, the embodiment of the present application further provides an image preprocessing device facing machine vision, where the image preprocessing device may be deployed in the image preprocessing network of the CV-oriented analysis system illustrated in fig. 1. The CV-oriented image preprocessing device can modularize the functions of the algorithm and the enhancement model in the image preprocessing network in a mode of software, hardware or combination of software and hardware, and can be used for executing the machine vision-oriented image preprocessing method provided by any embodiment.
The apparatus may include: and the blurring processing module and the enhancement module. The blurring processing module is used for blurring processing the original image, the processed image is an image to be enhanced, and the definition of the image to be enhanced is lower than that of the original image; the enhancement module is used for carrying out enhancement processing on the image to be enhanced, the image after the enhancement processing is a target image, the strength of the semantic features of the target image is greater than or equal to that of the semantic features of the original image, and the target image is used as an input image for a neural network to execute an image analysis task based on the semantic features.
The image preprocessing device provided by the embodiment of the application and the machine vision-oriented image preprocessing method provided by the embodiment of the application have the same beneficial effects as the method adopted, operated or realized by the same inventive concept.
The embodiment of the application also provides a computer device which is applied to the CV-oriented analysis system to execute the image preprocessing method oriented to machine vision. Reference is made to fig. 7, which is a schematic diagram illustrating a computer device according to some embodiments of the present application. As shown in fig. 7, the computer device 7 includes: a processor 700, a memory 701, a bus 702 and a communication interface 703, the processor 700, the communication interface 703 and the memory 701 being connected by the bus 702; the memory 701 stores a computer program that can be executed on the processor 700, and when the processor 700 executes the computer program, the machine vision-oriented image preprocessing method provided in any one of the foregoing embodiments of the present application is executed.
The memory 701 may include a high-speed random access memory (RAM: random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the device network element and at least one other network element is achieved through at least one communication interface 703 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
Bus 702 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. The memory 701 is configured to store a program, and the processor 700 executes the program after receiving an execution instruction, where the machine vision oriented image preprocessing method disclosed in any embodiment of the present application may be applied to the processor 700 or implemented by the processor 700.
The processor 700 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in processor 700. The processor 700 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 701, and the processor 700 reads information in the memory 701, and in combination with its hardware, performs the steps of the above method.
The computer equipment provided by the embodiment of the application and the machine vision-oriented image preprocessing method provided by the embodiment of the application have the same beneficial effects as the method adopted, operated or realized by the computer equipment and the machine vision-oriented image preprocessing method provided by the embodiment of the application due to the same inventive concept.
The present application further provides a computer readable storage medium corresponding to the machine vision-oriented image preprocessing method provided in the foregoing embodiment, on which a computer program (i.e. a program product) is stored, where the computer program, when executed by a processor, performs the machine vision-oriented image preprocessing method provided in any of the foregoing embodiments.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
The computer readable storage medium provided by the above embodiment of the present application and the machine vision oriented image preprocessing method provided by the embodiment of the present application are the same inventive concept, and have the same beneficial effects as the method adopted, operated or implemented by the application program stored therein.
It should be noted that:
in the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the following schematic diagram: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the present application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

1. The machine vision-oriented image preprocessing method is characterized by comprising the following steps of:
performing blurring processing on an original image to generate an image to be enhanced, wherein the definition of the image to be enhanced is lower than that of the original image;
performing enhancement processing on the semantic features of the image to be enhanced to generate a target image;
Inputting the target image into an image processing neural network to trigger the image processing neural network to execute an image analysis task based on semantic features of the target image.
2. The method of claim 1, wherein the enhancing the semantic features of the image to be enhanced to generate a target image comprises:
generating enhancement parameters according to the original image and the image to be enhanced, wherein the enhancement parameters are used for representing the enhancement degree of the image to be enhanced;
and carrying out enhancement processing on the semantic features of the image to be enhanced according to the enhancement parameters to generate the target image.
3. The method of claim 2, wherein the generating enhancement parameters from the original image and the image to be enhanced comprises:
generating a residual image between the original image and the image to be enhanced;
extracting semantic features of the original image to obtain first semantic features, and extracting features of the residual image;
and carrying out feature fusion on the first semantic features and the features of the residual image to obtain the enhancement parameters.
4. The method of claim 2, wherein the generating enhancement parameters from the original image and the image to be enhanced comprises:
Extracting semantic features of the original image to obtain first semantic features, and extracting the semantic features of the image to be enhanced as second semantic features, wherein the first semantic features and the second semantic features both comprise features of at least two dimensions;
for the feature of any dimension, calculating the difference value between the feature of the dimension in the first semantic feature and the feature of the dimension in the second semantic feature to obtain at least two difference values;
and calculating the enhancement parameters according to the at least two difference values.
5. The method of claim 4, wherein said calculating said enhancement parameter from said at least two differences comprises:
multiplying the difference value by the weight of the difference value aiming at any difference value, wherein the multiplication result is an enhancement factor corresponding to the difference value so as to obtain at least two enhancement factors; the weight is used for representing the enhancement degree of the characteristics of the dimension corresponding to the difference value;
and calculating the addition result of the at least two enhancement factors, wherein the addition result is the enhancement parameter.
6. The method according to claim 3 or 4, wherein said extracting semantic features of said original image to obtain first semantic features comprises:
Extracting global semantic features of the original image;
performing at least one dimension reduction treatment on the global semantic features to obtain initial image features of a target object, wherein the target object refers to an object related to an image analysis task of the image processing neural network;
and carrying out at least one dimension lifting process on the initial image feature to obtain the first semantic feature.
7. The method of claim 6, wherein said performing said at least one dimension-up process on said initial image feature comprises:
upsampling the initial image feature;
splicing the up-sampled feature with the dimension-reducing feature with the same size, wherein the dimension-reducing feature with the same size is obtained in the at least one dimension-reducing processing process and has the same size as the up-sampled feature;
if the size of the spliced feature is the same as the size of the global semantic feature, taking the spliced feature as the first semantic feature;
if the size of the spliced feature is smaller than the size of the global semantic feature, taking the spliced feature as a new initial image feature, and upsampling the new initial image feature.
8. The method of claim 1, wherein blurring the original image comprises:
the original image is blurred by any one of the following means:
adding noise points into the original image, carrying out degradation processing on the original image, and carrying out blurring processing on the original image by adopting an image blurring algorithm.
9. The method of claim 8, wherein the degrading the original image comprises:
downsampling the original image according to a preset sampling multiplying power;
and up-sampling the image after down-sampling according to the preset sampling multiplying power, wherein the image after up-sampling is an image obtained by degradation processing.
10. The method of claim 9, wherein the step of determining the position of the substrate comprises,
if the resolution ratio of at least one direction of the width and the height of the original image is larger than or equal to a preset value, determining that the preset sampling rate is a first preset sampling rate;
if the resolutions of the original image in the width direction and the high direction are smaller than the preset value, determining that the preset sampling multiplying power is a second preset sampling multiplying power;
wherein the first preset sampling rate is greater than the second preset sampling rate.
11. The method as recited in claim 1, further comprising:
training a network to be trained by adopting a proxy neural network to obtain an image preprocessing network, wherein the proxy neural network is used for executing the image analysis task; the image preprocessing network is used for executing the image preprocessing method facing machine vision, and the image preprocessing network is a deep learning network;
and establishing connection between the image preprocessing network and the image processing neural network.
12. The method of claim 11, wherein training the network to be trained using the proxy neural network results in an image preprocessing network, comprising:
invoking the network to be trained to preprocess a sample image, wherein the preprocessed image is an image to be analyzed;
invoking the agent neural network to execute the image analysis task on the image to be analyzed to obtain a prediction analysis result;
calculating a penalty value, the penalty value comprising a preprocessing penalty, an analysis penalty, and a proxy penalty; the preprocessing loss is used for representing the loss of the image to be analyzed and the sample image, the analysis loss is used for representing the loss of the prediction analysis result and the sample image labeling result, and the proxy loss is used for representing the loss of the proxy neural network to the sample image processing process;
Judging whether the loss value converges or not;
if the loss value converges, the network to be trained is used as the image preprocessing network;
and if the image preprocessing network is not converged, adjusting parameters of the network to be trained, taking the model after the parameters are adjusted as a new network to be trained, and executing the operation of calling the network to be trained to preprocess the sample image again.
13. The method of claim 12, wherein the proxy neural network comprises a proxy encoding network and an execution network of image analysis tasks, wherein the invoking the proxy neural network to execute the image analysis tasks on the image to be analyzed results in predictive analysis results comprises:
invoking the proxy coding network to code the image to be analyzed, and obtaining coding loss and Discrete Cosine Transform (DCT) loss, wherein the proxy loss comprises the coding loss and the DCT loss, the coding loss is used for representing the code rate loss of the coded data and the pre-coded data, and the DCT loss is used for representing the coding complexity loss;
and calling the execution network to execute the image analysis task on the encoded image data, and obtaining the analysis loss.
14. The method according to claim 12 or 13, wherein the network to be trained comprises a fuzzy algorithm module and an enhancement model to be trained, and the adjusting parameters of the network to be trained comprises:
and adjusting parameters of the enhancement model to be trained.
15. The method of claim 1, further comprising, after said inputting said target image into said image processing neural network:
invoking the image processing neural network to encode the target image to obtain encoded data;
and calling an analysis module of the image processing neural network to execute the image analysis task on the encoded data so as to output an analysis result by analyzing the semantic features of the target image.
16. The method according to claim 11, wherein the network to be trained is a pre-built initial network or a pre-processing network to be trained, the pre-processing network to be trained being adapted for another image analysis task, the other image analysis task being different from the image analysis task;
if the network to be trained is the preprocessing network to be trained, before the training of the network to be trained by using the proxy neural network to obtain the image preprocessing network, the method further comprises:
And disconnecting the network to be trained from an original image processing neural network, wherein the original image processing neural network is an image processing neural network for executing the other image analysis task.
17. The method of claim 16, wherein the image analysis task comprises at least one of: image matching, image recognition, image separation, image content extraction and face recognition.
18. An image preprocessing device facing machine vision, characterized by comprising:
the blurring processing module is used for blurring processing the original image to generate an image to be enhanced, and the definition of the image to be enhanced is lower than that of the original image;
the enhancement module is used for enhancing the semantic features of the image to be enhanced to generate a target image;
and the input module is used for inputting the target image into an image processing neural network so as to trigger the image processing neural network to execute an image analysis task based on the semantic features of the target image.
19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor runs the computer program to implement the method of any one of claims 1-17.
20. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the method of any of claims 1-17.
CN202311750184.9A 2023-12-19 Machine vision-oriented image preprocessing method, device, equipment and storage medium CN117422855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311750184.9A CN117422855B (en) 2023-12-19 Machine vision-oriented image preprocessing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311750184.9A CN117422855B (en) 2023-12-19 Machine vision-oriented image preprocessing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117422855A true CN117422855A (en) 2024-01-19
CN117422855B CN117422855B (en) 2024-05-03

Family

ID=

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583161A (en) * 2020-06-17 2020-08-25 上海眼控科技股份有限公司 Blurred image enhancement method, computer device and storage medium
WO2021000906A1 (en) * 2019-07-02 2021-01-07 五邑大学 Sar image-oriented small-sample semantic feature enhancement method and apparatus
CN112419219A (en) * 2020-11-25 2021-02-26 广州虎牙科技有限公司 Image enhancement model training method, image enhancement method and related device
CN112446834A (en) * 2019-09-04 2021-03-05 华为技术有限公司 Image enhancement method and device
WO2021208247A1 (en) * 2020-04-17 2021-10-21 北京大学 Mimic compression method and apparatus for video image, and storage medium and terminal
US20220108546A1 (en) * 2019-06-17 2022-04-07 Huawei Technologies Co., Ltd. Object detection method and apparatus, and computer storage medium
CN114359289A (en) * 2020-09-28 2022-04-15 华为技术有限公司 Image processing method and related device
CN114915783A (en) * 2021-02-07 2022-08-16 华为技术有限公司 Encoding method and apparatus
CN115205150A (en) * 2022-07-19 2022-10-18 腾讯科技(北京)有限公司 Image deblurring method, device, equipment, medium and computer program product
CN116894801A (en) * 2023-07-19 2023-10-17 广州虎牙科技有限公司 Image quality enhancement method and device, electronic equipment and storage medium
CN116939226A (en) * 2023-06-14 2023-10-24 南京大学 Low-code-rate image compression-oriented generated residual error repairing method and device
CN117151987A (en) * 2022-05-23 2023-12-01 海信集团控股股份有限公司 Image enhancement method and device and electronic equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220108546A1 (en) * 2019-06-17 2022-04-07 Huawei Technologies Co., Ltd. Object detection method and apparatus, and computer storage medium
WO2021000906A1 (en) * 2019-07-02 2021-01-07 五邑大学 Sar image-oriented small-sample semantic feature enhancement method and apparatus
CN112446834A (en) * 2019-09-04 2021-03-05 华为技术有限公司 Image enhancement method and device
WO2021043273A1 (en) * 2019-09-04 2021-03-11 华为技术有限公司 Image enhancement method and apparatus
US20220188999A1 (en) * 2019-09-04 2022-06-16 Huawei Technologies Co., Ltd. Image enhancement method and apparatus
WO2021208247A1 (en) * 2020-04-17 2021-10-21 北京大学 Mimic compression method and apparatus for video image, and storage medium and terminal
CN111583161A (en) * 2020-06-17 2020-08-25 上海眼控科技股份有限公司 Blurred image enhancement method, computer device and storage medium
CN114359289A (en) * 2020-09-28 2022-04-15 华为技术有限公司 Image processing method and related device
CN112419219A (en) * 2020-11-25 2021-02-26 广州虎牙科技有限公司 Image enhancement model training method, image enhancement method and related device
CN114915783A (en) * 2021-02-07 2022-08-16 华为技术有限公司 Encoding method and apparatus
CN117151987A (en) * 2022-05-23 2023-12-01 海信集团控股股份有限公司 Image enhancement method and device and electronic equipment
CN115205150A (en) * 2022-07-19 2022-10-18 腾讯科技(北京)有限公司 Image deblurring method, device, equipment, medium and computer program product
CN116939226A (en) * 2023-06-14 2023-10-24 南京大学 Low-code-rate image compression-oriented generated residual error repairing method and device
CN116894801A (en) * 2023-07-19 2023-10-17 广州虎牙科技有限公司 Image quality enhancement method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHENLI ZHANG ET AL.: "ExFuse: Enhancing Feature Fusion for Semantic Segmentation", 《PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》, 30 September 2018 (2018-09-30), pages 269 - 284 *
李承珊;蒋平;崔雄文;马震环;雷涛;: "基于编解码和局部增强的光电图像分割算法", 半导体光电, no. 06, 15 December 2018 (2018-12-15), pages 133 - 138 *

Similar Documents

Publication Publication Date Title
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
CN111798400B (en) Non-reference low-illumination image enhancement method and system based on generation countermeasure network
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
US20230080693A1 (en) Image processing method, electronic device and readable storage medium
CN113284054A (en) Image enhancement method and image enhancement device
CN112598579A (en) Image super-resolution method and device for monitoring scene and storage medium
CN112184585B (en) Image completion method and system based on semantic edge fusion
CN114764868A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN112150400B (en) Image enhancement method and device and electronic equipment
CN113674159A (en) Image processing method and device, electronic equipment and readable storage medium
KR102628115B1 (en) Image processing method, device, storage medium, and electronic device
CN114037640A (en) Image generation method and device
CN114549369A (en) Data restoration method and device, computer and readable storage medium
CN113066018A (en) Image enhancement method and related device
CN111833360A (en) Image processing method, device, equipment and computer readable storage medium
Cui et al. Exploring resolution and degradation clues as self-supervised signal for low quality object detection
CN117422855B (en) Machine vision-oriented image preprocessing method, device, equipment and storage medium
CN116823908A (en) Monocular image depth estimation method based on multi-scale feature correlation enhancement
CN117422855A (en) Machine vision-oriented image preprocessing method, device, equipment and storage medium
CN114119428A (en) Image deblurring method and device
CN113538425B (en) Passable water area segmentation equipment, image segmentation model training and image segmentation method
CN117036895B (en) Multi-task environment sensing method based on point cloud fusion of camera and laser radar
Amendola et al. Image Translation and Reconstruction using a Single Dual Mode Lightweight Encoder
CN117218033B (en) Underwater image restoration method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination