CN112801132B

CN112801132B - Image processing method and device

Info

Publication number: CN112801132B
Application number: CN202011580763.XA
Authority: CN
Inventors: 韩森尧; 喻庐军; 李驰; 刘岩
Original assignee: Taikang Tongji Wuhan Hospital
Current assignee: Taikang Tongji Wuhan Hospital
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2024-01-02
Anticipated expiration: 2040-12-28
Also published as: CN112801132A

Abstract

The invention discloses an image processing method and device, wherein a specific implementation mode of the method comprises the steps of obtaining an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset pretreatment model; inputting the target to-be-processed image into a trained image quality evaluation neural network, extracting multi-scale network characteristics to fuse middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification values; analyzing the text block region data, obtaining the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval level. Therefore, the embodiment of the invention can be applied to an image retrieval platform to realize automatic screening and filtering of the influence of fuzzy characters, so that the subsequent image retrieval result meets the clear condition.

Description

Image processing method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an image processing method and apparatus.

Background

Currently, many scenes including medical health images involve processing images with text, and filtering out unusable blurred images are needed in advance. The image quality evaluation is difficult to do, and the difficulty is mainly that the image quality is in a continuous state, the limit between the definition and the blurring of an image is difficult to define, the evaluation of the image quality is often subjective, the quality judgment views of different people on the same image are different, and the unified standard is not available. In addition, the existing method for evaluating the images of the documents is less, the method is generally based on the whole image or the image is cut into blocks to judge, the final result is poor and satisfactory, and the accuracy and recall rate reach the practical and available effect or have a certain distance.

That is, in the existing business, fuzzy document images are manually screened, so that when the image magnitude is larger and larger, the manual screening efficiency is too low, the automation of the subsequent process of image retrieval is greatly influenced, and the speed and the precision of the subsequent result are also influenced.

Disclosure of Invention

In view of the above, the embodiment of the invention provides an image processing method and device, which can be applied to an image retrieval platform to automatically screen and filter out the influence of fuzzy characters, so that the result of subsequent image retrieval meets the clear condition.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an image processing method including acquiring an image to be processed, and adjusting the image to be processed to a target image to be processed based on a preset preprocessing model; inputting the target to-be-processed image into a trained image quality evaluation neural network, extracting multi-scale network characteristics to fuse middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification levels; analyzing the text block region data, obtaining the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval level.

Optionally, adjusting the image to be processed to a target image to be processed based on a preset preprocessing model includes:

and performing white edge filling on the image to be processed to adjust the image to be processed to a target shape, and further scaling the adjusted image to be processed to a preset size to obtain the target image to be processed.

Optionally, before inputting the target image to be processed into the trained image quality evaluation neural network, the method includes:

Constructing an image quality evaluation neural network, and adopting a multi-task loss function comprising classification loss and regression loss; wherein classification losses in the multitasking loss function are given a greater weight to bias image quality assessment neural network training more toward reducing classification losses.

Optionally, inputting the target to-be-processed image into a trained image quality evaluation neural network, including:

sequentially carrying out feature extraction on the target image to be processed through five convolution pooling layers;

performing up-sampling operation on the features obtained after the fifth layer of convolution pooling layer, and further superposing the results obtained after the fourth layer of convolution pooling layer and the results obtained after the features are subjected to 1x1 convolution with the results of the up-sampling operation to obtain a first feature map;

performing up-sampling operation on the first feature map, and further superposing a result obtained by carrying out 1x1 convolution on the feature obtained by the third layer convolution pooling layer with the result of the up-sampling operation to obtain a second feature map;

and performing up-sampling operation on the second feature map to obtain a target feature map, so that the target feature map generates text block region data and text block region fuzzy clear classification values through preset two full-connection layers.

Optionally, the five-layer convolution pooling layer includes:

the first convolution pooling layer adopts 64 convolution kernels of 3×3 and 1 maxpooling pooling layer; the second convolution pooling layer adopts 128 convolution kernels of 3×3 and 1 maxpooling pooling layer; the third convolution pooling layer adopts 2 layers of 256 convolution kernels of 3 multiplied by 3, and then uses 1 layer of 256 convolution layers of 1 multiplied by 1 and 1 maxpooling pooling layer; the fourth convolution pooling layer adopts 2 layers of 512 convolution kernels with the size of 3 multiplied by 3, and then uses 1 layer of 512 convolution layers with the size of 1 multiplied by 1 and 1 maxpooling pooling layer; the fifth convolutional pooling layer uses 2 layers of 512 convolutional kernels of 3×3, and then uses 1 layer of 512 convolutional layers of 1×1 and 1 maxpooling pooling layer.

Optionally, the method further comprises:

after the target feature map is operated through a preset two-layer full-connection layer, outputting text block area data, wherein the text block area data comprises an upper left corner coordinate, a lower right corner coordinate and a confidence corresponding to the text block area;

and after the target feature map is operated through the other two preset full-connection layers, outputting a fuzzy value and a clear value corresponding to the text block area, and further obtaining a numerical value between 0 and 1 after the fuzzy value and the clear value are processed through softmax, wherein the numerical value is used as a fuzzy clear classification value of the text block area.

Optionally, a preset evaluation engine is called, and a fuzzy clear classification value of the image to be processed is obtained based on the fuzzy clear classification value of the text block area, so as to generate evaluation data of the image to be processed according to the fuzzy clear interval level, including:

sequencing all text block areas from large to small according to the confidence score, further calculating the overlapping degree of the text block areas according to an intersection comparison algorithm from the text block area with the largest confidence score, comparing through a preset overlapping degree value threshold, and filtering out the text block areas smaller than the overlapping degree value threshold;

and obtaining a fuzzy definition classification value of the image to be processed by using a weighted voting method for the rest text block area, and matching the fuzzy definition classification value of the image to be processed with the fuzzy definition interval level to obtain a fuzzy definition classification level which is used as evaluation data of the image to be processed.

In addition, the invention also provides an image processing device, which comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model; the processing module is used for inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network characteristics, fusing middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification levels; the generation module is used for analyzing the text block area data, obtaining the confidence coefficient of the text block area, further calling a preset evaluation engine, calculating the fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block area, and generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval level.

One embodiment of the above invention has the following advantages or benefits: the fuzzy clear judgment of text positioning based on the inaccurate large text block area can replace manual screening in the image retrieval process, the fuzzy evaluation result of the document image can be obtained within 0.1 second, the recall rate and the accuracy rate can reach more than 95%, the image retrieval efficiency is greatly improved, and the related labor cost is reduced. And the detection precision is sacrificed (the detection task is simplified) by optimizing and changing three links of the label, the network structure and the loss function, but the fuzzy classification judgment precision of the text block area and the overall speed of the method are improved. Meanwhile, the images with partially blurred and partially clear images are screened out through a final grading strategy. In addition, the invention can be applied to an image retrieval platform to obtain the evaluation data of the image, so that the result of subsequent image retrieval can meet the most basic clear condition.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

Fig. 1 is a schematic diagram of a main flow of an image processing method according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of an image quality assessment neural network according to an embodiment of the present invention;

FIG. 3 is an example of image recognition according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a main flow of an image processing method according to a second embodiment of the present invention;

fig. 5 is a schematic diagram of main modules of an image processing apparatus according to an embodiment of the present invention;

FIG. 6 is an exemplary device frame pattern to which embodiments of the present invention may be applied;

fig. 7 is a schematic diagram of a computer apparatus suitable for use in implementing an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of the main flow of an image processing method according to a first embodiment of the present invention, the image processing method including:

step S101, obtaining an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model.

In some embodiments, the adjusting the image to be processed into the target image to be processed is based on a preset preprocessing model, and the implementation process specifically includes: and performing white edge filling on the image to be processed to adjust the image to be processed to a target shape, and further scaling the adjusted image to be processed to a preset size to obtain the target image to be processed.

Step S102, inputting the target to-be-processed image into a trained image quality evaluation neural network, extracting multi-scale network characteristics to fuse middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification level.

In some embodiments, before inputting the target image to be processed into the trained image quality assessment neural network, including constructing the image quality assessment neural network, employing a multi-tasking loss function including classification loss and regression loss; wherein classification losses in the multitasking loss function are given a greater weight to bias image quality assessment neural network training more toward reducing classification losses.

As an embodiment, the target to-be-processed image is input into a trained image quality evaluation neural network, and as shown in fig. 2, feature extraction is sequentially performed on the target to-be-processed image through five convolution pooling layers; performing up-sampling operation on the features obtained after the fifth layer of convolution pooling layer, and further superposing the results obtained after the fourth layer of convolution pooling layer and the results obtained after the features are subjected to 1x1 convolution with the results of the up-sampling operation to obtain a first feature map; performing up-sampling operation on the first feature map, and further superposing a result obtained by carrying out 1x1 convolution on the feature obtained by the third layer convolution pooling layer with the result of the up-sampling operation to obtain a second feature map; and performing up-sampling operation on the second feature map to obtain a target feature map, so that the target feature map generates text block region data and text block region fuzzy clear classification values through preset two full-connection layers.

Preferably, the five-layer convolution pooling layer comprises: the first convolution pooling layer adopts 64 convolution kernels of 3×3 and 1 maxpooling pooling layer; the second convolution pooling layer adopts 128 convolution kernels of 3×3 and 1 maxpooling pooling layer; the third convolution pooling layer adopts 2 layers of 256 convolution kernels of 3 multiplied by 3, and then uses 1 layer of 256 convolution layers of 1 multiplied by 1 and 1 maxpooling pooling layer; the fourth convolution pooling layer adopts 2 layers of 512 convolution kernels with the size of 3 multiplied by 3, and then uses 1 layer of 512 convolution layers with the size of 1 multiplied by 1 and 1 maxpooling pooling layer; the fifth convolutional pooling layer uses 2 layers of 512 convolutional kernels of 3×3, and then uses 1 layer of 512 convolutional layers of 1×1 and 1 maxpooling pooling layer.

It is also worth to describe that after the target feature map is operated through a preset two-layer full-connection layer, text block area data is output, and the text block area data comprises an upper left corner coordinate, a lower right corner coordinate and a confidence corresponding to the text block area; and after the target feature map is operated through the other two preset full-connection layers, outputting a fuzzy value and a clear value corresponding to the text block area, and further obtaining a numerical value between 0 and 1 after the fuzzy value and the clear value are processed through softmax, wherein the numerical value is used as a fuzzy clear classification value of the text block area.

As other embodiments, before inputting the target to-be-processed image into the trained image quality assessment neural network, the training of the image quality assessment neural network is required, and the specific implementation process includes:

m literal images are acquired, the images are white-filled to a target shape (e.g., square), and then the images are scaled to a fixed size (e.g., 768x 768). And (3) marking text blocks of m images, wherein the text block content is marked as long as the text blocks are approximately framed, namely, marking of large text blocks is not required, a small text target area is abandoned, fuzzy judgment can be handled as long as the small text target area is not too outlier, the accuracy of target detection is slightly sacrificed, but the accuracy of fuzzy judgment is improved, and the large text blocks (instead of text lines or small text blocks) are detected, so that the parameter quantity is much smaller and the speed is much higher. Labels that label text blocks are clear or ambiguous.

And constructing a convolutional neural network, carrying out convolution and fusion operation on the images to obtain features with different scales, regressing an inaccurate text block area from the fused features, further continuously classifying the text block area, and outputting classifications 0 and 1 of the text block (0 represents clear text and 1 represents fuzzy text). Wherein the loss function employs a multitasking loss function including classification loss and regression loss, specifically: loss function = classification loss function + regression loss function, i.e

L＝λ _s L _s +λ _g L _g

Preferably, the classification loss L _s Adopting classical classification loss function cross entropy function and regression loss L _g Classical regression loss function IOU loss is used. Classification losses in the loss function, i.e. lambda _s The convolutional neural network training is biased to reduce the classification loss by giving more weight appropriately. Wherein lambda is _s And lambda (lambda) _g Respectively, is a classification loss function and regressionThe preset weight of the loss function.

The constructed convolutional neural network is input as an rgb three-channel image and a fuzzy definition classification of the image. And extracting different scale features (ResNet 50) through a convolutional neural network, and carrying out feature fusion and output. The specific structure comprises 5 layers of feature extraction:

a first layer: 1 convolutional layer and 1 pooling layer, 64 3×3 convolutional kernels and 1 maxpooling pooling layer are used.

A second layer: 2 convolutional layers and 1 pooling layer, 128 convolutional kernels of 3×3 and 1 maxpooling pooling layer are used.

Third layer: 3 convolutional layers and 1 pooling layer, 2 layers of 256 convolutional kernels of 3×3 are adopted first, and then 1 layer of 256 convolutional layers of 1×1 and 1 pooling layer of maxpooling are used.

Fourth layer: 3 convolutional layers and 1 pooling layer, 2 layers of 512 convolutional kernels of 3×3 are adopted first, and then 1 layer of 512 convolutional layers of 1×1 and 1 pooling layer of maxpooling are used.

Fifth layer: 3 convolutional layers and 1 pooling layer, 2 layers of 512 convolutional kernels of 3×3 are adopted first, and then 1 layer of 512 convolutional layers of 1×1 and 1 pooling layer of maxpooling are used.

The specific structure of feature fusion comprises: and firstly, carrying out up-sampling operation on the output of the fifth layer, recovering the size of the output to the result of the last step of the convolution pooling operation, and simultaneously, overlapping the output of the fourth layer with the result of the up-sampling operation after 1x1 convolution to obtain a first characteristic diagram. And performing up-sampling operation on the first characteristic diagram, and superposing a result of the up-sampling operation on the third layer after 1x1 convolution to obtain a second characteristic diagram. The second feature map is directly up-sampled, is not overlapped with the previous convolution pooling result, only features of lower layers (more focused on target position information) in the feature extraction network are adopted, features of higher layers (more focused on semantic information) are not combined, namely, the selected text block area is larger in target, only the approximate text area is positioned, the method is not required to be accurate, the network parameter quantity is reduced, the network speed is increased, the purpose of approximately positioning the text area is matched, and the method is not accurate.

The network output specific structure comprises fuzzy clear classification values of the regression text block area and regression text block area data. The text block area data is obtained through the following steps: performing a two-layer full-connection operation on the up-sampled second feature map, generating 8 candidate frames to be selected on each pixel point, wherein the candidate frames have different sizes, marking the values of all text block areas of a training sample as 1, marking the values of non-text block areas as 0 during model training, then regressing the candidate frames of the areas with the values of 1 by using a regression mode, and merging the candidate frames into a final text block area by using an NMS (non-maximum suppression) algorithm to obtain text block area data: the upper left and lower right corner coordinates of the text block area. The text block areas are marked with fuzzy clear classifications during training, and then the corresponding classifications are regressed in each text block area by a regression method during training.

It is worth noting that the confidence score is output at the same time as the text block region data is output. Preferably, all text block areas are firstly sorted according to the respective confidence scores, the text block areas with the largest confidence scores are firstly counted from big to small, the overlapping degree of the text block areas is calculated according to an overlap ratio (IOU) algorithm, screening is carried out (for example, the value of the overlapping degree is set to be 0.75), and then the rest text block areas are subjected to weighted voting to obtain final image fuzzy definition judgment, namely, the numerical value after weighted voting is matched with fuzzy definition classification level.

In addition, the invention carries out another two-layer full-connection operation on the up-sampled second feature map to obtain a fuzzy clear classification value of the regression text block region, further obtains a value between 0 and 1 after the fuzzy clear classification value is processed by softmax (normalized exponential function), and the value after voting weighting is divided into 3 grades (shown in the following table), wherein 0 to 0.3 is a fuzzy image, 0.3 to 0.7 is a partial fuzzy partial clear image, and 0.7 to 1 is a clear image. In the case of determining the degree of blurring of an actual medical text image, there are many images that are partially clear and partially blurred due to various reasons such as shooting angle and compression during transmission. Therefore, the classification is carried out through the predicted value, and the purpose of screening out the image with a part of blurred and clear part and a class of images with a large proportion in practical application is realized.

Watch (watch)

Class value interval class	Conclusion of quality
		0～0.3	Blurring
0.3～0.7	Part of the blur is clear
		0.7～1	Clear and clear

Step S103, analyzing the text block area data, obtaining the confidence coefficient of the text block area, further calling a preset evaluation engine, calculating the fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block area, and generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval level.

In some embodiments, invoking a preset evaluation engine to calculate a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, so as to generate evaluation data of the image to be processed according to the fuzzy clear interval level, including: sequencing all text block areas from large to small according to the confidence score, further calculating the overlapping degree of the text block areas according to an intersection comparison algorithm from the text block area with the largest confidence score, comparing through a preset overlapping degree value threshold, and filtering out the text block areas smaller than the overlapping degree value threshold; and obtaining a fuzzy definition classification value of the image to be processed by using a weighted voting method for the rest text block area, and matching the fuzzy definition classification value of the image to be processed with the fuzzy definition interval level to obtain a fuzzy definition classification level which is used as evaluation data of the image to be processed.

The weighted voting algorithm is to filter the rest text block areas, wherein each text block area has a confidence score and also has an output value of fuzzy clear classification, divide the confidence score of each text block area by the total value of the sum of the confidence scores of all the rest text block areas to obtain the weight of each text block area, multiply the output value of the fuzzy clear classification of each text block area, add to obtain the final value, and then judge according to the classification value interval level to obtain the final quality assessment conclusion.

It is worth to say that the evaluation data may include a fuzzy definition classification level of each text block area in the image and a fuzzy definition classification level of the whole image, so that the invention can evaluate the fuzzy definition of each text block area and the whole image, and realize omnibearing dynamic evaluation, that is, the fuzzy definition evaluation of the text block area and the whole image is performed for different images based on the fuzzy definition classification level of the text block area of the image. In summary, it can be seen that the fuzzy clear classification of the text block area is judged through the text block area which is not accurately detected, and the judgment result of the text block area above the threshold is selected for voting according to the confidence threshold, so that the clear fuzzy conclusion of the final image is obtained. Meanwhile, the invention discards the existing text line detection thought, adopts the detection thought of large text block areas, and only screens out the approximate proper text block areas (as shown in figure 3), thereby achieving better fuzzy clear judgment precision and faster speed. In addition, the invention can evaluate and process a large number of medical health images in an image retrieval platform, screen and remove some blurred unusable images, and can also be used for manually identifying some half-blurred images, thereby greatly improving the application efficiency of the images. That is, the invention can not only distinguish blurred and clear images, but also distinguish that some images are special images with partially blurred and partially clear, and the special images are quite large in appearance in the practical application process although being special, so that the special images are quite necessary to be screened out singly and cannot be mixed with only blurred or only clear images, thereby causing deviation of image identification.

Fig. 4 is a schematic diagram of the main flow of an image processing method according to a second embodiment of the present invention. The image processing method comprises the following steps:

step S401, acquiring an image to be processed.

Step S402, based on a preset pretreatment model, performing white edge filling on the image to be treated to adjust the image to be treated to a target shape, and further scaling the adjusted image to be treated to a preset size to obtain the target image to be treated.

And S403, sequentially extracting features of the target image to be processed through five convolution pooling layers.

Step S404, up-sampling operation is carried out on the features obtained after the fifth layer of convolution pooling layer, and then the results obtained after the fourth layer of convolution pooling layer are subjected to 1x1 convolution are overlapped with the results of the up-sampling operation to obtain a first feature map.

Step S405, performing an upsampling operation on the first feature map, and further overlapping a result obtained by performing a 1x1 convolution on the feature obtained by the third convolution pooling layer with the result of the upsampling operation to obtain a second feature map.

Step S406, performing an up-sampling operation on the second feature map to obtain a target feature map, so that the target feature map generates text block region data and a text block region fuzzy clear classification value respectively through two preset two full-connection layers.

In an embodiment, after the target feature map is operated through a preset two-layer full-connection layer, text block area data including five nodes is output, wherein four nodes are upper left corner coordinates and lower right corner coordinates, and one node is a confidence level corresponding to the text block area. And after the target feature map is operated through the other two preset full-connection layers, outputting a fuzzy value and a clear value corresponding to the text block area, and further obtaining a numerical value between 0 and 1 after the fuzzy value and the clear value are processed through softmax, wherein the numerical value is used as a fuzzy clear classification value of the text block area.

Step S407, calling a preset evaluation engine, sorting all text block areas from large to small according to the confidence score in the text block area data, further calculating the overlapping degree of the text block areas according to an intersection comparison algorithm from the text block area with the largest confidence score, comparing through a preset overlapping degree value threshold, and filtering out the text block areas smaller than the overlapping degree value threshold.

Step S408, obtaining a fuzzy clear classification value of the image to be processed by using a weighted voting method for the rest text block area, and matching the fuzzy clear classification value of the image to be processed with the fuzzy clear interval level to obtain a fuzzy clear classification level which is used as evaluation data of the image to be processed and output.

In summary, the invention judges whether the text is blurred or not based on the text block area which is not accurately detected, thereby determining whether the whole document image is blurred or not, and realizing that the fuzzy judgment of the document image is focused on the text content. Marking (marking and detecting large text block areas with coarse granularity), network structure (discarding high-level semantic information features in feature extraction, focusing on middle-low-level network features, increasing abstract expression capacity of 1x1 convolution to improve features) and loss function (increasing weight of classification loss in the loss function), and optimizing and changing three links to achieve the aim of improving fuzzy classification judgment precision of text block areas although sacrificing detection precision. Therefore, the invention greatly improves the speed of the whole network under the condition of simplifying the whole detection task, meets the requirement of on-line real-time performance, and reduces the labeling difficulty. In addition, the invention can obtain clear or fuzzy conclusion of the image, and can also screen out the partially fuzzy and partially clear image through a final grading strategy.

It is worth to say that the invention can be applied to an image retrieval platform, can be used as one of the functional modules, can also independently provide services to the outside, and supports interface call.

As an application embodiment of the invention, in the image retrieval platform, a certain preprocessing is required to be carried out on all uploaded images in the process of processing a large number of medical health images, and some fuzzy unusable images, such as some health archive images or physical health images, are screened out, if the business is required, the fuzzy images are fed back to the business to be uploaded again, and if the business is not required, the images are deleted in a direct middle image library.

Fig. 5 is a schematic diagram of main modules of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the image processing apparatus 500 includes an acquisition module 501, a processing module 502, and a generation module 503. The acquiring module 501 acquires an image to be processed, and adjusts the image to be processed into a target image to be processed based on a preset preprocessing model; the processing module 502 inputs the target image to be processed into a trained image quality evaluation neural network, extracts multi-scale network characteristics to fuse middle-low layer network characteristics and outputs text block area data and text block area fuzzy clear classification level; the generating module 503 analyzes the text block area data, obtains the confidence coefficient of the text block area, and further invokes a preset evaluation engine, and calculates a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block area, so as to generate and output evaluation data of the image to be processed according to the fuzzy clear interval level.

In some embodiments, the obtaining module 501 adjusts the image to be processed to a target image to be processed based on a preset preprocessing model, including performing white-edge filling on the image to be processed to adjust to a target shape, and scaling the adjusted image to be processed to a preset size to obtain the target image to be processed.

In some embodiments, before the processing module 502 inputs the target image to be processed into the trained image quality assessment neural network, it includes:

In some embodiments, the processing module 502 inputs the target image to be processed into a trained image quality assessment neural network, comprising:

In some embodiments, the five-layer convolution pooling layer comprises:

In some embodiments, the processing module 502 further includes:

In some embodiments, the generating module 503 invokes a preset evaluation engine to calculate a fuzzy definition classification value of the image to be processed based on the fuzzy definition classification value of the text block region, so as to generate the evaluation data of the image to be processed according to the fuzzy definition interval level, including

Sequencing all text block areas from large to small according to the confidence score, further calculating the overlapping degree of the text block areas according to an intersection comparison algorithm from the text block area with the largest confidence score, comparing through a preset overlapping degree value threshold, and filtering out the text block areas smaller than the overlapping degree value threshold; and obtaining a fuzzy definition classification value of the image to be processed by using a weighted voting method for the rest text block area, and matching the fuzzy definition classification value of the image to be processed with the fuzzy definition interval level to obtain a fuzzy definition classification level which is used as evaluation data of the image to be processed.

In the image processing method and the image processing apparatus according to the present invention, the specific implementation content has a corresponding relationship, and therefore, the repetitive content will not be described.

Fig. 6 shows an exemplary device architecture 600 in which an image processing method or image processing device of an embodiment of the present invention may be applied.

As shown in fig. 6, the apparatus architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.

The terminal devices 601, 602, 603 may be various electronic devices having an image processing screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.

It should be noted that, the image processing method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the computing device is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 7, there is shown a schematic diagram of a computer arrangement 700 suitable for use in implementing a terminal device of an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, the computer apparatus 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer device 700 are also stored. The CPU701, ROM702, and RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a liquid crystal image processor (LCD), and the like, and a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the apparatus of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or any combination of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes an acquisition module, a processing module, and a generation module. The names of these modules do not constitute a limitation on the module itself in some cases.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by one of the devices, cause the device to include acquiring an image to be processed, and adjusting the image to be processed to a target image to be processed based on a preset preprocessing model; inputting the target to-be-processed image into a trained image quality evaluation neural network, extracting multi-scale network characteristics to fuse middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification levels; analyzing the text block region data, obtaining the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval level.

According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention can be applied to an image retrieval platform to realize automatic screening and filtering of the influence of fuzzy characters, so that the subsequent image retrieval result meets the clear condition.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. An image processing method, comprising:

acquiring an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset pretreatment model;

inputting the target to-be-processed image into a trained image quality evaluation neural network, extracting multi-scale network characteristics to fuse middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification values;

analyzing the text block region data, acquiring the confidence coefficient of the text block region, and calling a preset evaluation engine, and calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region to generate and output evaluation data of the image to be processed according to the fuzzy clear interval level, wherein the text block region data comprises an upper left corner coordinate, a lower right corner coordinate and the confidence coefficient corresponding to the text block region;

Invoking a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block area, so as to generate evaluation data of the image to be processed according to the fuzzy clear interval level, wherein the evaluation data comprises the following steps:

obtaining a fuzzy definition classification value of the image to be processed by using a weighted voting method on the rest text block area, and matching the fuzzy definition classification value of the image to be processed with a fuzzy definition interval level to obtain a fuzzy definition classification level which is used as evaluation data of the image to be processed;

inputting the target to-be-processed image into a trained image quality evaluation neural network, extracting multi-scale network characteristics to fuse middle-low layer network characteristics, and outputting text block area data and text block area fuzzy definition classification values comprises the following steps:

and performing up-sampling operation on the second feature map to obtain a target feature map, so that the target feature map respectively generates text block region data and text block region fuzzy clear classification values through two preset full-connection layers.

2. The method according to claim 1, wherein adjusting the image to be processed to a target image to be processed based on a preset preprocessing model comprises:

3. The method of claim 1, wherein prior to inputting the target image to be processed into the trained image quality assessment neural network, comprising:

4. The method of claim 1, wherein the five-layer convolution pooling layer comprises:

5. The method as recited in claim 1, further comprising:

after the target feature map is operated through a preset two-layer full-connection layer, outputting text block area data;

6. An image processing apparatus, comprising:

the acquisition module is used for acquiring an image to be processed and adjusting the image to be processed into a target image to be processed based on a preset pretreatment model;

the processing module is used for inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network characteristics, fusing middle-low layer network characteristics, and outputting text block area data and text block area fuzzy clear classification values;

the generation module is used for analyzing the text block area data, acquiring the confidence coefficient of the text block area, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block area, generating and outputting evaluation data of the image to be processed according to the fuzzy clear interval level; the text block area data comprises upper left corner coordinates, lower right corner coordinates and confidence corresponding to the text block area;

The generating module calls a preset evaluation engine, calculates a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block area, so as to generate evaluation data of the image to be processed according to the fuzzy clear interval level, and comprises the following steps:

sequencing all text block areas from large to small according to the confidence score, further calculating the overlapping degree of the text block areas according to an intersection comparison algorithm from the text block area with the largest confidence score, comparing through a preset overlapping degree value threshold, and filtering out the text block areas smaller than the overlapping degree value threshold; obtaining a fuzzy definition classification value of the image to be processed by using a weighted voting method on the rest text block area, and matching the fuzzy definition classification value of the image to be processed with a fuzzy definition interval level to obtain a fuzzy definition classification level which is used as evaluation data of the image to be processed;

the processing module inputs the target to-be-processed image into a trained image quality evaluation neural network, and the processing module comprises the following steps:

7. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.