CN111898577B

CN111898577B - Image detection method, device, equipment and computer readable storage medium

Info

Publication number: CN111898577B
Application number: CN202010795779.6A
Authority: CN
Inventors: 陈星宇; 张睿欣; 李绍欣; 黄飞跃; 李季檩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2022-08-26
Anticipated expiration: 2040-08-10
Also published as: CN111898577A

Abstract

The embodiment of the application provides an image detection method, an image detection device, image detection equipment and a computer-readable storage medium; the method comprises the following steps: based on the initial detection model, carrying out image detection on the sample image to obtain at least one prediction component ratio corresponding to at least one preset category; obtaining the difference between at least one predicted component proportion and at least one labeled component proportion based on the labeling category to obtain the mixed Gaussian distribution loss; obtaining the difference between at least one predicted component proportion and at least one labeled component proportion based on at least one preset category score to obtain the mean square error loss; performing iterative training on the initial detection model by using the combined loss of the Gaussian mixture distribution loss and the mean square error loss until a training cut-off condition is met, and determining the initial detection model after the iterative training as a preset detection model; the preset detection model is used for obtaining the detection value of the image to be detected. Through this application embodiment, can promote the fineness that image detection detected.

Description

Image detection method, device, equipment and computer readable storage medium

Technical Field

The present application relates to image processing technologies in the field of artificial intelligence, and in particular, to an image detection method, apparatus, device, and computer readable storage medium.

Background

With the development of artificial intelligence, images become important information which is the basis in life application; in practical applications, when processing is performed according to an image, sometimes the image is detected; for example, the quality of the image is detected, or the blocking condition of a human face in the image is detected, so as to determine the subsequent processing of the image according to the detection result.

Generally, in order to determine the detection result of an image, the image is usually divided into two or more classes in advance, and the network model is used to determine which of the two or more classes the image belongs to; however, since the above-described specified detection result is the category to which the image belongs, the fineness of the detection result of the specified image is low, and therefore, the fineness of the image detection is low.

Disclosure of Invention

The embodiment of the application provides an image detection method, device and equipment and a computer readable storage medium, which can improve the fineness of image detection.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an image detection method, which comprises the following steps:

obtaining a sample to be detected, wherein the sample to be detected comprises a sample image, an annotation category and at least one annotation component ratio corresponding to at least one preset category;

based on an initial detection model, carrying out image detection on the sample image to obtain at least one prediction component ratio corresponding to the at least one preset category;

obtaining the difference between the at least one predicted component proportion and the at least one labeled component proportion based on the labeling category to obtain mixed Gaussian distribution loss;

obtaining the difference between the at least one predicted component proportion and the at least one labeled component proportion based on at least one preset category score to obtain mean square error loss;

performing iterative training on the initial detection model by using the combined loss of the Gaussian mixture distribution loss and the mean square error loss until a training cut-off condition is met, and determining the initial detection model after the iterative training as a preset detection model; the preset detection model is used for obtaining the detection score of the image to be detected.

The embodiment of the present application further provides an image detection method, including:

acquiring an image to be detected;

extracting the feature to be processed of the image to be detected by using the preset detection model, and determining at least one component proportion corresponding to at least one preset category based on the feature to be processed; wherein the at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to the at least one preset category;

weighting and summing at least one preset category score and the at least one component ratio to obtain a detection score;

and determining a target detection result of the image to be detected based on the detection score.

An embodiment of the present application provides a first image detection apparatus, including:

the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a sample to be detected, and the sample to be detected comprises a sample image, an annotation category and at least one annotation component proportion corresponding to at least one preset category;

the prediction module is used for carrying out image detection on the sample image based on an initial detection model to obtain at least one prediction component ratio corresponding to the at least one preset category;

a loss obtaining module, configured to obtain a difference between the at least one predicted component proportion and the at least one labeled component proportion based on the label category, so as to obtain a mixed gaussian distribution loss;

the loss obtaining module is further configured to obtain a difference between the at least one predicted component proportion and the at least one labeled component proportion based on at least one preset category score, so as to obtain a mean square error loss;

the model training module is used for carrying out iterative training on the initial detection model by utilizing the combined loss of the Gaussian mixture distribution loss and the mean square error loss until a training cut-off condition is met, and determining the initial detection model after the iterative training as a preset detection model; the preset detection model is used for obtaining the detection value of the image to be detected.

In an embodiment of the present application, the sample obtaining module is further configured to obtain the sample image and the annotation category; fitting the sample image and the at least one labeled component ratio corresponding to the at least one preset category by using the sample image and the labeled category; and proportionally combining the sample image, the labeling category and the at least one labeling component to obtain the sample to be detected.

In this embodiment of the present application, the loss obtaining module is further configured to obtain, from the at least one predicted component proportion, a target predicted component proportion corresponding to the labeling category; acquiring a target labeling component proportion corresponding to the labeling category from the at least one labeling component proportion; calculating the product result of the target prediction component proportion and the target labeling component proportion; calculating a weighted summation of the at least one predicted component proportion and the at least one annotated component proportion; and calculating the ratio of the multiplication result to the weighted summation result, and finishing the acquisition of the difference between the at least one prediction component ratio and the at least one labeling component ratio, thereby obtaining the mixed Gaussian distribution loss.

In this embodiment of the present application, the loss obtaining module is further configured to perform weighted summation on the ratio between the at least one preset category score and the at least one prediction component to obtain a prediction score; weighting and summing the ratio of the at least one preset category score to the at least one labeled component to obtain a labeled score; obtaining the difference between the annotation score and the prediction score to obtain an initial difference; obtaining a current difference, and calculating a target difference between the initial difference and the current difference; and determining the minimum difference between the target difference and a preset difference as the mean square error loss.

In this embodiment of the present application, the first image detection apparatus further includes a model optimization module, configured to obtain a new sample to be detected; and optimizing the preset detection model by using the new sample to be detected to obtain the optimized preset detection model.

An embodiment of the present application provides a second image detection apparatus, including:

the image acquisition module is used for acquiring an image to be detected;

the image detection module is used for extracting the to-be-processed characteristics of the to-be-detected image by using the preset detection model and determining at least one component proportion corresponding to at least one preset category based on the to-be-processed characteristics; wherein the at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to the at least one preset category;

and the result determining module is used for carrying out weighted summation on at least one preset category score and the at least one component ratio to obtain a detection score, and determining a target detection result of the image to be detected based on the detection score.

In an embodiment of the application, the result determining module is further configured to compare the detection score with at least one preset result threshold to obtain a comparison result; and determining the target detection result of the image to be detected according to the comparison result.

In the embodiment of the application, the image acquisition module is further configured to acquire a detection instruction through an image detection interface; responding to the detection instruction, and acquiring the image to be detected;

in this embodiment, the image detection apparatus further includes a result processing module, configured to display the target detection result on a detection result interface.

In this embodiment of the application, the image obtaining module is further configured to receive a detection request sent by a client device; responding to the detection request, and acquiring the image to be detected;

in an embodiment of the application, the result processing module is further configured to send the target detection result to the client device, so as to display the target detection result on a display interface of the client device.

In an embodiment of the present application, the image obtaining module is further configured to obtain an initial image; detecting a face region of the initial image by using preset face key point information; and determining the face region intercepted from the initial image as the image to be detected.

The embodiment of the present application provides a first image detection apparatus, including:

a first memory for storing executable instructions;

and the first processor is used for realizing the image detection method applied to the first image detection device provided by the embodiment of the application when executing the executable instructions stored in the first memory.

a second memory for storing executable instructions;

and the second processor is used for implementing the image detection method applied to the second image detection device provided by the embodiment of the application when the executable instructions stored in the second memory are executed.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for realizing an image detection method applied to first image detection equipment when being executed by a first processor; or, when executed by the second processor, implement an image detection method applied to the second image detection apparatus.

The embodiment of the application has at least the following beneficial effects: the preset detection model is obtained by combining loss training of mixed Gaussian distribution loss and mean square error loss, so that the detection score of the image to be detected can be obtained by using the preset detection model, and the detection score can represent the degree of the image to be detected corresponding to the detection item; therefore, the preset detection model can obtain the image detection result with high fineness, and therefore, the fineness of image detection can be improved.

Drawings

FIG. 1 is a schematic view of an exemplary mask worn by a wearer;

FIG. 2 is a schematic view of another exemplary mask configuration;

FIG. 3 is a schematic diagram of an alternative architecture of an image inspection system according to an embodiment of the present application;

fig. 4a is a schematic structural diagram of a server in fig. 3 according to an embodiment of the present disclosure;

fig. 4b is a schematic structural diagram of another server in fig. 3 according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart diagram of an alternative image detection method provided in an embodiment of the present application;

FIG. 6 is a schematic flow chart of another alternative image detection method provided in the embodiments of the present application;

FIG. 7 is a diagram illustrating at least one predetermined result threshold provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of an exemplary method for acquiring an image to be detected according to an embodiment of the present disclosure;

FIG. 9 is a diagram illustrating an exemplary target detection result provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of another exemplary method for acquiring an image to be detected according to an embodiment of the present disclosure;

FIG. 11 is a diagram illustrating another exemplary target detection result provided by an embodiment of the present application;

FIG. 12 is a graph illustrating the results of an exemplary assay performed on a test sample;

FIG. 13 is a graphical representation of the results of another exemplary assay performed on a test sample;

FIG. 14 is a graph illustrating the results of an exemplary assay performed on a test sample according to an embodiment of the present disclosure;

FIG. 15 is a schematic diagram illustrating an exemplary process of training and predicting a pre-set detection model according to an embodiment of the present disclosure;

fig. 16 is a schematic diagram of obtaining at least one ratio of prediction components according to an embodiment of the present disclosure;

FIG. 17 is a schematic diagram of an exemplary mask occlusion portion acquisition device provided in an embodiment of the present disclosure;

FIG. 18 is a schematic diagram illustrating an exemplary application of a target detection result provided by an embodiment of the present application;

fig. 19 is a schematic diagram of another exemplary training and prediction process of a preset detection model according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge.

2) Machine Learning (ML): the method is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Specially researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills; reorganizing the existing knowledge structure to improve the performance of the knowledge structure. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and inductive learning.

3) Artificial neural networks: is a mathematical Model for imitating the structure and function of biological Neural Networks, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Gaussian Mixture Models (GMM), Recurrent Neural Networks (RNN), and so on.

4) Loss function: the function is also called a cost function, and is a function which maps the value of the random event or the random variable related to the random event into a non-negative real number to represent the risk or loss of the random event; as is readily known, the loss function value is a value calculated by a loss function.

Illustratively, with the development of artificial intelligence, human faces become the most potential means for biometric authentication, which is performed by recognizing human faces; however, in practical applications, a situation that a face is blocked often occurs, for example, the face is blocked by a mask, a scarf, a brim, and the like, and at this time, a result that the face is blocked needs to be determined, so as to perform subsequent processing according to the result that the face is blocked. Generally, in order to determine the result that a human face is occluded, the result is generally divided into two types of occlusion and non-occlusion (for example, whether a mask is worn or not), and whether the human face in an image belongs to the occlusion type or the non-occlusion type is determined through a network model; at this time, the image detection is taken as a secondary classification problem; however, it is unclear when actually applying to directly determine whether the user is occluded, for example, in an application of judging whether to wear a mask based on occlusion of a face in an image, see a mask wearing manner 1-1 shown in fig. 1: the mask is pulled down to the chin, and whether the mask wearing mode 1-1 is the mask wearing condition is difficult to judge according to the classification problem, so that whether the scene belongs to the mask wearing condition or not and the task requirement is seriously bound is judged at the moment; thus, the image detection method, which takes image detection as a two-classification problem, has a narrow application range and low accuracy.

Similarly, when the images are divided into a plurality of types (such as pulling down the mask, correctly wearing the mask, having no mask and partially covering the mask) in advance, which of the plurality of types the face belongs to is determined through the network model; in this case, image detection is used as a multi-classification problem. However, similarly to when the image detection is regarded as a two-classification problem as described above, there is also a case where it is not possible to accurately determine which category it belongs to; for example, in an application of judging whether to wear a mask according to the occlusion of a human face in an image, see 5 mask wearing manners shown in fig. 2: the mask wearing mode is from 2-1 to 2-5; when the multiple categories are: when the mouth of the mask is blocked, the mask is pulled down, and the mask is worn correctly or is not worn, the mask wearing mode 2-2 cannot determine whether the mask belongs to the mouth blocking or the mask pulling down. In addition, each mask wearing mode needs to be added with one type, so that the mode is easily changed into an incremental task, and the overall difficulty and data volume of the task are increased. Thus, the image detection method, which takes image detection as a multi-classification problem, has low accuracy and high complexity of image detection.

In addition, when the image quality detection is carried out on the image, if at least two quality categories are divided in advance, which of a plurality of categories the face belongs to in the image is determined through a network model; also, there are problems of low accuracy, high complexity of image detection, and narrow application range.

Based on this, embodiments of the present application provide an image detection method, an image detection apparatus, an image detection device, and a computer-readable storage medium, which can improve the fineness and accuracy of image detection, reduce the complexity of image detection, and widen the application range of image detection. An exemplary application of the image detection apparatus provided in the embodiments of the present application is described below, and the image detection apparatus provided in the embodiments of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server. Next, an exemplary application when the image detection apparatus is implemented as a server will be described.

Referring to fig. 3, fig. 3 is a schematic diagram of an alternative architecture of an image detection system provided in the embodiment of the present application; as shown in fig. 3, in order to support an image inspection application, in the image inspection system 100, a terminal 400 (client devices, which exemplarily show a terminal 400-1 and a terminal 400-2) is connected to a server 200 (first image inspection device) through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both. In addition, the image detection system 100 further includes a database 500, and the database 500 provides data service to the server 200 to support the server 200 to perform image detection. Further, the image inspection system 100 includes a server 600 (second image inspection apparatus), and the server 600 establishes a connection with the server 200 to provide a preset inspection model to the server 200.

The server 600 is further configured to obtain a sample to be detected, where the sample to be detected includes a sample image, an annotation category, and at least one annotation component proportion corresponding to at least one preset category; based on the initial detection model, carrying out image detection on the sample image to obtain at least one prediction component ratio corresponding to at least one preset category; obtaining the difference between at least one predicted component proportion and at least one labeled component proportion based on the labeling category to obtain the mixed Gaussian distribution loss; obtaining the difference between at least one predicted component proportion and at least one labeled component proportion based on at least one preset category score to obtain the mean square error loss; performing iterative training on the initial detection model by using combined loss of Gaussian mixture distribution loss and mean square error loss until a training cut-off condition is met, and determining the initial detection model after the iterative training as a preset detection model; the preset detection model is used for obtaining the detection value of the image to be detected.

A terminal 400-1 for performing image detection on the acquisition region through the image acquisition device 400-11, receiving a detection operation when an image including a face region is acquired, and transmitting the image including the face region to the server 200 through the network 300 in response to the detection operation; and receives the target detection result sent by the server 200 through the network 300 to display the target detection result (not shown in the figure) on the graphical interface 400-12, and performs subsequent processing according to the target detection result.

The terminal 400-2 is used for displaying the detection control 400-22 on the graphical interface 400-21, and responding to the detection operation to acquire the uploaded image when receiving the detection operation acted on the detection control 400-22; transmitting the uploaded image to the server 200 through the network 300; and receives the target detection result transmitted by the server 200 through the network 300 to display the target detection result (not shown in the figure) on the graphical interface 400-21, and performs subsequent processing according to the target detection result.

The server 200 is configured to receive the image sent by the terminal 400 through the network 300, and obtain an image to be detected according to the image; extracting to-be-processed features of an image to be detected by using a preset detection model, and determining at least one component proportion corresponding to at least one preset category based on the to-be-processed features; the preset detection model is obtained by combining Gaussian mixture loss and mean square error loss training, and at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to at least one preset category; weighting and summing at least one preset category score and at least one component ratio to obtain a detection score; determining a target detection result of the image to be detected based on the detection score; and transmits the target detection result to the terminal 400 through the network 300.

It should be noted that the function corresponding to the terminal 400 may be integrated in the server 200, or the function corresponding to the server 200 may also be integrated in the terminal 400, and at this time, the image detection system provided in the embodiment of the present application is implemented on one device. Further, the function corresponding to the server 200 and the function corresponding to the server 600 may be implemented by one server.

In some embodiments, the server 200 and the server 600 may be independent physical servers, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Network services, cloud communications, middleware services, domain name services, security services, a CDN (Content Delivery Network), big data, and artificial intelligence platforms. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.

Referring to fig. 4a, fig. 4a is a schematic structural diagram of a server in fig. 3 according to an embodiment of the present disclosure, where the server 200 shown in fig. 4a includes: at least one first processor 210, a first memory 250, at least one first network interface 220, and a first user interface 230. The various components in the first server 200 are coupled together by a first bus system 240. It is understood that the first bus system 240 is used to enable communications among the connections of these components. The first bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as first bus system 240 in fig. 4 a.

The first Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., wherein the general purpose Processor may be a microprocessor or any conventional Processor, etc.

The first user interface 230 includes one or more first output devices 231, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The first user interface 230 also includes one or more first input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The first memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. The first memory 250 optionally includes one or more storage devices physically located remotely from the first processor 210.

The first memory 250 includes volatile memory or nonvolatile memory and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The first memory 250 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, the first memory 250 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

A first operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a first network communication module 252 for communicating to other computing devices via one or more (wired or wireless) first network interfaces 220, an exemplary first network interface 220 comprising: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;

a first presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more first output devices 231 (e.g., a display screen, speakers, etc.) associated with the first user interface 230;

a first input processing module 254 for detecting one or more user inputs or interactions from one of the one or more first input devices 232 and translating the detected inputs or interactions.

In some embodiments, the first image detection apparatus provided in the embodiments of the present application may be implemented in software, and fig. 4a illustrates the first image detection apparatus 255 stored in the memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: sample acquisition module 2551, prediction module 2552, loss acquisition module 2553, model training module 2554 and model optimization module 2555, which are logical and therefore can be arbitrarily combined or further split depending on the functionality implemented.

Referring to fig. 4b, fig. 4b is a schematic diagram of a component structure of another server in fig. 3 according to an embodiment of the present disclosure, where the server 600 shown in fig. 4b includes: at least one second processor 610, a second memory 650, at least one second network interface 620, and a second user interface 630. The various components in the second server 600 are coupled together by a second bus system 640. It is understood that the second bus system 640 is used to enable connection communications between these components. The second bus system 640 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as the second bus system 640 in figure 4 b.

The second processor 610 may be an integrated circuit chip having signal processing capabilities, such as a general purpose processor, a digital signal processor, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., wherein the general purpose processor may be a microprocessor or any conventional processor, etc.

The second user interface 630 includes one or more second output devices 631, including one or more speakers and/or one or more visual displays, that enable presentation of media content. The second user interface 630 also includes one or more second input devices 632 including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The second memory 650 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. The second memory 650 optionally includes one or more storage devices physically located remote from the second processor 610.

The second memory 650 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory and the volatile memory may be a random access memory. The second memory 650 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, the second memory 650 can store data to support various operations, examples of which include programs, modules, and data structures, or a subset or superset thereof, as exemplified below.

A second operating system 651 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a second network communication module 652 for reaching other computing devices via the one or more second network interfaces 620, the example second network interface 620 including: bluetooth, wireless compatibility authentication, universal serial bus, and the like;

a second presentation module 653 for enabling presentation of information via one or more second output devices 631 associated with the second user interface 630;

a second input processing module 654 for detecting one or more user inputs or interactions from one of the one or more second input devices 632 and translating the detected inputs or interactions.

In some embodiments, the second image detection apparatus provided in the embodiments of the present application may be implemented in software, and fig. 4b illustrates the second image detection apparatus 655 stored in the memory 650, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: image acquisition module 6551, image detection module 6552, result determination module 6553 and result processing module 6554, which are logical and thus may be arbitrarily combined or further separated depending on the functionality implemented.

The functions of the respective modules will be explained below.

In other embodiments, the first image detection Device and the second image detection Device provided in this embodiment may be implemented in hardware, and as an example, the first image detection Device and the second image detection Device provided in this embodiment may be a processor in the form of a hardware decoding processor, which is programmed to execute the image detection method provided in this embodiment, for example, the processor in the form of the hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The image detection method provided by the embodiment of the present application will be described below with reference to an exemplary application and implementation of the first image detection device and the second image detection device provided by the embodiment of the present application, both of which are servers.

Referring to fig. 5, fig. 5 is an alternative flow chart of the image detection method provided in the embodiment of the present application, and will be described with reference to the steps shown in fig. 5. The first image detection device is simply referred to as a first device, and the second image detection device is simply referred to as a second device.

S501, the first device obtains a sample to be detected, wherein the sample to be detected comprises a sample image, an annotation category and at least one annotation component ratio corresponding to at least one preset category.

In the embodiment of the application, when the first device obtains the preset detection model through model training, a sample for model training, that is, a sample to be detected, needs to be obtained first.

It should be noted that the sample to be detected includes a sample image, an annotation category, and at least one annotation component ratio corresponding to at least one preset category. Wherein the sample image is a collected image sample; the labeling type is a type labeled on the sample image according to at least one preset type, and is easy to know, and the labeling type belongs to at least one preset type; the at least one labeling component proportion corresponds to at least one preset category one to one, and is a real component proportion corresponding to each preset category in the at least one preset category, which is obtained according to the sample image and the labeling category training of the sample image.

S502, the first device carries out image detection on the sample image based on the initial detection model to obtain at least one prediction component ratio corresponding to at least one preset category.

In this embodiment of the application, after the first device obtains the sample image, the initial detection model is used to predict the sample image, and at least one predicted component ratio corresponding to at least one preset category is obtained.

It should be noted that the initial detection model is a network model to be trained, for example, an MR-Net (Mask Regression Net, Mask Regression network); and the at least one predicted component proportion corresponds to at least one preset category one to one, and is a component proportion corresponding to each preset category in at least one preset category of the sample image predicted by the initial detection model.

S503, the first device obtains the difference between the at least one predicted component proportion and the at least one labeled component proportion based on the labeling category to obtain the mixed Gaussian distribution loss.

In the embodiment of the application, after the first device obtains the at least one predicted component proportion, in order to determine the predicted effect of the initial detection model, the difference between the at least one predicted component proportion and the at least one annotated component proportion is obtained from two aspects; on one hand, a loss function value corresponding to the labeling type, the at least one prediction component proportion and the at least one labeling component proportion is calculated by using a Gaussian mixture loss function, and then the Gaussian mixture distribution loss is obtained.

S504, the first device obtains the difference between at least one predicted component proportion and at least one labeled component proportion based on at least one preset category score to obtain the mean square error loss.

It should be noted that, when the first device obtains the difference between the at least one predicted component proportion and the at least one labeled component proportion from another aspect, the mean square error loss function is used to calculate the loss function value corresponding to the at least one preset category score, the at least one predicted component proportion, and the at least one labeled component proportion, so as to obtain the mean square error loss.

And S505, the first equipment performs iterative training on the initial detection model by using combined loss of Gaussian mixture distribution loss and mean square error loss until a training cut-off condition is met, and the initial detection model after the iterative training is determined to be a preset detection model.

In the embodiment of the application, after obtaining the gaussian mixture loss and the mean square error loss, the first device combines the gaussian mixture loss and the mean square error loss to obtain a combined loss; then, judging whether a training cut-off condition is met or not according to the combined loss, and stopping training when the training cut-off condition is met, wherein the initial detection model is a preset detection model; when the training cutoff condition is not met, performing iterative training on the initial detection model by using the combination loss; in the iterative training process, whether a training cutoff condition is met or not is judged every time training is completed, if not, iterative training is continued, and if yes, the initial detection model trained at the moment is determined as a preset detection model, namely, the preset detection model is a trained network model and is used for obtaining the detection score of the image to be detected.

It should be noted that the combination loss represents the prediction effect of the initial detection model to a certain extent, and is inversely proportional to the prediction effect; in addition, the preset training cutoff condition may be that the obtained combination loss is smaller than a preset loss threshold, or may be other judgment conditions, which is not specifically limited in this embodiment of the present application. In addition, the process of performing one training with the combination loss is as follows: and performing back propagation by using the combination loss, and performing a process of gradually adjusting parameters in the initial detection model.

Here, the combination mode may be superposition, weighted summation, and the like. Illustratively, the combined loss L can be calculated by equation (1):

L＝L ₁ +L ₂ (1)

wherein L is ₁ Is a Gaussian mixture loss, L ₂ Is the mean square error loss.

Referring to fig. 6, fig. 6 is a schematic flow chart of another alternative image detection method provided in the embodiment of the present application; as shown in fig. 6, in the embodiment of the present application, S501 may be implemented by S5011 to S5013; that is, the first apparatus acquires a sample to be tested, including S5011 to S5013, and each step is described separately below.

S5011, the first device obtains the sample image and the annotation category.

It should be noted that, in the field of machine learning, learning tasks can be divided into two categories, one is supervised learning, and the other is unsupervised learning. Typically, both require learning from a training data set containing a large number of training samples to obtain a predictive model, and each training sample has its corresponding true value output. However, since it is costly or difficult to obtain the true-value output of the training samples, i.e. the data annotation process, many tasks have difficulty obtaining such strong supervision information as the true-value output of all training samples. Therefore, it is desirable to be able to use weakly supervised machine learning techniques, where weakly supervised includes ambiguous supervision, which refers to labels that rely on the coarse granularity of the sample, to obtain fine-grained model prediction results. Here, the annotation category is a result obtained by performing coarse-grained annotation on the sample image.

S5012, the first device fits the sample image and at least one mark component ratio corresponding to at least one preset category by using the sample image and the mark category.

In this embodiment of the application, after the first device obtains the sample image and the annotation category, the sample image and the annotation category are used to fit the classification ratios respectively corresponding to the sample image and each preset category in the at least one preset category, so as to obtain at least one annotation component ratio.

For example, the sample image and the annotation class are input into a Convolutional Neural Network (CNN), that is, a component proportion corresponding to each preset class in the at least one preset class is fitted through a gaussian mixture model, and that is, at least one annotation component proportion is obtained.

S5013, the first device proportionally combines the sample image, the label category and the at least one label component to obtain a sample to be detected.

In the embodiment of the application, after the first device obtains the sample image, the annotation category and the at least one annotation component proportion, the sample image, the annotation category and the at least one annotation component proportion are combined into the sample to be detected.

It can be understood that the first device fits the component proportion of the sample image with respect to each preset category through a plurality of gaussian distributions, so that the process of labeling the sample image with fine granularity is avoided, and the feasibility of acquiring the labeling information of the sample image can be improved. In addition, the difficulty of labeling the sample is reduced, and the robustness of labeling noise is improved.

With continued reference to fig. 6, in the present embodiment, S503 may be implemented by S5031-S5035; that is, the first device obtains the difference between at least one predicted component proportion and at least one labeled component proportion based on the label category to obtain the mixed gaussian distribution loss, including S5031-S5035, and the following steps are respectively explained.

S5031, the first device obtains a target predicted component proportion corresponding to the label category from the at least one predicted component proportion.

In the embodiment of the application, since the labeled category is a category in at least one preset category, at least one prediction component proportion corresponds to at least one preset category one to one; therefore, when the first device acquires the Gaussian mixture loss, the first device can acquire the predicted component ratio corresponding to the labeling category from at least one predicted classification ratio, and accordingly acquire the target predicted component ratio.

S5032, the first device obtains a target annotated component proportion corresponding to the annotated category from the at least one annotated component proportion.

It should be noted that, because the labeling category is a category in at least one preset category, at least one labeling component proportion corresponds to at least one preset category one to one; therefore, the first device can obtain the labeling component ratio corresponding to the labeling category from at least one labeling classification ratio, and also obtain the target labeling component ratio.

S5033, the first device calculates the product result of the target prediction component proportion and the target labeling component proportion.

In the embodiment of the application, after obtaining the target prediction component proportion and the target prediction component proportion, the first device fuses the target prediction component proportion and the target labeling component proportion, the fusion mode can be that the target prediction component proportion and the target labeling component proportion are multiplied, and here, the multiplication result refers to the result of multiplying the target prediction component proportion and the target labeling component proportion; in addition, the target prediction component proportion and the target labeling component proportion can be superposed, and the like.

S5034 the first device calculates a weighted sum of the at least one predicted component proportion and the at least one annotated component proportion.

In this embodiment, after the first device obtains at least one ratio of the predicted components and at least one ratio of the labeled components, since the at least one ratio of the predicted components and the at least one ratio of the labeled components are in one-to-one correspondence, the at least one ratio of the predicted components and the at least one ratio of the labeled components are multiplied and then superimposed, so as to obtain a weighted sum result.

S5035, the first device calculates a ratio of the multiplication result to the weighted summation result, and obtains a difference between the at least one predicted component proportion and the at least one annotated component proportion, thereby obtaining a mixed gaussian distribution loss.

It should be noted that, after the first device obtains the multiplication result and the weighted summation result, the first device calculates the ratio by taking the multiplication result as a numerator and the weighted summation result as a denominator, and thus obtains the gaussian mixture loss.

Illustratively, the Gaussian mixture loss is calculated as shown in equation (2):

wherein L is ₁ For Gaussian mixture loss, N is the number of sample images, i is the order of the ith sample image, K is the number of at least one preset category, j is the order of the jth preset category, and x _i For the ith sample image, z _i For the annotation class of the ith sample image,

for labeling class z _i The mean of the corresponding gaussian distributions,

for labeling class z _i Variance of corresponding Gaussian distribution, 1 (z) _i J) and p (z) _i ) Are all label category z _i Corresponding real component ratio, p (j | x) _i ) And

are all label categories z _i Corresponding toMeasured component ratio, N (x) _i ，μ _j And Σ j) is the predicted component proportion corresponding to each preset category, and p (j) is the real (or labeled) component proportion of each preset category.

It can be understood that the general classification task uses the cross entropy as a loss function, and the characteristic of the cross entropy is that the closer the intra-class distance is, the better, the farther the inter-class distance is, the better, and therefore, no distinction is made between intra-class images; in the embodiment of the application, the detection results of the images can be distinguished in the class by acquiring the Gaussian mixture distribution loss.

With continued reference to fig. 6, in the present embodiment, S504 may be implemented by S5041-S5045; that is, the first device obtains the difference between at least one predicted component proportion and at least one annotated component proportion based on at least one preset category score, obtains the mean square error loss, including S5041-S5045, and the distribution of each step is explained below.

S5041, the first equipment performs weighted summation on the ratio of the at least one preset category score to the at least one prediction component to obtain a prediction score.

In the embodiment of the application, at least one preset category score is preset in the first device, so that the at least one preset category score and the at least one prediction component are subjected to weighted summation, and an obtained weighted summation result is a prediction score.

Illustratively, the prediction score is obtained by calculation of equation (3):

wherein, U _i For the predicted score, s (j) is a preset category score corresponding to the jth preset category in the at least one preset category score.

S5042, the first equipment performs weighted summation on the ratio of the at least one preset category score to the at least one annotation component to obtain an annotation score.

In the embodiment of the application, at least one preset category score is preset in the first device, so that the at least one preset category score and the at least one labeled component are subjected to weighted summation, and an obtained weighted summation result is a labeled score.

Illustratively, the annotation score can be obtained by equation (4):

wherein, V _i For labeling the score, s (j) is a preset category score corresponding to the jth preset category in the at least one preset category score.

S5043, the first device obtains the difference between the annotation score and the prediction score to obtain an initial difference.

In the embodiment of the application, after the first device obtains the annotation score and the prediction score, the difference between the annotation score and the prediction score is obtained, and then the initial difference is obtained; for example, the initial difference can be obtained by calculating the mean square error between the annotated score and the predicted score.

S5044, the first device obtains the current difference, and calculates a target difference between the initial difference and the current difference.

In the embodiment of the present application, the first device is further provided with a loss parameter, that is, a current difference, which is a parameter that can be learned in a training process, and is used to ensure that a mean square error loss obtained each time is within a certain range, so as to improve the stability of the training process.

Here, the first device may obtain the target difference by calculating a difference between the initial difference and the current difference.

S5045, the first device determines a minimum difference between the target difference and a preset difference as a mean square error loss.

In the embodiment of the application, after obtaining the target difference and the preset difference, the first device selects a minimum value from the target difference and the preset difference as a final mean square error loss; namely, the minimum difference between the target difference and the preset difference is determined as the mean square error loss.

Illustratively, the mean square error loss L ₂ Can be obtained by the formula (5):

wherein margin is the current difference, and 0 is the preset difference.

It can be understood that the combination loss is restricted by increasing the mean square error loss, so that the prediction score can float in a certain range, and the stability of the training effect can be ensured in the training process.

In the embodiment of the present application, S505 is followed by S506 and S507; that is, after the first device determines that the initial detection model after the iterative training is the preset detection model, the image detection method further includes S506 and S507, which are described below.

S506, the first equipment obtains a new sample to be detected.

In the embodiment of the application, when the preset detection model needs to be optimized, the first device obtains a new sample to be detected, so that the preset detection model is optimized according to the new sample to be detected.

And S507, optimizing the preset detection model by the first equipment by using the new sample to be detected to obtain the optimized preset detection model.

In the embodiment of the application, after the first device obtains a new sample to be detected, the preset detection model can be optimized based on the new sample to be detected, so that regression detection of the image is performed by using the optimized preset detection model. Here, the optimization process is similar to the process of training to obtain the preset detection model, and the embodiment of the present application is not described herein again. Correspondingly, at the moment, the detection score of the image to be detected is obtained by using the optimized preset detection model.

It can be understood that, first equipment is through constantly optimizing predetermineeing the detection model, can promote the generalization ability of predetermineeing the detection model after optimizing, and then when realizing the regression detection of image based on predetermineeing the detection model after optimizing, can further promote the degree of accuracy that the image detected.

In the embodiment of the present application, S508 to S511 are further included after S505, and the following steps are respectively described.

And S508, acquiring the image to be detected by the second equipment.

In the embodiment of the application, when the second device performs image detection, the target image is detected, namely, an image to be detected; thus, at this time, the second device also acquires the image to be detected.

It should be noted that image detection refers to processing for detecting an image, for example, detecting the quality of the image, or detecting the degree of shielding of a face in an area where the face is located in the image; in addition, the image to be detected is an image finally detected by the second device, for example, when the image is detected as the mask shielding degree, the image to be detected refers to the area where the face is located; when the image detection is the image quality detection, the image to be detected is the image to be subjected to the image quality detection.

S509, the second device extracts the to-be-processed features of the to-be-detected image by using a preset detection model, and determines at least one component proportion corresponding to at least one preset category based on the to-be-processed features.

In the embodiment of the application, a preset detection model is deployed in the second device, and the second device further includes at least one preset category corresponding to the preset detection model, where the preset detection model is used to predict at least one component proportion of the image to be detected corresponding to the at least one preset category; therefore, after the second device obtains the image to be detected, the image to be detected is input into the preset detection model, the feature of the image to be detected is extracted through the preset detection model, the feature to be processed is also obtained, the feature to be processed is continuously predicted through the preset detection model, and at the moment, at least one component proportion corresponding to at least one preset category is also output from the preset detection model aiming at the image to be detected. Here, the at least one preset category corresponds to the at least one component proportion one to one

It should be noted that, at least one preset category is a category associated with a detection item of image detection, for example, when the detection item of image detection is a degree of mask blocking on a face, the at least one preset category is: the mask covers the nose, the mouth and the chin, and is not covered and not covered by the mask; and when the detection item of the image detection is the image quality, at least one preset category is as follows: very good (no image quality degradation at all), good (no image quality degradation visible but no viewing impaired), general (clearly image quality degradation slightly impaired viewing), poor (impaired viewing) and very poor (severely impaired viewing)

The preset detection model is obtained by combining mixed Gaussian distribution loss and mean square error loss training, and at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to at least one preset category. Here, the mixed gaussian distribution loss refers to a distance between a true value and a predicted value obtained through at least one gaussian distribution corresponding to at least one preset category, that is, a value of a loss function obtained by using a calculation manner of the distance between the gaussian distributions; such as a loss value obtained using a maximally spaced gaussian loss function. The mean square error loss refers to difference information between a true value and a predicted value obtained through at least one Gaussian distribution corresponding to at least one preset category, namely, a value of a loss function obtained by utilizing a calculation mode of a difference value; such as a loss value obtained using the mean square variance. In addition, each preset category is a gaussian distribution, the obtained at least one component proportion is the position or confidence of the image in the gaussian distribution corresponding to each preset category, and at least one preset category corresponds to at least one gaussian distribution one to one.

S510, the second equipment performs weighted summation on the ratio of at least one preset category score to at least one component to obtain a detection score.

In the embodiment of the application, at least one preset category group corresponding to at least one preset category is preset in the second device; therefore, when the second device combines at least one component proportion, the second device can perform weighted summation on at least one preset category score and at least one component proportion, and the obtained weighted summation result is the detection score.

And S511, the second equipment determines a target detection result of the image to be detected based on the detection score.

In the embodiment of the application, after the second device obtains the detection score, the obtained detection score is compared with a threshold preset in the second device to determine a target detection result of the image to be detected.

It should be noted that, there may be a plurality of threshold values preset in the second device for comparing with the detection score, or one threshold value may be preset, and the target detection result is determined according to the preset threshold value and the size of the detection score; that is to say, the detection score is the score of the image to be detected for the detection item, and the target detection result is a detection category determined based on the score of the image to be detected for the detection item, where the detection category may be related to or unrelated to at least one preset category; alternatively, the target detection result is processing information determined based on the score of the image to be detected for the detection item, such as: allow passage, face brushing failure, or request re-upload with lower image quality, etc.

The method comprises the steps that a preset detection model is obtained by combining loss training of Gaussian mixture loss and mean square error loss, so that the to-be-processed characteristics of an image to be detected can be extracted by using the preset detection model, component proportions corresponding to preset categories are predicted based on the to-be-processed characteristics, and a target detection result can be determined based on the detection scores of the component proportions; therefore, the target detection result of the image to be detected is determined by a numerical value, and the degree of the image to be detected corresponding to the detection item can be represented; therefore, the fineness of the image detection result is improved, and the fineness of the image detection can be improved. In addition, the target detection result of the determined image to be detected is determined by one numerical value, so that all categories are included, when one category is newly added, training does not need to be carried out on the newly added category, the image detection effect is realized once and for all, and the applicability of image detection is high.

In the embodiment of the present application, S511 may be implemented by S5111 and S5112; that is, the second device determines the target detection result of the image to be detected, including S5111 and S5112, based on the detection score, and the steps are explained separately below.

S5111, the second equipment compares the detection score with at least one preset result threshold value to obtain a comparison result.

In the embodiment of the application, at least one preset result threshold is preset in the second device and used for determining an image detection result; therefore, after the second device obtains the detection score, the detection score is compared with at least one preset result threshold value, and a comparison result is obtained, so that the detection result of the image to be detected for the detection item is determined according to the comparison result.

And S5112, the second equipment determines a target detection result of the image to be detected according to the comparison result.

It should be noted that the second device is preset with a detection result corresponding to the comparison result, so that when the second device obtains the comparison result, the target detection result of the image to be detected can be determined. For example, if the comparison result is that the detection score 0.12 is greater than one threshold 0 of at least one preset result threshold and is less than the other threshold 0.2, it can be determined that the image to be detected meets the target detection result of the mask wearing standard.

Exemplarily, referring to fig. 7, fig. 7 is a schematic diagram of at least one preset result threshold provided by an embodiment of the present application; as shown in fig. 7, when the degree of the mask covering the face in the image is detected, the at least one preset result threshold includes 5 thresholds, which are respectively a threshold 7-11:1, a threshold 7-12:0.8, a threshold 7-13:0.5, a threshold 7-14:0.2, and a threshold 7-15:0, and sequentially correspond to the categories corresponding to the images 7-21 to 7-25, respectively; when the user needing to detect that the mouth of the blocking part is not allowed to enter the building due to business needs, a threshold value of 7-16:0.4 is added, and the corresponding type of the corresponding image 7-26 is only needed; the newly-added class retraining model is not needed, the task requirements of different mask judgment can be met only by adjusting the threshold value, the generalization capability of the preset detection model is strong, and the application range is wide.

In the present embodiment, S508 may be implemented by S5081 and S5082; that is, the second apparatus acquires an image to be detected, including S5081 and S5082, and the respective steps are explained below.

And S5081, the second equipment acquires a detection instruction through an image detection interface.

In the embodiment of the application, when the second device acquires an image corresponding to a service requirement through the image detection interface (for example, an image including an area where a human face is located), or receives an operation of a detection control acting on the image detection interface (for example, when a picture is uploaded through the picture uploading button); the second device also acquires the detection instruction through the image detection interface.

The detection control refers to a component which is displayed on the interface and can trigger image detection through operation, such as a button, a link, an input box, a tab, an icon or a selection box; the detection operation is an operation for triggering image detection, for example, an operation of detecting a preset object (e.g., a human face), clicking, double-clicking, long-pressing, or sliding.

And S5082, the second equipment responds to the detection instruction and acquires an image to be detected.

In the embodiment of the application, after the second device obtains the detection operation, the second device responds to the detection operation to acquire the image to be detected including the face region by acquiring or acquiring the upload image and the like.

Referring to fig. 8, fig. 8 is a schematic diagram of an exemplary acquiring of an image to be detected according to an embodiment of the present disclosure; as shown in fig. 8, when an image is acquired by a camera through an image acquisition interface 8-1 (image detection interface) in real time, if the acquired image 8-2 includes a region where a human face is located, the region where the human face is located is extracted from the acquired image, and an image to be detected 8-3 is obtained.

Correspondingly, in the embodiment of the present application, S512 is further included after S511; that is, after the second device determines the target detection result of the image to be detected based on the detection score of the ratio of at least one component, the image detection method further includes S512, which will be described below.

And S512, displaying the target detection result on the detection result interface by the second equipment.

It should be noted that, after the second device obtains the target detection result of the image to be detected, in order to complete the response to the detection instruction, the target detection result is displayed on the detection result interface; as shown in fig. 9, the image 9-11 and the target detection result 9-12 are displayed on the detection result interface 9-1: allowing passage.

In addition, the target detection result can also be other corresponding processing prompt messages, such as permission to pass, access control opening failure, face brushing failure and the like. It is to be understood that S5081, S5082 and S512 describe a scene in which acquisition including an image to be detected is performed by one apparatus and image detection is performed.

Similarly, when the main body of execution of the embodiment of the present application is the terminal, S5081, S5082, and S512 are executed by the terminal.

In the present embodiment, S508 may be implemented by S5083 and S5084; that is, the second apparatus acquires an image to be detected, including S5083 and S5084, and the respective steps are explained below.

S5083, the second device receives the detection request sent by the client device.

In the embodiment of the application, after the client device acquires the instruction through the interface, the client device responds to the acquired instruction, and a detection request is generated; when the client device sends a detection request to the second device, the second device also receives the detection request.

And S5084, the second equipment responds to the detection request and acquires an image to be detected.

It should be noted that, after receiving the detection request, if the detection request carries an image, the second device responds to the detection request, acquires the image from the detection request, and further acquires the image to be detected according to the acquired image; and if the detection request carries the identifier of the image to be detected, acquiring the image to be detected according to the identifier of the image to be detected.

Referring to fig. 10, fig. 10 is a schematic diagram of another exemplary method for acquiring an image to be detected according to an embodiment of the present disclosure; as shown in fig. 10, a "face-brushing payment" button 10-12 is displayed on a payment interface 10-11 of a terminal 10-1 (client device), when the "face-brushing payment" button 10-12 is clicked, an image 10-13 including a face region is collected on the payment interface 10-11, the collected image is carried in a detection request and is sent to a server 10-2 (second device), and the server 10-2 receives the detection request, acquires the image 10-13 from the detection request, and performs mask processing on the face region of the image 10-13, so as to obtain an image to be detected 10-21.

Correspondingly, in the embodiment of the present application, S511 is followed by S513; that is, after the second device determines the target detection result of the image to be detected based on the detection score of the at least one component ratio, the image detection method further includes S513, which is explained below.

And S513, the second device sends the target detection result to the client device so as to display the target detection result on a display interface of the client device.

It should be noted that, after the second device obtains the target detection result of the image to be detected, in order to complete the response to the target detection request, the target detection is sent to the client device, so that the target detection result is displayed on the display interface of the client device; as shown in fig. 11, the image to be detected 11-11 and the target detection result 11-12 are displayed on the payment interface 11-1: too much face area shielding results in failure of payment.

It is to be understood that S5083, S5084, and S513 describe a scenario in which the acquisition of an image to be detected is performed by one apparatus and regression detection is performed by another apparatus.

In the embodiment of the present application, the second apparatus in S508, S5082, and S5084 acquires an image to be detected, including S5085 to S5087, and each step is described below.

And S5085, the second equipment acquires an initial image.

In the embodiment of the application, when the second device triggers image detection, the obtained image is an initial image, for example, an image including a region where a face is located is acquired in real time through a camera, and for example, an uploaded image including the region where the face is located is obtained.

And S5086, the second equipment detects the face area of the initial image by using the preset face key point information.

In the embodiment of the application, after the second device obtains the initial image including the face region, the face region of the initial image is detected by using the preset face key point information

And S5087, determining the human face region intercepted from the initial image as the image to be detected by the second equipment.

In the embodiment of the application, after the second device obtains the face region of the initial image, the detected face region is intercepted from the initial image, and then the image to be detected is obtained.

It should be noted that S5085-S5087 describe an application scenario for detecting a face occlusion degree of an image to be detected, where at least one corresponding preset category is at least one face occlusion category.

It can be understood that when the face shielding degree of the image is detected, the image corresponding to the face region is used as the image to be detected, on one hand, the influence of other information irrelevant to the face region on the detection result is avoided, and the accuracy of image detection is improved; on the other hand, the image detection is concentrated on the human face area, so that the calculated amount of the image detection is reduced, and the image detection effect is improved.

Illustratively, in an exemplary process of training an initial detection model, a test result of testing a preset detection model by using a fine-grained test set is shown in table 1:

TABLE 1

As is apparent from table 1, the regression detection model is a network model for detecting mask occlusion trained using the root mean square loss function, and has a probability of 1.07% of false passage of Q1 and Q2 when passing 6.29% of Q3, Q4, and Q5, 0% of false passage of Q1 and Q2 when passing 0.31% of Q4 and Q5, 5.83% of false passage of Q1 and Q2 when passing 10.89% of Q3, and 0.83% of false passage of Q1 when passing 5.20% of Q2.

The classification network model is a network model for detecting mask occlusion obtained by training using a cross entropy loss function, and as is apparent from table 1, the probability of the classification network model passing Q1 and Q2 by mistake is 1.53% when passing Q3, Q4, and Q5 of 5.06%, the probability of the classification network model passing Q1 and Q2 by mistake is 0.15% when passing Q4 and Q5 of 0.15%, the probability of the classification network model passing Q1 and Q2 by mistake is 4.75% when passing Q3 of 11.96%, and the probability of the classification network model passing Q1 by mistake is 0.62% when passing Q2 of 1.87%.

The preset detection model is a network model for detecting mask occlusion, which is trained by combining a gaussian mixture loss function and a mean square error loss function, and as is apparent from table 1, the probability of the preset detection model passing Q1 and Q2 by mistake is 1.07% when passing 3.37% of Q3, Q4 and Q5, the probability of the preset detection model passing Q1 and Q2 by mistake is 0% when passing 0% of Q4 and Q5, the probability of the preset detection model passing Q1 and Q2 by mistake is 3.37% when passing 8.59% of Q3, and the probability of the preset detection model passing Q1 by mistake is 0.21% when passing 1.66% of Q2.

In summary, it is easy to know that the false passing rate of the preset detection model in the embodiment of the present application is the lowest compared with the regression detection model and the classification network model; therefore, the improvement of the detection effect of the preset detection model in the embodiment of the application is verified.

The following description is continued to describe the test result of the preset detection model provided in the embodiment of the present application; referring to fig. 12, when the regression detection model is used to detect a test sample, the horizontal axis represents the result score; it is easy to know that in the test result 12-1, although the result scores of the obtained test samples are continuous, the discrimination effect between the positive sample 12-2 and the negative sample 12-3 is poor; moreover, the regression detection model cannot learn an accurate score distribution only under the weak supervision label of the label type.

Referring to fig. 13, when the test sample is tested using the above-described classification test model, the horizontal axis represents the result score; it is easy to know that, in the test result 13-1, the result scores of the positive sample 13-2 are more concentrated around 1, and the result scores of the negative sample 13-3 are more concentrated around 0, i.e., the result scores are too extreme by using the classification detection model, and the occlusion degree cannot be determined precisely in a continuous process such as mask occlusion.

Referring to fig. 14, when a test sample is tested by using the preset test model provided in the embodiment of the present application, the horizontal axis represents a result score; as the Gaussian mixture loss and the mean square error loss adopted by the embodiment of the application can adaptively learn data distribution, in the test result 14-1, the output result scores of the positive sample 14-2 and the negative sample 14-3 have continuity and are robust to noise; because the continuity of the output result score is better, the requirement on the shielding degree can be flexibly configured according to the specific task.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

Referring to fig. 15, fig. 15 is a schematic diagram illustrating a training and prediction process of an exemplary pre-set detection model provided in an embodiment of the present application; as shown in fig. 15, the method comprises the following steps:

s1501, starting; namely, the server (the first device and the second device) starts the training of the initial detection model and the prediction process of the preset detection model.

S1502, marking coarse granularity; namely, the server carries out coarse-grained marking on the shielding gears of the image according to five shielding categories (at least one preset category) to obtain a marking category.

S1503, picking up a face; namely, the server uses the face key points (preset face key point information) to scratch the face in the image, so as to obtain a sample image.

S1504, training samples; namely, the server trains the sample according to the annotation category and the sample image to obtain five annotation probabilities (at least one annotation component ratio) corresponding to the five occlusion categories respectively.

S1505, model prediction; that is, the server inputs the sample image into an initial detection model (e.g., MRNet) to obtain five prediction probabilities (at least one prediction component ratio) corresponding to the five occlusion classes, respectively. As shown in fig. 16, the five occlusion categories include category 16-11 to category 16-15, the sample image 16-2 is input into the initial detection model 16-3, and the probability values 16-41 to 16-45 corresponding to the categories are predicted through the convolutional layer 16-31 and the full link layer 16-32.

S1506, calculating a loss value corresponding to the maximum interval Gaussian mixture loss function; namely, the server obtains a loss value (Gaussian mixture distribution loss) corresponding to the maximum interval Gaussian mixture loss function according to the labeling category, the five labeling probabilities, the five prediction probabilities and the five fixed scores (at least one preset category score).

S1507, calculating a loss value corresponding to the mean square error loss function; namely, the server obtains the loss value (mean square error loss) corresponding to the mean square error loss function according to the five labeling probabilities, the five prediction probabilities and the five fixed scores.

S1508, obtaining a sum of loss values; that is, the server calculates the sum (combined loss) of the loss value corresponding to the maximum interval mixture gaussian loss function and the loss value corresponding to the mean square error loss function.

S1509, iterative training; namely, the server performs iterative training on the initial detection model by using the loss value to obtain a preset detection model.

S1510, detecting an image; after the server finishes iterative training, regression detection is carried out on the face image (image to be detected) by using a preset detection model.

And S1511, outputting the mask shielding score (detection score). Here, as shown in fig. 17, five occlusion categories include category 17-11 to category 17-15, the face image 17-2 is input into the preset detection model 17-3, and the probability values 17-41 to 17-45 (at least one component ratio) corresponding to the categories are predicted through the convolutional layer 17-31 and the full link layer 17-32; weighted summation is carried out on the probability values respectively corresponding to the categories and the fixed scores (the score is from 17-51 to 17-55) respectively corresponding to the categories, and then the mask shielding score is output to be 17-6; namely, mask occlusion score 17-6 is 17-41, score 17-51+ probability value 17-42, score 17-52+ probability value 17-43, score 17-53+ probability value 17-44, score 17-54+ probability value 17-45, score 17-55.

S1512, ending; namely, the server finishes the training of the initial detection model and the prediction process of the preset detection model.

It should be noted that S1501 to S1512 describe an application scenario of mask occlusion detection; for example, in a face-brushing payment scene, if the output mask shielding part indicates that the face shielding degree is high, the user is prompted to go to other modes of code scanning payment and the like for payment; if the output mask occlusion part indicates that the face occlusion degree is low, the user is allowed to perform face brushing payment when the mask is pulled down to a proper position. So, balanced brush face payment and payment safety.

For another example, in a scene in which the normative wearing of the mask is detected by using a face brushing access control, if the output mask blocking score indicates that the degree of blocking of the face by the mask is high, the pass is allowed, as shown in fig. 9; and if the output mask occlusion part indicates that the degree of occlusion of the face by the mask is low, the passing is not allowed, as shown in fig. 18, an image 18-11 and a target detection result 18-12 are displayed on a detection result interface 18-1: the passage is not allowed.

For example, in an intelligent community scene, if the output mask shielding part indicates that the mask has a high degree of shielding the face, displaying the mask; and if the output mask shielding part indicates that the degree of shielding the face by the mask is low, performing face identification and filing and displaying.

Referring to fig. 19, fig. 19 is a schematic diagram illustrating a training and prediction process of another exemplary preset detection model provided in the embodiment of the present application; as shown in fig. 19, the method comprises the following steps:

s1901, starting; that is, the server (the first device and the second device) starts the training of the initial detection model and the prediction process of the preset detection model.

S1902, marking coarse granularity; namely, the server performs coarse-grained marking on the quality gear of the image according to five quality categories (at least one preset category) to obtain a marking category.

S1903, training samples; namely, the server trains the sample according to the labeling categories and the sample image to obtain five labeling probabilities (at least one labeling component ratio) respectively corresponding to the five quality categories.

S1904, model prediction; that is, the server inputs the sample image into an initial detection model (e.g., MRNet) to obtain five prediction probabilities (at least one prediction component ratio) corresponding to the five quality classes, respectively.

S1905, calculating a loss value corresponding to the maximum interval Gaussian mixture loss function; namely, the server obtains a loss value (Gaussian mixture distribution loss) corresponding to the maximum interval Gaussian mixture loss function according to the labeling category, the five labeling probabilities, the five prediction probabilities and the five fixed scores (at least one preset category score).

S1906, calculating a loss value corresponding to the mean square error loss function; namely, the server obtains the loss value (mean square error loss) corresponding to the mean square error loss function according to the five labeling probabilities, the five prediction probabilities and the five fixed scores.

S1907, obtaining a sum of loss values; that is, the server calculates the sum (combined loss) of the loss value corresponding to the maximum interval mixture gaussian loss function and the loss value corresponding to the mean square error loss function.

S1908, iterative training; namely, the server performs iterative training on the initial detection model by using the loss value to obtain a preset detection model.

S1909, image detection; after the server finishes iterative training, regression detection is carried out on the image (to-be-detected image) by using a preset detection model.

S1910, the image quality score (detection score) is output.

S1911, ending; namely, the server finishes the training of the initial detection model and the prediction process of the preset detection model. It is readily appreciated that S1901 to S1911 describe application scenarios of image quality detection

Continuing with the exemplary structure of the first image detection device 255 implemented as software modules provided in the embodiments of the present application, in some embodiments, as shown in fig. 4a, the software modules stored in the first image detection device 255 of the first memory 250 may include:

a sample obtaining module 2551, configured to obtain a sample to be detected, where the sample to be detected includes a sample image, an annotation category, and at least one annotation component proportion corresponding to at least one preset category;

a prediction module 2552, configured to perform image detection on the sample image based on an initial detection model to obtain at least one predicted component proportion corresponding to the at least one preset category;

a loss obtaining module 2553, configured to obtain, based on the labeled category, a difference between the at least one predicted component proportion and the at least one labeled component proportion, so as to obtain a mixed gaussian distribution loss;

the loss obtaining module 2553 is further configured to obtain a difference between the at least one predicted component proportion and the at least one labeled component proportion based on at least one preset category score, so as to obtain a mean square error loss;

a model training module 2554, configured to perform iterative training on the initial detection model by using a combined loss of the gaussian distribution loss and the mean square error loss, until a training cutoff condition is met, determine that the initial detection model after the iterative training is a preset detection model; the preset detection model is used for obtaining the detection value of the image to be detected.

In this embodiment of the application, the sample obtaining module 2551 is further configured to obtain the sample image and the annotation category; fitting the sample image and the at least one labeled component ratio corresponding to the at least one preset category by using the sample image and the labeled category; and proportionally combining the sample image, the labeling category and the at least one labeling component to obtain the sample to be detected.

In this embodiment of the application, the loss obtaining module 2553 is further configured to obtain a target predicted component proportion corresponding to the labeling category from the at least one predicted component proportion; acquiring a target labeling component proportion corresponding to the labeling category from the at least one labeling component proportion; calculating a product result of the target prediction component proportion and the target labeling component proportion; calculating a weighted summation of the at least one predicted component proportion and the at least one annotated component proportion; and calculating the ratio of the multiplication result to the weighted summation result, and finishing the acquisition of the difference between the at least one prediction component ratio and the at least one labeling component ratio, thereby obtaining the mixed Gaussian distribution loss.

In this embodiment of the application, the loss obtaining module 2553 is further configured to perform weighted summation on the ratio between the at least one preset category score and the at least one prediction component, so as to obtain a prediction score; weighting and summing the ratio of the at least one preset category score to the at least one labeled component to obtain a labeled score; acquiring the difference between the annotation score and the prediction score to obtain an initial difference; obtaining a current difference, and calculating a target difference between the initial difference and the current difference; and determining the minimum difference between the target difference and a preset difference as the mean square error loss.

In this embodiment of the application, the first image detection apparatus 255 further includes a model optimization module 2555, configured to obtain a new sample to be detected; and optimizing the preset detection model by using the new sample to be detected to obtain the optimized preset detection model.

Continuing with the exemplary structure of second image detection apparatus 655 as implemented as a software module provided by the embodiments of the present application, in some embodiments, as shown in fig. 4b, the software module stored in second image detection apparatus 655 of second memory 650 may include:

an image obtaining module 6551, configured to obtain an image to be detected;

an image detection module 6552, configured to extract a feature to be processed of the image to be detected by using the preset detection model, and determine at least one component proportion corresponding to at least one preset category based on the feature to be processed; wherein the at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to the at least one preset category;

and the result determining module 6553 is configured to perform weighted summation on at least one preset category score and the at least one component ratio to obtain a detection score, and determine a target detection result of the image to be detected based on the detection score.

In this embodiment of the application, the result determining module 6553 is further configured to compare the detection score with at least one preset result threshold to obtain a comparison result; and determining the target detection result of the image to be detected according to the comparison result.

In this embodiment of the application, the image obtaining module 6551 is further configured to obtain a detection instruction through an image detection interface; responding to the detection instruction, and acquiring the image to be detected;

in this embodiment, the second image detection apparatus 655 further includes a result processing module 6554, configured to display the target detection result on a detection result interface.

In this embodiment of the application, the image obtaining module 6551 is further configured to receive a detection request sent by a client device; responding to the detection request, and acquiring the image to be detected;

in this embodiment, the result processing module 6554 is further configured to send the target detection result to the client device, so as to display the target detection result on a display interface of the client device.

In this embodiment of the present application, the image obtaining module 6551 is further configured to obtain an initial image; detecting a face region of the initial image by using preset face key point information; and determining the face region intercepted from the initial image as the image to be detected.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. A first processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the image detection method applied to the first device in the embodiment of the present application; for example, an image detection method as shown in fig. 5. The second processor of the computer device reads the computer instructions from the computer-readable storage medium, and the second processor executes the computer instructions, so that the computer device executes the image detection method applied to the second device in the embodiment of the present application.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the present application, the preset detection model is obtained by combining two loss training parts, namely, gaussian mixture loss and mean square error loss, so that after the to-be-processed features of the to-be-detected image are extracted by using the preset detection model and the component ratios corresponding to the preset categories are predicted based on the to-be-processed features, the target detection result can be determined based on the detection scores of the component ratios; therefore, the target detection result of the determined image to be detected is determined by a numerical value, and the degree of the image to be detected corresponding to the detection item can be represented; therefore, the fineness of the image detection result is high, and thus, the fineness of the image detection can be improved.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An image detection method, comprising:

obtaining a sample to be detected, wherein the sample to be detected comprises a sample image, an annotation category and at least one annotation component proportion corresponding to at least one preset category, the at least one annotation component proportion is obtained by training according to the sample image and the annotation category, and the at least one annotation component proportion is a real component proportion corresponding to each preset category in the at least one preset category;

performing image detection on the sample image based on an initial detection model to obtain at least one predicted component ratio corresponding to the at least one preset category, wherein the at least one predicted component ratio is a component ratio corresponding to each preset category in the at least one preset category predicted for the sample image;

2. The method according to claim 1, wherein the obtaining a sample to be tested comprises:

acquiring the sample image and the annotation category;

fitting the sample image and the at least one labeled component ratio corresponding to the at least one preset category by using the sample image and the labeled category;

and combining the sample image, the labeling type and the at least one labeling component in proportion to obtain the sample to be detected.

3. The method according to claim 1 or 2, wherein the obtaining a difference between the at least one predicted component proportion and the at least one labeled component proportion based on the labeling category to obtain a mixed gaussian distribution loss comprises:

obtaining a target prediction component proportion corresponding to the labeling category from the at least one prediction component proportion;

acquiring a target labeling component proportion corresponding to the labeling category from the at least one labeling component proportion;

calculating the product result of the target prediction component proportion and the target labeling component proportion;

calculating a weighted summation of the at least one predicted component proportion and the at least one annotated component proportion;

and calculating the ratio of the product result to the weighted sum result, and obtaining the difference between the at least one prediction component proportion and the at least one labeling component proportion, so as to obtain the Gaussian mixture distribution loss.

4. The method according to claim 1 or 2, wherein said deriving a difference between said at least one predicted component fraction and said at least one annotated component fraction based on at least one preset category score, resulting in a mean square error loss, comprises:

weighting and summing the ratio of the at least one preset category score to the at least one prediction component to obtain a prediction score;

weighting and summing the ratio of the at least one preset category score to the at least one labeled component to obtain a labeled score;

acquiring the difference between the annotation score and the prediction score to obtain an initial difference;

obtaining a current difference, and calculating a target difference between the initial difference and the current difference;

and determining the minimum difference between the target difference and a preset difference as the mean square error loss.

5. The method according to claim 1 or 2, wherein after determining that the iteratively trained initial detection model is a preset detection model, the method further comprises:

acquiring a new sample to be detected;

and optimizing the preset detection model by using the new sample to be detected to obtain the optimized preset detection model.

6. An image detection method, comprising:

acquiring an image to be detected;

extracting the to-be-processed features of the to-be-detected image by using the preset detection model of any one of claims 1 to 5; determining at least one component proportion corresponding to at least one preset category based on the features to be processed; wherein the at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to the at least one preset category;

7. The method according to claim 6, wherein the determining a target detection result of the image to be detected based on the detection score comprises:

comparing the detection score with at least one preset result threshold value to obtain a comparison result;

and determining the target detection result of the image to be detected according to the comparison result.

8. The method of claim 6, wherein the acquiring the image to be detected comprises:

acquiring a detection instruction through an image detection interface;

responding to the detection instruction, and acquiring the image to be detected;

after determining the target detection result of the image to be detected based on the detection score, the method further comprises:

and displaying the target detection result on a detection result interface.

9. The method of claim 6, wherein the acquiring the image to be detected comprises:

receiving a detection request sent by client equipment;

responding to the detection request, and acquiring the image to be detected;

and sending the target detection result to the client equipment so as to display the target detection result on a display interface of the client equipment.

10. The method according to any one of claims 6 to 9, wherein the acquiring of the image to be detected comprises:

acquiring an initial image;

detecting a face region of the initial image by using preset face key point information;

and determining the face region intercepted from the initial image as the image to be detected.

11. A first image detection apparatus, comprising:

the system comprises a sample acquisition module, a comparison module and a comparison module, wherein the sample acquisition module is used for acquiring a sample to be detected, the sample to be detected comprises a sample image, an annotation category and at least one annotation component proportion corresponding to at least one preset category, the at least one annotation component proportion is obtained by training according to the sample image and the annotation category, and the at least one annotation component proportion is a real component proportion corresponding to each preset category in the at least one preset category;

the prediction module is used for carrying out image detection on the sample image based on an initial detection model to obtain at least one prediction component ratio corresponding to the at least one preset category, wherein the at least one prediction component ratio is a component ratio corresponding to each preset category in the at least one preset category predicted by the sample image;

the model training module is used for carrying out iterative training on the initial detection model by utilizing the combined loss of the Gaussian mixture distribution loss and the mean square error loss until a training cut-off condition is met, and determining the initial detection model after the iterative training as a preset detection model; the preset detection model is used for obtaining the detection score of the image to be detected.

12. A second image detection apparatus, comprising:

the image acquisition module is used for acquiring an image to be detected;

an image detection module, configured to extract features to be processed of the image to be detected by using the preset detection model according to any one of claims 1 to 5, and determine at least one component proportion corresponding to at least one preset category based on the features to be processed; wherein the at least one component proportion is at least one confidence coefficient of the image to be detected in at least one Gaussian distribution corresponding to the at least one preset category;

13. A first image detection apparatus characterized by comprising:

a first memory for storing executable instructions;

a first processor for implementing the image detection method of any one of claims 1 to 5 when executing executable instructions stored in the first memory.

14. A second image detection apparatus characterized by comprising:

a second memory for storing executable instructions;

a second processor, configured to implement the image detection method of any one of claims 6 to 10 when executing the executable instructions stored in the second memory.

15. A computer-readable storage medium storing executable instructions for implementing the image detection method of any one of claims 1 to 5 when executed by a first processor; or for implementing the image detection method of any of claims 6 to 10 when executed by the second processor.