CN114140613A

CN114140613A - Image detection method, image detection device, electronic equipment and storage medium

Info

Publication number: CN114140613A
Application number: CN202111491712.4A
Authority: CN
Inventors: 吉梁; 周杰; 黄凯
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-04

Abstract

The method comprises the steps of inputting an image to be detected into a first model and a second model respectively, and acquiring global information output by the first model and a plurality of local information output by the second model, wherein the global information comprises confidence coefficients that a target object exists in the image to be detected, the local information comprises confidence coefficients that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area; and determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the target object is included in the image to be detected. The method and the device can effectively improve the detection accuracy of the target object in the image to be detected.

Description

Image detection method, image detection device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image detection method and apparatus, an electronic device, and a storage medium.

Background

With the development of computer technology, detecting and identifying all interested target objects from an image, and determining object information such as the area position and the object category of the target objects in the image are one of the core problems of computer vision technology.

However, the target object often occupies only a small area in the image, and the background area image except the target object occupies a large area in the image, and the background image in the large area may affect the detection of the target object from the image, so that the accuracy of detecting and identifying the target object from the image is reduced, and the detection accuracy cannot be ensured.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides an image detection method, including:

respectively inputting an image to be detected into a first model and a second model, and acquiring global information output by the first model and a plurality of local information output by the second model, wherein the global information comprises a confidence coefficient that a target object exists in the image to be detected, the local information comprises a confidence coefficient that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area;

and determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the target object is included in the image to be detected.

In a second aspect, the present disclosure provides an image detection apparatus, comprising:

the information acquisition module is used for respectively inputting an image to be detected into a first model and a second model, and acquiring global information output by the first model and a plurality of local information output by the second model, wherein the global information comprises a confidence coefficient that a target object exists in the image to be detected, the local information comprises a confidence coefficient that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area;

and the detection module is used for determining a detection result according to the plurality of local information and the global information, and the detection result represents whether the target object is included in the image to be detected or not.

In a third aspect, the present disclosure provides a computer-readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.

According to the image detection method, the image detection device, the electronic device and the storage medium provided by the disclosure, an image to be detected is respectively input into a first model and a second model, so as to obtain global information output by the first model and a plurality of local information output by the second model, wherein the global information comprises a confidence coefficient that a target object exists in the image to be detected, the local information comprises a confidence coefficient that the target object exists in a local area of the image to be detected, each local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area. And determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the target object is included in the image to be detected, so that whether the target object is included in the image to be detected can be determined by combining the local information and the global information for the target object in the image to be detected, the influence of background images in the image to be detected except the target object on the detection result is avoided, meanwhile, the overall information of the image to be detected is also considered, and the detection accuracy is effectively improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow chart illustrating a method of image detection according to an exemplary embodiment.

FIG. 2 is a schematic diagram illustrating an image under test according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating an image detection method according to another exemplary embodiment.

Fig. 4 is a flowchart illustrating step 220 of the image detection method according to the embodiment of fig. 3.

Fig. 5 is a flowchart illustrating an image detection method according to yet another exemplary embodiment.

Fig. 6 is a schematic flow chart of an image detection method in practical application according to the embodiment shown in fig. 5.

Fig. 7 is a block diagram illustrating an image detection apparatus according to an exemplary embodiment.

Fig. 8 is a schematic structural diagram of an electronic device according to an exemplary embodiment.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

In recent years, deep learning develops rapidly, and has attracted wide attention at home and abroad, and with the continuous progress of deep learning technology and the continuous improvement of data processing capability, more and more deep learning algorithms are used in the fields of image processing and computer vision. Object detection is an important direction of computer vision.

In the related art, a target entity of general target detection is to determine whether the target entity is included in a natural image according to a large number of predefined categories, specifically, during detection, all entities in the natural image need to be detected, and then, whether an entity attribute meets a target requirement is determined, and if so, it is determined that the target entity is included in the image.

However, the detection method is greatly influenced by the performance of the detection device, and missing detection or false detection easily occurs under the condition that the number of entities in the image is large, so that the final detection result is influenced, and the accuracy of the detection result cannot be ensured.

In order to solve the above problems, the present disclosure provides an image detection method, an image detection device, an electronic device, and a storage medium, which can effectively improve the detection accuracy of a target object in an image to be detected.

Fig. 1 is a flow chart illustrating an image detection method according to an exemplary embodiment, which may include the steps of, as shown in fig. 1:

110. and respectively inputting the image to be detected into the first model and the second model, and acquiring global information output by the first model and a plurality of local information output by the second model. The global information comprises a confidence coefficient that a target object exists in an image to be detected, the local information comprises a confidence coefficient that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area.

For example, the subject of the image detection method may be an image detection apparatus including, but not limited to: a computer, a smartphone, a vehicle-mounted terminal, a wearable smart device, or the like, and particularly, the image detecting party may be applied to a processor in an image detecting device. The image detection device can be configured with a trained first model and a trained second model in advance. The trained first model is used for outputting the confidence coefficient that the image to be detected contains the target object according to the input image to be detected, and the trained second model is used for outputting the confidence coefficient that each local area of a plurality of local areas of the image to be detected contains the target object according to the input image to be detected.

Alternatively, the target object may be determined by target attributes configured by the user in the image detection device, for example, the detection object is a shoe in the image, and the configurable target attributes may include color, size, shape, and the like.

For example, as shown in fig. 2, the image a to be detected may be divided into a local area 1, a local area 2, a local area 3, and a local area 4, where each of the local areas is a rectangular frame containing the detection object (shoe).

In some embodiments, when the image to be detected is received or acquired by the image detection device, the image to be detected may be input into the first model and the second model, respectively, so as to obtain an output result of the first model as global information and obtain a plurality of output results output by the second model as a plurality of local information. And each piece of local information in the plurality of pieces of local information is different from the corresponding local area in the image to be detected.

120. And determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the image to be detected comprises the target object.

In some embodiments, the image detection apparatus may perform statistics on the plurality of local information, for example, statistics may obtain a maximum value, a minimum value, an average value, and the like of the plurality of local information (hereinafter, may be referred to as a second confidence), then concatenate the statistical results of the plurality of second confidence with the global information (hereinafter, may be referred to as a first confidence), then input the concatenated concatenation information into a pre-trained fusion model, and output the detection result by using the fusion model. And the fusion model is trained to output a classification result of whether the target object with the target attribute is contained in the image to be detected or not according to the statistical data of the first confidence coefficient and the second confidence coefficient. For example, if the maximum value of the plurality of second confidence levels is a, the minimum value is b, the average value is c, and the first confidence level is k, then the stitching information is (a, b, c, k), and the stitching information (a, b, c, k) is input into the fusion model, so that the detection result output by the fusion model can be obtained.

In other embodiments, the image detection apparatus may determine whether each value in the stitching information is within a corresponding specified range, for example, the maximum value a is within a first specified range, the minimum value b is within a second specified range, the average value c is within a third specified range, the first confidence k is within a fourth specified range, and when all values in the plurality of values in the stitching information are within the corresponding specified range, it is determined that the target object exists in the image to be detected. Otherwise, determining that the target object does not exist in the image to be detected. Optionally, if a number of values exceeding the specified number in the stitching information is within the corresponding specified range, it may be determined that the target object exists in the image to be detected, for example, the image to be detected a includes black shoes.

It can be seen that, in this embodiment, an image to be detected is input to a first model and a second model respectively, so as to obtain global information output by the first model and a plurality of local information output by the second model, where the global information includes a confidence that a target object exists in the image to be detected, and the local information includes a confidence that the target object exists in a local region of the image to be detected, where each of the local information corresponds to one local region in the image to be detected, and at least one detection object exists in each of the local regions. And determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the target object is included in the image to be detected, so that whether the target object is included in the image to be detected can be determined by combining the local information and the global information of the target object in the image to be detected, the global information can reflect the overall situation of the image to be detected, the local information can reflect the situation of each local area in the image to be detected, the influence of background images in the image to be detected except the target object on the detection result can be effectively avoided by using the local information, the omission is prevented, the overall information of the image to be detected is considered, the false detection caused by analyzing the local information only is avoided, and the detection accuracy is improved.

Fig. 3 is a flowchart illustrating an image detection method according to another exemplary embodiment, which may include the steps of, as shown in fig. 3:

210. and respectively inputting the image to be detected into the first model and the second model, and acquiring global information output by the first model and a plurality of local information output by the second model. The global information comprises a confidence coefficient that a target object exists in an image to be detected, the local information comprises a confidence coefficient that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area.

The detailed implementation of step 210 can refer to step 110, and therefore is not described herein.

220. And carrying out statistical processing on the plurality of local information according to a preset rule to obtain statistical information.

In some embodiments, as shown in fig. 4, a specific implementation of step 220 may include:

221. at least one statistical value of a maximum value, a minimum value, a median value, a mean value, a variance, and a quartile of the plurality of local information is calculated.

Illustratively, the output of the second model is, for example, n second confidences as local information, the n confidences being used

Wherein i is 1 … … n. The image detection equipment can respectively count out

The maximum value, the minimum value, the median value, the average value, the variance and the quartile, and various statistical values are obtained.

222. At least one statistical value is determined as statistical information.

Illustratively, the image detection device may be a camera or a video camera

The maximum, minimum and mean values of (2) are determined as statistical information, e.g. maximum value of

Minimum value of

Has an average value of

Then the statistical information may be determined to be

230. And splicing the statistical information and the global information to obtain spliced information.

Taking over the above example, the global information of the output of the first model, for example, may be p^gExpress, to count information

And global information p^gSplicing information can be obtained by splicing

240. And determining a detection result according to the splicing information.

In some embodiments, embodiments of step 240 include: and inputting the splicing information into a third model, and obtaining an output result of the third model as a detection result, wherein the third model is a gradient lifting decision tree model.

For example, the third model is a pre-trained Gradient Boosting Decision Tree (GBDT) fusion model, and the information is spliced

Input deviceAfter the third model is reached, a fusion decision result output by the third model can be obtained, namely whether the image to be detected has a binary classification result of the target object with the target attribute or not can be obtained. Because the GBDT fusion model is an iterative decision tree algorithm, the algorithm is composed of a plurality of decision trees, and the conclusions of all the trees are accumulated to be used as final answers, the model can effectively determine the detection result based on a plurality of pieces of information in the splicing information.

In some embodiments, before the splicing information is input into the third model, it may be further determined whether the length of the splicing information meets a preset length requirement, if so, the splicing information is input into the third model, and if not, the splicing information is re-acquired, and it is determined whether the length of the splicing information meets the preset length requirement. Exemplary, for example, preset length requirements are: the length of the splicing information is not less than 4 bits, if the splicing information is

Determining that the length of the splicing information is 4 bits, and further determining that the length of the splicing information meets the preset length requirement. Optionally, the preset length requirement may also be that the length of the splicing information is equal to a specified length.

In consideration of the fact that the number of the local information output by the second model may not be fixed, which results in that the length of the spliced local information and the global information is not fixed, and the model needs to input a fixed length, which directly affects the use of the spliced information in the model, in this embodiment, the splicing information is obtained by calculating at least one statistical value of a maximum value, a minimum value, a median value, a mean value, a variance and a quartile of the plurality of local information, determining the at least one statistical value as statistical information, and then performing splicing processing on the statistical information and the global information, and finally determining a detection result according to the splicing information, so that the plurality of statistical characteristics and the global information can be effectively extracted from the plurality of local information and spliced into the spliced information with a fixed length, thereby ensuring that the spliced information can be normally used, and improving the detection efficiency.

Fig. 5 shows a flowchart of an image detection method according to yet another exemplary embodiment, which may include the steps of, as shown in fig. 5:

310. and acquiring an image sample comprising the target object, and labeling the specific attribute of the target object in the image sample.

In some embodiments, a batch of conventional image data may be collected in advance, and image data of a target object including a specific attribute may be screened out as an image sample based on a result of detecting the network, and a secondary labeling is performed on the specific attribute of the target object in the image sample, for example, a secondary labeling is performed on a color "black" attribute of a shoe, so that a black shoe in a plurality of shoes is the target object.

320. And training the preset two-classification network based on the labeled specific attributes to obtain a second model.

Optionally, the preset two-class network may be a lightweight two-class network, the lightweight two-class network may be trained by using the secondarily labeled specific attribute, and after the training is completed, a second model may be obtained, where a prediction detection result of the second model includes a probability of a target object having the specific attribute.

Optionally, a cross-entropy loss function may be employed in training the second model. Specifically, the cross entropy loss function may be: CE (P, Q) ═ P (X) logq (X), where for the same discrete event X, Q (X) is the distribution predicted from the model, and P (X) is its true distribution.

Alternatively, the second model may select a yolo, a Single Shot multi box Detector (SSD), or the like detection model as the base network.

The yolo detection model can treat target detection as a regression problem, and each picture is decomposed into a plurality of independent bounding boxes and predicts the probability of the belonged class. Thus, the probability of the bounding box and the class to which it belongs can be directly predicted from all pictures by using a simple neural network with only one evaluation. Since the whole detection process is carried out in a simple network, the detection performance can be directly optimized end to end. SSD is a general target detection algorithm.

In this embodiment, a yolo detection model or an ssd detection model is selected as a basic network, which may be determined according to actual requirements, and is not limited herein.

Optionally, the detection structure of the second model may be trained using an open source data set.

330. And generating at least one initial image according to the marked specific attribute.

In some embodiments, at least one initial image may be automatically generated based on the specific attribute of the secondary label relied on in the training of the second model, and specifically, the specific attribute of the label may be input into a pre-trained image generation model, so as to obtain an initial image output by the image generation model, wherein the image generation model is used for generating a corresponding image according to the attribute information.

For example, the image generation model may generate an image containing a shoe, the secondary attribute information may include the color, size, and the like of the shoe, if the specific attribute information is black, the label with the color information being "black" may be input into the image generation model, and at least one initial image, such as the initial image 1 and the initial image 2, may be output by the image generation model.

340. And determining the marked initial image containing the specific attribute in at least one initial image as a positive sample.

In connection with the above example, after obtaining at least one initial image, it may be detected whether the initial image 1 and the initial image 2 respectively include a label with color information of "black". For example, when an annotation containing "black" in the initial image 1 is detected, the initial image 1 may be determined to be a positive exemplar, and specifically, the image label of the initial image 1 may be set to be a positive exemplar.

350. And determining the marked initial image without the specific attribute in the at least one initial image as a negative sample.

Taking the above example as a continuation, when no label whose color information is "black" is detected in the initial image 2, the initial image 2 may be determined to be a negative example, and specifically, the image label of the initial image 2 may be set to be a negative example.

360. A first model is trained based on the positive and negative examples.

Wherein a cross entropy loss function may be employed in training the first model. Specifically, the cross entropy loss function may be: CE (P, Q) ═ P (x) logq (x).

Optionally, when the first model is trained, ResNet, densnet, etc. may be selected as the basic mechanism of the network.

The ResNet can solve the problem of gradient disappearance due to the existence of layer hopping (one hop for every two layers), so that the accuracy can be improved by increasing the number of network layers. Each layer of DenseNet can get additional input from all layers in front and pass its feature map to all layers in back.

370. And respectively inputting the image to be detected into the first model and the second model, and acquiring global information output by the first model and a plurality of local information output by the second model.

The global information comprises a confidence coefficient that a target object exists in an image to be detected, the local information comprises a confidence coefficient that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area.

In some embodiments, prior to step 370, the method may further comprise the steps of:

closing dropout layers in the first model and the second model.

For example, when the first model and the second model are trained and enter the testing stage, the trained network may be read in the testing stage, the dropout layer is removed, and then three channels of image data are input to the network to predict whether the image has a target object with specific attributes. Since network convergence becomes slow after the dropout layer is added, in the embodiment, the testing efficiency can be improved by removing the dropout layer.

In some embodiments, prior to step 370, the method may further comprise:

the image to be measured is preprocessed, which may include mean processing, variance processing, and the like, and the preprocessing is used to make the distribution area of the image features of the input network uniform.

380. And determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the image to be detected comprises the target object.

In some embodiments, specific embodiments of step 380 may include: local information, which is a target object and exists in a corresponding local area in the plurality of local information, is acquired as target local information, and a detection result is determined according to the target local information and the global information.

For example, if the target object is a shoe, it may be identified whether detection cashing of the presence of the corresponding local area in each piece of local information is a shoe, if so, the local information is used as the target local information, and if not, for example, the detection object is a sock, the local information may be deleted. Then, a detection result is determined according to the target local information and the global information.

In the embodiment, objects except for the target object are filtered, so that the subsequent model can be conveniently detected, and the detection efficiency is improved.

In some embodiments, the local information further includes location information, and before step 380, the method may further include the steps of:

and according to the position information of each piece of local information, filtering the local information in the same local area.

For example, the position information of the local information may be coordinate information of a local area corresponding to the local information, and specifically may be center coordinates of the local area. When the local information located in the same local area is filtered according to the position information of each piece of local information, it may be determined whether center coordinates corresponding to the two pieces of local information are the same, and if so, one piece of local information may be deleted, or it may be determined whether a distance between the center coordinates corresponding to the two pieces of local information exceeds a preset threshold, and if not, one piece of local information may be deleted. For example, if the center coordinates of the local area a are (x1, y1) and the center coordinates of the local area B are (x2, y2), and (x1, y1) is the same as (x2, y2), the local information corresponding to the local area a or the local information corresponding to the local area B may be deleted.

In the embodiment, by deleting the local information of the repeated local area, the repeated local information can be prevented from entering the model, and the detection efficiency is improved.

In practical applications, a flow when detecting an image to be detected may be as shown in fig. 6, where the flow is as follows:

and performing an image input step, specifically, respectively inputting the image to be measured into a first model (also referred to as a global model) and a second model (also referred to as a local model).

In the first model, the first model may be a ResNet50 model, and the image to be measured may output a confidence that the entire image of the image to be measured includes the target object, i.e., a first confidence through the first model.

In the second model, the step of target detection background may be performed first, and in this process, all objects (i.e., detection objects) existing in the image to be detected are detected, so as to obtain each detection object and a local region in the image to be detected corresponding to each detection object. Then, the step of target detection head is entered, in which the object that we need to be interested, i.e. the target object, can be screened out from the detected objects, and the position information of the local regions of the target object and the confidence (i.e. the second confidence) that the target object exists in each local region are output. Specifically, a lightweight classification network may be added after the detection result according to the entity tag of the query, a plurality of detection results belonging to the same category as the specified tag are obtained, and the confidence level output by the network is taken as the second confidence level. Then, an entity filtering step is performed, in the process, a second confidence degree of the local region repetition and a second confidence degree that an object existing in the local region is not the target object can be further filtered. And finally, performing a statistical feature extraction step on a second confidence coefficient corresponding to the local region obtained after filtering to obtain a statistical value of the second confidence coefficient.

After the statistical values of the first confidence coefficient and the second confidence coefficient are obtained, the statistical values of the first confidence coefficient and the second confidence coefficient may be input into the GBDT fusion model to obtain a fusion decision result output by the GBDT fusion model, that is, a result of whether the target object is included in the image to be detected.

Fig. 7 is a block diagram illustrating an image detection apparatus according to an exemplary embodiment, which may include, as shown in fig. 7: information acquisition module and detection module, wherein:

the information acquisition module is used for inputting the image to be detected into the first model and the second model respectively, and acquiring global information output by the first model and a plurality of local information output by the second model, wherein the global information comprises confidence that a target object exists in the image to be detected, the local information comprises confidence that the target object exists in a local area of the image to be detected, each piece of local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area.

And the detection module is used for determining a detection result according to the plurality of local information and the global information, and the detection result represents whether the image to be detected comprises the target object or not.

In some embodiments, a detection module, comprising:

and the statistical information acquisition submodule is used for performing statistical processing on the plurality of local information according to a preset rule to obtain statistical information.

And the splicing submodule is used for splicing the statistical information and the global information to obtain splicing information.

And the detection submodule is used for determining a detection result according to the splicing information.

In some embodiments, the splicing submodule is specifically configured to: at least one statistical value of a maximum value, a minimum value, a median value, a mean value, a variance, and a quartile of the plurality of local information is calculated. At least one statistical value is determined as statistical information.

In some embodiments, the detection submodule is specifically configured to input the splicing information into a third model, and obtain an output result of the third model as a detection result, where the third model is a gradient lifting decision tree model.

In some embodiments, the detection module is further specifically configured to acquire, as the target local information, local information in which a detection object existing in a corresponding local area in the plurality of local information is a target object. And determining a detection result according to the target local information and the global information.

In some embodiments, the local information further includes position information, and the image detection apparatus further includes:

and the filtering module is used for filtering the local information in the same local area according to the position information of each piece of local information.

In some embodiments, the image detection apparatus further includes:

and the second model training module is used for acquiring an image sample comprising the target object and labeling the specific attribute of the target object in the image sample. And training the preset two-classification network based on the labeled specific attributes to obtain a second model.

In some embodiments, the image detection apparatus further includes:

and the first model training module is used for generating at least one initial image according to the marked specific attribute. And determining the marked initial image containing the specific attribute in at least one initial image as a positive sample. And determining the marked initial image without the specific attribute in the at least one initial image as a negative sample. A first model is trained based on the positive and negative examples.

In some embodiments, the image detection apparatus further comprises:

and the closing module is used for closing the dropout layers in the first model and the second model.

Referring now to fig. 8, shown is a schematic diagram of an electronic device (e.g., a terminal device or server) 600 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 8 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: respectively inputting an image to be detected into a first model and a second model, and acquiring global information output by the first model and a plurality of local information output by the second model, wherein the global information comprises confidence that a target object exists in the image to be detected, the local information comprises confidence that the target object exists in a local area of the image to be detected, each local information corresponds to one local area in the image to be detected, and at least one detection object exists in each local area; and determining a detection result according to the plurality of local information and the global information, wherein the detection result represents whether the image to be detected comprises the target object.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. An image detection method, comprising:

2. The method of claim 1, wherein determining a detection result according to the plurality of local information and the global information comprises:

performing statistical processing on the local information according to a preset rule to obtain statistical information;

splicing the statistical information and the global information to obtain spliced information;

and determining the detection result according to the splicing information.

3. The method according to claim 2, wherein the processing the plurality of local information according to a preset rule to obtain statistical information comprises:

calculating at least one statistical value of a maximum value, a minimum value, a median value, a mean value, a variance and a quartile of the plurality of local information;

determining the at least one statistical value as the statistical information.

4. The method of claim 2, wherein the determining the detection result according to the splicing information comprises:

inputting the splicing information into a third model, and obtaining an output result of the third model as the detection result, wherein the third model is a gradient lifting decision tree model.

5. The method of claim 1, wherein determining a detection result according to the plurality of local information and the global information comprises:

acquiring local information of a target object which is a detection object existing in a corresponding local area in the plurality of local information as target local information;

and determining the detection result according to the target local information and the global information.

6. The method of claim 1, wherein the local information further comprises location information, and prior to determining the detection result based on the plurality of local information and the global information, further comprising:

7. The method of claim 1, further comprising, before said inputting the image under test to the first model and the second model, respectively:

acquiring an image sample comprising a target object, and labeling specific attributes of the target object in the image sample;

and training a preset two-classification network based on the labeled specific attributes to obtain the second model.

8. The method of claim 7, further comprising:

generating at least one initial image according to the marked specific attribute;

determining an initial image containing the label of the specific attribute in the at least one initial image as a positive sample;

determining the initial image which does not contain the mark of the specific attribute in the at least one initial image as a negative sample;

and training to obtain the first model based on the positive sample and the negative sample.

9. The method according to any one of claims 1-8, further comprising, before said inputting the image to be measured into the first model and the second model, respectively:

and closing dropout layers in the first model and the second model.

10. An image detection apparatus, characterized by comprising:

11. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1-9.

12. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 9.