CN114140852B - Image detection method and device - Google Patents

Image detection method and device Download PDF

Info

Publication number
CN114140852B
CN114140852B CN202111456542.6A CN202111456542A CN114140852B CN 114140852 B CN114140852 B CN 114140852B CN 202111456542 A CN202111456542 A CN 202111456542A CN 114140852 B CN114140852 B CN 114140852B
Authority
CN
China
Prior art keywords
image
detection
frame
classification
bounding box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111456542.6A
Other languages
Chinese (zh)
Other versions
CN114140852A (en
Inventor
王珂尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111456542.6A priority Critical patent/CN114140852B/en
Publication of CN114140852A publication Critical patent/CN114140852A/en
Application granted granted Critical
Publication of CN114140852B publication Critical patent/CN114140852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The disclosure provides an image detection method and device, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face recognition and face image processing. The implementation scheme is as follows: obtaining a detection frame surrounding a target object in a target image; based on the detection frame, obtaining a detection area image, wherein the detection area image at least comprises an area surrounded by the detection frame in the target image; dividing the detection area image into a plurality of area image blocks; and detecting the plurality of regional image blocks to obtain a first classification corresponding to the target image.

Description

Image detection method and device
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, which can be applied to scenes such as face recognition and face image processing, and particularly relates to an image detection method, an image detection device, electronic equipment, a computer readable storage medium and a computer program product.
Background
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, etc.: the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Image processing techniques based on artificial intelligence have penetrated into various fields. Wherein, based on artificial intelligence's human face living body detection technique, according to the image data of user input, judge whether this image data comes from human face living body.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.
Disclosure of Invention
The present disclosure provides an image detection method, apparatus, electronic device, computer-readable storage medium, and computer program product.
According to an aspect of the present disclosure, there is provided an image detection method including: obtaining a detection frame surrounding a target object in a target image; acquiring a detection area image based on the detection frame, wherein the detection area image at least comprises an area surrounded by the detection frame in the target image; dividing the detection area image into a plurality of area image blocks; and detecting the plurality of regional image blocks to obtain a first classification corresponding to the target image.
According to another aspect of the present disclosure, there is provided an image detection apparatus including: a first detection unit configured to obtain a detection frame surrounding a target object in a target image; a first acquisition unit configured to acquire a detection area image based on the detection frame, wherein the detection area image includes at least an area surrounded by the detection frame in the target image; a second acquisition unit configured to divide the detection area image into a plurality of area image blocks; and a second detection unit configured to detect the plurality of area image blocks to obtain a first classification corresponding to the target image.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to implement a method according to the above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to implement a method according to the above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method according to the above.
According to one or more embodiments of the present disclosure, a detection region in a target image is obtained by being based on a detection frame surrounding a target object in the target image, the detection region including at least a region surrounded by the detection frame, and a classification corresponding to the target image is obtained based on a plurality of region blocks obtained by dividing the detection region. For various target images corresponding to a plurality of classifications with unobvious distinguishing features, the classification corresponding to the target images is obtained through one-time detection, so that the obtained detection result is accurate, and meanwhile, the calculation force is saved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.
FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;
FIG. 2 illustrates a flow chart of an image detection method according to an embodiment of the present disclosure;
FIG. 3 shows a schematic diagram of a detection frame surrounding a target object in a target image in an image detection method according to an embodiment of the present disclosure;
FIG. 4 shows a flowchart of a process of obtaining a detection area image based on a detection frame in an image detection method according to an embodiment of the present disclosure;
fig. 5 shows a schematic view of a first bounding box obtained based on a detection box of a target image in an image detection method according to an embodiment of the present disclosure;
fig. 6 shows a schematic diagram of a first bounding box obtained based on a detection box of a target image in an image detection method according to an embodiment of the present disclosure;
fig. 7 is a flowchart illustrating a process of obtaining a detection area image based on a first bounding box in an image detection method according to an embodiment of the present disclosure;
FIG. 8 is a flowchart of a process for determining a detection area image based on an area surrounded by a first bounding box;
fig. 9 is a flowchart illustrating a process of determining a detection area image based on an area surrounded by a first bounding box in an image detection method according to an embodiment of the present disclosure;
FIG. 10 illustrates an architectural diagram of a detection model in an image detection method according to an embodiment of the present disclosure;
fig. 11 shows a block diagram of a structure of an image detection apparatus according to an embodiment of the present disclosure; and
fig. 12 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.
The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.
In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable execution of the image detection method.
In some embodiments, server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The user may view the searched objects using client devices 101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.
Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays (such as smart glasses) and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.
Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.
The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.
In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.
In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual private server (VPS, virtual Private Server) service.
The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and object files. The data store 130 may reside in a variety of locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.
In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
Referring to fig. 2, an image detection method 200 according to some embodiments of the present disclosure includes:
step S210: obtaining a detection frame surrounding a target object in a target image;
step S220: acquiring a detection area image based on the detection frame, wherein the detection area image at least comprises an area surrounded by the detection frame in the target image;
step S230: dividing the detection area image into a plurality of area image blocks; and
step S240: and detecting the plurality of regional image blocks to obtain a first classification corresponding to the target image.
According to one or more embodiments of the present disclosure, a detection region in a target image is obtained by being based on a detection frame surrounding a target object in the target image, the detection region including at least a region surrounded by the detection frame, and a classification corresponding to the target image is obtained based on a plurality of region blocks obtained by dividing the detection region. For various target images corresponding to a plurality of classifications with unobvious distinguishing features, the classification corresponding to the target images is obtained through one-time detection, so that the obtained detection result is accurate, and meanwhile, the calculation force is saved.
In the related art, face living body detection is performed based on image data input by a user to determine whether the input image data is from a face living body. After the image data is processed into the target image, the synthetic image detection and the human face living body detection are sequentially carried out on the target image so as to judge whether the image data corresponding to the target image come from the human face living body or not. First, a synthetic image is detected to determine whether image data corresponding to a target image is from the synthetic image, and after determining that the image data corresponding to the target image is not from the synthetic image, a face living body is detected to determine whether the image data corresponding to the target image is from the face living body. The whole process needs to carry out detection and judgment on the target image twice, so that the consumed calculation power is very large. Meanwhile, in the process of detecting the synthetic image of the target image, if the judgment is wrong, the final detection result is wrong, so that the accuracy is difficult to ensure. For example, the target image corresponding to the composite image is determined not to correspond to the composite image, and further human face living detection is performed on the target image, and the target image is determined to be human face living, so that in some application scenarios, the image data corresponding to the target image is verified, which may cause huge property loss and even personal injury.
According to the embodiment of the disclosure, the target image of the image data input by the user is directly processed to obtain the detection frame surrounding the human face, the detection area image is obtained based on the detection frame, and finally the plurality of area image blocks obtained by dividing the detection area image are detected to obtain the corresponding classification of the target image, namely, the detection result can be obtained by carrying out detection judgment on the target image once, so that the calculation power is reduced. Meanwhile, any target image corresponding to attack classification, synthetic image classification and human face living body classification can be obtained, and the effect of multi-classification detection on the target image is achieved.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
In some embodiments, the target object is a human face, and the first classification includes: attack classification, face living classification, or synthetic map classification.
When the target image is subjected to attack classification, human face living body classification and synthetic image classification, the characteristics of a human face area are often concerned in the task of distinguishing the human face living body classification and the synthetic image classification; in the task of classifying human faces and classifying attacks, the features of the background area around the human face are often focused. For example, in screen attacks and paper attacks, the boundary of the screen and paper (in the background area around the face) is taken as an important feature to distinguish whether to classify the attack.
According to the embodiment of the disclosure, the plurality of area image blocks obtained based on the target image are detected, features in each area image block are not only focused in the detection process, but global features (such as screen attack and paper attack, screen and paper boundary) of the target image are extracted through the plurality of area image blocks, so that the detection process can simultaneously perform the task of distinguishing the human face living body classification and the synthetic image classification and the task of distinguishing the human face living body classification and the attack classification, namely, two detection judgment outputs such as living body detection classification and synthetic image classification can be obtained simultaneously through one-time front prediction, and the calculation force is reduced.
It should be understood that the embodiments are described with the target object being a human face as an example, and are merely exemplary, and those skilled in the art should understand that any object (e.g., an animal, a vehicle, a fingerprint, etc.) may be used as the target object for the technical solution of the present disclosure.
In some embodiments, the method 200 further includes acquiring a target image, including: image data input by a user is acquired, and the target image is acquired based on the image data.
In some embodiments, the image data input by the user may be video, photo, etc., without limitation.
In some embodiments, in step S210, a detection frame surrounding the target object is obtained by a target detection method.
In some embodiments, the detection frame is a rectangular frame. In one example, a detection frame surrounding a face is obtained by a face detection method, wherein the face key point is obtained by detecting the face key point in the target image, and the detection frame is obtained based on the face key point. For example, a plurality of face keypoint coordinates (e.g., 72) are obtained using a trained face keypoint detection model, respectively (x) 1 ,y 1 )…(x i ,y i )…(x n ,y n ). Obtaining maximum and minimum values x of x and y according to the coordinates of the key points of the faces min 、x max 、y min 、y max And determining a detection frame according to the maximum and minimum values.
Referring to fig. 3, a schematic diagram of a target image is shown, wherein a target object 310 in the target image 300 is surrounded by a detection box 320, according to some embodiments.
In some embodiments, the detection frame is a rectangular frame, and wherein, as shown in fig. 4, obtaining the detection region image based on the detection frame includes:
step S410: enlarging the rectangular frame to obtain a first surrounding frame, wherein the ratio of at least a first side length of two adjacent side lengths of the first surrounding frame to a second side length of the rectangular frame corresponding to the first side length is a preset value; and
Step S420: and obtaining the detection area image based on the first bounding box.
The detection frame is directly enlarged, and the detection area image is obtained based on the first bounding box obtained after the expansion of the detection frame, so that the detection area image obtaining method is simple, and the data processing amount is small.
In some embodiments, the preset size is 256pix by 256pix.
In some embodiments, the first bounding box is obtained by expanding the side length of one of two adjacent sides of the rectangular box, while the side length of the other side remains unchanged. As shown in fig. 5, the first bounding box 530 is obtained based on the detection box 520 bounding the target object 510 in the target image 500, wherein the width of the first bounding box 530 coincides with the width of the detection box 520, and the length of the first bounding box 530 is three times the length of the detection box 520.
In some embodiments, the side length of each of two adjacent sides of the rectangular frame is enlarged, thereby obtaining a first bounding frame. As shown in fig. 6, the first bounding box 630 is obtained based on the detection box 620 bounding the target image 610 in the target image 600, wherein the length of the first bounding box 630 is three times the length of the detection box 630, and the width of the first bounding box 630 is also three times the detection box 620.
In some implementations, a ratio of at least a first side length of two adjacent sides of the first bounding box to a second side length of the rectangular box corresponding to the first side length ranges from 2 to 5.
Setting the range of the ratio of at least the first side length of two adjacent side lengths of the first bounding box to the second side length of the rectangular box corresponding to the first side length to be 2 to 5, and avoiding the situation that the value is too small, for example, below 2, so that the boundary of paper/screen attack cannot be acquired, and the detection is inaccurate; meanwhile, the overlarge value, for example, more than 5, is avoided, and the data processing capacity is increased.
In some embodiments, as shown in fig. 7, obtaining the detection region image based on the first bounding box includes:
step S710: determining a region surrounded by the first surrounding frame; and
step S720: and determining the detection area image based on the area surrounded by the first surrounding frame, wherein the detection area image has a preset size.
For target images with different sizes, detection area images with the same size are obtained, so that further dividing processing is performed, the sizes of the obtained plurality of area images for the target images with different sizes are consistent, and the detection result is more accurate.
In some embodiments, as shown in fig. 8, determining the detection region image based on the region surrounded by the first bounding box includes:
step S810: in response to determining that the region surrounded by the first bounding box is located within the target image, determining the region surrounded by the first bounding box as an image to be determined; and
step S820: and adjusting the size of the image to be determined to the preset size to obtain the detection area image.
For a target object (e.g., a human face) occupying a small area in a target image, a first bounding box obtained based on a detection frame bounding the target object is within the target image, and a detection area image is obtained directly based on an area bounded by the first bounding box.
With continued reference to fig. 5, the first bounding box 530 is located within the target image 500, and the region bounded by the first bounding box 530 on the target image 500 is determined as the image 540 to be determined. The image to be determined is processed, for example, scaled, to obtain a detection area image of a preset size.
In some embodiments, as shown in fig. 9, determining the detection region image based on the region surrounded by the first bounding box includes:
Step S910: responsive to determining that the region surrounded by the first bounding box exceeds the target image, acquiring a first portion of the region surrounded by the first bounding box that exceeds the target image;
step S920: acquiring a first image based on the first part, wherein each pixel position in the first image corresponds to a preset pixel value;
step S930: splicing the first image and a second part positioned in the target image in the area surrounded by the first bounding box into an image to be determined; and
step S940: and adjusting the size of the image to be determined to the preset size to obtain the detection area image.
For a target object (e.g., a human face) occupying a larger area in a target image, a first bounding box obtained based on a detection frame bounding the target object exceeds the target image, so that an image may not be obtained directly from an area bounded by the first bounding box.
Meanwhile, according to the embodiment of the disclosure, since the first image has the preset pixel value, the influence is small in the subsequent detection process of the plurality of area images, the data processing amount is further reduced, and meanwhile, the detection accuracy is improved.
As shown in fig. 6, the region surrounded by the first surrounding frame 630 exceeds the target image 600, a first image is obtained based on a first portion 630A exceeding the target image 600 in the region surrounded by the first surrounding frame 630, an image to be determined is constituted together with a second portion 630B (in this embodiment, the target image 600) located in the target image 600 in the region surrounded by the first surrounding frame 630, and a detection region image is obtained based on the image to be determined.
By the procedure of determining the detection area image based on the area surrounded by the first bounding box described with reference to fig. 8 and 9, it is possible to obtain the detection area image for any target image, whether the target object occupies a larger or smaller area in the target image.
In some embodiments, the preset pixel value is, for example, 0, 256, etc., which is not limited herein.
In some embodiments, after obtaining the detection region image, the detection region image is further subjected to normalization and data enhancement processing before obtaining the plurality of region image blocks, so that the detection region image after normalization and data enhancement is subsequently divided, and the plurality of region image blocks are obtained. In some examples, the detection area image is normalized by processing pixels at various locations in the detection area image to values distributed between-0.5 and 0.5. In some examples, the normalized detection region image is subjected to random data enhancement to perform data enhancement processing.
In some embodiments, after the detection region image is obtained, a plurality of region image blocks having the same size are obtained by equally dividing the detection region image.
In some embodiments, the detection area image of the preset size 256pix by 256pix is equally cut to obtain 16 area image blocks of 64pix by 64 pix.
In some embodiments, detecting the plurality of region image blocks includes:
the plurality of regional image blocks are detected using a detection model that includes a self-attention network, wherein the self-attention network is configured to extract global features of the plurality of regional image blocks that correspond to the target image.
Referring to fig. 10, a schematic architecture diagram of a detection model according to one embodiment of the present disclosure, the detection model 1000 includes a feature extraction network 1010, a self-attention network 1020, and a classification module 1030. In the detection process, a plurality of regional image blocks are input into the detection model 1000 as input a, the feature extraction network 1010 of the detection model 1000 extracts features, the extracted features are input into the self-attention network 1020 to further extract global features, and the features extracted through the self-attention network 1020 are input into the classification module 1030 to obtain the classification of the target image, so as to be used as output B of the detection model and output the classification. Because the feature extraction network is adopted in the detection model to extract features of the plurality of regional image blocks and the self-attention network is further used for extracting global features based on the features extracted by the feature extraction network, the global features reflect the overall features of the target image, so that the detection module fully considers the features of each regional image block in the plurality of regional image blocks and considers the global features of the target image block in the image classification process, and the detection model classifies the target image block more accurately.
In some embodiments, the target object is a human face, and the first classification includes: attack classification, face living classification, or synthetic map classification.
When the target image is subjected to attack classification, human face living body classification and synthetic image classification, the characteristics of a human face area are often concerned in the task of distinguishing the human face living body classification and the synthetic image classification; in the task of classifying human faces and classifying attacks, the features of the background area around the human face are often focused. For example, in screen attacks and paper attacks, the boundary of the screen and paper (in the background area around the face) is taken as an important feature to distinguish whether to classify the attack.
According to the embodiment of the disclosure, the detection model detects a plurality of region image blocks obtained based on the target image, and in the detection process, not only the features in each region image block are focused, but also global features (such as screen attack and paper attack, screen and paper boundary) of the target image are extracted through the plurality of region image blocks, so that the detection model can simultaneously perform a task of distinguishing the human face living body classification and the synthetic image classification and a task of distinguishing the human face living body classification and the attack classification, namely, two detection judgment outputs such as living body detection classification and synthetic image classification can be obtained simultaneously through one-time front-end prediction, calculation force is reduced, and training difficulty and complexity of subsequent deployment of the model are reduced.
In some embodiments, the feature extraction network includes, but is not limited to, mobileNet V2, VGG11, VGG15, and the like.
In some embodiments, the feature extraction network includes a plurality of convolution layers, wherein the self-attention network may be located after any convolution layer. For example, in a feature extraction grid comprising 5 convolutional layers VGG11, the self-attention network is located after the second convolutional layer and before the third convolutional layer.
In some embodiments, the target object is a human face, and the first classification includes: the attack classification, the face living body classification or the synthetic image classification is realized by training the detection model by using a training image set comprising a plurality of images corresponding to the attack classification, a plurality of images corresponding to the face living body classification and a plurality of images corresponding to the synthetic image classification, namely, the input image is classified into one of the attack classification, the face living body classification and the synthetic image classification.
Wherein, in the training of the detection model, the same processing as the processing of the target image in the foregoing embodiment (step S210 to step S230) is adopted for each image in the training image set.
According to another aspect of the present disclosure, there is also provided an image detection apparatus, as shown in fig. 11, an apparatus 1100 including: a first detection unit 1110 configured to obtain a detection frame surrounding a target object in a target image; a first obtaining unit 1120 configured to obtain a detection area image based on the detection frame, wherein the detection area image includes at least an area surrounded by the detection frame in the target image; a second acquisition unit 1130 configured to divide the detection area image into a plurality of area image blocks; and a second detection unit 1140 configured to detect the plurality of area image blocks to obtain a first classification corresponding to the target image.
In some embodiments, the detection frame is a rectangular frame, and wherein the first acquisition unit 1120 includes: a first obtaining subunit configured to enlarge the rectangular frame to obtain a first bounding box, wherein a ratio of at least a first side length of two adjacent side lengths of the first bounding box to a second side length of the rectangular frame corresponding to the first side length is a preset value; and a second acquisition subunit configured to acquire the detection region image based on the first bounding box.
In some embodiments, the preset value ranges from 2 to 5.
In some embodiments, the second acquisition subunit comprises: a first determination unit configured to determine an area surrounded by the first surrounding frame; and a second determination unit configured to determine the detection area image based on an area surrounded by the first surrounding frame, wherein the detection area image has a preset size.
In some embodiments, the second determining unit includes: a third acquisition unit configured to determine an area surrounded by the first bounding box as an image to be determined in response to determining that the area surrounded by the first bounding box is located within the target image; and a first adjustment unit configured to adjust the size of the image to be determined to the preset size to obtain the detection area image.
In some embodiments, the second determining unit includes: a fourth acquisition unit configured to acquire a first portion exceeding the target image in an area surrounded by the first bounding box in response to determining that the first bounding box exceeds the target image; a fifth acquisition unit configured to acquire a first image based on the first portion, wherein each pixel position in the first image corresponds to a preset pixel value; a sixth acquisition unit configured to stitch a second portion located in the target image in an area surrounded by the first image and the first bounding box, as an image to be determined; and a second adjusting unit configured to adjust the size of the image to be determined to the preset size to obtain the detection area image.
In some embodiments, the second detection unit comprises: and a model unit configured to detect the plurality of regional image blocks using a detection model including a self-attention network for extracting global features corresponding to the target image in the plurality of regional image blocks.
In some embodiments, the target object is a human face, and the first classification includes the following: attack classification, face living classification, or synthetic map classification.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program which, when executed by the at least one processor, implements a method according to the above.
According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a method according to the above.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements a method according to the above.
According to embodiments of the present disclosure, there is also provided an electronic device, a readable storage medium and a computer program product.
With reference to fig. 12, a block diagram of an electronic device 1200 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the electronic device 1200 includes a computing unit 1201 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the electronic device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other via a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204.
Various components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206, an output unit 1207, a storage unit 1208, and a communication unit 1209. The input unit 1206 may be any type of device capable of inputting information to the electronic device 1200, the input unit 1206 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 1207 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, an object/audio output terminal, a vibrator, and/or a printer. Storage unit 1208 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices over computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1201 performs the various methods and processes described above, such as method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into RAM 1203 and executed by computing unit 1201, one or more steps of method 200 described above may be performed. Alternatively, in other embodiments, computing unit 1201 may be configured to perform method 200 by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims (14)

1. An image detection method, comprising:
obtaining a detection frame surrounding a target object in a target image;
Acquiring a detection area image based on the detection frame, wherein the detection area image at least comprises an area surrounded by the detection frame in the target image;
dividing the detection area image into a plurality of area image blocks; and
detecting the plurality of area image blocks to obtain a first classification corresponding to the target image, wherein the obtaining the detection area image based on the detection frame includes:
enlarging the detection frame to obtain a first bounding box;
responsive to determining that the region surrounded by the first bounding box exceeds the target image, acquiring a first portion of the region surrounded by the first bounding box that exceeds the target image;
acquiring a first image based on the first part, wherein each pixel position in the first image corresponds to a preset pixel value;
splicing the first image and a second part positioned in the target image in the area surrounded by the first bounding box into an image to be determined; and
and adjusting the size of the image to be determined to a preset size to obtain the detection area image.
2. The method of claim 1, wherein the detection frame is a rectangular frame, and wherein,
The obtaining the detection area image based on the detection frame includes:
enlarging the rectangular frame to obtain a first surrounding frame, wherein the ratio of at least a first side length of two adjacent side lengths of the first surrounding frame to a second side length of the rectangular frame is a preset value, and the second side length corresponds to the first side length; and
and obtaining the detection area image based on the first bounding box.
3. The method of claim 2, wherein the preset value has a value in the range of 2 to 5.
4. The method of claim 2, wherein the obtaining the detection region image based on the first bounding box comprises:
determining a region surrounded by the first surrounding frame; and
and determining the detection area image based on the area surrounded by the first surrounding frame, wherein the detection area image has a preset size.
5. The method of claim 1, wherein the detecting the plurality of region image blocks comprises:
the plurality of regional image blocks are detected using a detection model that includes a self-attention network, wherein the self-attention network is configured to extract global features of the plurality of regional image blocks that correspond to the target image.
6. The method of claim 1, the target object being a human face, the first classification comprising: attack classification, face living classification, or synthetic map classification.
7. An image detection apparatus comprising:
a first detection unit configured to obtain a detection frame surrounding a target object in a target image;
a first acquisition unit configured to acquire a detection area image based on the detection frame, wherein the detection area image includes at least an area surrounded by the detection frame in the target image;
a second acquisition unit configured to divide the detection area image into a plurality of area image blocks; and
a second detection unit configured to detect the plurality of regional image blocks to obtain a first classification corresponding to the target image; the first acquisition unit includes:
a first acquisition subunit configured to expand the detection frame to obtain a first bounding box;
a fourth acquisition unit configured to acquire a first portion exceeding the target image in an area surrounded by the first bounding box in response to determining that the first bounding box exceeds the target image;
a fifth acquisition unit configured to acquire a first image based on the first portion, wherein each pixel position in the first image corresponds to a preset pixel value;
A sixth acquisition unit configured to stitch a second portion located in the target image in an area surrounded by the first image and the first bounding box, as an image to be determined; and
and a second adjusting unit configured to adjust the size of the image to be determined to a preset size to obtain the detection area image.
8. The apparatus of claim 7, wherein the detection frame is a rectangular frame, and wherein the first acquisition unit comprises:
the first acquisition subunit is configured to expand the rectangular frame to obtain a first bounding box, wherein the ratio of at least a first side length of two adjacent side lengths of the first bounding box to a second side length of the rectangular box is a preset value, and the second side length corresponds to the first side length; and
and a second acquisition subunit configured to acquire the detection region image based on the first bounding box.
9. The apparatus of claim 8, wherein the preset value ranges from 2 to 5.
10. The apparatus of claim 8, wherein the second acquisition subunit comprises:
a first determination unit configured to determine an area surrounded by the first surrounding frame; and
And a second determination unit configured to determine the detection area image based on an area surrounded by the first surrounding frame, wherein the detection area image has a preset size.
11. The apparatus of claim 7, wherein the second detection unit comprises:
and a model unit configured to detect the plurality of regional image blocks using a detection model including a self-attention network for extracting global features corresponding to the target image in the plurality of regional image blocks.
12. The apparatus of claim 7, wherein the target object is a human face, and the first classification comprises: attack classification, face living classification, or synthetic map classification.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the method comprises the steps of
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202111456542.6A 2021-12-01 2021-12-01 Image detection method and device Active CN114140852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111456542.6A CN114140852B (en) 2021-12-01 2021-12-01 Image detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111456542.6A CN114140852B (en) 2021-12-01 2021-12-01 Image detection method and device

Publications (2)

Publication Number Publication Date
CN114140852A CN114140852A (en) 2022-03-04
CN114140852B true CN114140852B (en) 2023-09-29

Family

ID=80386883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111456542.6A Active CN114140852B (en) 2021-12-01 2021-12-01 Image detection method and device

Country Status (1)

Country Link
CN (1) CN114140852B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977978A (en) * 2017-12-28 2019-07-05 中兴通讯股份有限公司 A kind of multi-target detection method, device and storage medium
CN112052831A (en) * 2020-09-25 2020-12-08 北京百度网讯科技有限公司 Face detection method, device and computer storage medium
CN112907598A (en) * 2021-02-08 2021-06-04 东南数字经济发展研究院 Method for detecting falsification of document and certificate images based on attention CNN

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222201B2 (en) * 2020-04-14 2022-01-11 International Business Machines Corporation Vision-based cell structure recognition using hierarchical neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977978A (en) * 2017-12-28 2019-07-05 中兴通讯股份有限公司 A kind of multi-target detection method, device and storage medium
CN112052831A (en) * 2020-09-25 2020-12-08 北京百度网讯科技有限公司 Face detection method, device and computer storage medium
CN112907598A (en) * 2021-02-08 2021-06-04 东南数字经济发展研究院 Method for detecting falsification of document and certificate images based on attention CNN

Also Published As

Publication number Publication date
CN114140852A (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN112749758B (en) Image processing method, neural network training method, device, equipment and medium
CN113313650B (en) Image quality enhancement method, device, equipment and medium
CN112749685B (en) Video classification method, apparatus and medium
CN114494935B (en) Video information processing method and device, electronic equipment and medium
CN113256583A (en) Image quality detection method and apparatus, computer device, and medium
CN112712498A (en) Vehicle damage assessment method and device executed by mobile terminal, mobile terminal and medium
CN114445667A (en) Image detection method and method for training image detection model
CN114140852B (en) Image detection method and device
CN114219046B (en) Model training method, matching method, device, system, electronic equipment and medium
CN114842476A (en) Watermark detection method and device and model training method and device
CN113868453B (en) Object recommendation method and device
CN113139542B (en) Object detection method, device, equipment and computer readable storage medium
CN114494797A (en) Method and apparatus for training image detection model
CN114998963A (en) Image detection method and method for training image detection model
CN113486853A (en) Video detection method and device, electronic equipment and medium
CN114140851B (en) Image detection method and method for training image detection model
CN115170536B (en) Image detection method, training method and device of model
CN114120420B (en) Image detection method and device
CN115100431B (en) Target detection method, neural network, training method, training device and training medium thereof
CN114677691B (en) Text recognition method, device, electronic equipment and storage medium
CN114842474B (en) Character recognition method, device, electronic equipment and medium
CN115512131B (en) Image detection method and training method of image detection model
CN115511779B (en) Image detection method, device, electronic equipment and storage medium
CN113793290B (en) Parallax determining method, device, equipment and medium
CN115131562B (en) Three-dimensional scene segmentation method, model training method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant