WO2021139324A1 - 图像识别方法、装置、计算机可读存储介质及电子设备 - Google Patents

图像识别方法、装置、计算机可读存储介质及电子设备 Download PDF

Info

Publication number
WO2021139324A1
WO2021139324A1 PCT/CN2020/123903 CN2020123903W WO2021139324A1 WO 2021139324 A1 WO2021139324 A1 WO 2021139324A1 CN 2020123903 W CN2020123903 W CN 2020123903W WO 2021139324 A1 WO2021139324 A1 WO 2021139324A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
category
information
image
target
Prior art date
Application number
PCT/CN2020/123903
Other languages
English (en)
French (fr)
Inventor
唐梦云
裴歌
刘水生
涂思嘉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021139324A1 publication Critical patent/WO2021139324A1/zh
Priority to US17/676,111 priority Critical patent/US20220172518A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • G06V10/426Graphical representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to an image recognition method, image recognition device, computer-readable storage medium, and electronic equipment.
  • the embodiments of the present application provide an image recognition method, device, computer-readable storage medium, and electronic equipment, which can improve the efficiency and accuracy of image recognition at least to a certain extent.
  • the embodiment of the present application provides an image recognition method, including:
  • the feature information includes any one or more of ambiguity information, local feature information, and global feature information;
  • An embodiment of the application provides an image recognition device, including:
  • the feature information obtaining module is configured to obtain feature information corresponding to the target object in the image to be recognized; wherein the feature information includes any one or more of ambiguity information, local feature information, and global feature information;
  • a confidence level acquiring module configured to determine the category of the target object based on the characteristic information, and determine the confidence level corresponding to the target object;
  • the target information obtaining module is configured to obtain target information corresponding to the image to be recognized according to the category of the target object and the confidence level.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the image recognition method described in the embodiment of the present application is implemented.
  • An embodiment of the present application provides an electronic device, including: one or more processors; a storage device, the storage device is used to store one or more programs, when the one or more programs are used by the one or more When executed by the two processors, the one or more processors are caused to execute the image recognition method described in the foregoing embodiment.
  • the image recognition method provided by the embodiment of the present application determines the category corresponding to the target object and the confidence level corresponding to the target object based on feature information of at least one dimension; finally, the target information corresponding to the image to be recognized is obtained according to the category and confidence level corresponding to the target object , which can improve the efficiency and accuracy of image recognition, and facilitate the identification of true and false target objects such as faces and human bodies in images or videos.
  • Fig. 1 is a schematic diagram of an application scenario of an image recognition system provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an interface for providing target information of an image recognition method according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of a process of detecting and recognizing a human face according to an image recognition method provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of a process of detecting and recognizing a face in a sensitive video according to an image recognition method provided by an embodiment of the present application;
  • FIG. 8 is a block diagram of an image recognition device provided by an embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a computer system of an electronic device provided by an embodiment of the present application.
  • Fig. 1 is a schematic diagram of an application scenario of an image recognition system provided by an embodiment of the present application.
  • the system architecture 100 may include a terminal device 101, a network 102, and a server 103.
  • the network 102 is used to provide a medium of a communication link between the terminal device 101 and the server 103.
  • the network 102 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to actual needs, there can be any number of terminal devices, networks and servers.
  • the server 103 may be a server cluster composed of multiple servers.
  • the terminal device 101 may be a device such as a tablet computer, a desktop computer, or a smart phone.
  • the terminal device 101 may obtain the captured video uploaded by the user as the to-be-recognized video, and use the video frame including the human face in the to-be-recognized video as the to-be-recognized image, and then the terminal device 101 may send the video to the server 103 via the network 102.
  • the image to be recognized After the server 103 obtains the image to be recognized, it can distinguish the authenticity of the face (target object) in the image to be recognized, that is, to determine whether the target object is added later through image processing technology, rather than in the image to be recognized The original contained object.
  • the server 103 can detect the image to be recognized obtained in the recall phase, and screen the authenticity of the face (target object) in the image to be recognized, that is, to determine whether the target object is Later, it is added through image processing technology, instead of the objects originally contained in the image to be recognized.
  • the server 103 will use the image to be recognized from the recall stage. Remove the database to prevent these unrecognized images from being recommended to users.
  • the server 103 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or it may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the terminal device 101 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited to this.
  • the terminal device and the server can be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.
  • the terminal device 101 may obtain the video to be recognized, obtain an image containing the target object by analyzing the video to be recognized, and use the image containing the target object as the image to be recognized, or directly obtain The image to be recognized containing the target object, and then the terminal device 101 can send the image to be recognized to the server 103 via the network 102.
  • the server 103 obtains the image to be recognized, it can perform feature extraction on the target object in the image to be recognized to obtain feature information corresponding to the target object.
  • the feature information includes any one of ambiguity information, local feature information, and global feature information. Or more. Then determine the target object category based on the characteristic information, and determine the confidence level corresponding to the target object.
  • the authenticity of the target object in the image to be recognized can be screened, that is, whether the target object is determined It is added later through image processing technology, rather than the original objects contained in the image to be recognized.
  • the risk value of the image to be identified can be calculated according to the obtained confidence level, and the target information can be obtained based on the risk value.
  • the target information can specifically include whether the target object is fake, whether the portrait of a public figure is used, the risk value, etc., according to The information contained in the target information is also different for different tasks. If it is determined that the image to be identified or the video to be identified is at risk, operations such as removing it from the shelves need to be performed.
  • the technical solutions of the embodiments of the present application can perform multi-directional and multi-level analysis of the image to be recognized or the video to be recognized, which greatly improves the efficiency and accuracy of image recognition.
  • the image recognition method provided in the embodiment of the present application can be executed by a server, and accordingly, the image recognition device can be set in the server. However, in some embodiments, the image recognition method provided in the embodiments of the present application may also be executed by the terminal device.
  • the image can be detected and then the human face image can be detected according to the fake face model, which is a fake face model.
  • the fake face videos generated by artificial intelligence technology have many variations and large differences in features. , The quality is very unstable. Therefore, only a single method for authenticating images and videos will bring about high false detection rate and high missed detection rate.
  • the method of identifying true and false faces in related technologies is simple and cannot Covers fake face images or videos generated by multiple face-changing algorithms, so it is difficult to accurately detect fake faces generated by multiple artificial face-changing technologies.
  • Machine learning is a type of artificial intelligence.
  • Artificial Intelligence uses Digital computers or digital computer-controlled machines simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results in theories, methods, technologies and application systems.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets. And further graphics processing, so that computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image.
  • Computer vision studies related theories and technologies trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robotics, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
  • the embodiment of the present application proposes an image recognition method.
  • the following takes the recognition of the face in the image to be recognized as an example to describe in detail the implementation details of the technical solution of the embodiment of the present application.
  • the image recognition method provided in the embodiments of the present application may be used to recognize the human face in the image to be recognized, the animal face in the image to be recognized, and the object in the image to be recognized may also be recognized.
  • FIG. 2 is a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • the image recognition method may be executed by a server, and the server may be the server 103 shown in FIG. 1.
  • the image recognition method includes at least step S210 to step S230, which are described in detail as follows:
  • step S210 feature information corresponding to the target object in the image to be recognized is acquired, where the feature information includes any one or more of blur degree information, local feature information, and global feature information.
  • the image to be recognized is an image that needs to be discriminated against human faces.
  • the human face is the target object.
  • the image to be recognized can be an image containing a human face stored locally on the terminal device, or it can be from
  • the image to be recognized can also be obtained by processing the video to be recognized.
  • the video to be recognized can be decoded through video coding and decoding technology, for example, the Moving Picture Experts Group (Moving Picture Experts) can be used to decode the video to be recognized.
  • the Moving Picture Experts Group Moving Picture Experts
  • MPEG MPEG
  • audio lossy compression standards MPEG-2, MPEG-4, H.264 and other methods
  • the video to be recognized is parsed into image frames, and the image frame containing the face is used as the image to be recognized , You can also select a frame from all the image frames containing the face as the image to be recognized.
  • the corresponding low-level features and high-level features can be extracted from the image to be recognized, where the low-level features can be pixel and texture level information of the image, and the high-level features can be local or global feature information of the image.
  • the feature information corresponding to the image to be recognized is from low to high, respectively, blur degree information, local feature information, and global feature information.
  • the category of the image to be recognized can be determined according to any one of the three feature information, and the category of the image to be recognized can also be determined according to multiple of the three feature information.
  • the category of the image to be recognized is the face in the image to be recognized. True and false.
  • the face-changing technology in the related technology firstly generates a face area similar to the expression and action of the target task in the image or video, and then uses image fusion and face fusion technology to embed the generated fake face into the target face.
  • the limitation of these methods is that the resolution of the generated face area is not consistent with that of the original video, which makes the face area in the composite video have a big difference from the original real image and video at the low-level feature level of the image, such as: color , Edge contour, texture, light intensity, sharpness, etc.
  • the synthesized video usually has the characteristics of blurred edges, lack of changes in the texture of different regions, and low resolution.
  • the fuzzy degree of the face area and the human body area in the video can be analyzed first, including two The difference in the features of texture, edge contour and sharpness, and the use of classification thresholds to judge the authenticity of the face in the image and video. Since the blurriness information is a low-level feature of the image, it is easy to obtain, so if you pass The ambiguity information can detect the true and false of the face, so there is no need to perform face recognition based on the high-level features of the image, which can improve the recognition efficiency and save time and resources.
  • FIG. 3 is a schematic flow chart of an image recognition method provided by an embodiment of the present application. As shown in FIG. 3, the flow at least includes steps S301-S303, specifically:
  • step S301 the target object in the image to be recognized is detected to obtain a first object map corresponding to the target object
  • the face in the image to be recognized can be detected based on the face detection algorithm to obtain face coordinate frame information, and the face area can be further determined according to the face coordinate frame information, and the area containing only the face can be obtained The first object graph of information.
  • a model based on the feature pyramid network architecture can be used to detect the face in the image to be recognized through the model to obtain face coordinate frame information.
  • the information specifically includes the coordinates (x, y) of the starting point of the upper left corner of the face area, and also includes the width and height of the bounding box of the face area.
  • the face coordinate frame information is (x, y, width, height)
  • the first object image only contains face information, and does not contain background information in the image to be recognized.
  • Face detection algorithms can also be used to detect the face in the image to be recognized to obtain face coordinate frame information
  • Cascade Convolutional Neural Networks Cascade CNN
  • DenseBox face detection network (Faceness-Net), face region convolutional neural network (Face Region-Convolutional Neural Networks, Face R-CNN), multi-task convolutional neural network (Multi-task Convolutional Neural Networks, MTCNN)
  • face detection algorithms which are not specifically limited in the embodiment of the present application.
  • step S302 the coordinates of the first object map are adjusted to obtain the second object map.
  • the coordinates of the first object map can be expanded to obtain For the second object image that includes a part of the background, specifically, both the width and height of the first object image can be enlarged by 1/4 times, of course, it can also be enlarged by other multiples, which is not specifically limited in the embodiment of the present application.
  • step S303 blur degree calculations are performed on the first object image and the second object image respectively to obtain the blur degree information corresponding to the target object.
  • the Laplacian operator can be used to calculate the first ambiguity S face corresponding to the first object graph and the corresponding ambiguity S face corresponding to the second object graph.
  • the second ambiguity S bg when calculating the ambiguity, Gaussian filtering can be performed on the first object image and the second object image to remove noise; then the filtered first object image and the second object image are converted to gray Degree map; then the Laplacian operator is used to convolve the grayscale images corresponding to the first object image and the second object image respectively; finally, the variance is calculated according to the feature information obtained after convolution, and the first ambiguity can be obtained S face and the second blur degree S bg , the first blur degree S face and the second blur degree S bg are the blur degree information corresponding to the face of the target object.
  • image recognition can also be performed based on the local feature information corresponding to the target object.
  • the local feature information corresponding to the target object can be acquired, and then the category of the target object can be determined based on the local feature information.
  • the first object map can be input to the first image recognition model, and the first object map can be extracted through the first image recognition model to obtain the local feature information.
  • the local feature information classifies the target object to obtain the target object category.
  • the first image recognition model may be various network models used for image feature extraction, for example, it may be a convolutional neural network such as ResNet, Inception, SqueezeNet, and DenseNet.
  • the first object image before the first object image is input to the SqueezeNet network, the first object image can be scaled to 224 ⁇ 224 pixels to match the input of the SqueezeNet network; then the scaled first object image is input to The SqueezeNet network uses the SqueezeNet network to perform feature extraction on the scaled first object map to obtain local feature information, that is, facial feature information; finally, the authenticity of the face is screened according to the facial feature information.
  • image recognition can also be performed based on the global feature information corresponding to the target object.
  • feature extraction can be performed on the image to be recognized to obtain the global feature information corresponding to the target object, and then the category of the target object can be determined based on the global feature information. And the confidence level corresponding to the target object.
  • the image to be recognized can be input to the second image recognition model, and feature extraction is performed on the image to be recognized through the second image recognition model to obtain global feature information.
  • the second image recognition model can also be used for Various network models for image feature extraction, such as Fast Region-Convolutional Neural Networks (Faster R-CNN), Fast Region-Convolutional Neural Networks (Fast R-CNN), Mask Region-Convolutional Neural Networks (Mask R-CNN), YOLO, YOLOv2, YOLOv3, etc.
  • Faster R-CNN Fast Region-Convolutional Neural Networks
  • Fast R-CNN Fast Region-Convolutional Neural Networks
  • Mask Region-Convolutional Neural Networks YOLO, YOLOv2, YOLOv3, etc.
  • step S220 the category of the target object is determined based on the characteristic information, and the confidence level corresponding to the target object is determined.
  • the classification threshold is an adaptive classification threshold obtained according to the face recognition results in all image frames in multiple videos.
  • the face image in the image is a face image directly collected from a face
  • the face in the face image is a real face
  • the face image in the image is through artificial intelligence technology
  • the face image of Li in the image is replaced with the face image of Zhang through the face-changing technology based on artificial intelligence.
  • the face in Li's face image is a real face
  • the face in Zhang's face image is a fake face.
  • the first confidence that it is false can be determined according to the ambiguity ratio and the classification threshold.
  • the calculation formula is as shown in formula (1), specifically: :
  • conf-level1-fake is the first confidence
  • p is the ambiguity ratio
  • p boarder is the classification threshold
  • the category of the target object may be determined according to the local feature information corresponding to the target object.
  • the local feature information of the image to be recognized can be obtained, and the local feature information is the facial feature information .
  • the second confidence level conf-level2-fake and the third confidence level conf-level2-real corresponding to the face can be obtained, where the second confidence level conf-level2-fake is the output of the first image recognition model
  • the category of the face is a probability value of false
  • the third confidence level conf-level2-real is the probability value of the category of the face output by the first image recognition model being true.
  • the second confidence level conf-level2-fake is greater than the third confidence level conf-level2-real, the face in the image to be recognized is determined to be a fake face; when the second confidence level conf-level2-fake is less than or equal to the third confidence level
  • the level is conf-level2-real, it is determined that the human face in the image to be recognized is a real face.
  • the local feature information is more accurate and belongs to the high-level features in the image information. Image recognition based on the local feature information can further improve the accuracy of image recognition, and the accuracy of face recognition is higher.
  • image recognition can also be performed based on global feature information corresponding to the target object, and specifically, the category of the target object and the confidence level corresponding to the target object can be determined based on the global feature information.
  • regression calculations can be performed on the region corresponding to the target object to obtain the target object category and the confidence level corresponding to the target object.
  • the fourth confidence level conf-level3-fake and the fifth confidence level conf-level3-real corresponding to the target object can be obtained, where the fourth confidence level conf-level3 -fake is the probability value that the category of the target object output by the second image recognition model is false, and the fifth confidence level conf-level3-real is the probability value that the category of the target object output by the second image recognition model is true.
  • the fourth confidence level conf-level3-fake when the fourth confidence level conf-level3-fake is greater than the fifth confidence level conf-level3-real, it means that the face in the image to be recognized is a fake face; when the fourth confidence level conf-level3-fake is less than or equal to When the fifth confidence level is conf-level3-real, it indicates that the face in the image to be recognized is a real face.
  • only one image recognition model can be used, or multiple images can be used.
  • a collection of recognition models can be used, feature extraction can be performed on the image to be recognized through each image recognition model in the collection, and the confidence levels corresponding to different types of target objects can be output based on the acquired feature information.
  • the confidence includes the confidence when the target object is true and the confidence when the target object is false; after obtaining the confidence of the output of each image recognition model, add the confidence when all the target objects are true to the final The confidence level when the target object is true. Similarly, the confidence level when all the target objects are false is added together to obtain the confidence level when the final target object is false.
  • the fuzzy degree information corresponding to the target object and according to the fuzzy degree
  • the information and the classification threshold determine the category of the target object; when the category of the target object determined according to the ambiguity information and the classification threshold is true, obtain the local feature information corresponding to the target object, and determine the category of the target object based on the local feature information; When the category of the target object is determined to be true based on the local feature information, the global feature information corresponding to the target object is obtained, and the category of the target object and the confidence level corresponding to the target object are determined based on the global feature information, so as to obtain and to be recognized according to the confidence level
  • the target information corresponding to the image When the category of the target object is determined to be true based on the local feature information, the global feature information corresponding to the target object is obtained, and the category of the target object and the confidence level corresponding to the target object are determined based on the global feature information, so as to obtain and to be recognized according to the confidence level
  • the target information corresponding to the image when the category of the target
  • the judgment is stopped, which can save time and resources.
  • the specific process of determining the category of the target object according to the ambiguity information and the classification threshold, determining the category of the target object based on the local feature information, and determining the category of the target object based on the global feature information is the same as the process in the foregoing embodiment. I won't repeat them here.
  • the target object in the image to be recognized is discriminated according to the fuzzy degree information, the local feature information, and the global feature information in sequence, until the category of the target object is determined to be false or the category of the target object is determined to be true based on all the feature information. , The judgment will stop and the judgment information will be output.
  • the category of the target object is determined to be false according to the ambiguity information, stop the judgment; if the category of the target object is determined to be true according to the ambiguity information, and the category of the target object is determined to be false according to the local feature information, stop the judgment; If the category of the target object is determined to be true according to the local feature information, and the category of the target object is determined to be false according to the global feature information, stop the judgment. And when it is determined that the target object in the image to be recognized is true according to the feature information from low-level to high-level, it can be determined that the target object in the image to be recognized is true, and the target information "real face, safe" is output.
  • step S230 the target information corresponding to the image to be recognized is acquired according to the category of the target object and the confidence level.
  • the target information is prompt information about whether there is a risk in the image to be recognized.
  • the target information may include whether it is a fake face and the risk Value.
  • the target information can also contain other information, such as whether to use the face of a public figure, and so on.
  • the risk value corresponding to the image to be recognized is determined according to the first confidence level, and the target information is obtained based on the category and risk value of the target object; or , When the category of the target object determined according to the local feature information is false, the risk value corresponding to the image to be recognized is determined according to the second confidence level, and the target information is obtained based on the category and risk value of the target object; or, when the global feature is used When the category of the target object determined by the information is false, the risk value corresponding to the image to be recognized is determined according to the fourth confidence level, and the target information is obtained based on the category and risk value of the target object.
  • the corresponding confidence level can be obtained as 0.7, according to The confidence level can determine that the risk value of the image to be recognized is 0.7, then the final target information is "fake face, risk value: 0.7".
  • the target object in the image to be recognized in addition to judging whether the target object in the image to be recognized is true or false, it can also be judged whether the target object in the image to be recognized is the same as a public figure, so as to protect the interests of the public figure and prevent criminals from passing through Fake videos made with images of public figures deceive the society or perform other acts that endanger society.
  • the category of the target object is determined to be false according to the ambiguity information, local feature information or global feature information
  • the target object in the image to be recognized can be matched with the object to be matched in the material library to obtain the matching result to determine the target object to be recognized Whether the face in the image uses the face information of public figures.
  • the target information can also be obtained according to the target object's category, target confidence, and the result of matching with the face of a public figure. Specifically, when the target object is determined according to the ambiguity information and the classification threshold, the target information can be obtained.
  • the target object is matched with the object to be matched in the material library to obtain the matching result to obtain the target information according to the target object’s category, target confidence and matching result; or, when the target is determined according to the local feature information
  • the target object is matched with the object to be matched in the material library to obtain the matching result to obtain the target information according to the target object’s category, target confidence and matching result; or, when it is determined according to the global feature information
  • the target object is matched with the object to be matched in the material library to obtain the matching result, so as to obtain the target information according to the target object category, target confidence, and matching result.
  • the material library contains the facial feature vectors of multiple public figures.
  • FIG. 4 is a schematic flowchart of the image recognition method provided by an embodiment of the present application.
  • step S401 the corresponding target object is calculated.
  • step S402 the distance is compared with a preset distance threshold; in step S403, when the distance is less than the preset distance threshold, it is determined that the target object is There is a matching relationship between the objects to be matched; in step S404, when the distance is greater than or equal to the preset distance threshold, it is determined that there is no matching relationship between the target object and the object to be matched.
  • the feature vector corresponding to the target object may be obtained by feature extraction on the first object map determined according to the coordinate frame information of the target object through a feature extraction network.
  • the feature extraction network may be a face network (FaceNet), Deep face (DeepFace), deep identity (DeepID), sphere face (SphereFace), arc face (ARCFace) and other networks, this embodiment of the application does not specifically limit this; the distance can be Euclidean distance, Mahalanobis distance, Manhattan distance, etc. Etc., the embodiments of the present application also do not specifically limit this. Taking Euclidean distance as an example, when calculating the distance between the feature vector corresponding to the face in the image to be recognized and the feature vector corresponding to the face of the public figure in the material library, it can be calculated according to formula (2), as follows:
  • dist(X, Y i ) is the distance between the feature vector corresponding to the face in the image to be recognized and the feature vector corresponding to the face of the public figure in the material library
  • X k is the feature corresponding to the face in the image to be recognized
  • Y ik is the component in the feature vector corresponding to the face of the public figure
  • k is the k-th component in the feature vector
  • N is the total number of components in the feature vector.
  • the feature vector corresponding to the face in the image to be recognized has the same dimension as the feature vector corresponding to the face of a public figure.
  • the feature vector of the face in the image to be recognized is a 512-dimensional feature.
  • Vector X [x 1 ,x 2 ,x 3 ,...x 512 ]
  • the preset distance threshold may be a distance value set according to actual conditions, and the preset distance threshold may be set to 0.3 or 0.35 in engineering.
  • the preset distance threshold When the distance between the feature vector corresponding to the face in the image to be recognized and the feature vector corresponding to the face of a public figure in the material library is less than the preset distance threshold, it means that the face in the image to be recognized has stolen the portrait of the public figure .
  • the sixth degree of confidence when there is a matching relationship between the target object and the object to be matched, it can be determined according to the distance between the feature vector corresponding to the face in the image to be recognized and the feature vector corresponding to the face of the public figure in the material library.
  • the sixth degree of confidence, the sixth degree of confidence can be obtained according to formula (3), as follows:
  • dist (X, Y i) is the distance between a human face image to be recognized and the feature vectors corresponding to the Library Public human faces corresponding eigenvectors.
  • the risk value of the image to be identified can be calculated according to the sixth confidence level and the target confidence level to obtain the target information, where the target confidence level is to determine the category of the target object according to the ambiguity information, the local feature information, or the global feature information.
  • the acquired confidence level corresponding to the target object may specifically be the first confidence level, the second confidence level, or the fourth confidence level in the foregoing embodiment. That is to say, the risk value is determined by the conf-fake confidence level determined based on the ambiguity information, local feature information or global feature information and the confidence level conf-celebrity determined based on the face comparison of public figures.
  • conf-fake conf-level1-fake
  • conf-fake conf -level2-fake
  • the target information can be determined according to the risk value.
  • the target information can include three aspects of information whether it is a fake face, whether to use a public figure's face, and the risk value.
  • FIG. 5 is a schematic diagram of an interface for providing target information of an image recognition method according to an embodiment of the present application. , As shown in Figure 5, the target information "fake face; public figure face not used; risk value: 0.65" is displayed above the face tag box of the image to be recognized, indicating that the face in the image to be recognized is a fake face.
  • the risk value it is possible to determine which level the risk belongs to. Specifically, multiple levels can be set, and each level corresponds to a different risk value interval. For example, three levels of high risk, medium risk and low risk can be set. When the risk value is (0.7 , 1] is determined as high risk, when the risk value is (0.3, 0.7], it is determined as medium risk, when the risk value is (0, 0.3], it is determined as low risk, of course, it can also be other numerical ranges. Finally, According to the risk level, the target information can be determined and output. The target information varies according to the risk level.
  • the target information is "fake face; public figure face is used; high risk”;
  • the target information is "fake face; no public figure face; low risk”; in addition, it may be in other forms, which are not specifically limited in the embodiment of the application.
  • the image recognition method in the embodiments of this application can be applied to various scenarios such as live body detection and sensitive video detection.
  • Live body detection is mainly used in transportation, finance, insurance and other fields. Face detection and recognition to confirm that the passenger information is correct and not dangerous.
  • Sensitive video detection is mainly applied to online videos. For example, a video appears on the Internet with an inappropriate remark made by a public figure in order to determine whether the video is To synthesize videos and protect the interests of public figures, it is necessary to detect and recognize the faces in them.
  • FIG. 6 is a schematic diagram of a flow of detecting and recognizing a face in an image recognition method provided by an embodiment of the present application. As shown in FIG.
  • step S601 the face in the image is detected to obtain face coordinate frame information;
  • step S602 the fuzzy degree analysis is performed on the face image determined according to the face coordinate frame information, and the fuzzy degree ratio is obtained; in step S603, it is judged whether the face in the image is based on the relationship between the fuzzy degree ratio and the classification threshold.
  • step S604 when it is determined to be a fake face, obtain the confidence of the fake face, and calculate the risk value according to the confidence of the fake face; in step S605, when it is determined to be a real face, The face image determined by the frame information extracts local feature information and classifies it according to the local feature information to obtain the confidence that the face is classified as a real face and the confidence that the face is classified as a fake face; in step S606, according to the real face Determine whether the human face is a fake face according to the confidence level of and the confidence level of the fake face; in step S607, when it is determined to be a fake face, calculate the risk value according to the confidence level of the fake face; in step S608, when it is determined to be a real face , Perform feature extraction on the image to be recognized to obtain global feature information, and perform regression on the face area according to the global feature information, and calculate the true and false of the face and the corresponding confidence; in step S609, according to the confidence and false of the true face The confidence of
  • the user sends the sensitive video to the back-end server through the terminal device.
  • the back-end server detects and recognizes the face in the video, and judges whether the face in the video is the face of a public figure, and finally
  • the target information determined according to the detection and recognition of the human face and the judgment result of the public figure is returned to the terminal device, and different operations can be performed on the sensitive video according to the target information.
  • the target information indicates that the risk level of the sensitive video is lower If the sensitive video is high, and the face of a public figure is used, the sensitive video can be removed from the shelves and the related path information can be completely deleted; when the target information indicates that the risk level of the sensitive video is no risk or low, and the face of the public figure is not used , You can leave it alone, or monitor, warn, etc. the user who uploaded the sensitive video.
  • FIG. 7 is a schematic diagram of the process of detecting and recognizing human faces in sensitive videos in the image recognition method provided by an embodiment of the present application.
  • the sensitive video is parsed to obtain video frames, and the video A frame containing a face in the frame is used as the image to be recognized;
  • the face in the image to be recognized is detected to obtain face coordinate frame information;
  • the person determined according to the face coordinate frame information Perform fuzzy degree analysis on the face image to obtain the fuzzy degree ratio;
  • step S704 determine whether the human face in the image is a fake face according to the relationship between the fuzzy degree ratio and the classification threshold;
  • step S705 when it is determined to be a fake face, Obtain the confidence of the fake face, and calculate the risk value according to the confidence of the fake face;
  • step S706 when it is determined to be a real face, extract the local feature information of the face image determined according to the face coordinate frame information, and according to the local The feature information is classified to obtain the confidence that the human face
  • the authenticity of the human body in the image and video can also be distinguished.
  • the method of distinguishing the authenticity of the human body is the same as the method of distinguishing the authenticity of the human face.
  • the discrimination method is similar, but the type of data processed is different. For example, the face coordinate frame information needs to be changed to the human body coordinate frame information.
  • the face coordinate frame information needs to be changed to the human body coordinate frame information.
  • the image recognition method in the embodiment of the application can detect and recognize the target object in the image to be recognized and the video to be recognized, determine its true or false, and output corresponding target information according to the determination result, so that the service user can respond according to the target information operating.
  • the image recognition method of the embodiment of the present application uses a machine learning model to detect the target object in the image during the image recognition process, and extracts local and global features, which improves the efficiency and accuracy of image recognition;
  • identifying the true and false of the target object it uses low-level features including pixel and texture levels, to high-level features including global semantic information, including three different stages of detection, which further improves the accuracy of image recognition. ;
  • it can judge whether the target object is a fake image using the portrait of the public mission, which protects the portrait rights of public figures and avoids the spread of false information.
  • FIG. 8 is a block diagram of an image recognition device provided by an embodiment of the present application.
  • the image recognition device 800 includes: a feature information acquisition module 801, a confidence degree acquisition module 802, and a target information acquisition module 803.
  • the feature information obtaining module 801 is configured to obtain feature information corresponding to the target object in the image to be recognized, where the feature information includes any one or more of ambiguity information, local feature information, and global feature information; confidence The degree acquisition module 802 is configured to determine the category of the target object based on the characteristic information and determine the confidence level corresponding to the target object; the target information acquisition module 803 is configured to determine the category of the target object and the The confidence level acquires target information corresponding to the image to be recognized.
  • the confidence degree obtaining module 802 includes: a first category determining unit configured to obtain ambiguity information corresponding to the target object, and determine the ambiguity information and a classification threshold according to the ambiguity information and the classification threshold.
  • the category of the target object is configured to, when the category of the target object is determined to be true according to the ambiguity information and the classification threshold, obtain local feature information corresponding to the target object, and based on The local feature information determines the category of the target object;
  • a third category determining unit is configured to obtain global feature information corresponding to the target object when the category of the target object is determined to be true according to the local feature information , And determine the category of the target object and the confidence level corresponding to the target object based on the global feature information, so as to obtain the target information corresponding to the image to be recognized according to the category of the target object and the confidence level .
  • the first category determining unit includes: a first object map acquiring unit configured to detect a target object in the image to be recognized to acquire a first object map corresponding to the target object A second object map acquisition unit, configured to adjust the coordinates of the first object map to acquire a second object map; a ambiguity calculation unit, configured to separately compare the first object map and the second object map The graph performs ambiguity calculation to obtain ambiguity information corresponding to the target object.
  • the ambiguity information includes a first ambiguity corresponding to the first object image and a second ambiguity corresponding to the second object image; the first category determining unit is configured to: Divide the second ambiguity and the first ambiguity to obtain the ambiguity ratio; when the ambiguity ratio is less than or equal to the classification threshold, it is determined that the category of the target object is true; When the ambiguity ratio is greater than the classification threshold, it is determined that the category of the target object is false.
  • the confidence level acquisition module 802 is configured to: when the category of the target object is determined to be false according to the ambiguity information and the classification threshold, according to the ambiguity ratio and The classification threshold determines a first confidence level corresponding to the target object.
  • the second category determination unit includes: a first feature extraction unit configured to input the first object map into a first image recognition model, and compare the first image recognition model to the first image recognition model.
  • An object graph performs feature extraction to obtain the local feature information;
  • the classification unit is configured to classify the target object according to the local feature information to obtain the target object category.
  • the classification unit is configured to obtain a second confidence level and a third confidence level corresponding to the target object based on the local feature information; When the third confidence level is used, it is determined that the category of the target object is false; when the second confidence level is less than or equal to the third confidence level, it is determined that the category of the target object is true.
  • the confidence level acquisition module 802 includes a second feature extraction unit configured to input the image to be recognized into a second image recognition model, and use the second image recognition model to identify the image to be recognized.
  • the image performs feature extraction to obtain the global feature information;
  • a regression unit is configured to perform regression calculations on the region corresponding to the target object according to the global feature information to obtain the category of the target object and the comparison with the target object Corresponding confidence level.
  • the regression unit is configured to perform regression calculation on the region corresponding to the target object according to the global feature information to obtain the fourth confidence level and the fifth confidence level corresponding to the target object; When the fourth degree of confidence is greater than the fifth degree of confidence, it is determined that the category of the target object is false; when the fourth degree of confidence is less than or equal to the fifth degree of confidence, it is determined that the target object’s The category is true.
  • the target information acquiring module 803 is configured to: when the category of the target object determined according to the ambiguity information and the classification threshold is false, according to the first determined based on the ambiguity information A confidence level determines the risk value corresponding to the image to be recognized, and obtains the target information based on the type of the target object and the risk value; or, when the target object is determined according to the local feature information When the category is false, determine the risk value corresponding to the image to be recognized according to the second confidence level determined based on the local feature information, and obtain the target information based on the category of the target object and the risk value; or When the category of the target object determined according to the global feature information is false, the risk value corresponding to the image to be recognized is determined according to the fourth confidence level determined based on the global feature information, and based on the target The target information is acquired by the category of the object and the risk value.
  • the target information acquisition module 803 is further configured to: when the category of the target object determined according to the ambiguity information and the classification threshold is false, combine the target object with the material library The object to be matched is matched to obtain a matching result to obtain the target information according to the target object category, target confidence, and matching result; or, when the target object category determined according to the local feature information is When false, the target object is matched with the object to be matched in the material library to obtain a matching result, so as to obtain the target information according to the target object category, target confidence, and matching result; or, when When the category of the target object determined by the global feature information is false, the target object is matched with the object to be matched in the material library to obtain a matching result, which is based on the category and target confidence of the target object The degree and the matching result obtain the target information.
  • the matching the target object with the object to be matched in the material library specifically includes: calculating the distance between the feature vector corresponding to the target object and the feature vector corresponding to the object to be matched; When the distance is less than the preset distance threshold, it is determined that there is a matching relationship between the target object and the object to be matched.
  • the obtaining the target information according to the target object category, target confidence, and matching result is specifically: when there is a matching relationship between the target object and the object to be matched, according to the The distance determines a sixth confidence level; when it is determined that the category of the target object is false according to the ambiguity information, the local feature information, or the global feature information, the confidence level corresponding to the target object is obtained as the The target confidence; determine the risk value corresponding to the image to be recognized according to the sixth confidence and the target confidence, and obtain the risk value based on the type of the target object, the matching result, and the risk value The target information.
  • Fig. 9 is a schematic structural diagram of a computer system of an electronic device provided by an embodiment of the present application.
  • the computer system 900 includes a central processing unit (Central Processing Unit, CPU) 901, which can be loaded into a random storage device according to a program stored in a read-only memory (Read-Only Memory, ROM) 902 or from a storage part 908.
  • the program in the Random Access Memory (RAM) 903 is accessed to execute various appropriate actions and processing to implement the image labeling method described in the foregoing embodiment.
  • RAM 903 various programs and data required for system operation are also stored.
  • the CPU 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
  • An input/output (Input/Output, I/O) interface 905 is also connected to the bus 904.
  • the following components are connected to the I/O interface 905: the input part 906 including a keyboard, a mouse, etc.; including an output part 907 such as a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (LCD), and speakers, etc.
  • a storage part 908 including a hard disk, etc.; and a communication part 909 including a network interface card such as a LAN (Local Area Network) card and a modem.
  • the communication section 909 performs communication processing via a network such as the Internet.
  • the drive 910 is also connected to the I/O interface 905 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 910 as needed, so that the computer program read therefrom is installed into the storage portion 908 as needed.
  • the process described below with reference to the flowchart may be implemented as a computer software program.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 909, and/or installed from the removable medium 911.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiment of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the embodiments of the present application also provide a computer-readable medium.
  • the computer-readable medium may be included in the image processing apparatus described in the above-mentioned embodiments; or it may exist alone without being incorporated into The electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, the electronic device realizes the method described in the above-mentioned embodiment.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a non-volatile storage medium which can be a CD-ROM, U disk, mobile hard disk, etc.
  • Including several instructions to make a computing device which can be a personal computer, a server, a touch terminal, or a network device, etc.

Abstract

一种图像识别方法及装置、电子设备及计算机可读存储介质,涉及人工智能领域。该方法包括:获取与待识别图像中的目标对象对应的特征信息,其中所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个(S210);基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度(S220);根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息(S230)。

Description

图像识别方法、装置、计算机可读存储介质及电子设备
相关申请的交叉引用
本申请基于申请号为202010017583.4、申请日为2020年01月08日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种图像识别方法、图像识别装置、计算机可读存储介质及电子设备。
背景技术
随着计算机技术的变革和快速发展,人们的生活方式有了很大的改变。随着各种智能应用的产生,例如美图相机、换装软件、换脸软件等等,人们可以通过这些智能应用进行娱乐。
以换脸为例,由于不法分子会通过这些智能应用制作假视频,以此欺骗大众,更甚者会采用假视频进行敲诈勒索、传播黄暴恐、扰乱政治等,导致人们对换脸技术带来的安全风险和隐私危机非常担忧,因此需要对视频中的真假人脸进行精准甄别。但是相关技术中的辨别方法无法实现精准检测和甄别。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本申请的背景的理解,因此可以包括不构成对本领域普通技术人员启示的现有技术的信息。
发明内容
本申请的实施例提供了一种图像识别方法、装置、计算机可读存储介质和电子设备,进而至少在一定程度上可以提高图像识别效率和精准度。
本申请实施例提供了一种图像识别方法,包括:
获取与待识别图像中的目标对象对应的特征信息;
其中,所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个;
基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度;
根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息。
本申请实施例提供了一种图像识别装置,包括:
特征信息获取模块,配置为获取与待识别图像中的目标对象对应的特征信息;其中,所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个;
置信度获取模块,配置为基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度;
目标信息获取模块,配置为根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息。
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本申请实施例所述的图像识别方法。
本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,所述存储装置用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如上述实施例中所述的图像识别方法。
本申请实施例具有以下有益效果:
本申请实施例提供的图像识别方法基于至少一个维度的特征信息确定目标对象对应的类别以及与目标对象对应的置信度;最后根据目标对象对应的类别和置信度获取与待识别图像对应的目标信息,能够提高图像识别的效率和精准度,便于对图像或视频中的人脸、人体等目标对象进行真假甄别。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本 申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1是本申请实施例提供的图像识别系统的应用场景示意图;
图2是本申请实施例提供的图像识别方法的流程示意图;
图3是本申请实施例提供的图像识别方法的流程示意图;
图4是本申请实施例提供图像识别方法的流程示意图;
图5是本申请实施例提供图像识别方法的目标信息的界面示意图;
图6是本申请实施例提供图像识别方法的对人脸进行检测和识别的流程示意图;
图7是本申请实施例提供图像识别方法的对敏感视频中的人脸进行检测和识别的流程示意图;
图8是本申请实施例提供图像识别装置的框图;
图9是本申请实施例提供的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装 置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
图1是本申请实施例提供的图像识别系统的应用场景示意图。
如图1所示,系统架构100可以包括终端设备101、网络102和服务器103。网络102用以在终端设备101和服务器103之间提供通信链路的介质。网络102可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实际需要,可以具有任意数目的终端设备、网络和服务器。比如服务器103可以是多个服务器组成的服务器集群等。终端设备101可以是诸如平板电脑、台式计算机、智能手机等设备。
在一些实施例中,终端设备101可以获取用户上传的拍摄视频作为待识别视频,将待识别视频中包括人脸的视频帧作为待识别图像,然后终端设备101可以通过网络102向服务器103发送该待识别图像。当服务器103获取待识别图像后,可以对待识别图像中的人脸(目标对象)的真假进行甄别,也就是确定该目标对象是否为后期通过图像处理技术添加上去的,而不是待识别图像中原始包含的对象。当确定出待识别图像中的人脸是通过图像处理技术进行换脸处理得到的,则继续确定待识别图像的风险值,若风险值处于特定区间,则向终端设备101返回拒绝上传的提示信息。
在一些实施例中,在推荐系统场景中服务器103可以对召回阶段得到的待识别图像进行检测,对待识别图像中的人脸(目标对象)的真假进行甄别,也就是确定该目标对象是否为后期通过图像处理技术添加上去的,而不是待识别图像中原始包含的对象。当确定出待识别图像中的人脸是通过图像处理技术进行换脸处理得到的,则继续确定待识别图像的风险值,若风险值处于特定区间,则服务器103将待识别图像从召回阶段使用的数据库移除,以防止将这些待识别图像推荐至用户。
在一些实施例中,服务器103可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端设备101可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本发明实施例中不做限制。
在本申请的一个实施例中,终端设备101可以获取待识别视频,通过对待识别视频进行解析,以获取包含目标对象的图像,并将该包含目标对象的图像作为待识别图像,也可以直接获取包含目标对象的待识别图像,然后终端设备101可以通过网络102向服务器103发送该待识别图像。当服务器103获取待识别图像后,可以对待识别图像中的目标对象进行特征提取,以获取与目标对象对应的特征信息,该特征信息包括模糊度信息、局部特征信息和全局特征信息中的任意一个或多个。接着基于该特征信息确定目标对象的类别,并确定与目标对象对应的置信度,根据目标对象的类别和置信度可以对待识别图像中的目标对象的真假进行甄别,也就是确定该目标对象是否为后期通过图像处理技术添加上去的,而不是待识别图像中原始包含的对象。最后根据获取的置信度可以对待识别图像的风险值进行计算,并基于该风险值获取目标信息,该目标信息具体可以包括目标对象是否为假、是否使用公众人物的肖像、风险值等信息,根据任务的不同,目标信息所包含的信息也不同。若确定待识别图像或待识别视频存在风险,则需要对其进行下架等操作。本申请实施例的技术方案能够对待识别图像或待识别视频进行多方位、多层次的分析,极大程度地提升了图像识别的效率和精准度。
需要说明的是,本申请实施例提供的图像识别方法可由服务器执行,相应地,图像识别装置可设置于服务器中。但是,在一些实施例中,也可以由终端设备执行本申请实施例所提供的图像识别方法。
在本领域的相关技术中,以对图像中人脸的真假进行检测识别为例, 首先可以对图像进行人脸检测,然后根据假脸模型对人脸图像进行检测,该假脸模型是一种基于特征向量的分类器。该方法主要针对早期人脸核验场景中的假脸,比如照片中的静态人像,与通过人工智能技术生成的人脸差异较大,通过人工智能技术生成的假脸视频变化多端,特征差别较大,质量很不稳定,因此,仅适用单一的方法对图像、视频进行真伪鉴别会带来误检率高、漏检率高的问题,相关技术中对真假人脸的辨别方法简单,不能覆盖多种换脸算法生成的假脸图像或视频,因此很难实现对多种人工智能换脸技术生成的假脸进行准确检测。
鉴于相关技术中存在的问题,本申请实施例提供了一种图像识别方法,该图像识别方法是基于机器学习实现的,机器学习属于人工智能的一种,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与 地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的图像识别技术,具体通过如下实施例进行说明:
本申请实施例提出了一种图像识别方法,以下以对待识别图像中的人脸进行识别为例,对本申请实施例的技术方案的实现细节进行详细阐述。
在一些实施例中,可以通过本申请实施例提供的图像识别方法对待识别图像中的人脸进行识别,对待识别图像中的动物脸进行识别,还可以对待识别图像中的物品对象进行识别。
图2是本申请实施例提供的图像识别方法的流程示意图,该图像识别方法可以由服务器来执行,该服务器可以是图1中所示的服务器103。参照图2所示,该图像识别方法至少包括步骤S210至步骤S230,详细介绍如下:
在步骤S210中,获取与待识别图像中的目标对象对应的特征信息,其中所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个。
在一些实施例中,待识别图像为需要对其中的人脸进行甄别的图像,其中的人脸即为目标对象,待识别图像可以是终端设备本地存储的包含人 脸的图像,也可以是从网络上获取的包含人脸的图像,待识别图像还可以是对待识别视频进行处理获取的,具体地,可以通过视频编解码技术对待识别视频进行解码,例如可以采用运动图像专家组(Moving Picture Experts Group,MPEG)制定的视频和音频有损压缩标准:MPEG-2、MPEG-4、H.264等方法,将待识别视频解析为图像帧,并将其中包含人脸的图像帧作为待识别图像,也可以从所有包含人脸的图像帧中选择一帧作为待识别图像。
在一些实施例中,可以从待识别图像提取与其对应的低阶特征和高阶特征,其中低阶特征可以是图像的像素、纹理级别的信息,高阶特征可以是图像局部或全局的特征信息,在本申请的实施例中,与待识别图像对应的特征信息由低到高分别为模糊度信息、局部特征信息和全局特征信息。根据该三个特征信息中的任意一个可以确定待识别图像的类别,也可以根据三个特征信息中的多个确定待识别图像的类别,该待识别图像的类别即为待识别图像中人脸的真假。
由于相关技术中的换脸技术首先是生成与图像或视频中目标任务的表情、动作相似的人脸区域,再利用图像融合、人脸融合技术将生成的假脸嵌入到目标人脸中。然而,这些方法的局限性在于,生成的人脸区域与原始视频的分辨率并不一致,使得合成视频中人脸区域在图像低阶特征层面与原始真实图像、视频存在较大差异,如:颜色、边缘轮廓、纹理、光照强度、清晰度等。合成的视频通常具有边缘模糊,不同区域的纹理缺少变化,分辨率较低等特点,因此在本申请的实施例中,可以首先对视频中人脸区域和人体区域的模糊度进行分析,包括二者在纹理、边缘轮廓和清晰度等方面的特征差异,并利用分类阈值,对图像、视频中的人脸进行真假判断,由于模糊度信息是图像的低阶特征,易获取,因此若通过模糊度信息便检测出人脸的真假,那么就不需要根据图像的高阶特征进行人脸识别,进而能够提高识别效率,节省时间和资源。
在一些实施例中,图3是本申请实施例提供的图像识别方法的流程示意图,如图3所示,该流程至少包括步骤S301-S303,具体地:
在步骤S301中,对待识别图像中的目标对象进行检测,以获取与目 标对象对应的第一对象图;
在一些实施例中,在获取待识别图像的模糊度信息之前,需要对待识别图像中的人脸区域进行检测,以获取与人脸区域对应的第一对象图。具体地,可以基于人脸检测算法对该待识别图像中的人脸进行检测,以获取人脸坐标框信息,根据该人脸坐标框信息可以进一步确定人脸区域,并获取仅包含人脸区域信息的第一对象图。在对待识别图像中的人脸进行检测时,可以采用基于特征金字塔网络架构的模型进行检测,通过该模型对待识别图像中的人脸进行检测,以获取人脸坐标框信息,该人脸坐标框信息具体包括人脸区域左上角起始点的坐标(x,y),同时还包括人脸区域边界框的宽度width和高度height,也就是说,人脸坐标框信息为(x,y,width,height),该第一对象图中仅包含人脸信息,不包含待识别图像中的背景信息。
在一些实施例中,还可以通过其它人脸检测算法对待识别图像中的人脸进行检测,以获取人脸坐标框信息,例如可以通过级联卷积神经网络(Cascade Convolutional Neural Networks,Cascade CNN)、DenseBox、人脸检测网络(Faceness-Net)、脸部区域卷积神经网络(Face Region-Convolutional Neural Networks,Face R-CNN)、多任务卷积神经网络(Multi-task Convolutional Neural Networks,MTCNN)等人脸检测算法,本申请实施例对此不作具体限定。
在步骤S302中,对第一对象图的坐标进行调整,以获取第二对象图。
在本申请的一个实施例中,由于需要对人脸区域和人体区域的模糊度进行分析,因此还需要获取与人体区域对应的第二对象图,但是考虑到待识别图像中的背景多且杂,若直接将待识别图像作为第二对象图的话,会因存在太多噪音影响最终判断结果的精准度,因此在本申请的实施例中,可以对第一对象图的坐标进行扩大,以获取包含部分背景的第二对象图,具体地,可以将第一对象图的宽度width和高度height都扩大1/4倍,当然也可以扩大其它倍数,本申请实施例对此不作具体限定。
在步骤S303中,分别对第一对象图和第二对象图进行模糊度计算,以获取与目标对象对应的模糊度信息。
在一些实施例中,在获取第一对象图和第二对象图后,可以利用拉普拉斯算子计算分别与第一对象图对应的第一模糊度S face和与第二对象图对应的第二模糊度S bg,在计算模糊度时,首先可以对第一对象图、第二对象图进行高斯滤波,去除噪音;接着将滤波处理后的第一对象图和第二对象图转换为灰度图;然后采用拉普拉斯算子分别与第一对象图和第二对象图对应的灰度图进行卷积;最后根据卷积后得到的特征信息计算方差,即可得到第一模糊度S face和第二模糊度S bg,该第一模糊度S face和该第二模糊度S bg即为与目标对象人脸对应的模糊度信息。
在一些实施例中,还可以根据与目标对象对应的局部特征信息进行图像识别,首先可以获取与目标对象对应的局部特征信息,然后基于局部特征信息确定目标对象的类别。在提取与目标对象对应的局部特征信息时,可以将第一对象图输入至第一图像识别模型,通过第一图像识别模型对第一对象图进行特征提取,以获取局部特征信息,并根据该局部特征信息对目标对象进行分类处理,得到目标对像的类别。其中,该第一图像识别模型可以是用于图像特征提取的各种网络模型,例如可以是ResNet、Inception、SqueezeNet、DenseNet等卷积神经网络。以SqueezeNet网络为例,在将第一对象图输入至SqueezeNet网络之前,可以将第一对象图缩放至224×224像素,以与SqueezeNet网络的输入匹配;然后将缩放后的第一对象图输入至SqueezeNet网络,通过SqueezeNet网络对缩放后的第一对象图进行特征提取,获取局部特征信息,即人脸特征信息;最后根据人脸特征信息对人脸的真假进行甄别。
在一些实施例中,还可以根据与目标对象对应的全局特征信息进行图像识别,首先可以对待识别图像进行特征提取以获取与目标对象对应的全局特征信息,然后基于全局特征信息确定目标对象的类别及与目标对象对应的置信度。在提取全局特征信息时,可以将待识别图像输入至第二图像识别模型,通过第二图像识别模型对待识别图像进行特征提取,以获取全局特征信息,该第二图像识别模型也可以是用于图像特征提取的各种网络模型,如快速区域卷积神经网络(Faster Region-Convolutional Neural Networks,Faster R-CNN)、快速区域卷积神经网络(Faster  Region-Convolutional Neural Networks,Fast R-CNN)、掩膜区域卷积神经网络(Mask Region-Convolutional Neural Networks,Mask R-CNN)、YOLO、YOLOv2、YOLOv3等等。
在步骤S220中,基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度。
在一些实施例中,获取第一模糊度S face和第二模糊度S bg后,可以将第二模糊度与第一模糊度相除,以获取模糊度比值p,p=S bg/S face;接着将该模糊度比值与分类阈值进行比较,当模糊度比值小于或等于分类阈值时,确定目标对象的类别为真,即待识别图像中的人脸为真脸;当模糊度比值大于分类阈值时,确定目标对象的类别为假,即待识别图像中的人脸为假脸。其中,分类阈值是根据多个视频中所有图像帧中的人脸识别结果得到的自适应的分类阈值。
在一些实施例中,若图像中的人脸图像是直接对人脸采集得到的人脸图像,则该人脸图像中的人脸为真脸,若图像中的人脸图像是通过人工智能技术对真脸进行变化得到的,则该人脸图像中的人脸为假脸,例如,通过基于人工智能的换脸技术将图像中的李某的人脸图像换成了张某的人脸图像,其中,李某的人脸图像中的人脸为真脸,张某的人脸图像中的人脸为假脸。
在一些实施例中,当根据模糊度信息确定目标对象的类别为假时,可以根据模糊度比值和分类阈值确定其为假的第一置信度,计算公式如式(1)所示,具体为:
conf-level1-fake=(p-p boarder)/p×0.5+0.5    (1)
其中,conf-level1-fake为第一置信度,p为模糊度比值,p boarder为分类阈值。
在一些实施例中,可以根据与目标对象对应的局部特征信息确定目标对象的类别。在将第一对象图输入至第一图像识别模型,并通过第一图像识别模型对第一对象图进行特征提取后,可以获取待识别图像的局部特征信息,该局部特征信息即为人脸特征信息。基于人脸特征信息能够获取与人脸对应的第二置信度conf-level2-fake和第三置信度conf-level2-real,其中第二置信度conf-level2-fake是第一图像识别模型输出的人脸的类别为 假的概率值,第三置信度conf-level2-real是第一图像识别模型输出的人脸的类别为真的概率值。当第二置信度conf-level2-fake大于第三置信度conf-level2-real时,确定待识别图像中的人脸为假脸;当第二置信度conf-level2-fake小于或等于第三置信度conf-level2-real时,确定待识别图像中的人脸为真脸。局部特征信息相较于模糊度信息更精确,属于图像信息中的高阶特征,根据局部特征信息进行图像识别,能够进一步提高图像识别的精准度,人脸甄别的准确度更高。
在一些实施例中,还可以根据与目标对象对应的全局特征信息进行图像识别,具体地可以基于全局特征信息确定目标对象的类别及与目标对象对应的置信度。在通过第二图像识别模型对待识别图像进行特征提取,获取全局特征信息后,可以对目标对象对应的区域进行回归计算,以获取目标对象的类别和与目标对象对应的置信度。通过根据全局特征信息对目标对象对应的区域进行回归计算,可以获取与目标对象对应的第四置信度conf-level3-fake和第五置信度conf-level3-real,其中第四置信度conf-level3-fake为第二图像识别模型输出的目标对象的类别为假的概率值,第五置信度conf-level3-real为第二图像识别模型输出的目标对象的类别为真的概率值。同样地,当第四置信度conf-level3-fake大于第五置信度conf-level3-real时,说明待识别图像中的人脸为假脸;当第四置信度conf-level3-fake小于或等于第五置信度conf-level3-real时,说明待识别图像中的人脸为真脸。通过对全局特征信息进行提取,能够充分考虑到人脸与背景之间的上下文关系,利用人脸区域与背景之间的差异,实现对真假人脸的甄别,提高了人脸甄别的精准度。
在一些实施例中,在提取待识别图像中的局部特征信息或全局特征信息,以获取与不同类别的目标对象对应的置信度时,可以仅采用一种图像识别模型,也可以采用多种图像识别模型的集合,当采用多种图像识别模型的集合时,可以通过集合中的各个图像识别模型对待识别图像进行特征提取,并基于获取的特征信息输出与不同类别的目标对象对应的置信度,该置信度包含目标对象为真时的置信度和目标对象为假时的置信度;在获取了各个图像识别模型输出的置信度后,将所有目标对象为真时的置信度相加即为最终的目标对象为真时的置信度,同样地,将所有目标对象为假 时的置信度相加即为最终的目标对象为假时的置信度。
在一些实施例中,为了提高图像识别的效率和精准度,可以根据与待识别图像对应的由低阶到高阶的特征信息进行真假人脸的甄别,在进行真假人脸的甄别时,需要根据前一个特征信息对应的识别结果确定是否获取下一个特征信息,并根据下一个特征信息进行真假人脸的甄别,具体地,首先获取与目标对象对应的模糊度信息,根据模糊度信息和分类阈值确定目标对象的类别;当根据模糊度信息和分类阈值确定的目标对象的类别为真时,获取与目标对象对应的局部特征信息,并基于局部特征信息确定目标对象的类别;当基于局部特征信息确定目标对象的类别为真时,获取与目标对象对应的全局特征信息,并基于全局特征信息确定目标对象的类别及与目标对象对应的置信度,以根据置信度获取与待识别图像对应的目标信息。也就是说,根据某一阶的特征信息确定待识别图像中的目标对象的类别为假时,即停止判断,这样可以节省时间和资源。其中,根据模糊度信息和分类阈值确定目标对象的类别、基于局部特征信息确定目标对象的类别、基于全局特征信息确定目标对象的类别的具体流程与上述实施例中的流程相同,本申请实施例在此不再赘述。
在一些实施例中,在对待识别图像中目标对象依次按照模糊度信息、局部特征信息和全局特征信息进行甄别时,直至确定目标对象的类别为假或根据所有特征信息确定目标对象的类别为真时,则停止判断并输出判定信息。具体地,如果根据模糊度信息确定目标对象的类别为假时,停止判断;如果根据模糊度信息确定目标对象的类别为真,而根据局部特征信息确定目标对象的类别为假时,停止判断;如果根据局部特征信息确定目标对象的类别为真,而根据全局特征信息确定目标对象的类别为假时,停止判断。并且当根据从低阶到高阶的特征信息都确定待识别图像中的目标对象为真时,即可确定待识别图像中的目标对象为真,并输出目标信息“真脸,安全”。
在步骤S230中,根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息。
在一些实施例中,在根据与待识别图像对应的由低阶到高阶的特征信息对目标对象的真假进行甄别时,当根据某一阶特征信息确定目标对象的 类别为假时,获取对应的置信度,并根据该置信度对待识别图像的风险值进行计算,以获取目标信息,该目标信息为待识别图像中是否存在风险的提示信息,例如目标信息可以包括是否为假脸和风险值,当然根据任务的不同,目标信息还可以包含其它信息,比如是否使用公众人物人脸,等等。具体地,当根据模糊度信息和分类阈值确定的目标对象的类别为假时,根据第一置信度确定与待识别图像对应的风险值,并基于目标对象的类别和风险值获取目标信息;或者,当根据局部特征信息确定的目标对象的类别为假时,根据第二置信度确定与待识别图像对应的风险值,并基于目标对象的类别和风险值获取目标信息;或者,当根据全局特征信息确定的目标对象的类别为假时,根据第四置信度确定与待识别图像对应的风险值,并基于目标对象的类别和风险值获取目标信息。例如,根据模糊度信息和局部特征信息确定待识别图像中的人脸为真脸,且根据全局特征信息确定待识别图像中的人脸为假脸时,可以获取对应的置信度为0.7,根据该置信度可以确定待识别图像的风险值为0.7,那么最终的目标信息即为“假脸,风险值:0.7”。
在一些实施例中,除了对待识别图像中的目标对象的真假进行判断,还可以对待识别图像中的目标对象是否与公众人物相同进行判断,以保护公众人物的利益,并防止不法分子通过以公众人物的形象制作的假视频欺骗社会或做出其它危害社会的行为。在根据模糊度信息、局部特征信息或全局特征信息确定目标对象的类别为假时,可以将待识别图像中的目标对象与素材库中的待匹配对象进行匹配,得到匹配结果,以判断待识别图像中的人脸是否使用公众人物的人脸信息。
在本申请的一个实施例中,还可以根据目标对象的类别、目标置信度和与公众人物的人脸的匹配结果获取目标信息,具体地,当根据模糊度信息和分类阈值确定的目标对象的类别为假时,将目标对象与素材库中的待匹配对象进行匹配,得到匹配结果,以根据目标对象的类别、目标置信度和匹配结果获取目标信息;或者,当根据局部特征信息确定的目标对象的类别为假时,将目标对象与素材库中的待匹配对象进行匹配,得到匹配结果,以根据目标对象的类别、目标置信度和匹配结果获取目标信息;或者,当根据全局特征信息确定的目标对象的类别为假时,将目标对象与素材库 中的待匹配对象进行匹配,得到匹配结果,以根据目标对象的类别、目标置信度和匹配结果获取目标信息。
在一些实施例中,素材库中包含多个公众人物的人脸特征向量,图4是本申请实施例提供图像识别方法的流程示意图,如图4所示,在步骤S401中,计算目标对象对应的特征向量与待匹配对象对应的特征向量之间的距离;在步骤S402中,将该距离与预设距离阈值进行比较;在步骤S403中,当距离小于预设距离阈值时,确定目标对象与待匹配对象之间具有匹配关系;在步骤S404中,当距离大于或等于预设距离阈值时,确定目标对象与待匹配对象之间不具有匹配关系。
其中,在步骤S401中,目标对象对应的特征向量可以通过特征提取网络对根据目标对象坐标框信息确定的第一对象图进行特征提取获取,具体地,特征提取网络可以是面部网络(FaceNet)、深度面部(DeepFace)、深度身份(DeepID)、球体面部(SphereFace)、弧度面部(ARCFace)等网络,本申请实施例对此不作具体限定;距离可以是欧氏距离、马氏距离、曼哈顿距离等等,本申请实施例同样对此不做具体限定。以欧式距离为例,在计算待识别图像中人脸对应的特征向量与素材库中公众人物的人脸对应的特征向量之间的距离时,可以根据公式(2)计算,具体如下:
Figure PCTCN2020123903-appb-000001
其中,dist(X,Y i)为待识别图像中人脸对应的特征向量与素材库中公众人物的人脸对应的特征向量之间的距离,X k为待识别图像中人脸对应的特征向量中的分量,Y ik为公众人物的人脸对应的特征向量中的分量,k为特征向量中的第k个分量,N为特征向量中分量的总数。
在一些实施例中,待识别图像中人脸对应的特征向量与公众人物人脸对应的特征向量的维度相同,以FaceNet的输出为例,待识别图像中的人脸特征向量为512维的特征向量X=[x 1,x 2,x 3,…x 512],相应地,公众人物的人脸特征向量也是512维的特征向量Y i=[y i1,y i2,y i3,…y i512]。
在一些实施例中,在步骤S402中,预设距离阈值可以是根据实际情况设定的距离值,在工程中可以将预设距离阈值设定为0.3或0.35。当待识别图像中人脸对应的特征向量与素材库中一公众人物的人脸对应的特 征向量的距离小于该预设距离阈值时,说明待识别图像中的人脸盗用了该公众人物的肖像。
在一些实施例中,在目标对象与待匹配对象之间具有匹配关系时,可以根据待识别图像中人脸对应的特征向量与素材库中公众人物的人脸对应的特征向量之间的距离确定第六置信度,第六置信度可以根据公式(3)获取,具体如下所示:
Figure PCTCN2020123903-appb-000002
其中,conf-celebrity为第六置信度,dist(X,Y i)为待识别图像中人脸对应的特征向量与素材库中公众人物的人脸对应的特征向量之间的距离。
进一步地,可以根据第六置信度和目标置信度对待识别图像的风险值进行计算,以获取目标信息,其中目标置信度为根据模糊度信息、局部特征信息或全局特征信息确定目标对象的类别为假时,所获取的与目标对象对应的置信度,该置信度具体可以是上述实施例中的第一置信度、第二置信度或第四置信度。也就是说,风险值是由根据模糊度信息、局部特征信息或全局特征信息确定的假脸置信度conf-fake和根据公众人物的人脸比对确定的置信度conf-celebrity共同决定的,当根据模糊度信息确定待识别图像中的人脸为假脸时,conf-fake=conf-level1-fake;当根据局部特征信息确定待识别图像中的人脸为假脸时,conf-fake=conf-level2-fake;当根据全局特征信息确定待识别图像中的人脸为假脸时,conf-fake=conf-level3-fake。在计算风险值时,可以将目标置信度与第六置信度求和,具体为:风险值conf-danger=conf-fake+conf-celebrity,conf-danger值越大,说明风险越大。最后,根据风险值可以确定目标信息,该目标信息可以包括是否为假脸、是否使用公众人物人脸、风险值三方面信息,图5是本申请实施例提供图像识别方法的目标信息的界面示意图,如图5所示,在待识别图像的人脸标注框上方显示有目标信息“假脸;未使用公众人物人脸;风险值:0.65”,表明待识别图像中的人脸为假脸。
进一步地,根据风险值可以确定风险属于哪个等级,具体可以设置多个等级,每个等级对应不同的风险值区间,例如设置高风险、中等风险和 低风险三个等级,当风险值在(0.7,1]时确定为高风险,当风险值在(0.3,0.7]时确定为中等风险,当风险值在(0,0.3]时确定为低风险,当然也可以是其它的数值区间。最后,根据风险等级可以确定目标信息并输出,该目标信息根据风险等级的不同而不同,例如,当风险等级为高风险时,目标信息为“假脸;使用了公众人物人脸;高风险”;当风险等级为低风险时,目标信息为“假脸;未使用公众人物人脸;低风险”;除此之外,还可以是其它形式,本申请实施例对此不做具体限定。
本申请实施例中的图像识别方法可以应用于活体检测、敏感视频检测等多种场景,活体检测主要应用于交通、金融、保险等领域,例如在乘坐火车、飞机等交通工具时,需要对乘客的人脸进行检测识别,确认乘客信息正确且不存在危险性,敏感视频检测主要应用于网络视频等,例如网络上出现一段视频,其内容为一公众人物发表不当言论,为了判断该视频是否为合成视频,并保障公众人物利益,就需要对其中的人脸进行检测识别。
以活体检测为例,终端设备对用户的头部或上半身进行拍摄,然后将拍摄得到的图像或视频上传至后台服务器,通过后台服务器对图像或视频中的人脸进行检测和识别,并将对人脸进行检测和识别得到的目标信息返回至终端设备,当目标信息提示人脸为假脸时,则进行警告,当目标信息提示人脸为真脸时,则继续后续的活体验证环节,进一步判断活体的合法性。图6是本申请实施例提供图像识别方法的对人脸进行检测和识别的流程示意图,如图6所示,在步骤S601中,对图像中的人脸进行检测,获取人脸坐标框信息;在步骤S602中,对根据人脸坐标框信息确定的人脸图像进行模糊度分析,获取模糊度比值;在步骤S603中,根据模糊度比值和分类阈值的大小关系判断图像中的人脸是否为假脸;在步骤S604中,当判定为假脸时,获取假脸的置信度,并根据假脸的置信度计算风险值;在步骤S605中,当判定为真脸时,对根据人脸坐标框信息确定的人脸图像提取局部特征信息,并根据局部特征信息进行分类,以获取人脸分为真脸的置信度和人脸分为假脸的置信度;在步骤S606中,根据真脸的置信度和假脸的置信度判断人脸是否为假脸;在步骤S607中,当判定为假脸时,根据假脸的置信度计算风险值;在步骤S608中,当判定为真脸时,对待识别图像进行特征提取获取全局特征信息,并根据全局特征信息对人 脸区域进行回归,同时计算人脸的真假及对应的置信度;在步骤S609中,根据真脸的置信度和假脸的置信度判断人脸是否为假脸;在步骤S610中,当判定为假脸时,根据假脸的置信度计算风险值;在步骤S611中,当判定为真脸时,确定目标信息;在步骤S612中,根据步骤S604、S607、S609中的风险值确定目标信息。其中各步骤的具体执行流程与本申请实施例中对应操作的具体执行流程相同,在此不再赘述。
以敏感视频检测为例,用户通过终端设备将敏感视频发送至后台服务器,通过后台服务器对视频中的人脸进行检测和识别,并判断视频中的人脸是否为公众人物的人脸,最后将根据对人脸进行检测和识别以及公众人物的判断结果确定的目标信息返回至终端设备,根据目标信息的不同可以对敏感视频进行不同的操作,具体地,当目标信息提示敏感视频的风险等级较高,且使用了公众人物的人脸,则可将该敏感视频下架并彻底删除相关路径信息;当目标信息提示敏感视频的风险等级为无风险或较低,并且未使用公众人物的人脸,则可不作处理,或对上传该敏感视频的用户进行监控、警告等。
图7示是本申请实施例提供图像识别方法的对敏感视频中的人脸进行检测和识别的流程示意图,如图7所示,在步骤S701中,解析敏感视频以获取视频帧,并将视频帧中包含人脸的一帧作为待识别图像;在步骤S702中,对待识别图像中的人脸进行检测,获取人脸坐标框信息;在步骤S703中,对根据人脸坐标框信息确定的人脸图像进行模糊度分析,获取模糊度比值;在步骤S704中,根据模糊度比值和分类阈值的大小关系判断图像中的人脸是否为假脸;在步骤S705中,当判定为假脸时,获取假脸的置信度,并根据假脸的置信度计算风险值;在步骤S706中,当判定为真脸时,对根据人脸坐标框信息确定的人脸图像提取局部特征信息,并根据局部特征信息进行分类,以获取人脸分为真脸的置信度和人脸分为假脸的置信度;在步骤S707中,根据真脸的置信度和假脸的置信度判断人脸是否为假脸;在步骤S708中,当判定为假脸时,根据假脸的置信度计算风险值;在步骤S709中,当判定为真脸时,对待识别图像进行特征提取获取全局特征信息,并根据全局特征信息对人脸区域进行回归,同时计算人脸的真假及对应的置信度;在步骤S710中,根据真脸的置信度和 假脸的置信度判断人脸是否为假脸;在步骤S711中,当判定为假脸时,根据假脸的置信度计算风险值;在步骤S712中,当判定为真脸时,确定目标信息;在步骤S713中,将假脸对应的特征向量与公众人物的人脸对应的特征向量进行匹配;在步骤S714中,根据步骤S705、S708、S711中的风险值和步骤S713的匹配结果确定目标信息。其中各步骤的具体执行流程与本申请实施例中对应操作的具体执行流程相同,在此不再赘述。
在一些实施例中,除了对图像、视频中的人脸的真假进行甄别,还可以对图像、视频中的人体的真假进行甄别,对人体真假的甄别方法与对人脸真假的甄别方法类似,只是处理数据的类型有所不同,例如需要将人脸坐标框信息更改为人体坐标框信息,在获取局部特征信息、全局特征信息时需要获取与人体对应的局部特征信息和包含人体和背景的图像对应的全局特征信息,等等,至于具体地处理流程与对人脸真假的识别方法的流程相似,本申请实施例在此不再赘述。
本申请实施例中的图像识别方法能够对待识别图像、待识别视频中的目标对象进行检测识别,确定其真假,并根据确定结果输出相应地目标信息,以使服务使用者根据目标信息进行相应操作。本申请实施例的图像识别方法,一方面,在图像识别过程中采用机器学习模型对图像中的目标对象进行检测,并对局部特征和全局特征进行提取,提高了图像识别的效率和精准度;另一方面,在识别目标对象的真假时采用了包括像素、纹理级别的低阶特征,到包括全局语义信息的高阶特征,包含三个不同阶段的检测,进一步提高了图像识别的精准度;再一方面,能够对目标对象为假的图像是否使用公众任务的肖像进行判断,保护了公众人物的肖像权,避免了不实信息的传播。
以下介绍本申请的装置实施例,可以配置为执行本申请上述实施例中的图像识别方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的图像识别方法的实施例。
图8是本申请实施例提供图像识别装置的框图。
参照图8所示,根据本申请的一个实施例的图像识别装置800,包括:特征信息获取模块801、置信度获取模块802和目标信息获取模块803。
其中,特征信息获取模块801,配置为获取与待识别图像中的目标对 象对应的特征信息,其中所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个;置信度获取模块802,配置为基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度;目标信息获取模块803,配置为根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息。
在本申请的一个实施例中,所述置信度获取模块802包括:第一类别确定单元,配置为获取与所述目标对象对应的模糊度信息,根据所述模糊度信息和分类阈值确定所述目标对象的类别;第二类别确定单元,配置为当根据所述模糊度信息和所述分类阈值确定所述目标对象的类别为真时,获取与所述目标对象对应的局部特征信息,并基于所述局部特征信息确定所述目标对象的类别;第三类别确定单元,配置为当根据所述局部特征信息确定所述目标对象的类别为真时,获取与所述目标对象对应的全局特征信息,并基于所述全局特征信息确定所述目标对象的类别及与所述目标对象对应的置信度,以根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息。
在一些实施例中,所述第一类别确定单元包括:第一对象图获取单元,配置为对所述待识别图像中的目标对象进行检测,以获取与所述目标对象对应的第一对象图;第二对象图获取单元,配置为对所述第一对象图的坐标进行调整,以获取第二对象图;模糊度计算单元,配置为分别对所述第一对象图和所述第二对象图进行模糊度计算,以获取与所述目标对象对应的模糊度信息。
在一些实施例中,所述模糊度信息包括与所述第一对象图对应的第一模糊度和与所述第二对象图对应的第二模糊度;所述第一类别确定单元配置为:将所述第二模糊度与所述第一模糊度相除,以获取模糊度比值;当所述模糊度比值小于或等于所述分类阈值时,确定所述目标对象的类别为真;当所述模糊度比值大于所述分类阈值时,确定所述目标对象的类别为假。
在一些实施例中,基于前述方案,所述置信度获取模块802配置为:在根据所述模糊度信息和所述分类阈值确定所述目标对象的类别为假时,根据所述模糊度比值和所述分类阈值确定与所述目标对象对应的第一置 信度。
在一些实施例中,所述第二类别确定单元包括:第一特征提取单元,配置为将所述第一对象图输入至第一图像识别模型,通过所述第一图像识别模型对所述第一对象图进行特征提取,以获取所述局部特征信息;分类单元,配置为根据所述局部特征信息对所述目标对象进行分类处理,得到所述目标对象的类别。
在一些实施例中,基于前述方案,所述分类单元配置为:基于所述局部特征信息获取与所述目标对象对应的第二置信度和第三置信度;当所述第二置信度大于所述第三置信度时,确定所述目标对象的类别为假;当所述第二置信度小于或等于所述第三置信度时,确定所述目标对象的类别为真。
在一些实施例中,所述置信度获取模块802包括:第二特征提取单元,配置为将所述待识别图像输入至第二图像识别模型,通过所述第二图像识别模型对所述待识别图像进行特征提取,以获取所述全局特征信息;回归单元,配置为根据所述全局特征信息对所述目标对象对应的区域进行回归计算,以获取所述目标对象的类别及与所述目标对象对应的置信度。
在一些实施例中,所述回归单元配置为:根据所述全局特征信息对所述目标对象对应的区域进行回归计算,以获取与所述目标对象对应的第四置信度和第五置信度;当所述第四置信度大于所述第五置信度时,确定所述目标对象的类别为假;当所述第四置信度小于或等于所述第五置信度时,确定所述目标对象的类别为真。
在一些实施例中,所述目标信息获取模块803配置为:当根据所述模糊度信息和所述分类阈值确定的所述目标对象的类别为假时,根据基于所述模糊度信息确定的第一置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类别和所述风险值获取所述目标信息;或者,当根据所述局部特征信息确定的所述目标对象的类别为假时,根据基于所述局部特征信息确定的第二置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类别和所述风险值获取所述目标信息;或者,当根据所述全局特征信息确定的所述目标对象的类别为假时,根据基于所述全局特征信息确定的第四置信度确定与所述待识别图像对应的风险值,并基于所述 目标对象的类别和所述风险值获取所述目标信息。
在一些实施例中,所述目标信息获取模块803还配置为:当根据所述模糊度信息和所述分类阈值确定的所述目标对象的类别为假时,将所述目标对象与素材库中的待匹配对象进行匹配,得到匹配结果,以根据所述目标对象的类别、目标置信度和匹配结果获取所述目标信息;或者,当根据所述局部特征信息确定的所述目标对象的类别为假时,将所述目标对象与所述素材库中的待匹配对象进行匹配,得到匹配结果,以根据所述目标对象的类别、目标置信度和匹配结果获取所述目标信息;或者,当根据所述全局特征信息确定的所述目标对象的类别为假时,将所述目标对象与所述素材库中的待匹配对象进行匹配,得到匹配结果,以根据所述目标对象的类别、目标置信度和匹配结果获取所述目标信息。
在一些实施例中,所述将所述目标对象与素材库中的待匹配对象进行匹配具体为:计算所述目标对象对应的特征向量和所述待匹配对象对应的特征向量之间的距离;当所述距离小于所述预设距离阈值时,确定所述目标对象与所述待匹配对象之间具有匹配关系。
在一些实施例中,所述根据所述目标对象的类别、目标置信度和匹配结果获取所述目标信息具体为:在所述目标对象与所述待匹配对象之间具有匹配关系时,根据所述距离确定第六置信度;在根据所述模糊度信息、所述局部特征信息或所述全局特征信息确定所述目标对象的类别为假时,获取与所述目标对象对应的置信度作为所述目标置信度;根据所述第六置信度和所述目标置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类型、所述匹配结果和所述风险值获取所述目标信息。
图9是本申请实施例提供的电子设备的计算机系统的结构示意图。
需要说明的是,图9示出的电子设备的计算机系统900仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图9所示,计算机系统900包括中央处理单元(Central Processing Unit,CPU)901,其可以根据存储在只读存储器(Read-Only Memory,ROM)902中的程序或者从存储部分908加载到随机访问存储器(Random Access Memory,RAM)903中的程序而执行各种适当的动作和处理,实现上述实施例中所述的图像标注方法。在RAM 903中,还存储有系统操 作所需的各种程序和数据。CPU 901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(Input/Output,I/O)接口905也连接至总线904。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分907;包括硬盘等的存储部分908;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器910也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入存储部分908。
在一些实施例中,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分909从网络上被下载和安装,和/或从可拆卸介质911被安装。在该计算机程序被中央处理单元(CPU)901执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适 的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请实施例还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的图像处理装置中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现上述实施例中所述的方法。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式 体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (16)

  1. 一种图像识别方法,所述方法由电子设备执行,所述方法包括:
    获取与待识别图像中的目标对象对应的特征信息;
    其中,所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个;
    基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度;
    根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息。
  2. 根据权利要求1所述的图像识别方法,其中,所述基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度,包括:
    获取与所述目标对象对应的模糊度信息,根据所述模糊度信息和分类阈值确定所述目标对象的类别;
    当根据所述模糊度信息和所述分类阈值确定所述目标对象的类别为真时,获取与所述目标对象对应的局部特征信息,并基于所述局部特征信息确定所述目标对象的类别;
    当根据所述局部特征信息确定所述目标对象的类别为真时,获取与所述目标对象对应的全局特征信息,并基于所述全局特征信息确定所述目标对象的类别及与所述目标对象对应的置信度。
  3. 根据权利要求2所述的图像识别方法,其中,所述获取与所述目标对象对应的模糊度信息,包括:
    对所述待识别图像中的目标对象进行检测,以获取与所述目标对象对应的第一对象图;
    对所述第一对象图的坐标进行调整,以获取第二对象图;
    分别对所述第一对象图和所述第二对象图进行模糊度计算,以获取与所述目标对象对应的模糊度信息。
  4. 根据权利要求3所述的图像识别方法,其中,所述模糊度信息包括与所述第一对象图对应的第一模糊度和与所述第二对象图对应的第二 模糊度;
    所述根据所述模糊度信息和分类阈值确定所述目标对象的类别,包括:
    将所述第二模糊度与所述第一模糊度相除,以获取模糊度比值;
    当所述模糊度比值小于或等于所述分类阈值时,确定所述目标对象的类别为真;
    当所述模糊度比值大于所述分类阈值时,确定所述目标对象的类别为假。
  5. 根据权利要求4所述的图像识别方法,所述基于所述特征信息确定与所述目标对象对应的置信度,包括:
    当根据所述模糊度信息和所述分类阈值确定所述目标对象的类别为假时,根据所述模糊度比值和所述分类阈值确定与所述目标对象对应的第一置信度。
  6. 根据权利要求3所述的图像识别方法,其中,所述获取与所述目标对象对应的局部特征信息,并基于所述局部特征信息确定所述目标对象的类别,包括:
    将所述第一对象图输入至第一图像识别模型,通过所述第一图像识别模型对所述第一对象图进行特征提取,以获取所述局部特征信息;
    根据所述局部特征信息对所述目标对象进行分类处理,得到所述目标对象的类别。
  7. 根据权利要求6所述的图像识别方法,所述根据所述局部特征信息对所述目标对象进行分类处理,得到所述目标对象的类别,包括:
    基于所述局部特征信息获取与所述目标对象对应的第二置信度和第三置信度;
    当所述第二置信度大于所述第三置信度时,确定所述目标对象的类别为假;
    当所述第二置信度小于或等于所述第三置信度时,确定所述目标对象的类别为真。
  8. 根据权利要求2所述的图像识别方法,其中,所述获取与所述目标对象对应的全局特征信息,并基于所述全局特征信息确定所述目标对象的类别及与所述目标对象对应的置信度,包括:
    将所述待识别图像输入至第二图像识别模型,通过所述第二图像识别模型对所述待识别图像进行特征提取,以获取所述全局特征信息;
    根据所述全局特征信息对所述目标对象对应的区域进行回归计算,以获取所述目标对象的类别及与所述目标对象对应的置信度。
  9. 根据权利要求8所述的图像识别方法,所述根据所述全局特征信息对所述目标对象对应的区域进行回归计算,以获取所述目标对象的类别及与所述目标对象对应的置信度,包括:
    根据所述全局特征信息对所述目标对象对应的区域进行回归计算,以获取与所述目标对象对应的第四置信度和第五置信度;
    当所述第四置信度大于所述第五置信度时,确定所述目标对象的类别为假;
    当所述第四置信度小于或等于所述第五置信度时,确定所述目标对象的类别为真。
  10. 根据权利要求2所述的图像识别方法,其中,所述根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息,包括:
    当根据所述模糊度信息和所述分类阈值确定的所述目标对象的类别为假时,根据基于所述模糊度信息确定的第一置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类别和所述风险值获取所述目标信息;或者,
    当根据所述局部特征信息确定的所述目标对象的类别为假时,根据基于所述局部特征信息确定的第二置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类别和所述风险值获取所述目标信息;或者,
    当根据所述全局特征信息确定的所述目标对象的类别为假时,根据基于所述全局特征信息确定的第四置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类别和所述风险值获取所述目标信息。
  11. 根据权利要求2所述的图像识别方法,其中,所述根据所述目标对象的类别和所述置信度获取与所述待识别图像对应的目标信息,包括:
    当根据所述模糊度信息和所述分类阈值确定的所述目标对象的类别为假时,将所述目标对象与素材库中的待匹配对象进行匹配,得到匹配结果,根据所述目标对象的类别、目标置信度和所述匹配结果获取所述目标 信息;或者,
    当根据所述局部特征信息确定的所述目标对象的类别为假时,将所述目标对象与所述素材库中的待匹配对象进行匹配,得到匹配结果,根据所述目标对象的类别、目标置信度和所述匹配结果获取所述目标信息;或者,
    当根据所述全局特征信息确定的所述目标对象的类别为假时,将所述目标对象与所述素材库中的待匹配对象进行匹配,得到匹配结果,根据所述目标对象的类别、目标置信度和所述匹配结果获取所述目标信息。
  12. 根据权利要求11所述的图像识别方法,其中,所述将所述目标对象与素材库中的待匹配对象进行匹配,得到匹配结果,包括:
    计算所述目标对象对应的特征向量和所述待匹配对象对应的特征向量之间的距离;
    当所述距离小于预设距离阈值时,确定所述目标对象与所述待匹配对象之间具有匹配关系。
  13. 根据权利要求12所述的图像识别方法,其中,所述根据所述目标对象的类别、目标置信度和匹配结果获取所述目标信息,包括:
    在所述目标对象与所述待匹配对象之间具有匹配关系时,根据所述距离确定第六置信度;
    在根据所述模糊度信息、所述局部特征信息或所述全局特征信息确定所述目标对象的类别为假时,获取与所述目标对象对应的置信度作为所述目标置信度;
    根据所述第六置信度和所述目标置信度确定与所述待识别图像对应的风险值,并基于所述目标对象的类型、所述匹配结果和所述风险值获取所述目标信息。
  14. 一种图像识别装置,包括:
    特征信息获取模块,配置为获取与待识别图像中的目标对象对应的特征信息;其中,所述特征信息包括模糊度信息、局部特征信息、全局特征信息中的任意一个或多个;
    置信度获取模块,配置为基于所述特征信息确定所述目标对象的类别,并确定与所述目标对象对应的置信度;
    目标信息获取模块,配置为根据所述目标对象的类别和所述置信度获 取与所述待识别图像对应的目标信息。
  15. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,所述存储装置用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至13中任一项所述的图像识别方法。
  16. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被设置为运行时执行所述权利要求1至13任一项中所述的图像识别方法。
PCT/CN2020/123903 2020-01-08 2020-10-27 图像识别方法、装置、计算机可读存储介质及电子设备 WO2021139324A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/676,111 US20220172518A1 (en) 2020-01-08 2022-02-18 Image recognition method and apparatus, computer-readable storage medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010017583.4 2020-01-08
CN202010017583.4A CN111241989B (zh) 2020-01-08 2020-01-08 图像识别方法及装置、电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/676,111 Continuation US20220172518A1 (en) 2020-01-08 2022-02-18 Image recognition method and apparatus, computer-readable storage medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2021139324A1 true WO2021139324A1 (zh) 2021-07-15

Family

ID=70876061

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123903 WO2021139324A1 (zh) 2020-01-08 2020-10-27 图像识别方法、装置、计算机可读存储介质及电子设备

Country Status (3)

Country Link
US (1) US20220172518A1 (zh)
CN (1) CN111241989B (zh)
WO (1) WO2021139324A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822901A (zh) * 2021-07-21 2021-12-21 南京旭锐软件科技有限公司 图像分割方法、装置、存储介质及电子设备

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241989B (zh) * 2020-01-08 2023-06-13 腾讯科技(深圳)有限公司 图像识别方法及装置、电子设备
CN111507262B (zh) * 2020-04-17 2023-12-08 北京百度网讯科技有限公司 用于检测活体的方法和装置
CN111967576B (zh) * 2020-07-22 2022-09-02 长春工程学院 一种基于深度学习的地球化学数据处理方法和系统
US20230259549A1 (en) * 2020-07-24 2023-08-17 Seung Mo Kim Extraction of feature point of object from image and image search system and method using same
CN111738244B (zh) * 2020-08-26 2020-11-24 腾讯科技(深圳)有限公司 图像检测方法、装置、计算机设备和存储介质
CN112183353B (zh) * 2020-09-28 2022-09-20 腾讯科技(深圳)有限公司 一种图像数据处理方法、装置和相关设备
CN112329719B (zh) * 2020-11-25 2021-10-15 江苏云从曦和人工智能有限公司 行为识别方法、装置以及计算机可读存储介质
CN112183501B (zh) * 2020-11-27 2021-02-19 北京智源人工智能研究院 深度伪造图像检测方法及装置
CN112784760B (zh) * 2021-01-25 2024-04-12 北京百度网讯科技有限公司 人体行为识别方法、装置、设备以及存储介质
CN114723966B (zh) * 2022-03-30 2023-04-07 北京百度网讯科技有限公司 多任务识别方法、训练方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678984A (zh) * 2013-12-20 2014-03-26 湖北微模式科技发展有限公司 一种利用摄像头实现用户身份验证的方法
CN106557726A (zh) * 2015-09-25 2017-04-05 北京市商汤科技开发有限公司 一种带静默式活体检测的人脸身份认证系统及其方法
CN106650669A (zh) * 2016-12-27 2017-05-10 重庆邮电大学 一种鉴别仿冒照片欺骗的人脸识别方法
CN107358157A (zh) * 2017-06-07 2017-11-17 阿里巴巴集团控股有限公司 一种人脸活体检测方法、装置以及电子设备
CN109948439A (zh) * 2019-02-13 2019-06-28 平安科技(深圳)有限公司 一种活体检测方法、系统及终端设备
CN111241989A (zh) * 2020-01-08 2020-06-05 腾讯科技(深圳)有限公司 图像识别方法及装置、电子设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116763B (zh) * 2013-01-30 2016-01-20 宁波大学 一种基于hsv颜色空间统计特征的活体人脸检测方法
CN106548145A (zh) * 2016-10-31 2017-03-29 北京小米移动软件有限公司 图像识别方法及装置
CN108664843B (zh) * 2017-03-27 2023-04-07 北京三星通信技术研究有限公司 活体对象识别方法、设备和计算机可读存储介质
CN107133948B (zh) * 2017-05-09 2020-05-08 电子科技大学 基于多任务卷积神经网络的图像模糊与噪声评测方法
CN107886070A (zh) * 2017-11-10 2018-04-06 北京小米移动软件有限公司 人脸图像的验证方法、装置及设备
CN107992845A (zh) * 2017-12-14 2018-05-04 广东工业大学 一种面部识别辨伪方法及装置、计算机设备
US10922626B2 (en) * 2018-03-09 2021-02-16 Qualcomm Incorporated Conditional branch in machine learning object detection
CN108446690B (zh) * 2018-05-31 2021-09-14 北京工业大学 一种基于多视角动态特征的人脸活体检测方法
CN109086675B (zh) * 2018-07-06 2021-08-24 四川奇迹云科技有限公司 一种基于光场成像技术的人脸识别及攻击检测方法及其装置
CN109345253A (zh) * 2018-09-04 2019-02-15 阿里巴巴集团控股有限公司 资源转移方法、装置及系统
CN109711254B (zh) * 2018-11-23 2020-12-15 北京交通大学 基于对抗生成网络的图像处理方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678984A (zh) * 2013-12-20 2014-03-26 湖北微模式科技发展有限公司 一种利用摄像头实现用户身份验证的方法
CN106557726A (zh) * 2015-09-25 2017-04-05 北京市商汤科技开发有限公司 一种带静默式活体检测的人脸身份认证系统及其方法
CN106650669A (zh) * 2016-12-27 2017-05-10 重庆邮电大学 一种鉴别仿冒照片欺骗的人脸识别方法
CN107358157A (zh) * 2017-06-07 2017-11-17 阿里巴巴集团控股有限公司 一种人脸活体检测方法、装置以及电子设备
CN109948439A (zh) * 2019-02-13 2019-06-28 平安科技(深圳)有限公司 一种活体检测方法、系统及终端设备
CN111241989A (zh) * 2020-01-08 2020-06-05 腾讯科技(深圳)有限公司 图像识别方法及装置、电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822901A (zh) * 2021-07-21 2021-12-21 南京旭锐软件科技有限公司 图像分割方法、装置、存储介质及电子设备
CN113822901B (zh) * 2021-07-21 2023-12-12 南京旭锐软件科技有限公司 图像分割方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN111241989A (zh) 2020-06-05
US20220172518A1 (en) 2022-06-02
CN111241989B (zh) 2023-06-13

Similar Documents

Publication Publication Date Title
WO2021139324A1 (zh) 图像识别方法、装置、计算机可读存储介质及电子设备
WO2021077984A1 (zh) 对象识别方法、装置、电子设备及可读存储介质
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN109558832B (zh) 一种人体姿态检测方法、装置、设备及存储介质
CN110728209B (zh) 一种姿态识别方法、装置、电子设备及存储介质
WO2022161286A1 (zh) 图像检测方法、模型训练方法、设备、介质及程序产品
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
US10891465B2 (en) Methods and apparatuses for searching for target person, devices, and media
CN108229297B (zh) 人脸识别方法和装置、电子设备、计算机存储介质
US11126827B2 (en) Method and system for image identification
CN111626163B (zh) 一种人脸活体检测方法、装置及计算机设备
CN111291863B (zh) 换脸鉴别模型的训练方法、换脸鉴别方法、装置和设备
CN111738120B (zh) 人物识别方法、装置、电子设备及存储介质
Sedik et al. An efficient cybersecurity framework for facial video forensics detection based on multimodal deep learning
CN115719428A (zh) 基于分类模型的人脸图像聚类方法、装置、设备及介质
Anwar et al. Perceptual judgments to detect computer generated forged faces in social media
CN115457620A (zh) 用户表情识别方法、装置、计算机设备及存储介质
CN113763313A (zh) 文本图像的质量检测方法、装置、介质及电子设备
CN108694347B (zh) 图像处理方法和装置
Singh et al. Development of accurate face recognition process flow for authentication
Shojaeilangari et al. Dynamic facial expression analysis based on extended spatio-temporal histogram of oriented gradients
Narwal et al. Image Systems and Visualizations
Wang et al. Framework for facial recognition and reconstruction for enhanced security and surveillance monitoring using 3D computer vision
Jayabharathi et al. POC-net: pelican optimization-based convolutional neural network for recognizing large pose variation from video
CN116543145A (zh) 一种图像处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20911643

Country of ref document: EP

Kind code of ref document: A1