CN112084858A

CN112084858A - Object recognition method and device, electronic equipment and storage medium

Info

Publication number: CN112084858A
Application number: CN202010780055.4A
Authority: CN
Inventors: 邱尚锋; 黄颖; 张文伟
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-12-15
Also published as: WO2022028425A1

Abstract

The application provides an object identification method and device, electronic equipment and a storage medium, and relates to the technical field of face identification. In the present application, first, a plurality of frames of target images obtained by photographing a target object are obtained, where each frame of target image includes face information of the target object. Second, similarity information between the target images of the plurality of frames is determined based on the face information. Then, it is determined whether the target object belongs to the living object based on the similarity information. Based on the method, the problem of low accuracy of the recognition result in the existing face recognition technology can be solved.

Description

Object recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to an object recognition method and apparatus, an electronic device, and a storage medium.

Background

With the wider application range of the face recognition technology, the higher the requirement on the accuracy of the face recognition result is. In order to improve the accuracy of the face recognition result, the recognition object is generally required to perform a specified action, such as blinking, or a depth image sensor is required to be configured on the recognition device to acquire the face depth information of the recognition object.

However, the inventors have studied and found that in many applications, a recognition object may not be required to perform a specified action (for example, for a better user experience), a depth image sensor is not configured on a recognition device (for example, considering the cost of the device, and the like), and thus, a recognition result obtained based on a face recognition technology still has a problem of low accuracy.

Disclosure of Invention

In view of the above, an object recognition method and apparatus, an electronic device, and a storage medium are provided to solve the problem of low accuracy of recognition result in the existing face recognition technology.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

an object recognition method, comprising:

acquiring a plurality of frames of target images obtained by shooting a target object, wherein each frame of target image comprises face information of the target object;

determining similarity information between a plurality of frames of the target images based on the face information;

determining whether the target object belongs to a living object based on the similarity information.

In a preferable selection of the embodiment of the present application, in the object recognition method, the step of determining similarity information between the target images of a plurality of frames based on the face information includes:

performing feature extraction processing on face information in each frame of the target image based on a face recognition model obtained through pre-training to obtain target feature information of each frame of the target image;

and obtaining the similarity information between the target images of multiple frames based on the target characteristic information.

In a preferable selection of the embodiment of the application, in the object recognition method, the step of determining whether the target object belongs to a living object based on the similarity information includes:

comparing at least one piece of target characteristic information with a plurality of pieces of contrast characteristic information which are formed in advance through the face recognition model to obtain confidence coefficient information that the target object belongs to a living object, wherein the contrast characteristic information is obtained based on a plurality of frames of images comprising face information of different objects, and the objects comprise a living object and a non-living object;

determining whether the target object belongs to a living object based on the confidence information and the similarity information.

In a preferable selection of the embodiment of the application, in the object recognition method, the step of comparing, by the face recognition model, at least one of the target feature information with a plurality of pieces of comparison feature information formed in advance to obtain confidence information that the target object belongs to a living object includes:

performing identity type identification processing on at least one target characteristic information through the face identification model to obtain identity type information of the at least one target characteristic information;

determining a target feature space of the at least one target feature information based on the identity class information in a plurality of feature spaces included in the face recognition model, wherein different feature spaces have contrast feature information of different objects, and each of the contrast feature information has first tag information identifying an identity class of a corresponding object and second tag information indicating whether the corresponding object is a living body;

and comparing the at least one target characteristic information with the contrast characteristic information in the target characteristic space corresponding to the target characteristic information through the face recognition model to obtain the confidence information of the living body object belonging to the target object.

aiming at the target characteristic information corresponding to each frame of target image, comparing the target characteristic information with a plurality of pieces of contrast characteristic information formed in advance through the face recognition model, determining the confidence coefficient that the target object belongs to a living object in the frame of target image, and obtaining a plurality of confidence coefficients;

obtaining confidence information that the target object belongs to a living object based on the plurality of confidences.

In a preferred option of the embodiment of the present application, in the object recognition method, the method further includes a step of obtaining the face recognition model by training, where the step includes:

carrying out feature extraction processing on the plurality of sample images through a feature extraction layer in a preset neural network model to obtain a plurality of sample feature information;

respectively determining a first loss value of each sample characteristic information based on first label information configured for each sample image in advance and a second loss value of each sample characteristic information based on second label information configured for each sample image in advance through a loss determination layer in the neural network model, wherein the first label information is used for identifying the identity category of the object in the corresponding sample image, and the second label information is used for identifying whether the object in the corresponding sample image is a living body;

and training the neural network model based on the first loss value and the second loss value to obtain the face recognition model.

comparing at least one frame of target image in the multi-frame target images with a pre-obtained multi-frame image to obtain confidence information that the target object belongs to a living object, wherein the multi-frame image comprises face information of a plurality of different objects, and the object comprises a living object and a non-living object;

An embodiment of the present application further provides an object recognition apparatus, including:

the target image acquisition module is used for acquiring a plurality of frames of target images obtained by shooting a target object, wherein each frame of target image comprises face information of the target object;

the similarity information determining module is used for determining similarity information among the multi-frame target images based on the face information;

a living object determination module for determining whether the target object belongs to a living object based on the similarity information.

On the basis, an embodiment of the present application further provides an electronic device, including:

a memory for storing a computer program;

and the processor is connected with the memory and is used for executing the computer program stored in the memory so as to realize the object identification method.

On the basis of the foregoing, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the object recognition method is implemented.

According to the object identification method and device, the electronic equipment and the storage medium, the multi-frame target image obtained by shooting the target object is obtained, and then whether the target object belongs to the living object is determined based on the similarity information between the face information in the multi-frame target image. Thus, in the case where the device which does not depend on the target object to perform the specified action and does not depend on the shooting of the target object has the depth image sensor, the high-accuracy recognition of whether the target object belongs to the living object can be realized (through the research of the inventor of the present application, even if the living object does not intentionally perform the specified action on the face, the living object can change slightly at random time, namely, the living object has a micro expression change, so that at least a fine difference exists between the shot multi-frame target images, but the non-living object such as a photo or a three-dimensional model can not change), thereby improving the problem of low accuracy of the recognition result in the application which can not require the recognition object to perform the specified action (such as better experience of the user) or the shooting device which is not provided with the depth image sensor (such as cost of the device) in the existing face recognition technology, and then make face identification's the degree of accuracy can obtain effectual assurance, and can also improve the experience of discernment object (need not make appointed action) to can reduce the cost of equipment, make application scope wider, practical value is higher.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Fig. 2 is a schematic flowchart of an object identification method according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating an effect of obtaining multiple frames of target images according to an embodiment of the present application.

Fig. 4 is a flowchart illustrating the sub-steps included in step S120 in fig. 2.

Fig. 5 is a flowchart illustrating sub-steps included in step S130 in fig. 2.

Fig. 6 is a flowchart illustrating other sub-steps included in step S130 in fig. 2.

Fig. 7 is a flowchart illustrating the sub-steps included in step S133 in fig. 6.

Fig. 8 is a flowchart illustrating other sub-steps included in step S133 in fig. 6.

Fig. 9 is a flowchart illustrating other steps of the object identification method according to the embodiment of the present application.

Fig. 10 is a block diagram illustrating functional modules of an object recognition apparatus according to an embodiment of the present disclosure.

Icon: 10-an electronic device; 12-a memory; 14-a processor; 100-object recognition means; 110-target image acquisition module; 120-similarity information determination module; 130-live object determination module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, an embodiment of the present application provides an electronic device 10. The electronic device 10 may include, among other things, a memory 12, a processor 14, and an object recognition apparatus 100.

In detail, the memory 12 and the processor 14 are electrically connected directly or indirectly to enable data transmission or interaction. For example, they may be electrically connected to each other via one or more communication buses or signal lines. The memory 12 may store at least one software functional module, which may be in the form of software or firmware (firmware), such as the object recognition apparatus 100. The processor 14 may be configured to execute an executable computer program stored in the memory 12, such as the object recognition apparatus 100, so as to implement an object recognition method provided in an embodiment (described later) of the present application to determine whether a target object belongs to a living object.

Alternatively, the Memory 12 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The Processor 14 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a System on Chip (SoC), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

It is understood that the electronic device 10 may be a terminal device (e.g. a mobile phone, a computer, etc.) or a server with data processing capability.

Moreover, the structure shown in fig. 1 is only an illustration, and the electronic device 10 may further include more or fewer components than those shown in fig. 1, or have a different configuration from that shown in fig. 1, for example, a communication unit for performing information interaction with other devices may be further included, and if the electronic device 10 is a server, the communication unit may be used for communicating with a shooting device of a target object to obtain a target image or feed back a recognition result, and the like.

With reference to fig. 2, an embodiment of the present application further provides an object identification method, which can be applied to the electronic device 10. Wherein the method steps defined by the flow related to the object recognition method may be implemented by the electronic device 10. The specific process shown in FIG. 2 will be described in detail below.

And step S110, acquiring a multi-frame target image obtained by shooting a target object.

In this embodiment, when the target object needs to be identified to determine that the target object identification belongs to the living object, the electronic device 10 may first obtain a plurality of frames of target images.

Wherein the plurality of frames of target images may be obtained based on photographing the target object, and each frame of the target image may include face information of the target object.

And step S120, determining similarity information among the multiple frames of target images based on the face information.

In this embodiment, after obtaining the plurality of frames of target images based on step S110, the electronic device 10 may determine similarity information between the plurality of frames of target images based on face information in the plurality of frames of target images.

Step S130 of determining whether the target object belongs to a living object based on the similarity information.

In the present embodiment, after obtaining the similarity information between the target images of the plurality of frames based on step S120, the electronic device 10 may determine whether the target object belongs to a living object based on the similarity information.

Based on the method, under the condition that the device which does not depend on the target object to make the specified action and does not depend on shooting the target object is provided with the depth image sensor, the high-accuracy identification of whether the target object belongs to the living body object can be realized, so that the problem that the accuracy of the identification result is low in the application that the identification object cannot be required to make the specified action (for example, the experience of a user is better) or the depth image sensor is not configured on the shooting device (for example, the cost of the device is considered) in the existing face identification technology is solved, the accuracy of the face identification can be effectively ensured, the experience of identifying the object can be improved (the specified action is not required), and the cost of the device can be reduced.

Even if the living object (such as a real person) does not intentionally make a specified motion on the face, the change of the living object can be random time change, so that the face has slight change, namely, the change of the micro expression exists, so that at least slight difference exists between the multi-frame target images obtained by shooting, and the change cannot exist on a non-living object, such as a photo or a three-dimensional model.

Based on the above finding, the inventors of the present application have proposed a technical solution for determining whether a target object belongs to a living object based on similarity information between images after a long-term study, that is, an object identification method improved in the embodiments of the present application.

In the first aspect, it should be noted that, in step S110, a specific manner of obtaining multiple frames of target images is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, if the electronic device 10 is a terminal device, such as a mobile phone or a computer, the electronic device 10 may capture a target object based on a carried image capture device (such as a camera) to obtain multiple frames of target images with face information of the target object.

That is to say, when the face of the target object needs to be recognized, the electronic device 10 may control the image capturing device to be turned on to capture the target object, and then, after the image capturing device captures multiple frames of target images, the image capturing device may send the multiple frames of target images to the electronic device 10, so that the electronic device 10 may obtain the multiple frames of target images.

For another example, in another alternative example, if the electronic device 10 is a server, the electronic device 10 may capture a target object based on an image capture device (e.g., a camera) on a connected terminal device to obtain multiple frames of target images with face information of the target object.

That is, when the face of the target object needs to be recognized, the electronic device 10 is connected to a terminal device, and may control the image capturing device to be turned on to capture the target object. Then, the terminal device may obtain multiple frames of target images captured by the image capturing device, and send the multiple frames of target images to the electronic device 10, so that the electronic device 10 may obtain multiple frames of target images.

It should be further explained in step S110 that the obtained multi-frame target image may be all target images obtained by capturing the target object, or may be a part of all target images obtained by capturing the target object.

For example, in an alternative example, if the demand for the recognition accuracy is particularly high, or the number of all the target images obtained by photographing is not particularly large, the obtained multi-frame target images may be all the target images obtained by photographing the target object.

For another example, in another alternative example, in order to reduce the data processing amount of the electronic device 10, the obtained multi-frame target image may be a partial target image of all target images obtained by photographing the target object.

Based on different application requirements, the manner of obtaining part of the target images from all the target images obtained by shooting is not limited, and the selection can be performed according to actual application requirements.

For example, in an alternative example, the target images of the previous N frames may be obtained from all the target images obtained by shooting (it is found through research that a general living subject is more likely to have a large change in expression when being shot at the beginning, so that the problem of misrecognition can be sufficiently avoided because the large change in expression is more likely to be recognized while the data processing amount of the electronic device 10 is reduced); or obtaining the target images of the next N frames from all the target images obtained by shooting; and obtaining an intermediate N-frame target image in all the shot target images.

For another example, as a result of research by the inventors of the present application, it has been found that expression (facial information) differences between target images of adjacent frames are extremely small due to an extremely short time interval between shooting, and are difficult to be recognized effectively, and thus, based on a comprehensive consideration of a data processing amount and recognition accuracy, a target image of one frame can be obtained in all target images obtained by shooting every preset number of frames, thereby obtaining target images of multiple frames.

Among them, in combination with the aforementioned finding (generally, a living subject is more likely to have a large change in expression when it is just started to be photographed), the inventors of the present application also provide a scheme in which the value of the above-mentioned preset number of frames can be increased stepwise.

That is, the time difference value between two adjacent target images of the plurality of target images whose time information is earlier may be smaller.

In detail, in a specific application example, with reference to fig. 3, if multiple frames of target images obtained by shooting are sequentially determined according to a time sequence, the multiple frames of target images may be a first frame of target image, a second frame of target image, a third frame of target image, a first fourth frame of target image, a fifth frame of target image, a sixth frame of target image, a seventh frame of target image, an eighth frame of target image, a ninth frame of target image, and a tenth frame of target image. In this way, the obtained multiple frames of target images may be, in order of time, a first frame of target image, a third frame of target image (with one frame interval), a sixth frame of target image (with two frames interval), and a tenth frame of target image (with three frames interval).

Further, it should be further explained for step S110 that, in order to further improve the identification accuracy of whether the target object belongs to the living body object, the total frame length of the obtained multi-frame target image may be less than a preset time length, for example, 1S or 0.5S.

In this way, through the limitation of the total frame length, the problem of false identification caused by counterfeiting a living object by replacing a photo or adopting a video mode can be fully avoided.

In the second aspect, it should be noted that, in step S120, a specific manner for determining the similarity information between the target images of multiple frames is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, the face information in the target images of the plurality of frames may be compared based on some image processing algorithm, such as extracting a face contour in the target image based on a contour extraction algorithm, and then comparing the extracted face contours to determine similarity information between the target images of the plurality of frames.

For another example, in another alternative example, the determination may be made based on a neural network in order to improve the accuracy of the determined similarity information. Based on this, in conjunction with fig. 4, step S120 may include step S121 and step S122, which are described in detail below.

And step S121, performing feature extraction processing on the face information in each frame of target image based on the face recognition model obtained through pre-training to obtain the target feature information of each frame of target image.

In this embodiment, after obtaining multiple frames of target images based on step S110, the multiple frames of target images may be input into a face recognition model obtained through pre-training, and feature extraction processing may be performed on face information in the multiple frames of target images based on the face recognition model, so that target feature information of the multiple frames of target images may be obtained.

And step S122, obtaining the similarity information between the multi-frame target images based on the target characteristic information.

In the present embodiment, after the target feature information of the multiple frames of target images is obtained based on step S121, the similarity information between the multiple frames of target images may be obtained based on the target feature information.

Therefore, the face recognition model, namely the neural network model, has higher information processing capacity, so that the extracted target characteristic information is richer, the similarity information determined based on the target characteristic information has higher accuracy, and the recognition accuracy of the living body object is improved.

It should be further explained that, in step S120, the target images of the multiple frames for which the similarity information is determined may be all target images in the target images of the multiple frames obtained in step S110, or may be part of the target images in the target images of the multiple frames obtained in step S110.

In the third aspect, it should be noted that, for step S130, the specific manner of determining whether the target object belongs to the living object is not limited, and may be selected according to the actual application requirement.

For example, in an alternative example, it may be determined whether the target object belongs to a living object based only on the similarity information.

In detail, in a specific example, if the similarity information is smaller than a preset similarity, it may be determined that the target object belongs to a non-living object, that is, may be a photograph or a three-dimensional model, and the like. If the similarity information is greater than the preset similarity, it can be determined that the target object belongs to a living object, that is, may be a real person or the like.

Alternatively, in another specific application example, if the similarity information is 100%, that is, completely the same, it may be determined that the target object belongs to a non-living object, that is, may be a photograph or a three-dimensional model. If the similarity information is not 100%, it may be determined that the target object belongs to a living object.

For another example, in another alternative example, in order to further improve the recognition accuracy of whether the living object belongs to, the similarity information may be combined with other information to perform comprehensive judgment. The research of the inventor of the present application finds that, when a first image is obtained by shooting a target object and then a second image is formed by shooting a photograph or a three-dimensional model formed based on the first image, there may be some differences between the first image and the second image.

Based on this, the inventor of this application provides a technical scheme: and on the basis of the similarity information, determining whether the target object belongs to the living body object by combining confidence information which is obtained by comparing the target image with some preset images and belongs to the living body object.

Based on different requirements, the implementation of the above technical solutions may also be different, that is, the specific manner of determining the confidence level information may have different choices.

For example, in an alternative example, in order to make the determination of the confidence information independent of a complex neural network to reduce the requirement for the data processing performance of the electronic device 10, in conjunction with fig. 5, step S130 may include step S131 and step S132, as described in detail below.

Step S131, comparing at least one frame of target image in the multiple frames of target images with multiple frames of images obtained in advance to obtain confidence information that the target object belongs to a living object.

In the present embodiment, after obtaining the multiple frame target images based on step S110, at least one of the multiple frame target images may be compared with the multiple frame images obtained in advance.

Wherein the multi-frame image may include face information of a plurality of different objects, and the object includes a living object and a non-living object. In this way, by comparing the target image with the multi-frame image, the confidence information that the target object belongs to a living object can be obtained.

For example, the target image is an image taken for the front side of the target object at the time of recognition, the first image is an image taken in advance for the front side of a living object (e.g., a person having the same characteristics as the target object), and the second image is an image taken for a photograph formed based on the first image. Thus, if the similarity between the target image and the first image is higher than the similarity between the target image and the second image, a higher confidence can be obtained, i.e. the target object is more likely to belong to the living object; if the similarity between the target image and the first image is less than the similarity between the target image and the second image, a smaller degree of confidence may be obtained, i.e. the target object is more likely to belong to a non-living object.

Step S132 of determining whether the target object belongs to a living object based on the confidence information and the similarity information.

In the present embodiment, after the confidence information is obtained based on step S131, it may be determined whether the target object belongs to a living object based on the confidence information in combination with the similarity information obtained in step S120.

For another example, in another alternative example, in order to improve the accuracy of the determined confidence information and thus ensure that the identification accuracy of whether the living object belongs to is also high, in conjunction with fig. 6, step S130 may also include step S133 and step S134, which are described in detail below.

Step S133, comparing at least one of the target feature information with a plurality of pre-formed contrast feature information through the face recognition model, to obtain confidence information that the target object belongs to a living object.

In this embodiment, after feature extraction processing is performed on a plurality of frames of the target images based on a face recognition model obtained through pre-training, for example, after the target feature information is obtained based on step S121, the target feature information (at least one) may be further processed by the face recognition model, that is, the target feature information is compared with a plurality of pieces of comparison feature information formed in advance.

Wherein the plurality of contrast characteristic information may be obtained based on a plurality of frames of images including face information of a plurality of different objects, and the object includes a living object and a non-living object. In this way, confidence information that the target object belongs to a living object can be obtained.

Step S134 of determining whether the target object belongs to a living object based on the confidence information and the similarity information.

In the present embodiment, after the confidence information is obtained based on step S133, it may be determined whether the target object belongs to a living object based on the confidence information in combination with the similarity information obtained in step S120.

Optionally, the specific manner of obtaining the confidence information through step S133 is not limited, and may be selected according to the actual application requirement.

For example, in an alternative example, the obtained at least one target feature information may be directly compared with all pre-formed feature information to determine confidence information that the target object belongs to a living object.

Wherein the total feature information may be formed based on images obtained by capturing a plurality of different objects including objects having different identity categories, and objects having the same identity category include a living object and a non-living object, such as a first image obtained by capturing a living object a, and a second image obtained by capturing a picture B of the first image, wherein the picture B is actually a non-living object and has the same identity category as the living object a, such as belonging to the same person.

For another example, in another alternative example, in order to improve the accuracy of the comparison analysis when comparing the information, in conjunction with fig. 7, step S133 may include step S133a, step S133b, and step S133c, which are described in detail below.

Step S133a, performing identity type recognition processing on at least one piece of target feature information through the face recognition model, to obtain identity type information of the at least one piece of target feature information.

In this embodiment, after feature extraction processing is performed on a plurality of frames of the target images based on the face recognition model, for example, after the target feature information is obtained based on step S121, identity class recognition processing may be performed on at least one obtained target feature information through the face recognition model, so as to determine identity class information corresponding to the target feature information.

That is, it may be determined which person the target feature information belongs to first based on the face recognition model, that is, which person the target object belongs to.

Step S133b, determining a target feature space of the at least one target feature information based on the identity class information, in a plurality of feature spaces included in the face recognition model.

In this embodiment, after the identity class information of the target feature information is determined based on step S133a, the target feature space of the target feature information may be determined among a plurality of feature spaces included in the face recognition model based on the identity class information.

The different feature spaces have contrast feature information of different objects, and each of the contrast feature information has first tag information identifying an identity category of a corresponding object and second tag information identifying whether the corresponding object is a living body.

That is, the comparison feature information of the same feature space may have the same first tag information, but may have different second tag information.

Step S133c, comparing, by the face recognition model, the at least one target feature information with the comparison feature information in the target feature space corresponding to the target feature information, to obtain confidence information that the target object belongs to a living object.

In the present embodiment, after the target feature space is determined based on step S133b, the target feature information may be compared with the comparison feature information in the target feature space by the face recognition model, and then, the confidence information that the target object belongs to the living object is determined according to the comparison result and the second tag information of the comparison feature information.

That is, in the above example, by determining the identity category of the target feature information (i.e., the target object) first and then determining whether or not the target object belongs to the living object, a more refined comparison process can be performed when the living object determination is performed, so that the comparison result can be more accurate.

In step S133, the specific manner of obtaining the confidence level information may also be different based on different specific amounts of the compared target feature information, and may be selected according to the actual application requirements.

For example, in an alternative example, in order to reduce the data processing amount of the electronic device 10, to improve the efficiency of identification and reduce the performance requirement of the electronic device 10, after the target feature information of multiple frames of target images is obtained based on step S121, the target feature information of one frame of target image may be selected, and then corresponding confidence information is obtained based on the target feature information.

For another example, in another alternative example, in order to improve the accuracy of the identification, in conjunction with fig. 8, step S133 may include step S133d and step S133e, which are described in detail below.

Step S133d, for the target feature information corresponding to each frame of target image, comparing the target feature information with a plurality of pieces of comparison feature information formed in advance through the face recognition model, determining a confidence that the target object belongs to a living object in the frame of target image, and obtaining a plurality of confidences.

In this embodiment, for each frame of the obtained target image, the target feature information corresponding to the frame of the target image may be compared with a plurality of pieces of contrast feature information formed in advance through the face recognition model, so as to determine a confidence that the target object belongs to a living object in the frame of the target image (specifically, refer to the foregoing explanation of step S133a, step S133b, step S133c, and a parallel scheme), and thus, a plurality of confidences may be obtained.

Step S133e, obtaining confidence information that the target object belongs to a living object based on the plurality of confidences.

In the present embodiment, after obtaining a plurality of confidences based on step S133d, confidence information that the target object belongs to a living object may be obtained based on the plurality of confidences.

For example, in an alternative example, a minimum confidence level may be determined among a plurality of confidence levels as the confidence level information that the target object belongs to the living object.

For another example, in another alternative example, a maximum confidence may be determined among the plurality of confidences as the confidence information that the target object belongs to the living object.

For another example, in another alternative example, an average value may be calculated based on a plurality of confidences as the confidence information that the target object belongs to the living object.

Alternatively, in step S132 and step S134, the specific manner of determining whether the target object belongs to the living object based on the confidence information and the similarity information is not limited, and may also be selected according to actual application requirements.

For example, in an alternative example, a larger piece of information (where, if the confidence information and the similarity information are information in different value ranges, normalization processing may be performed first, and then selection is performed) may be selected from the confidence information and the similarity information as a judgment basis to determine whether the target object belongs to the living object, and if the larger piece of information is larger than the preset information, the target object is determined to belong to the living object.

For another example, in another alternative example, a smaller information may be selected as a judgment basis in the confidence information and the similarity information to determine whether the target object belongs to the living object, for example, the smaller information is compared with a preset information, and if the smaller information is greater than the preset information, the target object is determined to belong to the living object.

For another example, in another alternative example, in order to further improve the accuracy of determining whether the target object belongs to the living object, different weight coefficients may be configured for the confidence information and the similarity information, respectively, and then a weighted sum of the confidence information and the similarity information is calculated, and it is determined whether the target object belongs to the living object based on the weighted sum.

In detail, in a specific application example, the weight coefficient corresponding to the similarity information may be larger than the weight coefficient corresponding to the confidence information, so that the basis for determining whether to belong to a living subject is more focused on the similarity between target images, i.e., a slight change in face information.

Further, in the above example, the feature extraction process may be performed based on a face recognition model to obtain the target feature information, and therefore, in order to enable richer target feature information to be extracted based on the face recognition model so that the target feature information includes more detailed information, in this embodiment, in combination with fig. 9, the object recognition method may further include a step of model training, specifically including step S140, step S150, and step S160, which is described below.

Step S140, performing feature extraction processing on the plurality of sample images through a feature extraction layer in a preset neural network model to obtain a plurality of sample feature information.

In this embodiment, after obtaining a plurality of sample images, feature extraction processing may be performed on the plurality of sample images based on a feature extraction layer in a preset neural network model, and thus, a plurality of sample feature information may be obtained.

Step S150, determining, by a loss determination layer in the neural network model, a first loss value of each sample feature information based on first label information configured for each sample image in advance, and a second loss value of each sample feature information based on second label information configured for each sample image in advance, respectively.

In this embodiment, after obtaining a plurality of sample feature information based on step S140, for each sample feature information, a first loss value of the sample feature information may be determined based on first label information configured for each sample image in advance and a second loss value of the sample feature information may be determined based on second label information configured for each sample image in advance through a loss determination layer in the neural network model. In this way, a plurality of first loss values and a plurality of second loss values may be obtained.

Wherein the first tag information may be used to identify an identity category of an object in the corresponding sample image, such as which person is specific. The second label information may be used to identify whether an object in the corresponding sample image is a living object, such as a real person.

Step S160, training the neural network model based on the first loss value and the second loss value to obtain the face recognition model.

In this embodiment, after obtaining the first loss value and the second loss value based on step S150, the neural network model may be trained (for example, parameters of the neural network model are updated) based on the first loss value and the second loss value, so as to obtain the face recognition model.

In the fourth aspect, it should be noted that, in step S140, a specific architecture of the neural network model is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, the neural network model may be a residual model, such as a Deep Residual Network (DRN).

Also, the specific configuration of the feature extraction layer is not limited. For example, in an alternative example, the feature extraction layer may be an encoder.

In the fifth aspect, it should be noted that, in step S150, the specific configuration of the loss determining layer is not limited, and may be selected according to the actual application requirement.

For example, in an alternative example, the loss determination layer may include an image classification network (e.g., fully connected layers, FC) for performing a feature classification process on each sample feature information to obtain a plurality of feature vectors. In this way, the first loss value may be obtained by calculating the feature vector and a first tag vector formed based on the first tag information; and, the second loss value may be obtained by calculating the feature vector and a second tag vector formed based on the second tag information.

In the sixth aspect, it should be noted that, in step S160, a specific way of performing the training based on the first loss value and the second loss value is not limited, and may also be selected according to actual application requirements.

For example, in an alternative example, the sum of the first loss value and the second loss value may be calculated and used as a total loss value, and then the neural network model may be trained based on the total loss value.

For another example, in another alternative example, the first loss value and the second loss value may be calculated to obtain a weighted sum, the weighted sum is used as a total loss value, and then the neural network model is trained based on the total loss value.

Wherein, the concrete mode is not limited when training.

For example, in an alternative example, the neural network model may be updated by a back propagation algorithm (BP algorithm, which is a supervised learning algorithm) based on the calculated total loss value, so as to obtain the face recognition model.

With reference to fig. 10, an object recognition apparatus 100 is also provided in the embodiment of the present application, and can be applied to the electronic device 10. The object recognition apparatus 100 may include a target image obtaining module 110, a similarity information determining module 120, and a living object determining module 130, among others.

The target image obtaining module 110 is configured to obtain multiple frames of target images obtained by shooting a target object, where each frame of the target image includes face information of the target object. In this embodiment, the target image obtaining module 110 may be configured to perform step S110 shown in fig. 2, and reference may be made to the foregoing description of step S110 regarding the relevant content of the target image obtaining module 110.

The similarity information determining module 120 is configured to determine similarity information between multiple frames of the target images based on the face information. In this embodiment, the similarity information determining module 120 may be configured to perform step S120 shown in fig. 2, and reference may be made to the foregoing description of step S120 for relevant contents of the similarity information determining module 120.

The living object determination module 130 is configured to determine whether the target object belongs to a living object based on the similarity information. In the present embodiment, the living object determination module 130 is configured to perform step S130 shown in fig. 2, and reference may be made to the foregoing description of step S130 for relevant contents of the living object determination module 130.

In an embodiment of the present application, there is also provided a computer-readable storage medium, in which a computer program is stored, and the computer program executes the steps of the object identification method when running.

The steps executed when the computer program runs are not described in detail herein, and reference may be made to the foregoing explanation of the object identification method.

It is to be understood that in the foregoing description, a plurality of or a plurality of frames, etc., refer to two or more, and for example, a multi-frame target image refers to a target image of two or more frames.

In summary, the object recognition method and apparatus, the electronic device, and the storage medium provided by the present application obtain a plurality of target images obtained by photographing a target object, and then determine whether the target object belongs to a living object based on similarity information between face information in the plurality of target images. Thus, in the case where the device which does not depend on the target object to perform the specified action and does not depend on the shooting of the target object has the depth image sensor, the high-accuracy recognition of whether the target object belongs to the living object can be realized (through the research of the inventor of the present application, even if the living object does not intentionally perform the specified action on the face, the living object can change slightly at random time, namely, the living object has a micro expression change, so that at least a fine difference exists between the shot multi-frame target images, but the non-living object such as a photo or a three-dimensional model can not change), thereby improving the problem of low accuracy of the recognition result in the application which can not require the recognition object to perform the specified action (such as better experience of the user) or the shooting device which is not provided with the depth image sensor (such as cost of the device) in the existing face recognition technology, and then make face identification's the degree of accuracy can obtain effectual assurance, and can also improve the experience of discernment object (need not make appointed action) to can reduce the cost of equipment, make application scope wider, practical value is higher.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An object recognition method, comprising:

2. The object recognition method according to claim 1, wherein the step of determining similarity information between a plurality of frames of the target images based on the face information includes:

3. The object recognition method according to claim 2, wherein the step of determining whether the target object belongs to a living object based on the similarity information includes:

4. The object recognition method according to claim 3, wherein the step of comparing at least one of the target feature information with a plurality of pieces of comparison feature information formed in advance by the face recognition model to obtain confidence information that the target object belongs to a living object includes:

5. The object recognition method according to claim 3, wherein the step of comparing at least one of the target feature information with a plurality of pieces of comparison feature information formed in advance by the face recognition model to obtain confidence information that the target object belongs to a living object includes:

6. The method of any one of claims 2 to 5, further comprising the step of training the face recognition model, the step comprising:

7. The object recognition method according to claim 1, wherein the step of determining whether the target object belongs to a living object based on the similarity information includes:

8. An object recognition apparatus, comprising:

the similarity information determining module is used for determining similarity information among the target images of multiple frames based on the face information;

9. An electronic device, comprising:

a memory for storing a computer program;

a processor coupled to the memory for executing a computer program stored by the memory to implement the object recognition method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the object recognition method of any one of claims 1 to 7.