CN110866508B

CN110866508B - Method, device, terminal and storage medium for identifying form of target object

Info

Publication number: CN110866508B
Application number: CN201911141195.0A
Authority: CN
Inventors: 颜波
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2023-06-27
Anticipated expiration: 2039-11-20
Also published as: CN110866508A

Abstract

The embodiment of the application discloses a method, a device, a terminal and a storage medium for identifying the form of a target object, which belong to the field of image processing. Because the target object has a corresponding symmetrical object, and the symmetrical relation between the two objects can be mirror symmetry or center symmetry. Therefore, the method and the device can determine the form of the target object through one detection flow no matter whether the target object is one object or two objects in a symmetrical relation, improve the efficiency of determining the form of the target object, and reduce the probability of missed detection and false detection of the target object through feature comparison.

Description

Method, device, terminal and storage medium for identifying form of target object

Technical Field

The embodiment of the application relates to the field of image processing, in particular to a method, a device, a terminal and a storage medium for identifying the form of a target object.

Background

With the development of image recognition technology, technology for recognizing whether or not eyes are open has also been developed.

In some technical application scenarios, a person skilled in the art will locate the face key points of the image. When the key points of the face in the image are located, the region of the human eye is located by the key points and it is further determined whether the human eye is open.

Disclosure of Invention

The embodiment of the application provides a method, a device, a terminal and a storage medium for identifying the form of a target object. The technical scheme is as follows:

according to an aspect of the present application, there is provided a method of identifying a morphology of a target object, the target object having a corresponding symmetric object in a physical world, the target object and the symmetric object being centrosymmetric or specularly symmetric, the method comprising:

acquiring a target image of the target object;

extracting a first feature of the target image, wherein the first feature is used for indicating the target image;

acquiring a second feature according to the first feature, wherein the second feature is a turnover feature corresponding to the first feature;

Fusing the first feature and the second feature to obtain a fused feature;

and determining the result form of the target object according to the fusion characteristic, wherein the result form is a normal form or an abnormal form.

According to another aspect of the present application, there is provided an apparatus for identifying a morphology of a target object, the target object having a corresponding symmetric object in a physical world, the target object and the symmetric object being centrally symmetric or specularly symmetric, the apparatus comprising:

the image acquisition module is used for acquiring a target image of the target object;

a first feature extraction module for extracting a first feature of the target image, the first feature being used to indicate the target image;

the second feature extraction module is used for acquiring a second feature according to the first feature, wherein the second feature is a turning feature corresponding to the first feature;

the feature fusion module is used for fusing the first feature and the second feature to obtain a fusion feature;

and the morphology determining module is used for determining the result morphology of the target object according to the fusion characteristics, wherein the result morphology is a normal morphology or an abnormal morphology.

According to another aspect of the present application, there is provided a terminal comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to implement a method of identifying a morphology of a target object as provided by an implementation of the present application.

According to another aspect of the present application, there is provided a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a method of identifying a morphology of a target object as provided by an implementation of the present application.

The beneficial effects that technical scheme that this application embodiment provided can include:

according to the method and the device for obtaining the target image of the target object, the first feature in the target image can be extracted, the corresponding second feature is obtained according to the first feature, the first feature and the second feature are fused together to form the fusion feature, and the result form of the target object is determined according to the fusion feature and is a normal form or an abnormal form. Because the target object has a corresponding symmetrical object, and the symmetrical relation between the two objects can be mirror symmetry or center symmetry. Therefore, the method and the device can determine the form of the target object through one detection flow no matter whether the target object is one object or two objects in a symmetrical relation, improve the efficiency of determining the form of the target object, and reduce the probability of missed detection and false detection of the target object through feature comparison.

Drawings

In order to more clearly describe the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a block diagram of a terminal according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a method of identifying a morphology of a target object provided in an exemplary embodiment of the present application;

FIG. 3 is a flowchart of a method for identifying a morphology of a target object according to another exemplary embodiment of the present application;

FIG. 4 is a schematic illustration of a training process based on a first feature extraction model and a first fully connected layer provided in FIG. 3;

FIG. 5 is a schematic diagram of a process for identifying a target object provided based on the embodiment shown in FIG. 3;

fig. 6 is a block diagram of an apparatus for recognizing a morphology of a target object according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "connected," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the field of image processing technology, techniques for identifying the human eye include methods based on geometric features. In the application of the scheme based on the geometric features, certain differences exist between the recognized human eye shape and the expression. Therefore, with a fixed geometry, more false and missed detection situations occur.

In one possible implementation manner provided by the embodiment of the present application, the terminal may extract a first feature and a second feature from a target object having a symmetrical characteristic, where the second feature is a flip feature corresponding to the first feature. According to the scheme, the first feature and the second feature are fused to obtain the fusion feature, and whether the target object is in the normal form or the abnormal form is determined according to the fusion feature, so that the efficiency of identifying the form of the target object with the symmetrical characteristic is improved.

In another possible implementation provided in the present application, a model based on deep learning can be used for human eye recognition. It should be noted that, because the model based on deep learning has excellent self-learning ability and nonlinear fitting ability, the model can automatically extract high-level features from images, thereby having better robustness and adaptability in technical application of human eye recognition.

For ease of understanding of the schemes shown in the embodiments of the present application, several terms appearing in the embodiments of the present application are described below.

ADAM (english: adaptive Moment Estimation, chinese: adaptive moment estimation) model: the method is an optimization algorithm model, can update the neural network weight iteratively based on training data, and designs independent adaptive learning rates for different parameters by calculating first moment estimation and second moment estimation of gradients.

MSE (English: mean Squared Error, chinese: mean square error): the smaller the value of MSE used to evaluate the degree of change in the data, the better the performance of the model constructed based on the convolutional neural network.

The method for identifying the form of the target object according to the embodiment of the present application may be applied to a terminal having a display screen and a function of identifying the form of the target object. The terminals may include mobile electronic devices such as a cell phone, a tablet computer, smart glasses, a smart watch, a digital camera, an MP4 play terminal, an MP5 play terminal, a learning machine, a point-to-read machine, an electronic book, an electronic dictionary, a vehicle-mounted terminal, a Virtual Reality (VR) play terminal, or an augmented Reality (Augmented Reality, AR) play terminal.

Referring to fig. 1, fig. 1 is a block diagram of a terminal according to an exemplary embodiment of the present application, where, as shown in fig. 1, the terminal includes a processor 120 and a memory 140, where at least one instruction is stored in the memory 140, and the instruction is loaded and executed by the processor 120 to implement a method for identifying a morphology of a target object according to various method embodiments of the present application. Optionally, the terminal 100 may further include an image capturing component and a display component, which is not limited in this embodiment of the present application.

In the present application, the terminal 100 is an electronic device having a morphological function of identifying a target object. When the terminal 100 acquires a target image of a target object, the terminal 100 is able to extract a first feature of the target image, the first feature being used to indicate the target image; acquiring a second feature according to the first feature, wherein the second feature is a turnover feature corresponding to the first feature; fusing the first feature and the second feature to obtain a fused feature; and determining the result form of the target object according to the fusion characteristics, wherein the result form is a normal form or an abnormal form.

Processor 120 may include one or more processing cores. The processor 120 connects various parts within the overall terminal 100 using various interfaces and lines, performs various functions of the terminal 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 140, and invoking data stored in the memory 140. Alternatively, the processor 120 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 120 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 120 and may be implemented by a single chip.

The Memory 140 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Optionally, the memory 140 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 140 may be used to store instructions, programs, code sets, or instruction sets. The memory 140 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc.; the storage data area may store data and the like referred to in the following respective method embodiments.

Referring to fig. 2, fig. 2 is a flowchart of a method for identifying a morphology of a target object according to an exemplary embodiment of the present application. The method of recognizing the morphology of the target object may be applied to the terminal shown above. The object has corresponding symmetrical objects in the physical world, and the object and the symmetrical objects are centrally symmetrical or mirror symmetrical. In fig. 2, a method of identifying a morphology of a target object includes:

Step 210, a target image of a target object is acquired.

In the embodiment of the application, the target image may be an image acquired by the terminal in real time, or an image acquired by the terminal from a network. In one possible manner, the target image may be a photograph taken by the terminal through the image capturing component, a live-view image displayed in a viewfinder, or a frame of image in a video taken by the terminal.

In another possible manner, the target image may be an image file obtained by the terminal from the network or from another device through wired data transmission or wireless data transmission, etc. In this application scenario, the target image may still be a picture or an image frame.

In another classification mode of the target image, the target image may be an image acquired by the image acquisition component, or the target image may be an image drawn by the electronic device, and the source of the target image is not limited in this embodiment of the present application.

In the embodiment of the application, the terminal acquires the target image of the target object, which can be read from the local or downloaded from the network.

It should be noted that, the target object has a corresponding symmetric object in the physical world, and the target object and the symmetric object are centrally symmetric or mirror symmetric. For human beings, the body surface organs of the human body have symmetry in terms of morphology. Such as the human eye, hands, arms, legs, and feet. Each set of symmetrical organs can be represented separately from left to right. For example, the left and right eyes may act as a pair of mirror-symmetrical organs.

In the basic application embodiment, a human eye is taken as an example for illustration, and the target object may be a left eye or a right eye. In one aspect, when the target object is a left eye, its corresponding symmetrical object is a right eye, and the symmetrical relationship of the left and right eyes is specular. On the other hand, when the target object is the right eye, the corresponding symmetrical object is the left eye, and the symmetrical relationship of the left eye and the right eye is specular symmetry.

In one possible implementation, the target object and the symmetric object belong to the same subject, and the target object and the symmetric object may appear in the same image.

It should be noted that the target object and the symmetrical object are not limited to the body surface organ of the human body. The target object and the symmetric object may also be organs in the human body, for example, the left and right lungs, the left and right kidneys of the human body. The method can also be applied to a scene of morphological analysis of human organ conditions.

Alternatively, the target object and the symmetrical object may be other centrosymmetric objects, which are not limited in this embodiment of the present application.

Step 220, extracting a first feature of the target image, the first feature being used to indicate the target image.

In the embodiment of the application, the terminal can extract the first feature from the target image, wherein the first feature is used for indicating the target image. It should be noted that, in performing this operation, the terminal may use a model tool preset therein. The model can be designed by a mathematical modeling method or can be obtained based on deep learning training.

When the model used in the embodiment of the present application is a model obtained based on deep learning training, the first feature may be a feature extracted by any one feature extraction layer in the model.

Alternatively, the dimension of the first feature may be a dimension set in advance by a designer. Different designs are made according to the complexity of the target object. In one possible approach, the dimension of the first feature may be a lower dimension such as 16, 32 or 64 when the image complexity of the target object is lower. In another possible way, the dimension of the first feature may be a higher dimension such as 128, 256 or 512 when the image complexity of the target object is higher. The dimension of the first feature is not limited in this embodiment, and the specific value thereof may be determined according to the actual situation.

Step 230, obtaining a second feature according to the first feature, where the second feature is a flip feature corresponding to the first feature.

It should be noted that, in the embodiment of the present application, the terminal may obtain the second feature according to the first feature. Similar to the processing manner in step 220, the terminal can also obtain the second feature from the first feature through a preset data processing model.

In an embodiment of the present application, two ways of obtaining the second feature from the first feature are provided.

In one possible manner, the terminal may invert the target image according to the symmetrical relationship between the target image and the symmetrical image to obtain an inverted image, and process the inverted image using the same model as in step 220 to obtain the second feature. However, the method is large in calculation amount, is unfavorable for the quick implementation of the embodiment of the application under the condition of limited hardware resources, and is high in accuracy.

In another possible mode, the terminal can acquire the second feature from the second feature through a neural network based on deep learning, and the acquisition mode is fast and efficient, so that the storage space for storing various models in the terminal is reduced, and the occupation of computing resources in the operation process is reduced.

It should be noted that the second feature may be used to indicate a flipped image of the target image.

Step 240, fusing the first feature and the second feature to obtain a fused feature.

In the embodiment of the application, the terminal can also fuse the first feature and the second feature to obtain a fused feature. In one possible way, if the first feature and the second feature are both represented by a form of a vector, then the dimensions of the first feature and the second feature are the same. In a specified fusion process, the specified dimensions in the first feature are fused with the dimensions of the same location in the second feature. For example, the dimension of the first feature and the dimension of the second feature are both 128. Then the 26 th dimension of the first feature will be fused with the 26 th dimension of the second feature when the first feature and the second feature are fused, resulting in the 26 th dimension of the fused feature.

In a possible implementation manner in the embodiment of the present application, the first feature and the second feature may be fused by adding the first feature and the second feature to obtain a sum feature. Then 1/2 of the sum feature is determined as the fusion feature. For example, take the first feature as (1,2,4,1,2), the second feature as (3,1,2,3,2), and the value feature as (4,3,6,5,4), the fusion feature as (2,1.5,3,2.5,2).

Alternatively, the relationship of the first feature (English: logit) and the second feature (English: flip-Logit) and the fusion feature (English: final Logit) can be expressed as follows:

Final Logit＝1/2(Logit+Flip-Logit)

step 250, determining the result form of the target object according to the fusion characteristics, wherein the result form is a normal form or an abnormal form.

In the embodiment of the application, the terminal can determine the result form of the target object according to the fusion characteristics.

In one possible implementation, the terminal can classify the fusion features by using two classifiers to determine the result features of the target object. The two classifiers can determine whether the result form of the target object is a normal form or an abnormal form according to the fusion characteristics. Alternatively, the classifier may be a Softmax classifier.

In another possible implementation manner, the terminal may also obtain the result form of the target object through other classifiers, which is not limited in the embodiment of the present application.

In summary, the method for identifying the morphology of the target object provided in this embodiment can obtain the target image of the target object, extract the first feature in the target image, obtain the corresponding second feature according to the first feature, fuse the first feature and the second feature together to form a fused feature, and determine the resulting morphology of the target object according to the fused feature, where the resulting morphology is a normal morphology or an abnormal morphology. Because the target object has a corresponding symmetrical object, and the symmetrical relation between the two objects can be mirror symmetry or center symmetry. Therefore, the method and the device can determine the form of the target object through one detection flow no matter whether the target object is one object or two objects in a symmetrical relation, improve the efficiency of determining the form of the target object, and reduce the probability of missed detection and false detection of the target object through feature comparison.

Based on the solution disclosed in the previous embodiment, the terminal can also refer to the following embodiments.

Referring to fig. 3, fig. 3 is a flowchart of a method for identifying a morphology of a target object according to another exemplary embodiment of the present application. The method of recognizing the morphology of the target object may be applied to the terminal shown above. In fig. 3, the method for identifying the morphology of the target object includes:

In step 310, a target image of a target object is acquired.

In the embodiment of the present application, the execution process of step 310 is the same as the execution process of step 210, and will not be described herein.

Step 320, inputting the target image into the first feature extraction model, and obtaining the first feature output by the first feature extraction model.

In this embodiment of the present application, the first feature extraction model is a feature extraction model that completes training through a first training sample, the first training sample is a first training image labeled with a result form, and the first training image and the target image are images belonging to the same appearance form type.

Referring to fig. 4, fig. 4 is a schematic diagram of a training process based on a first feature extraction model and a first full-connection layer provided in fig. 3. In fig. 4, the terminal is capable of face detection and face keypoint determination for the first training image 410. Alternatively, face detection and face keypoint determination may be achieved by already trained face detection models and face keypoint detection models. After determining the left eye key point and the right eye key point in the first training image 410, the terminal cuts out a left eye sub-image 411 from the first training image 410 according to the left eye key point; the terminal will crop the right eye sub-image 412 from the first training image 410 based on the right eye keypoints. It should be noted that, the left-eye sub-image 411 and the right-eye sub-image 412 may be used as training samples for training the first feature extraction model and the first full-connection layer (i.e., the first training image 410 is actually the left-eye sub-image 411 and the right-eye sub-image 412 cut out therefrom as training samples). In another possible manner, if the first training image 410 includes only the left-eye sub-image 411, or the first training image 410 includes only the right-eye sub-image 412, the terminal uses the only sub-image as the training sample.

Alternatively, the clipping mode may be a regular quadrangle centered on the eye key point, with a distance between the eye key point and the eyebrow key point being 1/2 of the side length.

In fig. 4, during one training process, the terminal inputs a left eye sub-image 411 or a right eye sub-image 412. In one training process, the terminal only inputs one picture to the original feature extraction model. Alternatively, the original feature extraction model 420 in the embodiment of the present application may be a MobileNetV2 model, where the MobileNetV2 model has high performance and lightweight characteristics and is suitable for being disposed in a mobile device. The original feature extraction model 420 can extract training features 430, calculate the training features 430 through a loss function 440, and optimize using an optimization model.

In the training process shown in fig. 4, the terminal may train the original feature extraction model using training samples, and the Loss function 440 may be a Softmax Loss function selected as follows:

in the Softmax Loss function, x represents the output vector of the mobilenet v2 model, W is the weight vector, b is the bias, and y is the label. After the loss caused by the training sample is solved according to the function, the terminal can optimize the MobileNetV2 model by using a specified optimization model, and optionally, the optimization model can be an ADAM model. After training the mobilenet v2 model, a first feature extraction model in the embodiment of the present application may be obtained.

In step 331, a symmetry relationship between the target object and the corresponding symmetry object is determined.

In the present embodiment, the symmetrical relation includes center symmetry and mirror symmetry. The terminal is able to determine a symmetry relationship between the target object and the corresponding symmetry object. For example, the symmetrical relationship between the left and right eyes is mirror symmetry. The symmetrical relation of the white part and the black part in the Chinese traditional pattern is center symmetry.

And 332, turning the second training image according to the symmetrical relation to obtain a corresponding real turning image.

In the embodiment of the application, the terminal can overturn the second training image according to the determined symmetrical relation to obtain the corresponding real overturn image. It should be noted that, because the image obtained after the second training image is turned over is a true turning over image, that is, the true turning over image is turned over again according to the symmetrical relation, the corresponding second training image can be obtained.

Alternatively, in one possible application, the inversion according to mirror symmetry may be a Horizontal inversion (English).

Step 333, inputting the real flip image into the first feature extraction model to obtain the real flip feature.

In the embodiment of the application, the terminal inputs the real Flip image into the first feature extraction model, and can extract the real Flip feature (English: flip-Logit) through the model.

And 334, labeling the real turning features on the corresponding second training images to obtain second training samples.

In the embodiment of the application, the terminal can correspondingly store the real turning feature and the second training image. In one possible approach, the terminal may standard the true rollover feature on the second training image to obtain a second training sample.

And step 335, training the original full-connection layer according to the second training sample and the second loss function to obtain a first full-connection layer.

In the embodiment of the application, the terminal can train the original full connection layer according to the second training sample and the second loss function. The terminal inputs the second training sample into the first feature extraction model to obtain a first feature, the first feature obtains a predicted turning feature through the original full-connection layer, and the loss function trains the original full-connection layer through the predicted turning feature and the real turning feature.

It should be noted that the dimension and the layer number of the original full connection layer (abbreviated as FC) may be set according to the actual use requirement, which is not limited in the embodiment of the present application. In one possible implementation, the dimension of the original fully-connected layer may be 32, 64, 128, 256, 512, or the like. The number of layers of the original full connection layer can be 1 layer, 2 layers, 3 layers or 4 layers, etc.

In an embodiment of the present application, the second loss function may be an MSE loss function. The MSE loss function may be as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

predictive roll-over feature for original full link layer, y _i Is a true rollover feature.

It should be noted that, in the embodiment of the present application, the terminal may complete the training of the first full-connection layer by executing steps 331 to 335, and combine the training process for the first feature extraction model described above to implement the training of the neural network based on deep learning shown in the embodiment of the present application. In the optimization process, the network weight of the first feature classification model is unchanged, and the ADAM algorithm optimization model only optimizes the original full-connection layer, so that the first full-connection layer is finally obtained.

In the training process shown in fig. 4, the original fully connected layer 450, the predicted roll-over feature 460 and the second loss function 470 are also included.

Optionally, in an embodiment of the present application, the output dimension of the first fully-connected layer is the same as the dimension of the input first feature.

In one possible implementation, the terminal is capable of encapsulating the first feature extraction model and the first full connectivity layer as a target neural network. In a practical application scenario, a terminal inputs a target image into a target neural network, and a result form of the target image can be obtained.

And step 340, training the target neural network through the third training sample.

In the embodiment of the present application, the third training sample is a third training image labeled with a result form, and the third training image and the target image are images belonging to the same appearance form type. The learning rate of training the target neural network is a target learning rate, the target learning rate is less than the learning rate of training the first feature extraction model, and the target learning rate is less than the learning rate of training into the first fully connected layer.

In the embodiment of the application, the terminal can perform fine adjustment on the target neural network through the third training sample, so that after the first feature extraction model and the first full-connection layer are in series connection, weight fine adjustment is performed integrally, and the accuracy of the result morphology of the target neural network for identifying the target object is higher.

Step 350, inputting the first feature into the first fully-connected layer to obtain a second feature.

The first full-connection layer is a full-connection layer for completing training through a second training sample, the second training sample is marked with a second training image with real overturning characteristics, the second training image and the target image are images belonging to the same appearance form type, and the output dimension of the first full-connection layer is equal to the dimension of the first characteristic.

In step 360, the first feature and the second feature are fused to obtain a fused feature.

In the embodiment of the present application, the execution of step 360 is the same as the execution of step 240, and will not be described here again.

Step 370, determining the result form of the target object according to the fusion characteristics.

In the embodiment of the present application, the execution of step 370 is the same as the execution of step 250, and will not be described here again.

It should be noted that, the present application can encapsulate the target neural network and the classifier to obtain an integral morphological recognition model. Referring to fig. 5, fig. 5 is a schematic diagram of a process for identifying a target object according to the embodiment shown in fig. 3. After the target image 510 is input into the morphological recognition model, a human eye image 511 is first obtained, and the human eye image 511 is input into the first feature extraction model 520 to obtain a first feature 531. The human eye image 511 is input into the first feature extraction model 520, and then passes through the first full connection layer 521 to obtain a second feature 532, the first feature 531 and the second feature 532 are fused to obtain a fused feature 533, and the fused feature 533 passes through the two classifiers 540 to obtain a result form 550. When the target object is the human eye, the resulting morphology 550 indicates an open eye morphology or indicates a closed eye morphology. The morphology recognition model includes a first feature extraction model 520, a first fully connected layer 521, and a classifier 540.

In one possible application scenario of the embodiment of the present application, when the target image is a viewfinder image of the terminal and the resulting form is an open-eye form, the viewfinder image is photographed. In the scene, the method and the device can capture the image of the person when the eyes of the person are opened, and improve the capability of the mobile terminal for capturing the image of the eyes of the person.

In another possible application scenario of the embodiment of the present application, when the target image is an image captured by the terminal and the result form is a closed-eye form, a reminder message is displayed, where the reminder message is used to prompt that the closed-eye condition occurs in the target image. In the scene, the terminal can prompt the user in time which images contain the eye-closing condition, so that the user can conveniently carry out subsequent processing.

In another possible application scenario of the embodiment of the present application, when the target image is an image captured by the terminal and the result form is a closed-eye form, the target image is moved to the closed-eye image album. In the scene, the terminal can automatically arrange the images in the eye-closing form into an album, so that the user can conveniently and rapidly process the images in batches.

In another possible application scenario of the embodiment of the present application, when the target image is used as an image of the face unlocking terminal, a result form is obtained; when the result form is a closed-eye form, refusing to unlock the terminal; and unlocking the terminal when the result form is an open eye form and the target image is matched with the preset image template. In the scene, the terminal can avoid the illegal unlocking of the user by others in the eye-closing state, and the information security in the terminal is protected.

It should be noted that, the first training image, the second training image and the third training image in the embodiments of the present application may be different training samples, so as to improve the overall recognition capability of the method provided in the present application.

In summary, the embodiment can obtain the target image of the target object, extract the first feature in the target image, obtain the corresponding second feature according to the first feature, fuse the first feature and the second feature together to form a fused feature, and determine the result form of the target object according to the fused feature, where the result form is a normal form or an abnormal form. Because the target object has a corresponding symmetrical object, and the symmetrical relation between the two objects can be mirror symmetry or center symmetry. Therefore, the method and the device can determine the form of the target object through one detection flow no matter whether the target object is one object or two objects in a symmetrical relation, improve the efficiency of determining the form of the target object, and reduce the probability of missed detection and false detection of the target object through feature comparison.

Optionally, the embodiment of the application can also effectively select and filter the eye-closing picture by detecting the state of the eyes in real time, so that the photographing experience of the user is improved. On one hand, the human eye state recognition model provided by the invention is based on the MobileNet V2 network, so that the human eye state of the mobile terminal can be accurately judged in real time. On the other hand, the existing human eye state recognition model does not consider the difference between the left eye and the right eye, and the left eye and the right eye of the same person are relatively mirror symmetrical except for a certain difference between the angle and the gesture, and although the mode of increasing the data quantity and enhancing the data can be made up to a certain degree, the difference caused by the mirror symmetry still has a certain influence on the precision. In order to solve the problem, a model can be trained by a left eye and a right eye respectively, then in the actual use process, whether the model is the left eye or the right eye is judged first, and then the model is input into the corresponding model for human eye state identification, however, the model is not only required to be two models, but also a judging module is required to be added, and the requirements of real-time performance and light weight of a mobile terminal cannot be well met. According to the method provided by the invention, firstly, the original characteristics of the human eye picture are extracted by using the MobileNet V2 model, then, the inversion characteristics of the human eye picture can be obtained by using the mapping relation between the original characteristics and the inversion characteristics, and finally, the original characteristics and the inversion characteristics are fused to obtain the final characteristics, so that the final characteristics can simultaneously contain the information of the original picture and the information of the mirror symmetry picture of the original picture no matter the left eye region picture or the right eye region picture is input, the robustness and the adaptability of the model are improved, the extraction and the fusion of all the characteristics are completed in one model, no redundant judging module exists, and the requirements of a mobile terminal on real-time performance, light weight and high performance are met.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Referring to fig. 6, fig. 6 is a block diagram illustrating an apparatus for recognizing a morphology of a target object according to an exemplary embodiment of the present application. The means for identifying the morphology of the target object may be implemented as all or part of the terminal by software, hardware or a combination of both. The target object has a corresponding symmetrical object in the physical world, and the target object and the symmetrical object are centrally symmetrical or mirror symmetrical. The device comprises:

an image acquisition module 610, configured to acquire a target image of the target object;

a first feature extraction module 620, configured to extract a first feature of the target image, where the first feature is used to indicate the target image;

a second feature extraction module 630, configured to obtain a second feature according to the first feature, where the second feature is a flip feature corresponding to the first feature;

a feature fusion module 640, configured to fuse the first feature and the second feature to obtain a fused feature;

The morphology determining module 650 is configured to determine a resulting morphology of the target object according to the fusion feature, where the resulting morphology is a normal morphology or an abnormal morphology.

In an alternative embodiment, the first feature extraction module 620 is configured to input the target image into a first feature extraction model, and obtain the first feature output by the first feature extraction model; the first feature extraction model is a feature extraction model which is trained through a first training sample, the first training sample is a first training image marked with the result form, and the first training image and the target image are images belonging to the same appearance form type.

In an optional embodiment, the second feature extraction module 630 is configured to input the first feature into a first fully-connected layer to obtain the second feature; the first full-connection layer is a full-connection layer for completing training through a second training sample, the second training sample is marked with a second training image with real overturning characteristics, the second training image and the target image are images belonging to the same appearance form type, and the output dimension of the first full-connection layer is equal to the dimension of the first characteristic.

In an alternative embodiment, the apparatus further comprises an execution module for determining a symmetry relationship between the target object and the corresponding symmetric object, the symmetry relationship including center symmetry and mirror symmetry; according to the symmetrical relation, the second training image is overturned to obtain a corresponding real overturned image; inputting the real overturn image into the first feature extraction model to obtain the real overturn feature; labeling the real turning features on the corresponding second training images to obtain the second training samples; and training an original full-connection layer according to the second training sample and the second loss function to obtain the first full-connection layer.

In an alternative embodiment, the first feature extraction model and the first fully connected layer involved in the apparatus are packaged as a target neural network.

In an optional embodiment, the apparatus further includes a fine tuning module, configured to train the target neural network through a third training sample, where the third training sample is a third training image labeled with the result form, and the third training image and the target image are images belonging to a same appearance form type; the learning rate of training the target neural network is a target learning rate, the target learning rate is smaller than the learning rate of training the first feature extraction model, and the target learning rate is smaller than the learning rate of training the first full-connection layer.

In an alternative embodiment, when the target object related to the apparatus is a human eye, the relationship between the target object and the feature object is mirror symmetry, the abnormal shape is a closed-eye shape, the normal shape is an open-eye shape, and the execution module is further configured to capture the view image when the target image is a view image of a terminal and the resultant shape is the open-eye shape; and/or when the target image is an image shot by a terminal and the result form is the eye closing form, displaying a reminding message, wherein the reminding message is used for reminding that the eye closing condition occurs in the target image; and/or when the target image is an image shot by a terminal and the result form is the eye-closing form, moving the target image to an eye-closing image album; and/or, when the target image is used as an image of the face unlocking terminal, acquiring the result form; when the result form is the eye-closing form, refusing to unlock the terminal; and unlocking the terminal when the result form is the open-eye form and the target image is matched with a preset image template.

In an alternative embodiment, the feature fusion module 640 is configured to add the first feature and the second feature to obtain a sum feature; and determining one half of the sum feature as the fusion feature.

Embodiments of the present application also provide a computer readable medium storing at least one instruction that is loaded and executed by the processor to implement the method for identifying a morphology of a target object according to the above embodiments.

It should be noted that: in the apparatus for recognizing the form of the target object according to the above embodiment, only the division of the functional modules is used for illustration, and in practical application, the above-mentioned functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the apparatus for identifying the form of the target object provided in the above embodiment belongs to the same concept as the method embodiment for identifying the form of the target object, and the detailed implementation process of the apparatus is referred to the method embodiment and is not repeated here.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is merely illustrative of the possible embodiments of the present application and is not intended to limit the present application, but any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of identifying a morphology of a target object, wherein the target object has a corresponding symmetric object in a physical world, the target object and the symmetric object being centrally symmetric or specularly symmetric, the method comprising:

acquiring a target image of the target object;

inputting the target image into a first feature extraction model, and acquiring a first feature output by the first feature extraction model, wherein the first feature is used for indicating the target image;

inputting the first feature into a first full-connection layer to obtain a second feature, wherein the second feature is a turnover feature corresponding to the first feature;

fusing the first feature and the second feature to obtain a fused feature;

determining the result form of the target object according to the fusion characteristics, wherein the result form is a normal form or an abnormal form;

the first full-connection layer is a full-connection layer which is trained through a second training sample, the second training sample is marked with a second training image with a real overturning characteristic, the second training image and the target image are images belonging to the same appearance form type, and the output dimension of the first full-connection layer is equal to the dimension of the first characteristic;

The training method of the first full-connection layer comprises the following steps: determining a symmetry relationship between the target object and the corresponding symmetric object, the symmetry relationship including center symmetry and mirror symmetry; according to the symmetrical relation, the second training image is overturned to obtain a corresponding real overturned image; inputting the real overturn image into the first feature extraction model to obtain the real overturn feature; labeling the real turning features on the corresponding second training images to obtain the second training samples; and training an original full-connection layer according to the second training sample and the second loss function to obtain the first full-connection layer.

2. The method of claim 1, wherein the first feature extraction model is a feature extraction model trained over a first training sample, the first training sample being a first training image labeled with the resulting morphology, the first training image and the target image being images belonging to a same appearance morphology type.

3. The method of claim 2, wherein the first feature extraction model and the first full connectivity layer are encapsulated as a target neural network.

4. A method according to claim 3, characterized in that the method further comprises:

training the target neural network through a third training sample, wherein the third training sample is a third training image marked with the result form, and the third training image and the target image are images belonging to the same appearance form type;

the learning rate of training the target neural network is a target learning rate, the target learning rate is smaller than the learning rate of training the first feature extraction model, and the target learning rate is smaller than the learning rate of training the first full-connection layer.

5. The method of any one of claims 1 to 4, wherein when the target object is a human eye, the relationship between the target object and the feature object is mirror symmetric, the abnormal shape is a closed-eye shape, and the normal shape is an open-eye shape, the method further comprising:

when the target image is a view finding image of a terminal and the result form is the open-eye form, shooting the view finding image;

and/or the number of the groups of groups,

when the target image is an image shot by a terminal and the result form is the eye closing form, displaying a reminding message, wherein the reminding message is used for reminding that the eye closing condition occurs in the target image;

And/or the number of the groups of groups,

when the target image is an image shot by a terminal and the result form is the eye-closing form, moving the target image to an eye-closing image album;

and/or the number of the groups of groups,

when the target image is used as an image of a face unlocking terminal, acquiring the result form;

when the result form is the eye-closing form, refusing to unlock the terminal;

and unlocking the terminal when the result form is the open-eye form and the target image is matched with a preset image template.

6. The method of any one of claims 1 to 4, wherein said fusing said first feature and said second feature to obtain a fused feature comprises:

adding the first feature and the second feature to obtain a sum feature;

and determining one half of the sum feature as the fusion feature.

7. An apparatus for identifying a morphology of a target object, wherein the target object has a corresponding symmetric object in a physical world, the target object and the symmetric object being centrally symmetric or specularly symmetric, the apparatus comprising:

The first feature extraction module is used for inputting the target image into a first feature extraction model, and obtaining a first feature output by the first feature extraction model, wherein the first feature is used for indicating the target image;

the second feature extraction module is used for inputting the first feature into a first full-connection layer to obtain the second feature, wherein the second feature is a turnover feature corresponding to the first feature;

the morphology determining module is used for determining the result morphology of the target object according to the fusion characteristics, wherein the result morphology is a normal morphology or an abnormal morphology;

8. A terminal comprising a processor, and a memory coupled to the processor, and program instructions stored on the memory, which when executed by the processor implement the method of identifying the morphology of a target object according to any one of claims 1 to 6.

9. A computer readable storage medium having stored therein program instructions, which when executed by a processor, implement a method of identifying a morphology of a target object according to any one of claims 1 to 6.