WO2020108041A1

WO2020108041A1 - Detection method and device for key points of ear region and storage medium

Info

Publication number: WO2020108041A1
Application number: PCT/CN2019/107104
Authority: WO
Inventors: 李宣平; 李岩; 张国鑫
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2018-11-28
Filing date: 2019-09-20
Publication date: 2020-06-04
Also published as: CN109522863A; CN109522863B

Abstract

The present application discloses a detection method and device for key points of ear region and a storage medium, relating to the technical field of image processing. The method comprises the following steps: obtaining a facial image, the facial image comprising key points of facial profile used for determining ear regions in the facial image; obtaining a detection module for key points of ear region used for detecting key points of any ear region; and detecting the key points of ear region in the facial image based on the ear key point detection module and the positions of the key points of facial profile in the facial image. The method uses key points of facial profile to determine the ear regions, and uses the detection module for key points of ear region to detect the key points of ear region in the facial image, taking into consideration the positional relation between the ear regions and the facial profile as well as learning to detect the key points of ear region by means of the detection module for key points of ear region, thereby improving the accuracy of the key points of ear region and reducing errors.

Description

Ear key point detection method, device and storage medium

This application requires the priority of the Chinese patent application submitted to the China Patent Office on November 28, 2018, with the application number 201811437331.6 and the invention titled "ear key point detection method, device and storage medium". In this application.

Technical field

The present application belongs to the field of image processing, and particularly relates to an ear key point detection method, device and storage medium.

Background technique

In recent years, with the rapid development and wide application of image processing technology, in many fields such as virtual reality and short video, it is usually necessary to detect the key points of the ear in the face image, based on the detected key points of the ear Perform operations on the ear area in the image, such as adding decorations.

Move a circular area of a certain size in the face image containing the ear area, scan the face image located in the circular area, and highlight the gray scale according to the characteristics of the different gray scales of the pixels in different parts of the ear area The pixel point of is determined as the outer pinna edge point, so that multiple outer pinna edge points are determined by moving the circular area multiple times, and the ear area of the face image is determined according to the multiple outer pinna edge points, according to the ear area The gray level of each pixel determines the key points of the ear.

The inventor found that the above solution only determines the key points of the ear according to the grayscale of each pixel in the ear area, which results in the detected key points of the ear being inaccurate and large errors.

Summary of the invention

In order to overcome the problems in the related art, the present application discloses a method, device and storage medium for detecting key points of the ear.

According to a first aspect of the embodiments of the present application, a method for detecting key points of an ear is provided. The method includes:

Acquiring a face image, the face image includes key points of a face contour, and the key points of the face contour are used to determine an ear region in the face image;

Acquiring an ear key point detection model, the ear key point detection model is used to detect an ear key point in any ear region;

Based on the ear key point detection model and the position of the face contour key point in the face image, the ear key point in the face image is detected.

According to a second aspect of the embodiments of the present application, an ear key point detection device is provided, and the device includes:

An image acquisition unit configured to acquire a face image, the face image including key points of a face contour, and the key points of the face contour are used to determine an ear region in the face image;

A model acquisition unit configured to acquire an ear key point detection model, the ear key point detection model is used to detect an ear key point in any ear area;

The determining unit is configured to detect the ear key points in the face image based on the ear key point detection model and the position of the face contour key points in the face image.

According to a third aspect of the embodiments of the present application, there is provided an ear key point detection device, the device comprising:

processor;

Memory for storing processor executable commands;

Wherein, the processor is configured to:

According to a fourth aspect provided by an embodiment of the present application, a non-transitory computer-readable storage medium is provided, and when instructions in the storage medium are executed by a processor of a detection device, the detection device can perform an ear key Point detection method, the method includes:

According to a fifth aspect of the embodiments of the present application, an application program/computer program product is provided, and when instructions in the application program/computer program product are executed by a processor of a detection device, the detection device can execute an ear Key point detection method, the method includes:

The technical solutions provided by the embodiments of the present application may include the following beneficial effects:

Obtain the ear key point detection model by acquiring the face image including the face contour key points, the face contour key point is used to determine the ear area in the face image, and the ear key point detection model is used to detect the ear area The key points in the ear are detected based on the key point detection model of the ear and the key points of the contour of the face to detect the key points of the ear in the face image. The ear area is determined by using the key points of the face contour, and the ear key point detection model is used to detect the key points of the ear in the face image. Considering the positional relationship between the ear area and the face contour, the The key point detection model learns how to detect key points in the ear area, which improves the accuracy of key points in the ear and reduces errors.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present application.

BRIEF DESCRIPTION

The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the application, and are used together with the specification to explain the principles of the application.

Fig. 1 is a flowchart of a method for detecting key points of an ear according to an exemplary embodiment;

Fig. 2 is a flowchart of a method for detecting key points of an ear according to an exemplary embodiment;

Fig. 3 is a schematic diagram of a face image according to an exemplary embodiment;

Fig. 4 is a flow chart of a method for detecting key points of an ear according to an exemplary embodiment;

Fig. 5 is a block diagram of an ear key point detection device according to an exemplary embodiment;

Fig. 6 is a block diagram of a terminal for key point detection of an ear according to an exemplary embodiment;

Fig. 7 is a schematic structural diagram of a server according to an exemplary embodiment.

detailed description

Exemplary embodiments will be described in detail here, examples of which are shown in the drawings. When referring to the drawings below, unless otherwise indicated, the same numerals in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.

Fig. 1 is a flowchart of an ear key point detection method according to an exemplary embodiment. As shown in Fig. 1, the ear key point detection method is used in a detection device and includes the following steps:

In step 101, a face image is obtained. The face image includes face contour key points, and the face contour key points are used to determine the ear region in the face image.

In step 102, an ear key point detection model is obtained, and the ear key point detection model is used to detect ear key points in any ear region.

In step 103, based on the ear key point detection model and the position of the face contour key point in the face image, the ear key point in the face image is detected.

The method provided in the embodiment of the present application obtains an ear key point detection model by acquiring a face image including key points of a face outline, and the face outline key points are used to determine an ear area and ear key points in a face image The detection model is used to detect key points of the ear in the ear area, and then the key points of the ear in the face image are detected based on the key point detection model of the ear and the key points of the face contour. The ear area is determined by using the key points of the face contour, and the ear key point detection model is used to detect the key points of the ear in the face image. Considering the positional relationship between the ear area and the face contour, the ear The key point detection model learns how to detect key points in the ear area, which improves the accuracy of key points in the ear and reduces errors.

In a possible implementation manner, based on the ear key point detection model and the position of the face contour key points in the face image, detecting the ear key points in the face image includes:

Determine the first ear area and the second ear area in the face image according to the position of the key points of the face contour in the face image;

Detect the ear key points in the first ear area and the ear key points in the second ear area based on the ear key point detection model, the first ear area and the second ear area;

According to the determined position of each ear key point in the ear area and the position of the first ear area and the second ear area in the face image, determine each ear key point in the face image s position.

In another possible implementation manner, the first ear region and the second ear region in the face image are determined according to the positions of the key points of the face contour in the face image, including:

Obtain the first specified key point and the second specified key point among the key points of the face contour;

The first ear region including the first designated key point and the second ear region including the second designated key point are determined.

In another possible implementation manner, the first ear area belongs to the first type ear area, the second ear area belongs to the second type ear area, and the first type ear area is located on the first side of the human face Ear area, the second type of ear area is the ear area located on the second side of the human face;

Based on the ear key point detection model, the first ear area, and the second ear area, the ear key points in the first ear area and the ear key points in the second ear area are detected, including:

Invert the first ear area horizontally to obtain a third ear area, which belongs to the second type of ear area;

Based on the ear key point detection model, the second ear area, and the third ear area, determine the ear key points in the second ear area and the ear key points in the third ear area;

The third ear region including the key point of the ear is horizontally inverted to obtain the first ear region including the key point of the ear.

In another possible implementation manner, the method further includes:

Acquire multiple sample images, each sample image including the ear area and the ear key points in the ear area;

Extract the ear area from multiple sample images respectively;

The model training is performed according to the extracted ear area and the ear key points in the ear area to obtain an ear key point detection model.

In another possible implementation manner, the model training is performed according to the extracted ear region and the ear key points in the ear region to obtain an ear key point detection model, including:

Horizontally invert the first type of ear area in the extracted ear area to obtain the inverted ear area. The first type of ear area is the ear area located on the first side of the human face;

The second type ear area and the inverted ear area in the extracted ear area are determined as the sample ear area, and the second type ear area is the ear area located on the second side of the human face;

The model training is performed according to the sample ear area and the ear key points in the sample ear area to obtain the ear key point detection model.

Fig. 2 is a flowchart of an ear key point detection method according to an exemplary embodiment. As shown in Fig. 2, the ear key point detection method is used in a detection device. The detection device may be a mobile phone, a computer, or Servers, cameras, monitoring equipment and other devices with image processing functions, the method includes the following steps:

In step 201, a face image is obtained, and the face image includes key points of the face contour.

The face image may be captured by the detection device, or extracted from the video image captured by the detection device, or downloaded by the detection device from the Internet, or sent to the detection device by other devices. Or, during the live video broadcast of the detection device, each image in the video stream can be obtained, and each image can be used as a face image to be detected, so as to perform key points of the ear on each image in the video stream Detection.

The face image includes multiple face contour key points, that is, key points on the face contour in the face image, and the multiple face contour key points are connected to form a face contour. For example, the face image includes 19 face contour key points, and the 19 face contour key points are evenly distributed on the face contour in the face image.

The multiple key points of the face contour are obtained by performing face detection on the face image. The face detection algorithm used in the face detection process may be a recognition algorithm based on face feature points, a recognition algorithm based on a template, and a neural network Identification algorithm, etc. When the detection device acquires the original face image, it performs face detection on the face image to obtain multiple face contour key points in the face image. Or, other devices perform face detection on the face image, and after obtaining multiple face contour key points in the face image, send a face image including multiple face contour key points to the detection device.

In step 202, an ear key point detection model is obtained, and the ear key point detection model is used to detect ear key points in any ear region.

In the embodiment of the present application, based on the ear key point detection model, the ear key points in any ear area can be detected, so as to determine the ear key points in the face image.

The ear key point detection model can be trained by the detection device and stored by the detection device, or the ear key point detection model can be sent to the detection device after being trained by other equipment and stored by the detection device.

In a possible implementation manner, when training an ear key point detection model, an initial ear key point detection model is first constructed to obtain multiple sample images, and each sample image includes the ear area and the ear area. Ear key points are used to extract the ear area from multiple sample images, and the model training is performed according to the extracted ear area and the ear key points in the ear area to obtain an ear key point detection model.

Among them, during the training process, multiple ear regions and corresponding ear key points are divided into a training data set and a test data set, and multiple ear regions in the training data set are used as the input of the ear key point detection model, Use the position of the ear key point in the corresponding ear area as the output of the ear key point detection model, train the ear key point detection model, and make the ear key point detection model learn the detection method of the ear key point , With the ability to detect key points of the ear. After that, each ear region in the test data set is input into the ear key point detection model, the position of the test ear key point in the ear region is determined based on the ear key point detection model, and the test ear key The position of the point in the ear area is compared with the position of the marked actual ear key point in the ear area, and the ear key point detection model is modified according to the comparison result to improve the ear key point detection The accuracy of the model.

In a possible implementation manner, a preset training algorithm may be used when training the ear key point detection model, and the preset training algorithm may be a convolutional neural network algorithm, a decision tree algorithm, an artificial neural network algorithm, or the like. Correspondingly, the trained ear key point detection model can be a convolutional neural network model, a decision tree model or an artificial neural network model.

In step 203, the first ear region and the second ear region in the face image are determined according to the positions of the key points of the face contour in the face image.

In the embodiment of the present application, the detection device detects the ear key points in the face image based on the ear key point detection model and the position of the face contour key points in the face image.

The key points of the face contour are used to determine the ear area including the entire ear in the face image. Since there is a fixed relative position relationship between the face contour and the ear area, the key position is determined based on the relative position relationship and the face contour. The position of the point in the face image can determine the ear area in the face image, so as to detect key points of the ear.

Among them, since the left ear area and the right ear area are usually included in the face image, when determining the ear area in the face image, the first ear area and the second ear area are determined, of which the first The ear region is a left ear region, the second ear region is a right ear region, or the first ear region is a right ear region, and the second ear region is a left ear region.

The face image usually includes the face area, the ear area and other areas. The ear area is extracted according to the key points of the face contour. The a priori knowledge that the ear is adjacent to the face contour can be used to exclude the ear area. For other areas, detection is based only on the ear area, which not only reduces the amount of calculation, but also eliminates interference from extraneous areas and improves accuracy.

The multiple face contour key points in the face image are located at different positions in the face image. The relative positional relationship between the multiple face contour key points and the ear area is also different. Therefore, in order to extract the accurate ear area, you can First, based on the positions of the multiple face contour key points in the face contour, determine the face contour key point closest to the ear area as the designated key point, and determine the ear in the face image according to the designated key point region.

In a possible implementation manner, the first specified key point and the second specified key point among the face contour key points are obtained, and the first specified key point and the second specified key point are the faces closest to the ear region Contour key points, determine the first ear area including the first specified key point, and the second ear area including the second specified key point.

Among them, the first designated key point and the second designated key point are determined in advance according to the distance between the key points of the multiple face contours and the ear, such as when a face detection algorithm is used to obtain a fixed number of sequentially arranged in the face contour When there are multiple face contour key points, the sequence numbers of the two key points closest to the ear can be determined in advance. Then, when the face detection algorithm is used to obtain a face image including a plurality of face contour key points, the first specified key point and the second specified key point can be determined from the face image according to the determined two serial numbers.

Optionally, after determining the first designated key point and the second designated key point, according to the positions of the first designated key point and the second designated key point in the face image, according to a fixed size and shape, it is determined that the first designated key point is included The first ear area of the key point and the second ear area including the second designated key point. Wherein, the size is set according to the size of a general human face, so that the determined ear area can include the entire ear, and the shape may be a rectangle, a circle, a shape similar to the human ear, or other shapes.

In addition, when determining the position of the ear region, it may be determined according to the relative positional relationship between the first designated key point and the second designated key point and the corresponding ear region. As shown in FIG. 3, the first specified key point and the second specified key point are the face contour key points closest to the ear lobe in the face contour, and the first specified key point and the second specified key point can be used as the ear to be extracted, respectively The center of the ear area, or the center of the lower edge of the ear area to be extracted, respectively, extracts the first ear area and the second ear area.

When the first specified key point is the face contour key point closest to the left ear area in the face image, and the second specified key point is the face contour key point closest to the right ear area in the face image, the first The ear area is the left ear area, and the second ear area is the right ear area. When the first specified key point is the face contour key point closest to the right ear region in the face image, and the second specified key point is the face contour key point closest to the left ear region in the face image, the first The ear area is the right ear area, and the second ear area is the left ear area.

In step 204, based on the ear key point detection model, the first ear area, and the second ear area, the ear key points in the first ear area and the ear key points in the second ear area are detected.

In a possible implementation manner, the detection device inputs the first ear region and the second ear region to the ear key point detection model, and based on the ear key point detection model, the ear of the first ear region The ear key points in the second ear area and the ear key points in the second ear area are separately detected, so as to determine the ear key points in the first ear area and the ear key points in the second ear area.

In step 205, each ear key point is determined according to the determined position of each ear key point in the located ear area and the position of the first ear area and the second ear area in the face image Position in the face image.

The detection of the key points of the ear in the first ear area and the second ear area in the above step 204 actually determines the position of the key points of the ear in the ear area. Therefore, the position of the ear key point in the face image is determined according to the position of the ear key point in the ear area and the position of the ear area in the face image.

In a possible implementation manner, a certain point (such as a designated key point) in the face image is determined as the origin of the ear area, and a coordinate system is created, then the coordinates of the key point of the ear in the ear area are determined Then, superimpose the coordinates of the ear key point in the ear area and the coordinates of the origin in the face image to obtain the coordinates of the ear key point in the face image, thereby determining the ear key point in the human The position in the face image.

After implementing the detection of key points of the ear through the above steps 201-205, various operations can be performed based on the key points of the ear in the face image. For example, in the process of live video broadcasting, you can obtain each image in the video stream, after detecting the key points of the ears of each image, add virtual decorations, stickers, and glow at the location of the key points of an ear Special effects, etc., enhance the live broadcast effect.

In addition, the face image usually includes the face area, the ear area and other areas. The ear area is extracted according to the key points of the face contour. The a priori knowledge that the ear is adjacent to the face contour can be used to exclude the ear area Except for other areas, only the ear area is used for detection, which not only reduces the amount of calculation, but also eliminates interference from extraneous areas and improves accuracy.

In addition, after the detection of key points of the ear, the key points of the ear can be used as the operation target, and a variety of operations can be performed based on the key points of the ear in the face image, which expands the application function, improves flexibility, and improves the face The fun of the image.

Fig. 4 is a flowchart of a method for detecting key points of the ear according to an exemplary embodiment. As shown in Fig. 4, the method for detecting key points of the ear is used in a detection device. The detection device may be a mobile phone, a computer, or Servers, cameras, monitoring equipment and other devices with image processing functions, the method includes the following steps:

In step 401, a face image is obtained, and the face image includes key points of the face contour, and the key points of the face contour are used to determine the ear region in the face image.

This step 401 is similar to the above step 201. For a detailed description, please refer to the above step 201, which will not be repeated here.

In step 402, an ear key point detection model is acquired.

In the embodiment of the present application, since the ear key point detection model used in the embodiment shown in FIG. 2 needs to separately detect the left ear area and the right ear area, it is necessary to train the ear key point detection model according to The ear area and the right ear area are trained to learn the detection method of key points of the ear, resulting in a high complexity of the ear key point detection model.

In order to solve the above problem, in the embodiment of the present application, the ear region is divided into a first type ear region and a second type ear region. The first type ear region is an ear region located on the first side of the human face, and the second The ear-like area is the ear area located on the second side of the human face, and the ear key point detection model is used to detect the ear key points in the second type ear area, instead of detecting the first type ear area Key points of the ear.

Wherein, the first type ear area is the left ear area, the second type ear area is the right ear area, or the first type ear area is the right ear area, and the second type ear area is the left ear area region.

Correspondingly, during the training of the ear key point detection model, the type of the extracted ear region is determined, and the first type of ear region in the extracted ear region is horizontally inverted to obtain the inverted ear region, To make the inverted ear area belong to the second type of ear area, determine the second type of ear area and the inverted ear area in the extracted ear area as the sample ear area, according to the sample ear area and The ear key points in the sample ear area are subjected to model training to obtain an ear key point detection model, so that the ear key point detection model can learn the detection method of the ear key points in the second type ear area. Since the ear key point detection model does not need to learn the detection method of the ear key points on both sides of the face, it is only necessary to learn the ear key point detection method on the side of the face, thus reducing the complexity of the ear key point detection model Degree, improve the training speed.

Among them, during the training process, multiple sample ear regions and corresponding ear key points are divided into a training data set and a test data set, and multiple sample ear regions in the training data set are used as the input of the ear key points, The position of the ear key point in the corresponding ear area is used as the output of the ear key point detection model, and the ear key point detection model is trained to make the ear key point detection model match the ear in the second type of ear area. Learning the key point detection method to make the ear key point detection model have the ability to detect ear key points in the second type of ear area. After that, each sample ear region in the test data set is input into the ear key point detection model, and the position of the test ear key point in the ear region is determined based on the ear key point detection model. If the sample ear region is the original second type ear region, then the detected test ear key point is compared with the actual ear key point in the sample ear region, and the ear key point is compared according to the comparison result The detection model is modified, if the sample ear area is the ear area obtained after the first type ear area is turned over, then the detected key points of the test ear and the actual ear after the first type ear area is turned over The key points of the ear are compared, and the key point detection model of the ear is revised according to the comparison result.

Wherein, horizontally inverting any ear region includes: determining the position of each pixel in the ear region in the ear region, and the central axis of the ear region, according to the position of each pixel and the central axis Position, determine the target position of each pixel point symmetrical about the central axis, and exchange the pixel information of each pixel point with the pixel information of the pixel point at the corresponding target position to achieve horizontal flip.

In step 403, the first ear area and the second ear area in the face image are determined according to the positions of the key points of the face contour in the face image.

Step 403 is similar to the above step 203. For a detailed description, please refer to the above step 201, which will not be repeated here.

In step 404, the first ear region is horizontally inverted to obtain a third ear region, which belongs to the second type of ear region.

In step 405, based on the ear key point detection model, the second ear area and the third ear area, the ear key points in the second ear area and the ear key points in the third ear area are determined.

In step 406, the third ear region including the key points of the ear is horizontally inverted to obtain the first ear region including the key points of the ear.

In the embodiment shown in FIG. 2 above, both the first-type ear area and the second-type ear area can be detected based on the ear key point detection model, while in the embodiment of the present application, the ear key point detection model only Can detect the second type of ear area.

Therefore, before the detection, the first ear region belonging to the first type ear region is horizontally inverted to obtain the third ear region, so that the third ear region belongs to the second type ear region, based on the ear The key point detection model detects the third ear area. After detecting the ear key points in the third ear area, the third ear area containing the ear key points is horizontally flipped to determine the ear key points in the first ear area. The first type of ear area detection.

In step 407, each ear key point is determined according to the determined position of each ear key point in the located ear area and the position of the first ear area and the second ear area in the face image Position in the face image.

Step 407 is similar to step 205 described above. For a detailed description, please refer to step 205 described above, which will not be repeated here.

The method provided in the embodiment of the present application obtains the ear key point detection model by acquiring the face image, determines the first ear area and the second ear area in the face image according to the key points of the face contour, and converts the first ear The region is flipped horizontally to obtain the third ear region belonging to the second type of ear region. Based on the ear key point detection model, the ear key points in the ear region are detected, and the third ear including the ear key points The region is flipped horizontally to obtain the first ear region containing the key points of the ear, and the position of each ear key point in the face image is determined. The ear area is determined by using the key points of the face contour, and the ear key point detection model is used to detect the key points of the ear in the face image. Considering the positional relationship between the ear area and the face contour, the ear The key point detection model learns how to detect key points in the ear area, which improves the accuracy of key points in the ear and reduces errors.

And, by dividing the ear area into the first type ear area and the second type ear area, the ear key point detection model is used to detect the ear key points in the second type ear area without detecting the first Ear key points in the ear-like area. When training the ear key point detection model, there is no need to learn the ear key point detection method on both sides of the face, only the ear key point detection method on the side of the face Yes, so the complexity of the ear key point detection model is reduced, and the training speed is improved.

Fig. 5 is a block diagram of a device for detecting key points of an ear according to an exemplary embodiment. Referring to FIG. 5, the device includes an image acquisition unit 501, a model acquisition unit 502 and a determination unit 503.

The image acquisition unit 501 is configured to acquire a face image, and the face image includes key points of the face contour, and the key points of the face contour are used to determine the ear region in the face image;

The model acquisition unit 502 is configured to acquire an ear key point detection model, and the ear key point detection model is used to detect ear key points in any ear region;

The determining unit 503 is configured to detect the ear key points in the face image based on the ear key point detection model and the position of the face contour key points in the face image.

The device provided in the embodiment of the present application obtains an ear key point detection model by acquiring a face image including face contour key points, and the face contour key points are used to determine the ear area and ear key points in the face image The detection model is used to detect key points of the ear in the ear area, and then the key points of the ear in the face image are detected based on the key point detection model of the ear and the key points of the face contour. The ear area is determined by using the key points of the face contour, and the ear key point detection model is used to detect the key points of the ear in the face image. Considering the positional relationship between the ear area and the face contour, the ear The key point detection model learns how to detect key points in the ear area, which improves the accuracy of key points in the ear and reduces errors.

In a possible implementation manner, the determining unit 503 includes:

The area determination subunit is configured to determine the first ear area and the second ear area in the face image according to the positions of the key points of the face contour in the face image;

The key point determination subunit is configured to detect the ear key points in the first ear area and the ears in the second ear area based on the ear key point detection model, the first ear area, and the second ear area Key points

The position determination subunit is configured to determine each ear according to the determined position of each ear key point in the ear region where it is located and the positions of the first and second ear regions in the face image The position of the key points in the face image.

In another possible implementation manner, the area determination subunit is further configured to obtain the first specified key point and the second specified key point among the key points of the face contour; determine the first ear including the first specified key point Region and the second ear region including the second designated key point.

The key point determination subunit is also configured to horizontally flip the first ear area to obtain a third ear area, which belongs to the second type of ear area; based on the ear key point detection model, the second Ear area and third ear area, determine the ear key points in the second ear area and the ear key points in the third ear area; horizontally flip the third ear area containing the ear key points To get the first ear area containing the key points of the ear.

In another possible implementation manner, the device further includes:

The acquiring unit is configured to acquire a plurality of sample images, and each sample image includes an ear region and ear key points in the ear region;

An extraction unit configured to extract the ear region from multiple sample images, respectively;

The training unit is configured to perform model training based on the extracted ear region and ear key points in the ear region to obtain an ear key point detection model.

In another possible implementation manner, the training unit includes:

The flip subunit is configured to horizontally flip the first type of ear area in the extracted ear area to obtain a flipped ear area. The first type of ear area is an ear area located on the first side of the human face ;

The sample determination subunit is configured to determine the second type ear area and the inverted ear area in the extracted ear area as the sample ear area, and the second type ear area is located on the second side of the human face Ear area

The training unit is further configured to perform model training based on the sample ear area and the ear key points in the sample ear area to obtain an ear key point detection model.

Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram of a terminal 600 for key point detection of an ear according to an exemplary embodiment. The terminal 600 is used to perform the steps performed by the detection device in the ear key point detection method described above, and may be a portable mobile terminal, such as: a smartphone, a tablet computer, a motion picture expert compression standard audio layer 3 player (Moving Picture Experts Group Audio Layer III, MP3), Motion Picture Expert Compression Standard Audio Layer 4 (Moving Pictures Experts Group Audio Layer IV, MP4) player, laptop or desktop computer. The terminal 600 may also be called other names such as user equipment, portable terminal, laptop terminal, and desktop terminal.

Generally, the terminal 600 includes a processor 601 and a memory 602.

The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may adopt at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA) achieve. The processor 601 may also include a main processor and a coprocessor. The main processor is a processor for processing data in a wake-up state, also called a central processing unit (Central Processing Unit, CPU); the coprocessor is A low-power processor for processing data in the standby state. In some embodiments, the processor 601 may be integrated with a graphics processor (Graphics Processing Unit, GPU), and the GPU is used to render and draw the content required to be displayed on the display screen. In some embodiments, the processor 601 may further include an artificial intelligence (Artificial Intelligence, AI) processor, which is used to process computing operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices and flash storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 602 is used to store at least one instruction for the ear provided by the processor 601 to implement the method embodiment of the present application Key point detection method.

In some embodiments, the terminal 600 may optionally include a peripheral device interface 603 and at least one peripheral device. The processor 601, the memory 602, and the peripheral device interface 603 may be connected by a bus or a signal line. Each peripheral device may be connected to the peripheral device interface 603 through a bus, a signal line, or a circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch display screen 605, a camera 606, an audio circuit 607, a positioning component 608, and a power supply 609.

The peripheral device interface 603 may be used to connect at least one peripheral device related to input/output (Input/Output, I/O) to the processor 601 and the memory 602. In some embodiments, the processor 601, the memory 602, and the peripheral device interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 601, the memory 602, and the peripheral device interface 603 or Both can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The radio frequency circuit 604 is used to receive and transmit radio frequency (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 604 communicates with a communication network and other communication devices through electromagnetic signals. The radio frequency circuit 604 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on. The radio frequency circuit 604 can communicate with other terminals through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 13G), wireless local area networks, and/or wireless fidelity (WiFi) networks. In some embodiments, the radio frequency circuit 604 may further include a circuit related to near field communication (Near Field Communication, NFC), which is not limited in this application.

The display screen 605 is used to display a user interface (User Interface, UI). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to collect touch signals on or above the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this time, the display screen 605 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards. In some embodiments, the display screen 605 may be one, and the front panel of the terminal 600 is provided; in other embodiments, the display screen 605 may be at least two, respectively disposed on different surfaces of the terminal 600 or in a folded design; In still other embodiments, the display screen 605 may be a flexible display screen, which is disposed on the curved surface or folding surface of the terminal 600. Even, the display screen 605 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen. The display screen 605 may be made of liquid crystal display (Liquid Crystal) (LCD), organic light-emitting diode (Organic Light-Emitting Diode, OLED) and other materials.

The camera component 606 is used to collect images or videos. Optionally, the camera assembly 606 includes a front camera and a rear camera. Usually, the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal. In some embodiments, there are at least two rear cameras, each of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera Integrate with wide-angle camera to realize panoramic shooting and virtual reality (Virtual Reality, VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 606 may also include a flash. The flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation at different color temperatures.

The audio circuit 607 may include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 601 for processing, or input them to the radio frequency circuit 604 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively installed in different parts of the terminal 600. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is used to convert the electrical signal from the processor 601 or the radio frequency circuit 604 into sound waves. The speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible by humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes. In some embodiments, the audio circuit 607 may further include a headphone jack.

The positioning component 608 is used to locate the current geographic location of the terminal 600 to implement navigation or location-based services (Location Based Services, LBS). The positioning component 608 may be a positioning component based on the Global Positioning System (GPS) of the United States, the Beidou system of China, the Grenas system of Russia, or the Galileo system of the European Union.

The power supply 609 is used to supply power to various components in the terminal 600. The power source 609 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery can also be used to support fast charging technology.

In some embodiments, the terminal 600 further includes one or more sensors 610. The one or more sensors 610 include, but are not limited to: an acceleration sensor 611, a gyro sensor 612, a pressure sensor 613, a fingerprint sensor 614, an optical sensor 615, and a proximity sensor 616.

The acceleration sensor 611 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established with the terminal 600. For example, the acceleration sensor 611 can be used to detect components of gravity acceleration on three coordinate axes. The processor 601 may control the touch display 605 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 611. The acceleration sensor 611 can also be used for game or user movement data collection.

The gyro sensor 612 can detect the body direction and the rotation angle of the terminal 600, and the gyro sensor 612 can cooperate with the acceleration sensor 611 to collect a 3D action of the user on the terminal 600. Based on the data collected by the gyro sensor 612, the processor 601 can realize the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.

The pressure sensor 613 may be disposed on the side frame of the terminal 600 and/or the lower layer of the touch display 605. When the pressure sensor 613 is disposed on the side frame of the terminal 600, it can detect the user's grip signal on the terminal 600, and the processor 601 can perform left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 613. When the pressure sensor 613 is disposed on the lower layer of the touch screen 605, the processor 601 controls the operability control on the UI interface according to the user's pressure operation on the touch screen 605. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 614 is used to collect the user's fingerprint, and the processor 601 identifies the user's identity based on the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 identifies the user's identity based on the collected fingerprint. When the user's identity is recognized as a trusted identity, the processor 601 authorizes the user to have relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings. The fingerprint sensor 614 may be provided on the front, back, or side of the terminal 600. When a physical button or manufacturer logo is provided on the terminal 600, the fingerprint sensor 614 may be integrated with the physical button or manufacturer logo.

The optical sensor 615 is used to collect the ambient light intensity. In one embodiment, the processor 601 can control the display brightness of the touch display 605 according to the ambient light intensity collected by the optical sensor 615. Specifically, when the ambient light intensity is high, the display brightness of the touch display 605 is increased; when the ambient light intensity is low, the display brightness of the touch display 605 is decreased. In another embodiment, the processor 601 can also dynamically adjust the shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615.

The proximity sensor 616, also called a distance sensor, is usually provided on the front panel of the terminal 600. The proximity sensor 616 is used to collect the distance between the user and the front of the terminal 600. In one embodiment, when the proximity sensor 616 detects that the distance between the user and the front of the terminal 600 gradually becomes smaller, the processor 601 controls the touch display 605 to switch from the bright screen state to the breathing state; when the proximity sensor 616 detects When the distance from the user to the front of the terminal 600 gradually becomes larger, the processor 601 controls the touch display 605 to switch from the breath-hold state to the bright-screen state.

A person skilled in the art may understand that the structure shown in FIG. 6 does not constitute a limitation on the terminal 600, and may include more or fewer components than illustrated, or combine certain components, or adopt different component arrangements.

FIG. 7 is a schematic structural diagram of a server according to an exemplary embodiment. The server 700 may have a relatively large difference due to different configurations or performance, and may include one or more processors (central processing units) (CPU) 701. And one or more memories 702, wherein at least one instruction is stored in the memory 702, and the at least one instruction is loaded and executed by the processor 701 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input-output interface for input and output. The server may also include other components for implementing device functions, which will not be repeated here.

The server 700 may be used to perform the steps performed by the ear key point detection device in the ear key point detection method.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided. When the instructions in the storage medium are executed by the processor of the detection device, the detection device can perform an ear key point detection method, Methods include:

Obtaining a face image, the face image includes face contour key points, and the face contour key points are used to determine the ear area in the face image;

Obtain the ear key point detection model. The ear key point detection model is used to detect the ear key points in any ear area;

Based on the ear key point detection model and the position of the face contour key points in the face image, the ear key points in the face image are detected.

In an exemplary embodiment, an application program/computer program product is also provided, and when instructions in the application program/computer program product are executed by the processor of the detection device, the detection device can perform an ear key point detection Methods, methods include:

After considering the description and practicing the invention disclosed herein, those skilled in the art will easily think of other embodiments of the present application. This application is intended to cover any variations, uses, or adaptations of this application, which follow the general principles of this application and include common general knowledge or customary technical means in the technical field not disclosed in this application . The description and examples are to be considered exemplary only, and the true scope and spirit of this application are pointed out by the following claims.

It should be understood that the present application is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from the scope thereof. The scope of this application is limited only by the appended claims.

Claims

An ear key point detection method, the method includes:

Acquiring a face image, the face image includes key points of a face contour, and the key points of the face contour are used to determine an ear region in the face image;

Acquiring an ear key point detection model, the ear key point detection model is used to detect an ear key point in any ear region;

Based on the ear key point detection model and the position of the face contour key point in the face image, the ear key point in the face image is detected.
The method according to claim 1, wherein the key points of the ear in the face image are detected based on the ear key point detection model and the position of the face contour key points in the face image ,include:

Determine the first ear area and the second ear area in the face image according to the positions of the key points of the face contour in the face image;

Based on the ear key point detection model, the first ear area and the second ear area, detecting the ear key points in the first ear area and the second ear area Key points of the ear;

Determine each ear according to the determined position of each key point of the ear in the ear area and the position of the first ear area and the second ear area in the face image The position of the key points in the face image.
The method according to claim 2, the determining the first ear area and the second ear area in the face image according to the position of the face contour key point in the face image, including :

Acquiring the first specified key point and the second specified key point among the key points of the face contour;

The first ear region including the first designated key point and the second ear region including the second designated key point are determined.
The method according to claim 2, wherein the first ear area belongs to a first type ear area, the second ear area belongs to a second type ear area, and the first type ear area is located in a human An ear area on the first side of the face, and the second type of ear area is an ear area on the second side of the human face;

The detecting based on the ear key point detection model, the first ear area and the second ear area, detecting the ear key points and the second ear area in the first ear area Key points in the ear include:

Horizontally inverting the first ear region to obtain a third ear region, the third ear region belongs to the second ear region;

Based on the ear key point detection model, the second ear area, and the third ear area, determining the ear key points in the second ear area and the third ear area Key points of the ear;

The third ear region including the key point of the ear is horizontally turned to obtain the first ear region including the key point of the ear.
The method according to any one of claims 1 to 4, further comprising:

Acquiring a plurality of sample images, each sample image including an ear region and ear key points in the ear region;

Extracting ear regions from the plurality of sample images respectively;

Perform model training based on the extracted ear region and the ear key points in the ear region to obtain the ear key point detection model.
According to the method of claim 5, the model training is performed according to the extracted ear region and the ear key points in the ear region to obtain the ear key point detection model, including:

Horizontally inverting the first type of ear area in the extracted ear area to obtain an inverted ear area, the first type of ear area being the ear area located on the first side of the human face;

Determining the second type ear area and the inverted ear area in the extracted ear area as the sample ear area, and the second type ear area is the ear area located on the second side of the human face;

Perform model training according to the sample ear region and the ear key points in the sample ear region to obtain the ear key point detection model.
An ear key point detection device, the device includes:

An image acquisition unit configured to acquire a face image, the face image including key points of a face contour, and the key points of the face contour are used to determine an ear region in the face image;

A model acquisition unit configured to acquire an ear key point detection model, the ear key point detection model is used to detect an ear key point in any ear area;

The determining unit is configured to detect the ear key points in the face image based on the ear key point detection model and the position of the face contour key points in the face image.
The apparatus according to claim 7, the determination unit comprising:

An area determination subunit configured to determine the first ear area and the second ear area in the face image according to the positions of the key points of the face contour in the face image;

A key point determination subunit configured to detect the key points of the ear in the first ear area based on the ear key point detection model, the first ear area and the second ear area Key points of the ear in the second ear area;

The position determining subunit is configured to determine the position of each key point of the ear in the ear region where it is located and the first ear region and the second ear region in the face image Position, determine the position of each key point of the ear in the face image.
The apparatus according to claim 8, the area determination subunit is further configured to obtain a first specified key point and a second specified key point among the face contour key points; determining to include the first specified key The first ear region of the point, and the second ear region including the second designated key point.
The device according to claim 8, wherein the first ear area belongs to a first type ear area, the second ear area belongs to a second type ear area, and the first type ear area is located in a human An ear area on the first side of the face, and the second type of ear area is an ear area on the second side of the human face;

The key point determination subunit is further configured to horizontally flip the first ear region to obtain a third ear region, and the third ear region belongs to the second type of ear region; The ear key point detection model, the second ear area and the third ear area, to determine the ear key point in the second ear area and the ear in the third ear area Key point; horizontally invert the third ear area including the ear key point to obtain the first ear area including the ear key point.
The device according to any one of claims 7-10, the device further comprising:

An acquiring unit, configured to acquire a plurality of sample images, each sample image including an ear region and ear key points in the ear region;

An extraction unit configured to extract ear regions from the plurality of sample images, respectively;

The training unit is configured to perform model training based on the extracted ear region and the ear key points in the ear region to obtain the ear key point detection model.
The apparatus according to claim 11, the training unit comprising:

A flip subunit, configured to horizontally flip the first type of ear area in the extracted ear area to obtain a flipped ear area, the first type of ear area being the ear located on the first side of the human face Ministry area

The sample determination subunit is configured to determine the second type ear area and the inverted ear area in the extracted ear area as the sample ear area, and the second type ear area is located on the human face Ear area on the second side;

The training unit is further configured to perform model training based on the sample ear region and ear key points in the sample ear region to obtain the ear key point detection model.
An ear key point detection device, the device includes:

processor;

Memory for storing processor executable commands;

Wherein, the processor is configured to execute an ear key point detection method according to any one of claims 1-6.
A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a detection device, enabling the detection device to execute an ear key according to any one of claims 1-6 Point detection method.