WO2019179441A1

WO2019179441A1 - Focus tracking method and device of smart apparatus, smart apparatus, and storage medium

Info

Publication number: WO2019179441A1
Application number: PCT/CN2019/078747
Authority: WO
Inventors: 周子傲; 谢长武; 王雪松; 马健
Original assignee: 北京猎户星空科技有限公司
Priority date: 2018-03-21
Filing date: 2019-03-19
Publication date: 2019-09-26
Also published as: TW201941098A; CN108733280A; TWI705382B

Abstract

The present disclosure provides a focus tracking method and device of a smart apparatus, a smart apparatus, and a storage medium. The method comprises: detecting a face key point of a target user in an environmental image acquired by a smart apparatus, determining a face center point according to the face key point, and controlling the smart apparatus to perform focus tracking on the face center point; and if no face key point is found in the environment image, detecting a body key point of the target user in the environmental image, determining a body center point according to the body key point, and controlling the smart apparatus to perform focus tracking on the body center point. A body key point is used as a substitute in the above method to resolve the technical issue in which focus tracking cannot be maintained when no face key point has been detected, thereby preventing loss or missed detection of a focus point, and improving the success rate and accuracy of focus tracking.

Description

Focus tracking method, device, smart device and storage medium of smart device

Cross-reference to related applications

The present disclosure claims the priority of the Chinese patent application number "201810236920.1" submitted by Beijing Orion Star Technology Co., Ltd. on March 21, 2018, entitled "Focus-Following Method, Device, Intelligent Device and Storage Medium for Intelligent Devices" .

Technical field

The present disclosure relates to the field of smart device technologies, and in particular, to a focus following method, device, smart device, and storage medium for a smart device.

Background technique

With the development of artificial intelligence technology, smart devices interact with users more and more. Among them, smart devices can follow the user's movement through the method of focus following, and achieve the effect that the smart device pays attention to user behavior.

In the related art, the smart device adopts a face recognition technology to collect a user's face center point, calculate a distance between the user's face center point and the collected image center position, and control the smart device to rotate so that the user's face is located at the image center position.

Summary of the invention

The present disclosure proposes a focus following method of a smart device. The method supplements the key points of the human body as a focus. When the smart device does not detect the key points of the face, the key points of the human body are detected as the following focus from the collected images, thereby preventing the user from losing focus in the case of bowing and turning. Improves the success rate and accuracy of focus tracking.

The present disclosure proposes a focus following device of a smart device.

The present disclosure proposes a smart device.

The present disclosure proposes a non-transitory computer readable storage medium.

The first aspect of the present disclosure provides a focus following method of a smart device, including:

Detecting a face key point of the target user from the environment image collected by the smart device, determining a face center point according to the face key point, and controlling the smart device to perform focus tracking on the face center point;

If the face key point is not detected from the environment image, detecting a human body key point of the target user from the environment image, determining a body center point according to the body key point, and controlling the smart point The device performs focus tracking on the center point of the human body.

The focus following method of the smart device of the embodiment of the present disclosure firstly detects a face key point of the target user from the environment image collected by the smart device, determines a face center point according to the face key point, and controls the smart device to The face center point performs focus tracking. If the face key point is not detected from the environment image, the target user's human key point is detected from the environment image, the body center point is determined according to the body key point, and the smart device is controlled to the human body center. Point to focus follow. Therefore, the method solves the technical problem that the focus cannot be maintained due to the detection of the key points of the face in the prior art, and the key points of the human body are used as the focus to complement the foot. When the smart device does not detect the key points of the face, the method collects In the image to be detected, the key points of the human body are detected as the following focus, and the user is prevented from losing focus in the case of bowing and turning, etc., and the success rate and accuracy of focus following are improved.

In addition, the focus following method of the smart device according to the above-described embodiments of the present disclosure may further have the following additional technical features:

In an embodiment of the present disclosure, before identifying a face key point of the target user from the environment image collected by the smart device, the method further includes: identifying a center point of the environment image collected by the smart device, and centering the environment image The point is the reference point and a circle is created for the image area that the focus follows.

In an embodiment of the present disclosure, performing focus following includes: periodically determining whether the detected face center point or the body center point is in the image area; when the face center point or the body center point is not in Obtaining, in the image area, a shortest path between the face center point or the body center point and the image area center point; acquiring control information for controlling the movement of the smart device according to the shortest path; The smart device moves according to the control information, so that the detected face center point or body center point falls within the image area.

In an embodiment of the present disclosure, the face key of the target user is detected from the environment image collected by the smart device, and the face center point is determined according to the face key point, including: according to the preset head feature, Identifying a head region of the target user in the environment image; extracting the face key point from the head region; if the extracted face key point is one, the face key point is taken as a face center point; if the extracted face key points are two or more, obtaining the first center point of all the extracted face key points, the first center point is taken as The face center point.

In an embodiment of the present disclosure, acquiring the first center point of all the extracted face key points includes: using each face key point as a node, using one of the nodes as a starting node, and all the nodes Connected one by one to form a key point graph covering all nodes; obtain a center point of the key point graph, and determine a center point of the key point graph as the first center point.

In an embodiment of the present disclosure, detecting a key point of the human body of the target user from the collected environment image includes: identifying from a collected human body area located below the head area; and after identifying the human body area And controlling an imaging angle of the pan-tilt camera of the smart device to move in a direction of the head region; after the camera angle is moved, capturing an environment image; determining whether the head region is included in the environment image; If the head region is included in the environment image, identifying the face key point from the head region; if the head region is not included in the environment image, detecting from the environment image The key point of the human body of the target user.

In an embodiment of the present disclosure, before detecting a face key point of the target user from the environment image collected by the smart device, the method further includes: performing human body recognition on the environment image; and identifying a plurality of human bodies from the environment image Obtaining a distance between each human body and the smart device; selecting a human body closest to the smart device as the human body corresponding to the target user

In an embodiment of the present disclosure, selecting a human body closest to the smart device as the human body corresponding to the target user includes: querying the smart device when the human body closest to the smart device is multiple Whether there is a face image corresponding to the human body closest to the smart device in the registered user face image library; if there is a face image corresponding to the human body closest to the smart device in the face image library And the human body that is closest to the smart device is the human body corresponding to the target user; if there is no face image corresponding to the human body closest to the smart device in the face image library, Then randomly selecting a human body that is closest to the smart device as a human body corresponding to the target user; if there are multiple face images corresponding to the human body closest to the smart device in the face image library, The human body closest to the smart device is firstly queried as the human body corresponding to the target user.

The second aspect of the present disclosure provides a focus following device of a smart device, including:

a detecting module, configured to detect a face key point of the target user from the environment image collected by the smart device, and detect the image from the environment image when the face key point is not detected from the environment image The key point of the target user's body;

a determining module, configured to determine a face center point according to the face key point, and determine a body center point according to the body key point when the human body key point is detected;

And a control module, configured to control the smart device to perform focus tracking on the face center point, and control the smart device to perform focus tracking on the human body center point when determining the body center point.

In addition, the focus following device of the smart device according to the above-described embodiments of the present disclosure may further have the following additional technical features:

In an embodiment of the present disclosure, the focus following device of the smart device of the above embodiment further includes: a generating module, configured to identify a face key of the target user in the environment image collected from the smart device Identifying a center point of the environment image collected by the smart device, and using a center point of the environment image as a reference point, generating a circular image area for focus following.

In an embodiment of the present disclosure, the control module is specifically configured to: periodically determine whether the detected face center point or the body center point is in the image area; when the face center point or the body center point is not Acquiring a shortest path between the face center point or the body center point and the image area center point when the image area is in the image area; acquiring control information for controlling the movement of the smart device according to the shortest path; The smart device moves according to the control information, so that the detected face center point or body center point falls within the image area.

In an embodiment of the present disclosure, the detecting module is configured to: identify a head area of the target user from the environment image according to a preset head feature; and extract the face from the head area key point;

Determining a module, specifically, if the extracted face key point is one, the face key point is used as the face center point; if the extracted face key points are two and two In the above, the first center point of all the extracted face key points is obtained, and the first center point is used as the face center point.

In an embodiment of the present disclosure, the determining module is specifically configured to: use each face key point as a node, and use one of the nodes as a starting node to connect all the nodes one by one to form a key point graphic covering all the nodes. Obtaining a center point of the key point graphic, and determining a center point of the key point graphic as the first center point.

In an embodiment of the present disclosure, the detecting module is specifically configured to: identify a human body region located below the head region from the collected; and control the pan/tilt camera of the smart device after identifying the human body region a camera angle is moved in a direction in which the head region is located; after the camera angle is moved, capturing an environment image; determining whether the head region is included in the environment image; and if the environment image includes the header And the part area identifies the face key point from the head area; if the head area is not included in the environment image, detecting a human body key point of the target user from the environment image.

In an embodiment of the present disclosure, the focus following device of the smart device of the foregoing embodiment further includes: a human body recognition module, configured to detect, before detecting a key point of the target user from the environment image collected by the smart device, The environment image is used for human body recognition; the distance detecting module is configured to acquire a distance between each human body and the smart device when a plurality of human bodies are identified from the environment image; and a module for selecting and selecting the smart device The device is the human body corresponding to the closest human body as the target user.

In an embodiment of the present disclosure, the module is specifically configured to: if there is a plurality of human bodies that are closest to the smart device, query whether the registered user face image library of the smart device exists in the a face image corresponding to the closest human body of the smart device; if there is a face image corresponding to the human body closest to the smart device in the face image library, the one is closest to the smart device The human body is the human body corresponding to the target user; if there is no face image corresponding to the human body closest to the smart device in the face image library, a human body closest to the smart device is randomly selected. a human body corresponding to the target user; if there are a plurality of face images corresponding to the human body closest to the smart device in the face image library, the first query is the closest to the smart device The human body serves as a human body corresponding to the target user.

The focus following device of the smart device of the embodiment of the present disclosure first detects a face key point of the target user from the environment image collected by the smart device, determines a face center point according to the face key point, and controls the smart device to The face center point performs focus tracking. If the face key point is not detected from the environment image, the target user's human key point is detected from the environment image, the body center point is determined according to the body key point, and the smart device is controlled to the human body center. Point to focus follow. Therefore, the device solves the technical problem that the focus cannot be maintained due to the detection of the key points of the face in the prior art, and the key points of the human body are used as the focus to complement the foot. When the smart device does not detect the key point of the face, the device collects In the image to be detected, the key points of the human body are detected as the following focus, and the user is prevented from losing focus in the case of bowing and turning, etc., and the success rate and accuracy of focus following are improved.

A third aspect of the present disclosure provides a smart device, comprising: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside the space enclosed by the housing, and the processor And a memory disposed on the circuit board; a power supply circuit for supplying power to each circuit or device of the smart device; a memory for storing executable program code; and a processor for operating by reading executable program code stored in the memory A program corresponding to the program code is executed for implementing the focus following method of the smart device as described in the above embodiments.

The fourth aspect of the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program, wherein the program is executed by the processor to implement the focus of the smart device as described in the above embodiments. Follow the method.

The aspects and advantages of the present invention will be set forth in part in the description which follows.

DRAWINGS

The above and/or additional aspects and advantages of the present disclosure will become apparent and readily understood from

1 is a schematic flowchart of a focus following method of a smart device according to an embodiment of the present disclosure;

2 is a schematic diagram of a position of a key point of a human body according to an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of a method for determining a face center point according to an embodiment of the present disclosure;

4 is a schematic diagram of a location of a face key point according to an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of a focus following method of another smart device according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a focus following process according to an embodiment of the present disclosure;

FIG. 7 is a schematic flowchart diagram of a focus follow method of a specific smart device according to an embodiment of the present disclosure;

FIG. 8 is a schematic flowchart of a method for determining a target user according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a principle for binocular vision calculation distance according to an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a focus following device of a smart device according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a focus following device of another smart device according to an embodiment of the present disclosure;

12 is a block diagram of an exemplary smart device suitable for implementing an implementation of the present disclosure, in accordance with an embodiment of the present disclosure.

detailed description

The embodiments of the present disclosure are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are illustrative, and are not intended to be construed as limiting.

A focus following method and apparatus of a smart device of an embodiment of the present disclosure will be described below with reference to the accompanying drawings.

The execution body of the focus following method of the smart device of the embodiment of the present disclosure may be a smart device that captures an image of the surrounding environment by the camera device and follows the focus on the image, such as an intelligent robot or the like.

FIG. 1 is a schematic flowchart diagram of a focus following method of a smart device according to an embodiment of the present disclosure. As shown in FIG. 1, the focus following method of the smart device includes the following steps:

Step 101: Detect a face key point of the target user from the environment image collected by the smart device, determine a face center point according to the face key point, and control the smart device to perform focus follow on the face center point.

In this embodiment, the smart device may be a robot, a smart home appliance, or the like.

The smart device is equipped with a camera device, such as a camera, and the smart device can collect the environment image in the monitoring range in real time through the camera device. After the environmental image is acquired, the environmental image can be detected to identify the human body entering the monitoring range.

Specifically, from the environment image, combined with the face recognition technology, it is detected whether there is a face in the collected image. As an example, from the environment image, the outline of the object is extracted, and the extracted object outline is compared with a pre-existing face contour or a human body contour. When the similarity between the extracted contour and the preset contour exceeds a preset threshold, it can be considered that the user is recognized from the environmental image. Thus, all users in the environmental image can be identified by this method.

Further, if the face of the target user exists in the environment image, the smart device detects the face key point of the target user, and determines the face center point according to the face key point. Among them, the key point of the face can be the facial features of the target user, such as eyes, nose and mouth, etc., the smart device can determine the key points of the face by detecting the shape of the face organ and the position of the different organs in the face, and then according to The detected face key points determine the face center point.

Further, after the smart device acquires the face center point, focusing on the face center point, the camera or the visual system controlling the smart device follows the focus in real time, and keeps the focus in the following area of the collected environment image, wherein, following The area can cover a partial area of the environment image that is not fixed in the environment image, but moves in real time following the surveillance field of view. The following area generally needs to cover the central area in the environment image in order to keep the smart device and the monitored target user able to face-to-face interaction.

For example, when the smart device is an intelligent robot, the head of the robot is an imaging device, and the camera device that controls the robot focuses on the focus of the face center point, thereby achieving the effect that the robot always “gazes” the target user and enhances the user experience. .

Step 102: If the face key point is not detected from the environment image, the target user's human key point is detected from the environment image, the body center point is determined according to the body key point, and the smart device is controlled to focus on the body center point. follow.

Specifically, when the target user turns or bows, the face key may not be detected in the environment image, and the smart device detects the key point of the target user from the environment image, wherein the key point of the human body is the target user body except the head. The key point of the other parts. 2 is a schematic diagram of a position of a key point of a human body according to an embodiment of the present disclosure. As shown in FIG. 2, the smart device identifies a contour edge of a torso of a target user in an environment image, and the intersection of the limb and the trunk is a key point of the human body, according to The key points of the human body determine the center point of the human body. For example, when the smart device cannot detect the key point of the face when the user bows, the camera device of the smart device moves downward to detect that the intersection point P1 of the user's neck and the torso is a key point of the human body, and the key point is the center point of the human body; For example, when the target user turns around, the smart device detects in the environmental image that the intersection of the user's two arms and the torso is P2 and P3, and the midpoint of the connection between P2 and P3 is the key point of the human body.

Further, the smart device focuses on the center point of the human body, and keeps the focus in the following area of the collected environment image. The method of performing focus tracking on the center point of the human body may refer to the focus of the center point of the face in the above example. The method to follow is not repeated here.

The focus following method of the smart device of the embodiment of the present disclosure detects a face key point of the target user from the environment image collected by the smart device, determines a face center point according to the face key point, and controls the smart device to The face center point performs focus tracking. If the face key point is not detected from the environment image, the target user's human key point is detected from the environment image, the body center point is determined according to the body key point, and the smart device is controlled to the human body center. Point to focus follow. Therefore, the method solves the technical problem that the focus cannot be maintained due to the detection of the key point of the face, and the key point of the human body is used as the focus to complement the image. When the smart device does not detect the key point of the face, the image is collected from the captured image. The detection of the key points of the human body as the focus of follow-up avoids the loss of focus of the user in the case of bowing and turning, and improves the success rate and accuracy of focus following.

Based on the above embodiment, in order to more clearly describe the process of determining the face center point, the embodiment of the present disclosure provides a method for determining a face center point, and FIG. 3 is a method for determining a face center according to an embodiment of the present disclosure. Schematic diagram of the point method.

As shown in FIG. 3, the method for determining a face center point includes the following steps:

In step 201, the head area of the target user is identified.

Specifically, the smart device sets the head feature according to the pre-stored head model, for example, the shape structure of the head, the basic proportion, and the positional relationship with the human torso, etc., and the smart device identifies the image from the environment image according to the preset head feature. The head area of the target user.

In step 202, a face key point is detected in the head area.

Specifically, the process of detecting the key point of the target user in the identified head area and the process of identifying the key point of the face from the head area can be referred to the description of related content in the foregoing embodiment, and details are not described herein again.

In step 203, the number of the detected face key points is determined. If the number of face key points is one, step 204 is performed. If the number of face key points is two or more, step 205 is performed. .

In step 204, a detected face key point is a face center point.

Specifically, a face key point detected in the target user's head area is a face center point. For example, if only the target user's eyes are detected, the eye is used as the target user's face center point.

Step 205: Acquire a first center point of all detected face key points, and use the first center point as the face center point.

The first center point is the center point of the key point pattern surrounded by all detected key points of the face. 4 is a schematic diagram of a location of a face key point according to an embodiment of the present disclosure. As shown in FIG. 4, each face key point is used as a connection node of a key point graphic, and one of the nodes is used as a starting node, and all The nodes are connected one by one to form a keypoint graph covering all the nodes. If the keypoint graph obtained is a symmetrical graph (as shown in Fig. 4), the midpoint of the symmetry axis of the keypoint graph is the keypoint graph. a center point, the first center point of the key point graphic is determined as the center point of the face; if the key point figure is an irregular figure, the intersection of the longest axis and the shortest axis of the irregular figure is the first center point of the key point graphic The first center point of the key point graphic is determined as the center point of the face.

The method for determining a face center point of the embodiment of the present disclosure determines a face center point by the detected face key point, and performing focus tracking on the face center point to ensure that the target user's face area is within the following area of the smart device, so that The target users who keep the smart device and monitor can interact face to face.

Based on the above embodiment, before the face is detected, it is necessary to generate an image area in advance, which is the following area. FIG. 5 is a schematic flowchart of a focus following method of another smart device according to an embodiment of the present disclosure.

As shown in FIG. 5, the focus following method includes the following steps:

Step 301: Acquire a reference point of an image area for focus following.

Specifically, the smart device takes the intersection of the horizontal symmetry axis and the vertical symmetry axis of the collected environment image as the center point of the environment image, and then uses the center point of the environment image as the reference point of the image region for the focus to follow.

Step 302, generating an image area for focus following.

Specifically, the smart device takes a preset pixel value as a radius, and uses a reference point of the image area for the focus to be the center of the circle, and generates a circular image area for the focus to follow. The size of the pixel value is preset according to the maximum pixel value of the camera device and the distance between the camera device and the target user. For example, when the camera of the smart device is 2 million pixels, the user and the camera device are different through a large amount of experimental data. The average value of the face detection area under the distance is rounded at a radius of 72 pixels when the target user is 2 meters away from the smart device, and the face area can be ensured to be within the image area of the circle.

Step 303, controlling the image area to perform focus following.

Specifically, the smart device periodically determines whether the detected face center point is in the image area, and when the face center point is not in the image area, the smart device controls the image area to perform focus tracking.

In a specific implementation, FIG. 6 is a schematic diagram of a focus following process according to an embodiment of the present disclosure. As shown in FIG. 6 , the reference point of the image region is taken as the origin, and the horizontal symmetry axis and the vertical symmetry axis of the image region are taken as the X axis. And the Y-axis generates a coordinate system. When the face center point is not in the image area, the shortest path between the face center point and the image area center point is obtained, that is, the reference point of the image area is used as the starting point, and the face center point is used. A directed line segment that is the end point, and according to the shortest path, obtains control information for controlling the movement of the smart device, for example, along the image area

The direction is moved by 5 cm or the like, thereby controlling the smart device to move according to the control information, so that the detected center point of the face falls into the image area.

The focus following method of the embodiment of the present disclosure generates a circular image region for focus following according to a center point of the collected environment image and a preset pixel value radius, compared to the “well” character following region in the related art. Or the square following area has four corners removed, so that the image area following the focus is more accurate, and the focus is followed according to the shortest path between the center point of the face and the center point of the image area, which shortens the movement of the camera or the visual system. Time increases the timeliness of focus follow-up.

Based on the above embodiment, in the case where the target user is unable to detect the face key point such as turning or turning, the smart device detects the target user's human key point for focus follow. However, the user's action such as bowing or turning may only last for a short period of time. It can be understood that focusing on the target user's face key points is easier to make the user observe the smart device on the basis of ensuring that the focus is not followed. In order to further improve the active interaction effect of the smart device, the embodiment of the present disclosure proposes a specific focus tracking method of the smart device.

Specifically, FIG. 7 is a schematic flowchart of a focus tracking method of a specific smart device according to an embodiment of the present disclosure. As shown in FIG. 7, the method includes:

Step 401, identifying from the collected human body area located below the head area.

The smart device identifies the human body area under the head area of the target user in the environment image when the smart device such as the user is unable to collect the key point of the face. For example, the deep learning technique is used to acquire the feature model of the human body in different forms, and the collected environmental image is matched with the feature model to identify the human body region in various forms such as standing, sitting and walking.

Step 402: After identifying the human body area, the imaging angle of the pan/tilt camera controlling the smart device moves in the direction of the head region.

In order to enable the "face-to-face" interaction between the smart device and the target user, after identifying the human body region, an attempt can be made to raise the camera angle of the PTZ camera or the PTZ camera, and in the past, to find the target user's head. Specifically, the camera angle of the pan/tilt camera or the pan-tilt camera is moved in the direction in which the head region is located, that is, the shooting angle or position is adjusted upward from the current shooting angle or position.

As an example, it may be slowly moved up or raised at a preset fixed speed.

As another example, depending on the position of the center point of the human body, the camera movement can be controlled at different speeds, for example, when the center point of the human body is the intersection of the neck and the trunk of the target user, the speed is slowed upward at 10°/s. Move, when the center point of the human body is located at the center point of the target user's torso, move upwards at a speed of 20°/s, thereby reducing the focus search time and avoiding focus follow loss.

Step 403: After the camera angle is moved, the captured environment image is captured.

Step 404, determining whether a header area is included in the environment image.

Performing header area identification on the currently collected environment image, if it is recognized that the head image area is included in the environment image, step 405 is performed; if it is recognized that the head area is not included in the environment image, step 406 is performed.

It should be noted that, in the process of performing the header area identification from the currently collected environment image, refer to the description of related content in the foregoing embodiment, and details are not described herein again.

Step 405, identifying a face key point from the head area.

It should be noted that the process of recognizing the key points of the face from the head area can be referred to the description of related content in the foregoing embodiment, and details are not described herein again.

Further, after the face key point is recognized from the head region, the face center point is determined according to the face key point, and the face center point is subjected to focus follow.

Step 406: Detect a human key point of the target user from the environment image.

For the process of identifying the key points of the human body from the environment image, refer to the description of related content in the above embodiments, and details are not described herein again.

If the head region is not included in the environment image, or the face key is still not detected in the head region, the human key point of the target user is detected from the environment image. Further, after extracting the key points of the human body, the center point of the human body is determined according to the key points of the human body, and then the focus of the center point of the human body is followed.

The focus following method of the smart device of the embodiment of the present disclosure detects a key point of the face on the basis of detecting the key point of the human body, and if the key point of the face is detected, determining the center point of the face according to the key point of the face If the focus is not followed, the focus of the human body is determined according to the key points of the human body. On the basis of ensuring that the focus is not lost, the focus of the target user's face is followed, which improves the vividness and flexibility of the smart device interaction.

Based on the foregoing embodiment, if there are multiple users in the environment image collected by the smart device, the smart device needs to identify the target user who has the willingness to interact with the smart device to perform focus tracking. As a possible implementation manner, the target user may be selected according to the distance between the human body and the smart device of the candidate target. FIG. 8 is a schematic flowchart of a method for determining a target user according to an embodiment of the present disclosure. As shown in FIG. 8, the method for determining a target user includes:

Step 501: Perform human body recognition on the environment image.

In this embodiment, the smart device can identify the human body in the environment image through face detection or human body detection.

Step 502: When a plurality of human bodies are identified from the environment image, obtain a distance between each human body and the smart device.

Specifically, the smart device can recognize each human body that enters the monitoring range from the collected environmental image. In this embodiment, each human body identified is regarded as a candidate. For the method of the human body identification, reference may be made to the description of the foregoing embodiment, and details are not described herein again.

Further, the smart device acquires the distance between each human body and the smart device in the environment image. It can be understood that the closer the distance between the candidate target and the smart device, the possibility that there is an interaction intention between the candidate target and the smart device. The greater the degree of the interaction, the distance between the candidate target and the smart device is used as one of the basis for determining whether the candidate target exists or not, and the interaction intention of interacting with the smart device.

In this embodiment, the distance between the candidate target and the smart device can be obtained by a depth camera or a binocular vision camera or a laser radar.

As a possible implementation manner, the smart device is configured with a depth camera, and the depth map of the candidate target is obtained through the depth camera. In a specific implementation, a controllable light spot, a light strip or a smooth surface structure can be projected to the candidate target surface by the structured light projector, and an image is obtained by the image sensor in the depth camera, and the candidate is calculated by using the triangular principle through the geometric relationship. The three-dimensional coordinates of the target, so that the distance between the candidate target and the smart device can be obtained.

As another possible implementation, a binocular vision camera is configured in the smart device, and the candidate target is captured by the binocular vision camera. Then, the parallax of the image captured by the binocular vision camera is calculated, and the distance between the candidate target and the smart device is calculated based on the parallax.

FIG. 9 is a schematic diagram of the principle of calculating binocular vision distance according to an embodiment of the present disclosure. In Fig. 9, in the actual space, the positions O _l and O _{r of} the two cameras are plotted, and the optical axes of the left and right cameras, the focal planes of the two cameras, and the focal plane are at a distance f from the plane of the two cameras.

As shown in FIG. 9, p and p' are the positions of the same candidate target P in different captured images, respectively. Wherein, the distance from the p-point to the left boundary of the captured image is x _l , and the distance from the p-point to the left boundary of the captured image is x _r . O _l and _Or are respectively two cameras, the two cameras are in the same plane, and the distance between the two cameras is Z.

Based on the principle of triangulation, the distance b between P in Fig. 9 and the plane where the two cameras are located has the following relationship:

Based on this, you can push

Where d is the visual difference of the image captured by the same candidate target binocular camera. Since Z and f are constant values, the distance b between the candidate target and the plane of the camera, that is, the distance between the candidate target and the smart device, can be determined according to the visual difference d.

As a further possible implementation, the laser radar is arranged in the smart device, and the laser is emitted into the monitoring range by the laser radar, and the emitted laser encounters obstacles within the monitoring range to be reflected. The smart device receives the laser returned by each obstacle within the monitored range and generates a binary map of each obstacle based on the returned laser. Then, each binary image is fused with the environment image, and the binary image corresponding to the candidate target is identified from all the binary images. Specifically, the contour or size of each obstacle can be identified according to the binary map of each obstacle, and then the contour or size of each target in the environment image is matched, so that the binary map corresponding to the candidate target can be obtained. . Then, the laser return time of the binary image corresponding to the candidate target is multiplied by the speed of light, and divided by 2 to obtain the distance between the candidate target and the smart device.

It should be noted that other methods for calculating the distance between the candidate target and the smart device are also included in the scope of the embodiments of the present disclosure.

Step 503: Select a human body that is closest to the smart device as the human body corresponding to the target user.

Specifically, when the distance between the candidate target and the smart device is far, the candidate target may not have the interaction intention of interacting with the smart device, so the human body closest to the smart device is selected as the human body corresponding to the target user for focus tracking.

It should be noted that there may be multiple human bodies closest to the smart device. For example, multiple users stand in parallel rows to visit the smart device, and only the presenter has the intention to interact with the smart device. At this time, the smart device can query the face image corresponding to the human body closest to the smart device in the registered user face image database to determine the target user, wherein the human body corresponding to the target user can be determined in different manners according to actual conditions.

In the first example, if there is a face image corresponding to the human body closest to the smart device in the face image library, a human body closest to the smart device is used as the human body corresponding to the target user.

In the second example, if there is no face image corresponding to the human body closest to the smart device in the face image library, a human body closest to the smart device is randomly selected as the human body corresponding to the target user.

In a third example, if there are a plurality of face images corresponding to the human body closest to the smart device in the face image library, the human body closest to the smart device is firstly queried as the human body corresponding to the target user.

A focus tracking method of a smart device according to an embodiment of the present disclosure, by using a distance between a candidate target and a smart device, selecting, from all candidate targets, a candidate target having an interaction intention of interacting with the smart device, when the face is detected Directly using people as interactive targets can reduce the false start of smart devices.

In order to implement the above embodiments, an embodiment of the present disclosure further provides a focus following device of a smart device. FIG. 10 is a schematic structural diagram of a focus following device of a smart device according to an embodiment of the present disclosure.

As shown in FIG. 10, the focus following device of the smart device includes: a detecting module 110, a determining module 120, and a control module 130.

The detecting module 110 is configured to detect a face key point of the target user from the environment image collected by the smart device, and when the face key point is not detected from the environment image, from the environment image Detecting key points of the human body of the target user.

The determining module 120 is configured to determine a face center point according to the face key point, and determine a body center point according to the body key point when the body key point is detected.

The control module 130 is configured to control the smart device to perform focus tracking on the face center point, and when the body center point is determined, control the smart device to perform focus tracking on the body center point.

In a possible implementation manner of the embodiment, the control module 130 is specifically configured to: periodically determine whether the detected face center point or the body center point is in the image area; when the face center point Or obtaining a shortest path between the face center point or the body center point and the image area center point when the body center point is not in the image area; acquiring, according to the shortest path, controlling the smart device movement Control information; controlling the smart device to move according to the control information, such that the detected face center point or body center point falls within the image area.

In a possible implementation manner of the embodiment, the detecting module 110 is specifically configured to: identify a head area of the target user from the environment image according to a preset head feature; and from the head area Extract the key points of the face.

The determining module 120 is specifically configured to: if the extracted face key point is one, use the face key point as the face center point; if the extracted face key points are two and Two or more, obtaining the first center point of all the extracted face key points, and using the first center point as the face center point.

In a possible implementation manner of the embodiment, the determining module 120 is specifically configured to: use each face key point as a node, and use one of the nodes as a starting node to connect all the nodes one by one to form an overlay all. a key point graph of the node; acquiring a center point of the key point graph, and determining a center point of the key point graph as the first center point.

In a possible implementation manner of the embodiment, the detecting module 110 is specifically configured to: identify, from the collected human body region located below the head region; and when the human body region is identified, control the smart The camera angle of the pan/tilt camera of the device moves toward the direction of the head region; after the camera angle is moved, the captured environment image is captured; whether the head region is included in the environment image; if the environment image Including the head region, the face key point is identified from the head region; if the head region is not included in the environment image, detecting the target user from the environment image The key point of the human body.

Based on the foregoing embodiment, if there are multiple users in the environment image collected by the smart device, the smart device needs to identify the target user who has the willingness to interact with the smart device to perform focus following and generate an image region for focus following. FIG. 11 is a schematic structural diagram of a focus following device of another smart device according to an embodiment of the present disclosure. As shown in FIG. 11 , before the focus following device of the smart device of the foregoing embodiment, the human body recognition module 210 and the distance detecting device are further included. Module 220, selection module 230, and generation module 240.

The human body recognition module 210 is configured to perform human body recognition on the environment image before detecting a key point of the target user from the environment image collected by the smart device;

The distance detecting module 220 is configured to acquire a distance between each human body and the smart device when a plurality of human bodies are identified from the environment image;

The selecting module 230 is configured to select a human body that is closest to the smart device as a human body corresponding to the target user.

a generating module 240, configured to identify a center point of the environment image collected by the smart device before identifying a face key point of the target user in the environment image collected by the smart device, to use the environment image The center point is the reference point, and a circular image area for focus tracking is generated.

The focus following device of the smart device of the embodiment of the present disclosure first detects a face key point of the target user from the environment image collected by the smart device, determines a face center point according to the face key point, and controls the smart device to The face center point performs focus tracking. If the face key point is not detected from the environment image, the target user's human key point is detected from the environment image, the body center point is determined according to the body key point, and the smart device is controlled to the human body center. Point to focus follow. Therefore, the device solves the technical problem that the focus cannot be maintained due to the detection of the key points of the face, and the key points of the human body are used as the focus to complement the image. When the smart device does not detect the key point of the face, the image is collected from the captured image. The detection of the key points of the human body as the focus of follow-up avoids the loss of focus of the user in the case of bowing and turning, and improves the success rate and accuracy of focus following.

In order to achieve the above object, an embodiment of the present disclosure further provides a smart device, including: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside the space enclosed by the housing, the processor and The memory is disposed on the circuit board; the power circuit is configured to supply power to each circuit or device of the smart device; the memory is used to store executable program code; and the processor is executable and executable by reading executable program code stored in the memory. A program corresponding to the program code for implementing a focus following method of the smart device as described in the above embodiments.

In order to achieve the above object, an embodiment of the present disclosure further provides a non-transitory computer readable storage medium having stored thereon a computer program, which is executed by a processor to implement focus tracking of a smart device as described in the above embodiments. method.

FIG. 12 illustrates a block diagram of an exemplary smart device suitable for use in implementing embodiments of the present application. As shown in FIG. 12, the smart device includes a housing 310, a processor 320, a memory 330, a circuit board 340, and a power supply circuit 350. The circuit board 340 is disposed inside the space enclosed by the housing 310, and the processor 320 and The memory 330 is disposed on the circuit board 340; the power supply circuit 350 is configured to supply power to the respective circuits or devices of the smart device; the memory 930 is used to store executable program code; and the processor 320 reads the executable program stored in the memory 330. The code runs a program corresponding to the executable program code for executing the focus following method of the smart device described in the above embodiments.

In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material, or feature is included in at least one embodiment or example of the present disclosure. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.

Moreover, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In the description of the present disclosure, the meaning of "a plurality" is at least two, such as two, three, etc., unless specifically defined otherwise.

Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing the steps of a custom logic function or process. And the scope of the preferred embodiments of the present disclosure includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an inverse order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present disclosure pertain.

The logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium, Used in conjunction with, or in conjunction with, an instruction execution system, apparatus, or device (eg, a computer-based system, a system including a processor, or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus, or device) Or use with equipment. For the purposes of this specification, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.

It should be understood that portions of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware and in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), and the like.

One of ordinary skill in the art can understand that all or part of the steps carried by the method of implementing the above embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, one or a combination of the steps of the method embodiments is included.

In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.

The above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present disclosure have been shown and described above, it is understood that the foregoing embodiments are illustrative and are not to be construed as limiting the scope of the disclosure The embodiments are subject to variations, modifications, substitutions and variations.

Claims

A focus following method for a smart device, comprising the steps of:

Detecting a face key point of the target user from the environment image collected by the smart device, determining a face center point according to the face key point, and controlling the smart device to perform focus tracking on the face center point;

If the face key point is not detected from the environment image, detecting a human body key point of the target user from the environment image, determining a body center point according to the body key point, and controlling the smart point The device performs focus tracking on the center point of the human body.
The method according to claim 1, wherein before the identifying the key point of the target user in the environment image collected by the smart device, the method further comprises:

Identifying a center point of the environment image collected by the smart device, and using a center point of the environment image as a reference point, generating a circular image area for focus following.
The method of claim 2 wherein said performing focus tracking comprises:

Timingly determining whether the detected face center point or the body center point is in the image area;

Obtaining a shortest path between the face center point or the body center point and the image area center point when the face center point or the body center point is not in the image area;

Obtaining control information for controlling movement of the smart device according to the shortest path;

Controlling the smart device to move according to the control information, such that the detected face center point or body center point falls within the image area.
The method according to any one of claims 1 to 3, wherein the target user's face key point is detected from the environment image collected by the smart device, and the face center point is determined according to the face key point. include:

Identifying a head region of the target user from the environment image according to a preset head feature;

Extracting the face key point from the head region;

If the extracted face key point is one, the face key point is used as the face center point;

If the extracted face key points are two or more, obtaining the first center point of all the extracted face key points, and using the first center point as the face center point .
The method according to claim 4, wherein the obtaining the first center point of all the extracted face key points comprises:

Each face key is taken as a node, and one of the nodes is used as a starting node, and all the nodes are connected one by one to form a key point graph covering all the nodes;

Obtaining a center point of the key point graphic, and determining a center point of the key point graphic as the first center point.
The method according to any one of claims 1 to 5, wherein the detecting a human key point of the target user from the environment image comprises:

Identifying from a human body region located below the head region;

After the human body area is identified, an imaging angle of a pan/tilt camera that controls the smart device moves in a direction in which the head area is located;

After the camera angle is moved, capturing an environment image;

Determining whether the head region is included in the environment image;

If the head region is included in the environment image, identifying the face key point from the head region;

If the head region is not included in the environment image, the human key point of the target user is detected from the environment image.
The method according to any one of claims 1-6, wherein before the detecting the key point of the target user in the environment image collected by the smart device, the method further comprises:

Performing human body recognition on the environmental image;

Obtaining a distance between each human body and the smart device when a plurality of human bodies are identified from the environmental image;

The human body closest to the smart device is selected as the human body corresponding to the target user.
The method according to claim 7, wherein the selecting a human body closest to the smart device as the human body corresponding to the target user comprises:

When there are a plurality of human bodies that are closest to the smart device, query whether the face image corresponding to the human body closest to the smart device exists in the registered user face image library of the smart device;

If a face image corresponding to the human body closest to the smart device exists in the face image library, the human body closest to the smart device is used as a human body corresponding to the target user;

If the face image corresponding to the human body closest to the smart device does not exist in the face image library, randomly select a human body closest to the smart device as the human body corresponding to the target user;

If a plurality of face images corresponding to the human body closest to the smart device are present in the face image library, the human body closest to the smart device is firstly queried as the human body corresponding to the target user. .
A focus following device for a smart device, comprising:

a detecting module, configured to detect a face key point of the target user from the environment image collected by the smart device, and detect the image from the environment image when the face key point is not detected from the environment image The key point of the target user's body;

a determining module, configured to determine a face center point according to the face key point, and determine a body center point according to the body key point when the human body key point is detected;

And a control module, configured to control the smart device to perform focus tracking on the face center point, and control the smart device to perform focus tracking on the human body center point when determining the body center point.
The device according to claim 9, further comprising:

a generating module, configured to identify a center point of the environment image collected by the smart device before identifying a face key point of the target user in the environment image collected from the smart device, to use the environment image The center point is the reference point, and a circular image area for focus tracking is generated.
The device according to claim 10, wherein the control module is specifically configured to:

Timingly determining whether the detected face center point or the body center point is in the image area;

Obtaining a shortest path between the face center point or the body center point and the image area center point when the face center point or the body center point is not in the image area;

Obtaining control information for controlling movement of the smart device according to the shortest path;

Controlling the smart device to move according to the control information, such that the detected face center point or body center point falls within the image area.
A device according to any one of claims 9-11, wherein

The detecting module is specifically configured to: identify a head area of the target user from the environment image according to a preset head feature; and extract the face key point from the head area;

The determining module is specifically configured to: if the extracted face key point is one, the face key point is used as the face center point; if the extracted face key point is two And two or more, obtaining the first center point of all the extracted face key points, and using the first center point as the face center point.
The device according to claim 12, wherein the determining module is specifically configured to:

Each face key is taken as a node, and one of the nodes is used as a starting node, and all the nodes are connected one by one to form a key point graph covering all the nodes;

Obtaining a center point of the key point graphic, and determining a center point of the key point graphic as the first center point.
The device according to any one of claims 9 to 13, wherein the detecting module is specifically configured to:

Identifying a human body region located below the head region from the collection;

After the human body area is identified, an imaging angle of a pan/tilt camera that controls the smart device moves in a direction in which the head area is located;

After the camera angle is moved, capturing an environment image;

Determining whether the head region is included in the environment image;

If the head region is included in the environment image, identifying the face key point from the head region;

If the head region is not included in the environment image, the human key point of the target user is detected from the environment image.
The device according to any one of claims 9 to 14, wherein the device further comprises:

a human body recognition module, configured to perform human body recognition on the environmental image before detecting a key point of the target user from an environment image collected by the smart device;

a distance detecting module, configured to acquire a distance between each human body and the smart device when a plurality of human bodies are identified from the environment image;

And a selection module, configured to select a human body that is closest to the smart device as a human body corresponding to the target user.
The device according to claim 15, wherein the selecting module is specifically configured to:

When there are a plurality of human bodies that are closest to the smart device, query whether the face image corresponding to the human body closest to the smart device exists in the registered user face image library of the smart device;

If a face image corresponding to the human body closest to the smart device exists in the face image library, the human body closest to the smart device is used as a human body corresponding to the target user;

If the face image corresponding to the human body closest to the smart device does not exist in the face image library, randomly select a human body closest to the smart device as the human body corresponding to the target user;

If a plurality of face images corresponding to the human body closest to the smart device are present in the face image library, the human body closest to the smart device is firstly queried as the human body corresponding to the target user. .
A smart device, comprising: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside the space enclosed by the housing, the processor and the memory are disposed on the circuit board; and the power circuit Used to supply power to various circuits or devices of the above smart device; the memory is used to store executable program code; the processor runs a program corresponding to the executable program code by reading executable program code stored in the memory for use in A focus following method of a smart device according to any one of claims 1-8.
A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program is executed by a processor to implement a focus following method of the smart device according to any one of claims 1-8.