WO2021008207A1

WO2021008207A1 - Target tracking method and apparatus, intelligent mobile device and storage medium

Info

Publication number: WO2021008207A1
Application number: PCT/CN2020/089620
Authority: WO
Inventors: 张军伟
Original assignee: 上海商汤智能科技有限公司
Priority date: 2019-07-17
Filing date: 2020-05-11
Publication date: 2021-01-21
Also published as: CN110348418A; TW202215364A; KR20210072808A; TW202105326A; TWI755762B; CN110348418B; JP2022507145A

Abstract

A target tracking method and apparatus, an intelligent mobile device and a storage medium. The method comprises: obtaining an acquired image (S10); determining the position of a target object in the image (S20); and obtaining, on the basis of the distance between the position of the target object and the central position of the image, a control instruction for controlling an intelligent mobile device to rotate (S30), wherein the control instruction is used for positioning the target object at the center of the image, the control instruction comprises a rotation instruction corresponding to an offset value in an offset sequence constituting the distance, and the offset sequence comprises at least one offset value.

Description

Target tracking method and device, smart mobile equipment and storage medium

Cross references to related applications

This application is filed based on a Chinese patent application with an application number of 201910646696.8 and an application date of July 17, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.

Technical field

The embodiments of the present application relate to the field of computer vision technology, and relate to but not limited to a target tracking method and device, smart mobile equipment, and storage media.

Background technique

At present, smart mobile devices such as remote control cars and mobile robots are used in various fields. For example, in the education industry, remote control cars can be used as teaching tools to achieve target tracking.

Summary of the invention

The embodiment of the present application proposes a target tracking method and device, smart mobile equipment and storage medium.

An embodiment of the present application provides a target tracking method, including: acquiring a captured image; determining the position of a target object in the image; and determining the distance between the position of the target object and the center position of the image A control instruction for controlling the rotation of a smart mobile device, wherein the control instruction is used to make the target object be located at the center of the image, and the control instruction includes an offset in an offset sequence constituting the distance Value corresponding to the rotation instruction, the offset sequence includes at least one offset value.

In some embodiments of the present application, before determining the position of the target object in the image, the method further includes performing a preprocessing operation on the image, and the preprocessing operation includes: adjusting the image to a preset A grayscale image of a specification, and performing normalization processing on the grayscale image; wherein the determining the position of the target object in the image includes: performing target detection processing on the image obtained after the preprocessing operation , Obtaining the position of the target object in the image after the preprocessing operation; and determining the position of the target object in the image based on the position of the target object in the image after the preprocessing operation.

In some embodiments of the present application, the performing normalization processing on the grayscale image includes: determining the average value and standard deviation of the pixel value of each pixel in the grayscale image; obtaining each pixel The difference between the pixel value of and the average value; the ratio between the difference and the standard deviation corresponding to each pixel is determined as the normalized pixel value of each pixel .

In some embodiments of the present application, the determining the location of the target object in the image includes: extracting image features of the image; performing classification processing on the image features to obtain the location of the target object in the image Area; the center position of the location area is determined as the location of the target object.

In some embodiments of the present application, the target object includes: a human face; correspondingly, the determining the position of the target object in the image includes: determining the position of the human face in the image.

In some embodiments of the present application, the determining the control instruction for controlling the rotation of the smart mobile device based on the distance between the position of the target object and the center position of the image includes: based on the target in the image Determine the target offset based on the distance between the position of the object and the center position of the image; generate multiple sets of offset sequences based on the target offset, and the sum of the offset values in each set of offset sequences Is the target offset; using a reinforcement learning algorithm, select an offset sequence that meets the requirements from the multiple sets of offset sequences, and determine the control instruction corresponding to the offset sequence that meets the requirements.

In some embodiments of the present application, the use of a reinforcement learning algorithm to select an offset sequence that meets the requirements from the multiple sets of offset sequences includes: determining for each offset value in the multiple sets of offset sequences The maximum value corresponding to the offset value in the value table, the value table includes the value corresponding to the offset value under different rotation commands; the reward value corresponding to the offset value is obtained, and the corresponding value is based on the offset value The reward value and the maximum value of the offset value are determined to determine the final value of the offset value, and the reward value is the position of the target object when the rotation instruction corresponding to the maximum value of the offset value is not executed The distance between the center positions of the image; the offset sequence with the largest sum of the final value of the offset values in the multiple sets of offset sequences is determined as the offset sequence that meets the requirements.

In some embodiments of the present application, the determining the control instruction corresponding to the offset sequence that meets the requirements includes: determining the rotation instruction corresponding to the maximum value of each offset value in the offset sequence that meets the requirements. The control instructions.

In some embodiments of the present application, the method further includes: driving the smart mobile device to perform rotation based on the control instruction.

In some embodiments of the present application, the method further includes: determining a control instruction for controlling the movement of the smart mobile device based on the location area of the target object, wherein the response to the location area of the target object corresponds to If the area is greater than the first threshold, generate a control instruction for controlling the back of the smart mobile device; in response to the area corresponding to the location area of the target object is less than the second threshold, generate a control instruction for controlling the smart mobile device to move forward , The first threshold is greater than the second threshold.

An embodiment of the application provides a target tracking device, which includes: an image acquisition module configured to acquire an image; a target detection module configured to determine the position of a target object in the image; and a control module configured to be based on The distance between the position of the target object and the center position of the image determines a control instruction for controlling the rotation of the smart mobile device, wherein the control instruction is used to make the position of the target object be located at the center position of the image , And the control instruction includes a rotation instruction corresponding to an offset value in an offset sequence constituting the distance, and the offset sequence includes at least one offset value.

In some embodiments of the present application, the device further includes a preprocessing module configured to perform a preprocessing operation on the image, and the preprocessing operation includes: adjusting the image to a grayscale image of a preset specification, And performing normalization processing on the grayscale image; the target detection module is further configured to perform target detection processing on the image obtained after the preprocessing operation to obtain the target object in the image after the preprocessing operation Based on the position of the target object in the image after the preprocessing operation, determine the position of the target object in the image.

In some embodiments of the present application, the step of performing the normalization process on the grayscale image by the preprocessing module includes: determining the average value and standard of the pixel value of each pixel in the grayscale image Difference; obtain the difference between the pixel value of each pixel and the average value; determine the ratio between the difference and the standard deviation corresponding to each pixel as the pixel The pixel value after point normalization.

In some embodiments of the present application, the target detection module is further configured to extract image features of the image; perform classification processing on the image features to obtain the location area of the target object in the image; The center position of the area is determined as the position of the target object.

In some embodiments of the present application, the target object includes a human face; correspondingly, the target detection module is further configured to determine the position of the human face in the image.

In some embodiments of the present application, the control module is further configured to determine the target offset based on the distance between the position of the target object in the image and the center position of the image; based on the target offset Generate multiple sets of offset sequences, and the sum of the offset values in each set of offset sequences is the target offset; the reinforcement learning algorithm is used to select from the multiple sets of offset sequences that meet the requirements Offset sequence, and obtain the control instruction corresponding to the offset sequence that meets the requirements.

In some embodiments of the present application, the control module is further configured to determine the maximum value corresponding to the offset value in the value table for each offset value in the multiple sets of offset sequences, and the value table includes The value corresponding to the offset value under different rotation commands; the reward value corresponding to the offset value is obtained, and the final value of the offset value is determined based on the reward value and the maximum value corresponding to the offset value Value, the reward value is the distance between the position of the target object and the center of the image when the rotation instruction corresponding to the maximum value of the offset value is not executed; the offset value of each offset value in the multiple sets of offset sequences The offset sequence with the largest sum of the final value is determined as the offset sequence that meets the requirements.

In some embodiments of the present application, the control module is further configured to determine the control instruction based on the rotation instruction corresponding to the maximum value of each offset value in the offset sequence that meets the requirements.

In some embodiments of the present application, the target detection module is further configured to determine a control instruction for controlling the movement of the smart mobile device based on the location area of the target object, wherein the location area corresponding to the target object If the area is greater than the first threshold, generate a control instruction to control the back of the smart mobile device; if the area corresponding to the location area of the target object is less than the second threshold, generate a control to control the smart mobile device to move forward Instruction that the first threshold is greater than the second threshold.

The embodiment of the present application provides a smart mobile device, which includes the target tracking device, and the target detection module in the target tracking device is integrated in the management device of the smart mobile device, and the management device executes the The target detection processing of the image collected by the image acquisition module obtains the position of the target object; the control module is connected with the management device and is used to generate the control instruction according to the position of the target object obtained by the management device, and The control instruction controls the rotation of the smart mobile device.

In some embodiments of the present application, the management device is also integrated with the preprocessing module of the target tracking device for performing preprocessing operations on the images, and performing target detection on the images after the preprocessing operations Processing to obtain the position of the target object in the image.

In some embodiments of the present application, the smart mobile device includes an educational robot.

The embodiment of the present application provides a smart mobile device, which includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute any one The target tracking method described in item.

An embodiment of the present application provides a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the target tracking method described in any one of the first aspect is implemented.

The embodiment of the present application provides a computer program, including computer-readable code. When the computer-readable code runs in a smart mobile device, the processor in the smart mobile device executes the Target tracking method.

The target tracking method and device, smart mobile device, and storage medium provided by the embodiments of the application can obtain the position of the target object in the collected image, and obtain the position of the smart mobile device according to the distance between the position of the target object and the image center. The control instruction is used to control the rotation of the smart mobile device, and the obtained control instruction includes at least one rotation instruction corresponding to an offset value, wherein the distance between the offset sequence formed by each offset value and the target object and the image center It is determined that the obtained control instruction can make the rotated target object be in the center of the collected image, so that the target object is within the tracking range of the smart mobile device. The target tracking method and device, smart mobile device, and storage medium provided in the embodiments of the present application can perform target tracking according to the position of the target object in real time, which is more convenient and accurate.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, rather than limiting the embodiments of the present application.

According to the following detailed description of exemplary embodiments with reference to the accompanying drawings, other features and aspects of the embodiments of the present application will become clear.

Description of the drawings

The drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments that conform to the application and are used together with the specification to illustrate the technical solutions of the embodiments of the application.

FIG. 1 is a schematic flowchart of a target tracking method provided by an embodiment of this application;

FIG. 2 is a schematic diagram of a process of performing preprocessing on an image provided by an embodiment of the application;

3 is a schematic flowchart of step S20 in a target tracking method provided by an embodiment of this application;

4 is a schematic flowchart of step S30 in a target tracking method according to an embodiment of the application;

5 is a schematic flowchart of step S303 in a target tracking method provided by an embodiment of this application;

6 is a schematic diagram of another process of a target tracking method provided by an embodiment of the application;

FIG. 7 is an application example diagram of a target tracking method provided by an embodiment of the application;

FIG. 8 is a schematic flowchart of a preprocessing process provided by an embodiment of the application;

FIG. 9 is a schematic diagram of the training process of the target detection network provided by an embodiment of the application;

FIG. 10 is a schematic diagram of the application process of the target detection network provided by an embodiment of this application;

11 is a schematic flowchart of a path planning algorithm based on reinforcement learning provided by an embodiment of the application;

FIG. 12 is a schematic structural diagram of a target tracking device provided by an embodiment of this application;

FIG. 13 is a schematic structural diagram of a smart mobile device provided by an embodiment of this application.

Detailed ways

Various exemplary embodiments, features, and aspects of the embodiments of the present application will be described in detail below with reference to the accompanying drawings. The same reference numerals in the drawings indicate elements with the same or similar functions. Although various aspects of the embodiments are shown in the drawings, unless otherwise noted, the drawings are not necessarily drawn to scale.

The dedicated word "exemplary" here means "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" need not be construed as being superior or better than other embodiments.

The term "and/or" in this article is only an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations. In addition, the term "at least one" in this document means any one or any combination of at least two of the multiple, for example, including at least one of A, B, and C, may mean including A, Any one or more elements selected in the set formed by B and C.

In addition, in order to better illustrate the embodiments of the present application, numerous specific details are given in the following specific embodiments. Those skilled in the art should understand that the embodiments of the present application can also be implemented without certain specific details. In some examples, the methods, means, elements, and circuits well known to those skilled in the art have not been described in detail, so as to highlight the gist of the embodiments of the present application.

The embodiment of the application provides a target tracking method, which can be applied to any smart mobile device with image processing function. For example, the target tracking method can be applied to devices such as mobile robots, remote-controlled vehicles, and aircraft. The foregoing is only an example. As long as the device that can achieve movement can use the target tracking method provided in the embodiments of the present application. In some possible implementation manners, the target tracking method may be implemented by a processor invoking computer-readable instructions stored in a memory.

FIG. 1 is a schematic flowchart of a target tracking method provided by an embodiment of the application. As shown in FIG. 1, the target tracking method includes:

Step S10: Obtain the collected image;

In some embodiments of the present application, the smart mobile device to which the target tracking method of the embodiments of the present application is applied may include an image acquisition device, such as a camera or a camera. In the embodiments of the present application, images can be directly collected by an image collection device, or video data can be collected by the image collection device, and the video data can be subjected to frame division or frame selection processing to obtain corresponding images.

Step S20: Determine the position of the target object in the image;

In some embodiments of the present application, when the captured image is obtained, the target detection process of the captured image can be performed, that is, whether the target object exists in the captured image, and when the target object exists, it is determined where the target object is s position.

In some embodiments of the present application, the target detection processing can be realized through a neural network. The target object detected by the embodiment of the present application may be any type of object, for example, the target object may be a human face, or the target object may be another object to be tracked, which is not specifically limited in the embodiment of the present application. Or, in some embodiments, the target object may be an object with a specific known identity, that is, the embodiments of the present application can perform tracking of corresponding types of objects (such as all face images), or perform tracking of a specific identity. The tracking of an object (such as a known specific face image) can be set according to requirements, which is not specifically limited in the embodiment of the present application.

In some embodiments of the present application, the neural network that implements target detection processing may be a convolutional neural network. After training, the neural network can accurately detect the position of the target object in the image. The form of the neural network is not limited. .

In one example, in the process of performing target detection processing on the image, perform feature extraction on the image to obtain image features, and then perform classification processing on the image features to obtain the location area of the target object in the image, based on the location area. Determine the location of the target object. The classification result obtained by the classification process may include the identification of whether there is a target object in the image, such as the first identification or the second identification, where the first identification indicates that the pixel corresponding to the current position in the image is the target object, and the second identification indicates that the current The pixel point corresponding to the position in the image is not the target object, and the position of the target object in the image can be determined by the area formed by the first identifier. For example, the center position of the area can be determined as the position of the target object. Through the foregoing, when the target object is included in the image, the position of the target object in the image can be directly obtained, for example, the position of the target object can be expressed in the form of coordinates. In the embodiment of the present application, the center position of the position area of the target object in the image may be used as the position of the target object. In addition, if the target object is not detected in the image, the output position is empty.

Step S30: Determine a control instruction for controlling the rotation of the smart mobile device based on the distance between the position of the target object and the center position of the image, wherein the control instruction is used for making the position of the target object located The center position of the image, and the control instruction includes a rotation instruction corresponding to an offset value in an offset sequence constituting the distance, and the offset sequence includes at least one offset value.

In some embodiments of the present application, when the position of the target object in the image is obtained, the smart mobile device can be controlled to move according to the position, so that the target object can be located at the center of the collected image, thereby realizing the target object Tracking. Among them, the embodiment of the present application can obtain a control instruction for controlling the rotation of the smart mobile device according to the distance between the position of the target object in the image and the center position of the image, so that the position of the target object can be located at the center of the currently collected image . Wherein, the control instruction may include rotation instructions respectively corresponding to at least one offset value, wherein the distance between the position of the target object and the center position of the image can be determined according to the offset sequence corresponding to the at least one offset value. For example, it is determined that the sum of the offset values is the aforementioned distance value. Among them, the distance in the embodiment of the present application can be a directed distance (such as a direction vector), and the offset value can also be a direction vector. The direction vector corresponding to the distance can be obtained by adding the direction vector corresponding to each offset value, that is, by The rotation instructions corresponding to each offset value can realize the offset of each offset value, and finally make the target object located in the center of the currently collected image. In the case that the target object remains stationary, the target object may always be located at the center of the captured image from the moment when the next image of the current image is captured. If the target object is moving, because the embodiment of the application can quickly adjust the rotation of the smart mobile device to the position of the target object in the previous image, so that the target object is in the center of the collected image, even when the target object is moving, It is also possible to track and shoot the target object so that the target object is in the frame of the collected image.

In some embodiments of the present application, the embodiment of the present application may use a reinforcement learning algorithm to execute the planning of the rotation path of the smart mobile device, and obtain a control instruction for positioning the target object in the center of the image. The control instruction may be determined based on the reinforcement learning algorithm The control instructions corresponding to the optimal movement plan. In one example, the reinforcement learning algorithm may be a value learning algorithm (Q-learning algorithm).

Through the reinforcement learning algorithm, the movement path of the smart mobile device is optimized and determined, and the control instructions corresponding to the optimal movement path are obtained in the comprehensive evaluation of the movement time, the convenience of the movement path, and the energy consumption of the smart mobile device.

Based on the above configuration, the embodiment of the present application can conveniently and accurately realize real-time tracking of the target object, and control the rotation of the smart mobile device according to the position of the target object, so that the target object is located in the center of the collected image. The control instruction of the smart mobile device can be obtained according to the distance between the position of the target object in the image and the center position of the image. The control instruction is used to control the rotation of the smart mobile device, and the obtained control instruction includes at least one offset value corresponding The distance between the offset sequence formed by each offset value and the target object and the center of the image is determined. The obtained control command can enable the rotated target object to be in the center of the captured image, thereby making the target The object is within the tracking range of the smart mobile device. The embodiment of the present application can perform target tracking according to the position of the target object in real time, and has the characteristics of being more convenient, accurate, and improving the performance of the smart mobile device.

The following provides a detailed description of the embodiments of the present application in conjunction with the drawings.

As described in the foregoing embodiment, the embodiment of the present application may perform target detection processing on the image when the image is collected. In the embodiments of the present application, since the specifications, types, and other parameters of the collected images may be different, it is possible to perform preprocessing operations on the images before performing target detection processing on the images to obtain a normalized image.

Before determining the position of the target object in the image, the method further includes performing a preprocessing operation on the image. FIG. 2 is a schematic diagram of the process of performing preprocessing on the image provided by an embodiment of the application, as shown in FIG. 2 , The preprocessing operation includes:

Step S11: Adjust the image to a grayscale image of a preset specification.

In some embodiments of the present application, the captured image may be a color image or another form of image, and the captured image may be converted into an image of a preset specification, and then the image of the preset specification may be converted into a grayscale image. Alternatively, it is also possible to first convert the collected image into a grayscale image, and then convert the grayscale image into a form of preset specifications. The preset specification may be 640*480, but it is not a specific limitation of the embodiment of the present application. Converting color images or other forms of images into grayscale images can be based on the processing of pixel values. For example, the pixel value of each pixel can be divided by the maximum pixel value, and the corresponding grayscale value can be obtained based on the result. It is only illustrative, and the embodiment of the present application does not specifically limit the process.

Since a large amount of resources may be consumed when processing color pictures or other forms of images, but the form of the picture has little influence on the detection effect, the embodiment of the present application converts the image into a grayscale image and directly converts the picture into a grayscale picture. Then send it to the network model for testing, which can reduce resource consumption and increase processing speed.

Step S12: Perform normalization processing on the grayscale image.

In the case of obtaining a grayscale image, normalization processing can be performed on the grayscale image. Through normalization processing, the pixel values of the image can be normalized to the same scale range. Wherein, the normalization processing may include: determining the average value and standard deviation of the pixel value of each pixel in the grayscale image; determining the difference between the pixel value of the pixel and the average value; The ratio between the difference and the standard deviation corresponding to each pixel is determined as the normalized pixel value of the pixel.

The images collected in the embodiment of the present application may be multiple or one. In the case of one image, the obtained grayscale image is also one. Then for the pixel value (gray value) corresponding to each pixel in the grayscale image, the average value and standard deviation corresponding to the pixel value of each pixel can be obtained. Then, the ratio between the difference between each pixel and the average value and the standard deviation can be updated to the pixel value of the pixel.

In addition, when there are multiple images collected, multiple grayscale images can be obtained correspondingly. The average value and standard deviation of the pixel values of the multiple grayscale images can be determined by the pixel value of each pixel in the multiple grayscale images. That is, the average value and standard deviation of the embodiment of the present application may be for one image or for multiple images. In the case of obtaining the average value and standard deviation of the pixel value of each pixel of multiple images, the difference between the pixel value of each pixel of each image and the average value can be obtained, and then the difference between the difference and the average value can be obtained. Use this ratio to update the pixel value of the pixel.

Through the above method, the pixel value of each pixel in the grayscale image can be unified to the same scale, and the normalization processing of the collected image can be realized.

The foregoing is an exemplary description of the manner in which the embodiment of the present application performs pre-processing. In other embodiments, the pre-processing may also be performed in other manners. For example, it is possible to only perform conversion of an image into a preset specification, and perform normalization processing on an image of the preset specification. That is, the embodiment of the present application may also perform normalization processing of color images. Among them, the average value and standard deviation of the feature value of each channel of each pixel in the color image can be obtained, for example, the average value of the feature value (R value) of the red (Red, R) channel of each pixel of the image can be obtained Sum standard deviation, the mean and standard deviation of the characteristic value (G value) of the green (Green, G) channel, and the mean and standard deviation of the characteristic value (B value) of the blue (Blue, B) channel. Then, according to the ratio of the difference between the eigenvalue of the corresponding color channel and the average value and the standard deviation, the new eigenvalue of the corresponding color channel is obtained. In this way, the updated feature value of the color channel corresponding to each pixel of each image is obtained, and then a normalized image is obtained.

By performing preprocessing on the image, the embodiments of the present application can be applied to different types of images and images of different scales during implementation, thereby improving the applicability of the embodiments of the present application.

After performing preprocessing on the image, you can also perform target detection processing on the preprocessed image to obtain the position of the target object in the preprocessed image, and then based on the correspondence between the preprocessed image and the pixel position in the unpreprocessed image Relationship, the position of the target object in the image is obtained, that is, the position of the target object in the original collected image can be obtained according to the position of the target object after preprocessing. The following only takes the execution of target detection processing on the collected image as an example for description, the process of performing target detection on the preprocessed image is the same as that, and the description is not repeated here.

FIG. 3 is a schematic flowchart of step S20 in a target tracking method according to an embodiment of the application. As shown in FIG. 3, the determining the position of the target object in the image includes:

Step S201: Extract image features of the image;

In some embodiments of the present application, the image features of the image can be extracted first, for example, the image features can be obtained by convolution processing. As described above, the target detection processing can be realized by a neural network, where the neural network can include a feature extraction module and a classification Module, the feature extraction module may include at least one convolutional layer, and may also include a pooling layer. The feature extraction module can extract the features of the image. In other embodiments, the feature extraction process may also be performed in the structure of the residual network to obtain image features, which is not specifically limited in the embodiment of the present application.

Step S202: Perform classification processing on the image features to obtain the location area of the target object in the image.

In some embodiments of the present application, classification processing can be performed on image features. For example, the classification module performing the classification processing can include a fully connected layer, and the detection result of the target object in the image, that is, the location area of the target object, is obtained through the fully connected layer. The location area of the target object in the embodiments of the present application can be expressed in the form of coordinates, such as the location coordinates of the two vertex corners of the detection frame corresponding to the location area of the detected target object, or the location coordinates of a vertex, And the height or width of the detection frame. Through the above, the location area of the target object can be obtained. In other words, the result of the classification process in the embodiment of the present application may include whether there is an object of the target type in the image, that is, the target object, and the location area of the target object. The first identifier and the second identifier can be used to identify whether there is an object of the target type, and to indicate the location area where the target object is located in the form of coordinates. For example, the first identifier can be 1, indicating that there is a target object, on the contrary, the second identifier can be 0, indicating that there is no target object, (x1, x2, y1, y2) are the horizontal lines corresponding to the two vertices of the detection frame. The ordinate value.

Step S203: Determine the center position of the position area as the position of the target object.

In some embodiments of the present application, the center position of the detected position area of the target object may be determined as the position of the target object. The average value of the coordinate values of the four vertices of the location area where the target object is located can be taken to obtain the coordinates of the center position, and then the coordinates of the center position are determined as the position of the target object.

Among them, in an example, the target object can be a face, and the target detection process can be a face detection process, that is, the location area where the face is located in the image can be detected, and the person can be obtained according to the center of the location area where the detected face is located. The position of the face. Then perform target tracking for the face.

Through the foregoing implementation manners, the embodiments of the present application can obtain the position of the target object with high accuracy, and improve the accuracy of target tracking.

In addition, in some embodiments of the present application, the above-mentioned preprocessing and target detection process can be performed by the management device of the smart mobile device. In the embodiments of the present application, the management device may be a Raspberry Pi chip. , Raspberry Pi chip has high scalability and high processing speed.

In some embodiments of the present application, the obtained information about the location of the target object, etc., may be transmitted to the control terminal of the smart mobile device to obtain the control instruction. The transmission of the detection result of the execution target object in the embodiment of the present application may be encapsulated and transmitted according to a preset data format. The detection result indicates the position of the target object in the image. Among them, the data corresponding to the detection result of the transmission can be 80 bytes, and it can include mode flags, detection result information, cyclic redundancy (CRC) check, retransmission threshold, control field, and optional Field. The mode flag bit can indicate the current working mode of the Raspberry Pi chip, the detection result information can be the position of the target object, the CRC check bit is used for security verification, the retransmission threshold is used to indicate the maximum number of retransmissions of data, and the control field Used to indicate the desired working mode of the smart mobile device. The optional field is the information that can be added.

When the position of the target object in the image is obtained, the path planning process of the smart mobile device can be executed to obtain the control instruction for controlling the smart mobile device. FIG. 4 is a schematic flowchart of step S30 in a target tracking method provided by an embodiment of the application. As shown in FIG. 4, step S30 can be implemented through the following steps:

Step S301: Determine a target offset based on the distance between the position of the target object in the image and the center position of the image;

In some embodiments of the present application, when tracking the target object in the embodiments of the present application, the position of the target object can be maintained at the center of the image, and the tracking of the target object can be achieved in this way. Therefore, in the embodiment of the present application, when the position of the target object is obtained, the distance between the position of the target object and the center position of the image can be detected, and the distance is used as the target offset. The Euclidean distance between the coordinates of the position of the target object and the coordinates of the center position of the image can be used as the target offset. The distance can also be expressed in the form of a vector, for example, it can be expressed as a directed vector between the center position of the image and the position of the target object, that is, the obtained target offset may include the distance between the position of the target object and the center position of the image , Can also include the direction of the center of the image relative to the position of the target object.

Step S302: Generate multiple sets of offset sequences based on the target offset, the offset sequences include at least one offset value, and the sum of the offset values in each set of offset sequences is the target offset Shift

In some embodiments of the present application, the embodiment of the present application may generate multiple sets of offset sequences according to the obtained target offset, the offset sequence includes at least one offset value, and the sum of the at least one offset value Is the target offset. For example, if the position of the target object is (100, 0) and the position of the image center is (50, 0), the target offset is 50 on the x-axis. In order to achieve the target offset, multiple offset sequences can be generated. For example, the offset value of the first offset sequence is 10, 20, and 20, and the offset value of the second offset sequence can be 10, 25 And 15, where the direction of each offset value can be the positive direction of the x-axis. In the same way, multiple sets of offset sequences corresponding to the target offset can be obtained.

In a possible implementation, the number of offset values in the generated multiple sets of offset sequences may be set, for example, it may be 3, but it is not a specific limitation in the embodiment of the present application. In addition, the method of generating multiple sets of offset sequences may be a method of randomly generating. In practice, there may be multiple combinations of offset values in the offset sequence that can achieve the target offset. The embodiment of the present application may randomly select a preset number of combinations from the multiple combinations, that is, the preset number. The offset sequence.

Step S303: Using a reinforcement learning algorithm, select an offset sequence that meets the requirements from the multiple sets of offset sequences, and obtain a control instruction corresponding to the offset sequence that meets the requirements.

In some embodiments of the present application, when the generated offset sequence is obtained, a reinforcement learning algorithm may be used to select an offset sequence that meets the requirements. Among them, the reinforcement learning algorithm can be used to obtain the total value corresponding to the offset sequence, and the offset sequence with the highest total value is determined as the offset sequence that meets the requirements.

Fig. 5 is a schematic flow chart of step S303 in a target tracking method provided by an embodiment of the application. As shown in Fig. 5, step S303, “the use of the reinforcement learning algorithm, selects from the multiple sets of offset sequences that meet the requirements And obtain the control instruction corresponding to the offset sequence that meets the requirements" may include:

Step S3031: For each offset value in the multiple sets of offset sequences, determine the maximum value corresponding to the offset value in the value table, and the value table includes the value corresponding to the offset value under different rotation commands;

In some embodiments of the present application, the reinforcement learning algorithm may be a value learning algorithm (Q-learning algorithm), and the corresponding value table (Q-table) may indicate the value corresponding to different offset values under different rotation instructions ( quality). Rotation instructions refer to instructions that control the rotation of smart mobile devices, which can include parameters such as motor rotation angle, motor speed, and motor rotation time. The value table in the embodiment of the present application may be a value table obtained through intensive chemical learning in advance, wherein the parameters of the value table can be accurately distinguished and reflected in the case of different offset values, and the values corresponding to different rotation commands. For example, Table 1 shows at least a part of the parameters of the rotation command, and Table 2 shows a schematic table of the value table. The horizontal parameters a1, a2, and a3 are different rotation commands, and the vertical parameters s1, s2, and s3 are different. The parameter in the table indicates the value of the corresponding offset value and the corresponding rotation command. The value can represent the value of the corresponding rotation command under the corresponding offset value. Generally, the larger the value, the higher the value, indicating that the value of the target tracking through the command is higher.

Table 1. Part of the rotation parameter table corresponding to the rotation command

动作action	值value
电机转速Motor speed	0-10000-1000
电机转动角度Motor rotation angle	0-3600-360
电机转动时间Motor rotation time	～~
电机停止动作Motor stop action	保持、中断Hold, interrupt

Table 2. Value table corresponding to rotation parameters

To	a1a1	a2a2	a3a3
s1s1	11	22	33
s2s2	11	11	22
s3s3	44	22	11

As described in the foregoing embodiment, each offset sequence may include multiple offset values, and the embodiment of the present application may determine the maximum value corresponding to each offset value in each sequence based on the value table. For example, for the offset value s1, the maximum value is 3, for the offset value s2, the maximum value is 2, and for the offset value s3, the maximum value is 4. The foregoing is only an exemplary description, and the obtained value may be different for different value tables, which is not specifically limited in the embodiment of the present application.

Step S3032: Obtain the reward value corresponding to the offset value, and determine the final value of the offset value based on the reward value corresponding to the offset value and the maximum value, wherein the reward value is The distance between the position of the target object and the center position of the image when the rotation instruction corresponding to the offset value is not executed;

In some embodiments of the present application, the reward value of each offset value in the offset sequence can be obtained, where the reward value is related to the position of the target object when the corresponding offset value is not executed. For example, for the first offset value of each offset sequence, if the rotation instruction corresponding to the offset value is not executed, the position of the target object is the initially detected position of the target object in the image. For other offset values in the offset sequence, the position of the target object may be assumed after the rotation instruction corresponding to the maximum value of the offset value before the offset value is executed. For example, assuming that the position of the target object in the detected image is (100, 0), the obtained offset sequence that satisfies the condition may be 20, 15, and 15. For the first offset value, the reward value of the first offset value can be determined by the position (100, 0) of the target object. For the second offset value, the position of the target object can be determined to be (120, 0). At this time, the reward value of the second offset value can be determined based on the position, and when the third offset value is executed, It is determined that the position of the target object is (135, 0), and the reward value of the third offset value can be determined based on this position.

In an example, the expression for obtaining the reward value can be as shown in formula (1-1):

R(s,a)=(s(x)-b) ² +(s(y)-c) ² (1-1);

Among them, R(s,a) is the reward value of the maximum value of the rotation instruction a corresponding to the offset value s, that is, the reward value corresponding to the offset value s, and s(x) and s(y) are respectively the unexecuted offset The value corresponds to the abscissa and ordinate of the position of the target object when the rotation instruction a of the maximum value corresponds, and b and c represent the abscissa and ordinate of the center position of the image, respectively.

In the case of obtaining the reward value and the maximum value corresponding to the offset value, the final value of the offset value can be determined according to the reward value corresponding to the offset value and the maximum value corresponding to the offset value. For example, the weighted sum of the reward value and the maximum value can be used to determine the final value. Wherein, the expression for determining the final value of the offset value in the embodiment of the present application may be as shown in formula (1-2):

Q'(s,a)=R(s,a)+r·max{Q(s,a)}·0.2·0.5 (1-2);

Among them, Q'(s,a) is the final value corresponding to the offset value s, R(s,a) is the reward value of the maximum value rotation instruction a corresponding to the offset value s, max{Q(s,a) } Is the maximum value corresponding to the offset value s.

Through the above method, the final value corresponding to each offset value can be obtained.

Step S3033: Determine the offset sequence with the largest sum of the final value as the offset sequence that meets the requirements.

In some embodiments of the present application, the final value of each offset value in the offset sequence may be summed to obtain the total value corresponding to the offset sequence. Then select the offset sequence with the largest total value as the offset sequence that meets the requirements.

Through the above method, the offset sequence with the largest total value can be obtained, and the maximum total value indicates that the rotation instruction corresponding to the rotation path corresponding to the offset sequence is the optimal choice.

After the offset sequence that meets the requirements is obtained, the control instruction can be combined and generated according to the rotation instruction corresponding to the maximum value corresponding to each offset value in the value table in the offset sequence. The control instruction can then be transmitted to the smart mobile device, so that the smart mobile device performs a rotation operation according to the control instruction.

In some embodiments of the present application, the smart mobile device can be controlled to move according to the generated control instruction. Among them, the control command may include parameters such as the rotation angle and direction of the motor, or may also include control commands such as the motor speed, the motor rotation time, whether to stop or not.

The embodiment of the present application may control the movement of the mobile device by means of differential steering. For example, the smart mobile device may be a smart mobile vehicle, which may include two left and right drive wheels. The embodiment of the present application may control the left and right drive wheels based on control instructions. Rotation speed realizes steering and movement. When the driving wheels rotate at different speeds, the body will rotate even if there is no steering wheel or the steering wheel does not move. In the embodiment of the present application, the difference in the rotational speed of the two driving wheels can be realized by operating two separate clutches or braking devices installed on the left and right half shafts.

The intelligent mobile device can realize different rotation trajectories according to the different rotation speed and rotation angle of the left and right driving wheels. Under different rotation trajectories, the pictures collected by the car are different, and then through continuous optimization, the position of the intelligent mobile car is adjusted to ensure that the target object is in the center of the image to achieve the tracking of the target object.

In addition, in some embodiments of the present application, the forward or backward movement of the smart mobile device can also be determined according to the size of the detected target object. FIG. 6 is a schematic diagram of another process of a target tracking method provided by an embodiment of the application. As shown in FIG. 6, the target tracking method further includes:

Step S41: Determine a control instruction for controlling the movement of the smart mobile device based on the location area of the target object, wherein it can be determined whether the area of the location area of the target object is within the range between the first threshold and the second threshold Inside. In the process of performing step S20 in the embodiment of the present application, the location area of the target object in the collected image can be obtained, and the embodiment of the present application can control the moving direction of the smart mobile device according to the area of the location area.

Wherein, the area of the location area can be determined according to the obtained location area of the target object, and the area can be compared with the first threshold and the second threshold. The first threshold and the second threshold may be preset reference thresholds, the first threshold is greater than the second threshold, and the embodiment of the present application does not limit specific values.

Step S42: in the case that the area corresponding to the location area of the target object is greater than the first threshold, generate a control instruction for controlling the backing of the smart mobile device;

In the embodiment of the present application, when the area of the detected location area of the target object is greater than the first threshold, it indicates that the distance between the target object and the smart mobile device is relatively short, and the smart mobile device can be moved backward at this time. Wherein, a control instruction for controlling the backing of the smart mobile device can be generated until the area of the detected location area of the target object is less than the first threshold and greater than the second threshold.

Step S43: In a case where the area corresponding to the location area of the target object is smaller than a second threshold, generate a control instruction for controlling the advancement of the smart mobile device, where the first threshold is greater than the second threshold.

In the embodiment of the present application, when the area of the detected location area of the target object is smaller than the second threshold, it means that the distance between the target object and the smart mobile device is far, and the smart mobile device can be moved forward at this time. Wherein, a control instruction for controlling the advancement of the smart mobile device can be generated until the area of the detected location area of the target object is less than the first threshold and greater than the second threshold.

Correspondingly, the smart mobile device can perform a forward or backward operation according to the received forward or backward control instruction.

Through the above method, the movement of the smart mobile device can be controlled according to the size of the target object, and the area corresponding to the location area of the detected target object (such as a human face) can be kept between the second threshold and the first threshold to realize the smart mobile device Control of the moving direction.

The application body of the target tracking method in the embodiment of the present application may be a smart mobile device, or may also be a device installed in the smart mobile device, and the device is used to control the movement of the smart mobile device. In the following, the intelligent mobile device to which the target tracking method of the embodiment of the present application is applied is an educational robot, the management device of the educational robot is a Raspberry Pi, and the target object is a human face as an example for description, to clearly embody the embodiments of the present application. FIG. 7 is an application example diagram of a target tracking method provided by an embodiment of the application, in which the camera 701 is connected to the raspberry pi 702 to transmit the image or video collected by the camera 701 to the raspberry pi 702, wherein the camera 701 The Raspberry Pi 702 can be connected to the Raspberry Pi 702 through a Universal Serial Bus (USB) port for data transmission, but this connection method is not limited to this embodiment of the application. The following process can then be performed.

1. Raspberry Pi image acquisition and image preprocessing.

The application field of the embodiment of the present application may be an intelligent robot in an educational background, and the intelligent robot may realize the functions of face detection and tracking. Among them, the Raspberry Pi 702 can perform image processing, the Raspberry Pi 702 in the embodiment of the present application can perform image preprocessing and target detection processing, and the Raspberry Pi can be integrated with a target detection network. Since the types of images collected by the camera 701 are not the same, the Raspberry Pi 702 needs to perform necessary preprocessing on the image data before transmitting the images to the target detection network model.

The preprocessing process includes the following four parts. Fig. 8 is a schematic flow chart of the preprocessing process provided by an embodiment of the application, as shown in Fig. 8, including:

Step S51: Receive the collected video data.

Step S52: Framing the video data into picture data.

Step S53: unify the picture size.

Step S54: Convert the picture into a grayscale image.

Step S55: Normalize the picture.

Image framing refers to decomposing the collected video data into one frame of images, and then unifying the image size to a size range of 640*480. Since color images consume a lot of resources during processing, but have little impact on the detection effect, the embodiment of the present application ignores color features, directly converts the image to a grayscale image, and sends it to the target detection network for detection. Finally, for the convenience of image processing, the image is normalized, which is to subtract the average value of each dimension from the original data of each dimension of the image data, replace the original data with the result, and then divide the data of each dimension With the standard deviation of each dimension data, the image data can be normalized to the same scale.

2. Realize face detection based on deep neural network model.

Input: The camera 701 collects the picture.

Output: Face detection coordinate position.

In the embodiment of the application, the target detection network in the Raspberry Pi 702 can perform face recognition and detection in the image, that is, the embodiment of the application can use the deep learning technology to realize the face detection technology, where the deep learning technology realizes the face detection The technology is divided into two stages: model training and model application. FIG. 9 is a schematic diagram of the training process of the target detection network provided in an embodiment of the application. As shown in FIG. 9, the training process includes:

Step S61: Collect a face data set picture.

The face data set pictures include face pictures of various ages and various regions, and the face pictures are manually labeled to obtain the coordinate positions of the faces. Construct a face data set and divide the data set into three parts: training set, test set and verification set.

Step S62, construct a neural network model.

In actual implementation, step S62 can be implemented through the following steps:

In step S621, feature extraction is achieved by superimposing the convolutional layer and the pooling layer.

Step S622: Use a classifier to classify the extracted features.

In implementation, classification can be achieved through a fully connected layer (classifier).

Step S63, training the neural network model.

Model training is achieved through a series of gradient optimization algorithms. After a large number of iterative training, a trained model can be obtained for model testing.

In step S64, a trained neural network model is obtained.

In the embodiment of the present application, the training process of the model is the training process of the target detection network (neural network model).

FIG. 10 is a schematic diagram of the application process of the target detection network provided by an embodiment of the application. As shown in FIG. 10, the application process includes:

Step S71: Collect a face picture.

In step S72, the preprocessed picture is sent to the trained model.

Step S73, obtain the coordinate position of the face.

In the embodiment of this application, the pre-processed picture is sent to the trained model, and the coordinate position of the face in the picture can be output after forward calculation.

3. Send the detection result to the educational robot EV3 (same as the intelligent robot in the above embodiment).

Through the foregoing embodiment, the face coordinate position detection can be completed by the Raspberry Pi 702, and then the face coordinate position can be encapsulated into a data packet according to a defined communication protocol specification. After the data encapsulation is completed, it is sent to the processor or controller in the smart mobile device 703 through the serial port, where the smart mobile device 703 can be an educational robot EV3, and then the smart mobile device 703 can complete subsequent faces according to the received face position track.

4. EV3 performs path planning according to the coordinates of the face position.

The educational robot EV3 receives and analyzes the data packet sent from the Raspberry Pi 702 side to obtain the coordinate position of the face, and then complete the path planning. Among them, reinforcement learning algorithms can be used to realize path planning. Reinforcement learning mainly includes state, reward and action factors. Among them, the state is the coordinate position of the face detected each time, the reward can be defined as the Euclidean distance between the center of the face and the center of the picture, and the action is the motor motion instruction executed each time. In the educational robot EV3, the motor motion can be controlled as shown in Table 1. . Through the neural Q-learning algorithm model, path planning can be performed. The Q function is defined as follows. Input includes state and action, and returns the reward value for performing an action in a specific state.

FIG. 11 is a schematic flowchart of a path planning algorithm based on reinforcement learning provided by an embodiment of the application, as shown in FIG. 11, including:

Step S81, initialize the Q value table.

Step S82, selecting a specific motor from the action set to execute the command.

Step S83, execute a specific motor execution instruction.

Step S84: Calculate the Q value table of this state.

Step S85, update the Q value table.

The action set of the educational robot EV3 is shown in Table 1. The state set uses the face coordinates to determine the tracking effect, that is, the distance between the face position and the center of the picture is used as the reward function, and the Q value table is updated by measuring the reward function of different actions. Finally, you can get The optimal Q value table pair, the Q value table pair contains the best action sequence, that is, the motor executes the command.

5. The smart mobile device 703 implements face tracking according to the motion instructions (same as the control instructions in the above embodiments).

Smart mobile devices such as educational robots use a differential steering mechanism, and the trolley realizes steering by controlling the speed of the left and right driving

wheels

704 and 705. When the driving wheels rotate at different speeds, the body will rotate even if there is no steering wheel or the steering wheel does not move. The difference in the speed of the driving wheels can be realized by operating two separate clutches or braking devices mounted on the left and right axles.

The smart mobile device 703 can realize different rotation trajectories according to different rotation speeds and rotation angles of the left and right wheels. Under different rotation trajectories, the pictures collected by the car are different, and then continuously optimize the action, adjust the position of the car, and finally ensure that the face position is in the center of the picture to realize the face tracking function.

In addition, the smart mobile device in the embodiment of the present application may also be provided with a sensor 706, such as a distance sensor, a touch sensor, etc., for sensing related information of the surrounding environment of the smart mobile device 703, and can be based on the sensed surroundings The related information of the environment controls the working mode and movement parameters of the smart mobile device 703.

The foregoing is merely illustrative and is not intended as a specific limitation of the embodiments of the present application.

In summary, the target tracking method provided by the embodiments of the present application can obtain the position of the target object in the collected image, and obtain the control instruction of the smart mobile device according to the distance between the position of the target object and the image center. The control instruction is used to adjust the rotation angle of the smart mobile device. The obtained control instruction includes at least one rotation instruction corresponding to an offset value, wherein the distance between the offset sequence formed by each offset value and the target object and the image center is determined, The obtained control instruction can enable the rotated target object to be in the center of the collected image, so that the target object is within the tracking range of the smart mobile device. The embodiment of the present application can perform target tracking according to the position of the target object in real time, and has the characteristics of being more convenient, accurate, and improving the performance of the smart mobile device.

In addition, the embodiments of the present application can use deep learning technology to complete face detection (using neural networks to achieve target detection), which has significantly improved accuracy and speed compared to traditional target detection methods. In the embodiment of the present application, a reinforcement learning algorithm may also be used to perform path planning through Q-learning technology, and the optimal rotation path may be selected. The embodiments of the present application can also be adapted to the requirements of different scenarios and have good scalability.

Those skilled in the art can understand that in the above methods of the specific implementation, the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possibility. The inner logic is determined.

It can be understood that the various method embodiments mentioned in this application can be combined with each other to form a combined embodiment without violating the principle logic.

In addition, the embodiments of the present application also provide target tracking devices, smart mobile devices, computer-readable storage media, and programs, all of which can be used to implement any target tracking method provided in the embodiments of the present application. For the corresponding technical solutions and descriptions, see Corresponding records in the method section.

FIG. 12 is a schematic structural diagram of a target tracking device provided by an embodiment of the application. As shown in FIG. 12, the target tracking device includes:

The image acquisition module 10 is configured to acquire images;

The target detection module 20 is configured to determine the position of the target object in the image;

The control module 30 is configured to determine a control instruction for controlling the rotation of the smart mobile device based on the distance between the position of the target object and the center position of the image, wherein the control instruction is used to make the target object The position is located at the center of the image, and the control instruction includes a control instruction corresponding to an offset value in an offset sequence constituting the distance, and the offset sequence includes at least one offset value.

In some embodiments of the present application, the device further includes a preprocessing module configured to perform a preprocessing operation on the image, and the preprocessing operation includes: adjusting the image to a grayscale image of a preset specification , And perform normalization processing on the grayscale image;

The target detection module is further configured to perform target detection processing on the image obtained after the preprocessing operation to obtain the position of the target object in the image after the preprocessing operation;

Determine the position of the target object in the image based on the position of the target object in the image after the preprocessing operation.

In some embodiments of the present application, the step of performing the normalization processing on the grayscale image by the preprocessing module includes:

Determining the average value and standard deviation of the pixel value of each pixel in the grayscale image;

Obtaining the difference between the pixel value of each pixel and the average value;

The ratio between the difference and the standard deviation corresponding to each pixel is determined as the normalized pixel value of the pixel.

In some embodiments of the present application, the target detection module is further configured to extract image features of the image;

Perform classification processing on the image features to obtain the location area of the target object in the image;

The center position of the position area is determined as the position of the target object.

In some embodiments of the present application, the target object includes a human face;

Correspondingly, the target detection module is further configured to determine the position of the human face in the image.

In some embodiments of the present application, the control module is further configured to determine the target offset based on the distance between the position of the target object in the image and the center position of the image;

Generating multiple sets of offset sequences based on the target offset, and the sum of the offset values in each set of offset sequences is the target offset;

Using a reinforcement learning algorithm, select an offset sequence that meets the requirements from the multiple sets of offset sequences, and obtain a control instruction corresponding to the offset sequence that meets the requirements.

In some embodiments of the present application, the control module is further configured to determine the maximum value corresponding to the offset value in the value table for each offset value in the multiple sets of offset sequences, and the value table includes The offset value corresponds to the value under different rotation commands;

The reward value corresponding to the offset value is obtained, and the final value of the offset value is determined based on the reward value and the maximum value corresponding to the offset value, and the reward value is In the case of a rotation instruction corresponding to the maximum value of the offset value, the distance between the position of the target object and the image center;

The offset sequence that has the largest sum of the final value of the offset values in the multiple sets of offset sequences is determined as the offset sequence that meets the requirements.

In some embodiments of the present application, the target detection module is further configured to determine a control instruction for controlling the movement of the smart mobile device based on the location area of the target object, wherein:

In the case that the area corresponding to the location area of the target object is greater than the first threshold, generating a control instruction for controlling the smart mobile device to move back;

In a case where the area corresponding to the location area of the target object is smaller than a second threshold, a control instruction for controlling the advancement of the smart mobile device is generated, and the first threshold is greater than the second threshold.

In addition, an embodiment of the present application also provides a smart mobile device that includes the target tracking device described in the above embodiment, and the target detection network in the target tracking device is integrated in the management device of the smart mobile device, Execute the target detection processing of the image collected by the image collection module by the management device to obtain the position of the target object;

The control module is connected to the management device, and is used to generate the control instruction according to the position of the target object obtained by the management device, and control the rotation of the smart mobile device according to the control instruction.

In some embodiments of the present application, the management device is a Raspberry Pi.

In some embodiments of the present application, the management device is also integrated with the preprocessing module of the target tracking device to be configured to perform preprocessing operations on the images and perform target detection on the images after the preprocessing operations Processing to obtain the position of the target object in the image.

In some embodiments, the functions or modules included in the apparatus provided in the embodiments of the present application can be configured to execute the methods described in the above method embodiments, and for specific implementation, refer to the description of the above method embodiments.

The embodiment of the present application also proposes a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the foregoing method when executed by a processor. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

An embodiment of the present application also proposes an intelligent mobile device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the above method.

FIG. 13 is a schematic structural diagram of a smart mobile device provided by an embodiment of this application. For example, the smart mobile device 800 may be any device capable of performing image processing or a mobile device capable of performing target tracking.

13, the device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (Input Output, I/O) interface 812, a sensor Component 814, and communication component 816.

The processing component 802 generally controls the overall operations of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support the operation of the device 800. Examples of these data include instructions for any application or method operating on the device 800, contact data, phone book data, messages, pictures, videos, etc. The memory 804 can be implemented by any type of volatile or non-volatile storage devices or their combination, such as static random access memory (Static Random-Access Memory, SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory, EEPROM, Erasable Programmable Read-Only Memory (Electrical Programmable Read Only Memory, EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory) , ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The power supply component 806 provides power for various components of the device 800. The power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (Touch Pad, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC). When the device 800 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive external audio signals. The received audio signal may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include but are not limited to: home button, volume button, start button, and lock button.

The sensor component 814 includes one or more sensors for providing the device 800 with various aspects of status assessment. For example, the sensor component 814 can detect the on/off status of the device 800 and the relative positioning of components, such as the display and keypad of the device 800, and the sensor component 814 can also detect the position change of the device 800 or a component of the device 800 , The presence or absence of contact between the user and the device 800, the orientation or acceleration/deceleration of the device 800, and the temperature change of the device 800. The sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact. The sensor component 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the device 800 and other devices. The device 800 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module can be based on radio frequency identification (RFID) technology, infrared data association (Infrared Data Association, IrDA) technology, ultra wideband (UWB) technology, Bluetooth (BT) technology and other technologies. Technology to achieve.

In an exemplary embodiment, the device 800 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processor (Digital Signal Processor, DSP), and digital signal processing device (Digital Signal Process, DSPD). ), programmable logic device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor, or other electronic components to implement the above methods.

In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the device 800 to implement the foregoing methods.

The embodiments of the application may be systems, methods and/or computer program products. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the embodiments of the present application.

The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples of computer-readable storage media (non-exhaustive list) include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (Digital Video Disc, DVD), memory stick, floppy disk, mechanical encoding device, such as storage on it Commanded punch cards or protruding structures in the grooves, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.

The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .

The computer program instructions used to perform the operations of the embodiments of the present application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or one or more programming Source code or object code written in any combination of languages, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network-including Local Area Network (LAN) or Wide Area Network (WAN)-or it can be connected to an external computer (such as Use an Internet service provider to connect via the Internet). In some embodiments, the electronic circuit is customized by using the state information of the computer-readable program instructions, such as programmable logic circuit, Field Programmable Gate Array (FPGA) or Programmable Logic Array (Programmable Logic). Array, PLA), the electronic circuit can execute computer-readable program instructions to implement various aspects of the embodiments of the present application.

Here, various aspects of the embodiments of the present application are described with reference to the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products according to the embodiments of the present application. It should be understood that each block of the flowcharts and/or block diagrams and combinations of blocks in the flowcharts and/or block diagrams can be implemented by computer-readable program instructions.

These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine such that when these instructions are executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner, so that the computer-readable medium storing instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagram.

It is also possible to load computer-readable program instructions onto a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , So that the instructions executed on the computer, other programmable data processing apparatus, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowcharts and block diagrams in the drawings show the possible implementation of the system architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more functions for implementing the specified logical function. Executable instructions. In some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.

The embodiments of the present application have been described above, and the above description is exemplary and not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the market of the embodiments, or to enable other ordinary skilled in the art to understand the embodiments disclosed herein.

Industrial applicability

The embodiment of the application discloses a target tracking method and device, a smart mobile device, and a storage medium. The method includes: acquiring a captured image; determining the location of a target object in the image; and based on the location of the target object and The distance between the center positions of the image obtains a control instruction used to control a smart mobile device, wherein the control instruction is used to make the position of the target object be located at the center position of the image, and the control instruction includes The rotation instruction corresponding to the offset value in the offset sequence constituting the distance, and the offset sequence includes at least one offset value. The embodiments of the present application can realize real-time tracking of target objects.

Claims

A target tracking method includes:

Obtain the collected images;

Determining the position of the target object in the image;

Based on the distance between the position of the target object and the center position of the image, a control instruction for controlling the rotation of the smart mobile device is determined, wherein the control instruction is used to make the position of the target object in the image The control instruction includes a rotation instruction corresponding to an offset value in an offset sequence constituting the distance, and the offset sequence includes at least one offset value.
The method according to claim 1, before determining the position of the target object in the image, the method further comprises performing a preprocessing operation on the image, and the preprocessing operation comprises: adjusting the image to a preprocessing operation. Set a grayscale image of the specification, and perform normalization processing on the grayscale image;

Wherein, the determining the position of the target object in the image includes:

Performing target detection processing on the image obtained after the preprocessing operation to obtain the position of the target object in the image after the preprocessing operation;

Determine the position of the target object in the image based on the position of the target object in the image after the preprocessing operation.
The method according to claim 2, wherein the performing normalization processing on the grayscale image comprises:

Determining the average value and standard deviation of the pixel value of each pixel in the grayscale image;

Obtaining the difference between the pixel value of each pixel and the average value;

The ratio between the difference and the standard deviation corresponding to each pixel is determined as the normalized pixel value of each pixel.
The method according to any one of claims 1 to 3, said determining the position of the target object in the image comprises:

Extract image features of the image;

Perform classification processing on the image features to obtain the location area of the target object in the image;

The center position of the position area is determined as the position of the target object.
The method according to any one of claims 1 to 4, the target object includes a human face;

Correspondingly, the determining the position of the target object in the image includes: determining the position of the human face in the image.
The method according to any one of claims 1 to 5, wherein the determining a control instruction for controlling the rotation of a smart mobile device based on the distance between the position of the target object and the center position of the image includes:

Determine the target offset based on the distance between the position of the target object in the image and the center position of the image;

Generating multiple sets of offset sequences based on the target offset, and the sum of the offset values in each set of offset sequences is the target offset;

Using a reinforcement learning algorithm, select an offset sequence that meets the requirements from the multiple sets of offset sequences, and determine the control instruction corresponding to the offset sequence that meets the requirements.
The method according to claim 6, said using a reinforcement learning algorithm to select an offset sequence that meets the requirements from the multiple sets of offset sequences, comprising:

For each offset value in the multiple sets of offset sequences, determine the maximum value corresponding to the offset value in the value table, the value table including the value corresponding to the offset value under different rotation instructions;

The reward value corresponding to the offset value is obtained, and the final value of the offset value is determined based on the reward value and the maximum value corresponding to the offset value, and the reward value is In the case of a rotation instruction corresponding to the maximum value of the offset value, the distance between the position of the target object and the center position of the image;

The offset sequence that has the largest sum of the final value of the offset values in the multiple sets of offset sequences is determined as the offset sequence that meets the requirements.
The method according to claim 6 or 7, wherein the determining the control instruction corresponding to the offset sequence that meets the requirement includes:

The control instruction is determined based on the rotation instruction corresponding to the maximum value of each offset value in the offset sequence that meets the requirements.
The method according to any one of claims 1 to 8, the method further comprising:

Drive the smart mobile device to perform rotation based on the control instruction.
The method according to claim 4, further comprising:

Based on the location area of the target object, a control instruction for controlling the movement of the smart mobile device is determined, wherein:

In response to the area corresponding to the location area of the target object being greater than the first threshold, generating a control instruction for controlling the backing of the smart mobile device;

In response to the area corresponding to the location area of the target object being smaller than a second threshold, a control instruction for controlling the advancement of the smart mobile device is generated, where the first threshold is greater than the second threshold.
A target tracking device includes:

Image acquisition module, which is configured to acquire images;

A target detection module configured to determine the position of the target object in the image;

The control module is configured to determine a control instruction for controlling the rotation of the smart mobile device based on the distance between the position of the target object and the center position of the image, wherein the control instruction is used to make the target object The position of is located at the center position of the image, and the control instruction includes a rotation instruction corresponding to an offset value in an offset sequence constituting the distance, and the offset sequence includes at least one offset value.
The device according to claim 11, the device further comprising a preprocessing module configured to perform a preprocessing operation on the image, the preprocessing operation comprising: adjusting the image to a grayscale image of a preset specification , And perform normalization processing on the grayscale image;

The target detection module is further configured to perform target detection processing on the image obtained after the preprocessing operation to obtain the position of the target object in the image after the preprocessing operation;

Determine the position of the target object in the image based on the position of the target object in the image after the preprocessing operation.
The device according to claim 12, wherein the step of performing the normalization processing on the grayscale image by the preprocessing module comprises:

Determining the average value and standard deviation of the pixel value of each pixel in the grayscale image;

Obtaining the difference between the pixel value of each pixel and the average value;

The ratio between the difference and the standard deviation corresponding to each pixel is determined as the normalized pixel value of each pixel.
The device according to any one of claims 11 to 13, wherein the target detection module is further configured to extract image features of the image;

Perform classification processing on the image features to obtain the location area of the target object in the image;

The center position of the position area is determined as the position of the target object.
The device according to any one of claims 11 to 14, wherein the target object includes a human face;

Correspondingly, the target detection module is further configured to determine the position of the human face in the image.
The device according to any one of claims 11 to 15, wherein the control module is further configured to determine the target offset based on the distance between the position of the target object in the image and the center position of the image;

Generating multiple sets of offset sequences based on the target offset, and the sum of the offset values in each set of offset sequences is the target offset;

Using a reinforcement learning algorithm, select an offset sequence that meets the requirements from the multiple sets of offset sequences, and obtain a control instruction corresponding to the offset sequence that meets the requirements.
The device according to claim 16, wherein the control module is further configured to determine the maximum value corresponding to the offset value in the value table for each offset value in the multiple sets of offset sequences, the value table comprising The corresponding value of the offset value under different rotation commands;

The reward value corresponding to the offset value is obtained, and the final value of the offset value is determined based on the reward value and the maximum value corresponding to the offset value, and the reward value is In the case of a rotation instruction corresponding to the maximum value of the offset value, the distance between the position of the target object and the center position of the image;

The offset sequence that has the largest sum of the final value of the offset values in the multiple sets of offset sequences is determined as the offset sequence that meets the requirements.
According to the device of claim 16 or 17, the control module is further configured to determine the control instruction based on the rotation instruction corresponding to the maximum value of each offset value in the offset sequence that meets the requirements.
The apparatus according to claim 14, wherein the target detection module is further configured to determine a control instruction for controlling the movement of the smart mobile device based on the location area of the target object, wherein:

In the case that the area corresponding to the location area of the target object is greater than the first threshold, generating a control instruction for controlling the smart mobile device to back up;

In a case where the area corresponding to the location area of the target object is smaller than a second threshold, a control instruction for controlling the advancement of the smart mobile device is generated, and the first threshold is greater than the second threshold.
An intelligent mobile device, comprising: the target tracking device according to any one of claims 11 to 19,

The target detection module in the target tracking device is integrated in a management device of a smart mobile device, and the management device executes target detection processing of images collected by the image collection module to obtain the position of the target object;

The control module is connected to the management device and is configured to generate the control instruction according to the position of the target object obtained by the management device, and control the rotation of the smart mobile device according to the control instruction.
The device according to claim 20, wherein the management device is further integrated with a preprocessing module of the target tracking device to be configured to perform a preprocessing operation on the image, and perform target detection on the image after the preprocessing operation Processing to obtain the position of the target object in the image.
The device according to claim 20 or 21, the smart mobile device comprises an educational robot.
A smart mobile device, including:

processor;

A memory configured to store executable instructions of the processor;

Wherein, the processor is configured to call instructions stored in the memory to execute the method according to any one of claims 1 to 10.
A computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the method according to any one of claims 1 to 10 is realized.
A computer program, comprising computer readable code, when the computer readable code runs in a smart mobile device, the processor in the smart mobile device executes for realizing any one of claims 1 to 10 The method described.