CN114862915A

CN114862915A - Target object tracking method and device, electronic equipment and storage medium

Info

Publication number: CN114862915A
Application number: CN202210604535.4A
Authority: CN
Inventors: 陈子亮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-08-05

Abstract

The present disclosure provides a target object tracking method, which relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, image processing, computer vision, etc., and can be applied to the scenes of object recognition, object detection, etc. The specific implementation scheme is as follows: determining an offset value and a first processing parameter according to a first detection frame of a first video frame and a second detection frame of a second video frame; determining a second processing parameter according to the offset value and the first processing parameter; determining an updating detection frame according to the second processing parameter, the first detection frame and the second detection frame; and tracking the target according to the updated detection frame. The present disclosure also provides a target object tracking apparatus, an electronic device, and a storage medium.

Description

Target object tracking method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular, to the field of deep learning, image processing, and computer vision technologies, which can be applied to object recognition, object detection, and other scenes. More specifically, the present disclosure provides a target object tracking method, apparatus, electronic device, and storage medium.

Background

With the development of artificial intelligence technology, deep learning models are widely applied to scenes such as target tracking, object recognition and the like.

Disclosure of Invention

The disclosure provides a target object tracking method, a target object tracking device and a storage medium.

According to an aspect of the present disclosure, there is provided a target object tracking method, including: determining an offset value and a first processing parameter according to a first detection frame of a first video frame and a second detection frame of a second video frame; determining a second processing parameter according to the offset value and the first processing parameter; determining an updating detection frame according to the second processing parameter, the first detection frame and the second detection frame; and tracking the target according to the updated detection frame.

According to another aspect of the present disclosure, there is provided a target object tracking apparatus, the apparatus including: the first determining module is used for determining an offset value and a first processing parameter according to a first detection frame of a first video frame and a second detection frame of a second video frame; a second determining module, configured to determine a second processing parameter according to the offset value and the first processing parameter; the third determining module is used for determining the updating detection frame according to the second processing parameter, the first detection frame and the second detection frame; and the target tracking module is used for tracking the target according to the updated detection frame.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method provided according to the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an exemplary system architecture to which the target object tracking method and apparatus may be applied, according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of a target object tracking method according to one embodiment of the present disclosure;

fig. 3A is a schematic diagram of a first video frame according to one embodiment of the present disclosure;

fig. 3B is a schematic diagram of a second video frame, according to one embodiment of the present disclosure;

FIG. 4 is a block diagram of a target object tracking device according to one embodiment of the present disclosure; and

fig. 5 is a block diagram of an electronic device to which a target object tracking method may be applied according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the fields of security protection or video monitoring and the like, a target object tracking task can be executed. For example, in a sequence of video frames, multiple detection boxes may be determined for the same target object in multiple video frames. Next, by associating a plurality of detection frames, target tracking can be performed.

In the target tracking process, the situation that a target detection model is unstable and the difference between continuous video frames is large can exist. Under the conditions, the detection frame output by the target detection model is small and large, so that the effect of shaking of the detection frame appears visually, and further the user experience is poor.

In order to avoid the frame jitter, the anti-jitter processing may be performed based on information of two video frames, or may be performed using information of a plurality of video frames.

For example, an anti-shake method based on two video frames performs object detection on two consecutive video frames, resulting in two detection frames. And carrying out averaging operation according to the position information of the two detection frames to obtain an average detection frame of the two frames, wherein the average detection frame is used as an output detection frame for tracking the target object. When the same target object changes obviously between different frames, the method has difficulty in ensuring the stability of the detection frame fully because only two frames of information are used. In addition, in the case where the position of the target object in two consecutive video frames is significantly changed, the output detection frame may not match the position where the target object is located, and a distortion problem occurs.

For another example, an anti-shake method based on a plurality of video frames performs object detection on a plurality of consecutive video frames to obtain a plurality of detection frames. And carrying out averaging operation according to the position information of the plurality of detection frames to obtain an average detection frame of a plurality of frames as an output detection frame for tracking the target object. The method makes use of information of a previous plurality of video frames. In the case of a violent movement of the target object, the distortion problem can be alleviated, and the jitter between detection frames can also be reduced. However, because information of a plurality of video frames is used, the overall memory overhead and algorithm complexity of the method are high.

FIG. 1 is a schematic diagram of an exemplary system architecture to which the target object tracking method and apparatus may be applied, according to one embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the target object tracking method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the target object tracking device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The target object tracking method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the target object tracking apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

FIG. 2 is a flow diagram of a target object tracking method according to one embodiment of the present disclosure.

As shown in fig. 2, the method 200 may include operations S210 to S240.

In operation S210, an offset value and a first processing parameter are determined according to a first detection frame of a first video frame and a second detection frame of a second video frame.

For example, the first video frame and the second video frame may be two adjacent video frames.

For example, by using an object detection model to perform object detection on the first video frame and the second video frame, respectively, a first detection frame bbox2 and a second detection frame bbox2 can be obtained. In one example, the width of the first detection frame bbox1 is, for example, width 1. The height of the first detection box may be height1, for example. In one example, the width of the second detection frame bbox2 is, for example, width 2. The height of the second detection box may be height2, for example.

In one example, various operations may be performed according to the width of the first detection frame bbox1 and the width2 of the second detection frame, and the first processing parameter may be obtained. The various operations may include, for example: summing, weighted summing, averaging, and the like.

For example, the offset value may characterize an offset between the first detection frame and the second detection frame.

In one example, the distance between the first detection box and the second detection box may be taken as an offset value.

In operation S220, a second process parameter is determined according to the offset value and the first process parameter.

For example, the second processing parameter can be obtained by performing various operations based on the offset value and the first processing parameter. The various operations may include, for example: summation, weighted summation, averaging, division, multiplication, and the like.

In operation S230, an update detection frame is determined according to the second processing parameter, the first detection frame, and the second detection frame.

For example, the position, length, and width of the first detection frame or the second detection frame are adjusted according to the second processing parameter. And taking the adjusted detection frame as an updated detection frame.

In operation S240, target tracking is performed according to the update detection frame.

For example, one of the second detection frame and the first detection frame is replaced with an update detection frame so as to perform target tracking on the target object. In one example, the second detection block may be replaced with an update detection block for target tracking.

By the embodiment of the disclosure, under the condition of using the information of two video frames, the number of the video frames required for determining the updated detection frame is reduced, and the resource (such as memory) overhead is reduced. Further, according to the offset between the detection frames, the second processing parameter is determined so that the second processing parameter can embody the moving speed of the target object. In the case where the target object moves rapidly, the update detection frame can be accurately determined for target tracking. In addition, through the embodiment of the disclosure, the update detection frame can be accurately determined under the condition that the target object moves rapidly or slowly.

In some embodiments, such as some implementations of operation 210 of method 200, determining the offset value and the first processing parameter from the first detection box of the first video frame and the second detection box of the second video frame includes: and determining the offset value according to the preset point coordinates of the first detection frame and the preset point coordinates of the second detection frame.

For example, the preset point may be any point on the detection frame, for example. In one example, the preset point may be a center point of the detection frame or a vertex of the detection frame. The vertices of the detection box include, for example, the top left vertex, the top right vertex, and so on.

For example, the preset point coordinates of the first detection frame bbox1 may be (x1, y1), for example. The preset point coordinates of the second detection box bbox2 may be, for example, (x2, y 2). The distance between two preset point coordinates may be used as an offset value. In one example, the offset Value may be determined by the following equation:

Value＝sqrt((x2-x1) ² +(y2-y1) ² ) (formula one)

sqrt () may be used to represent a square-on operation.

In some embodiments, such as some implementations of operation 210 of method 200, determining the offset value and the first processing parameter from the first detection box of the first video frame and the second detection box of the second video frame includes: and determining a first processing parameter according to the preset side length weight, the side length of the first detection frame and the side length of the second detection frame.

For example, the side length includes at least one of: the side length of the side representing the height direction and the side length of the side representing the width direction.

For example, the side lengths characterizing the sides in the width direction may include, for example: the width1 of the first detection frame bbox1 described above, and the width2 of the second detection frame bbox2 described above. In one example, the first processing parameter Tmp _ width may be determined by the following equation:

tmp _ width ═ α _ w width1+ β _ w width2 (equation two)

α _ w and β _ w are preset width weights, which can be used as the preset side length weights described above. In one example, α _ w is 0.3 and β _ w is 0.7.

For another example, the side lengths of the sides characterizing the height direction may include, for example: height1 of the first detection box bbox1 described above, height2 of the second detection box bbox2 described above. In one example, the first processing parameter Tmp _ height may be determined by the following equation:

tmp _ height ═ α _ h height1+ β _ h height2 (equation three)

α _ h and β _ h are preset height weights, which can be used as the preset side length weights described above. In one example, α _ h is 0.3 and β _ h is 0.7.

By the embodiment of the disclosure, a larger weight is set for the side length of the second detection frame, and the method can be applied to a scene in which a target object moves. The method is particularly suitable for scenes in which the target object moves rapidly, so that the position of the target object can be indicated more accurately by the updating detection frame, and target tracking is facilitated.

In some embodiments, in some implementations, such as operation S220 of the method 200, determining the second processing parameter from the offset value and the first processing parameter includes: determining a ratio between the offset value and the first processing parameter; and determining a second processing parameter according to the preset processing parameter and the ratio.

For example, the offset value may be used as a numerator and the first processing parameter as a denominator to determine a ratio therebetween. In one example, the ratio Scale1 may be determined by the following equation:

scale1 ═ Value/Tmp _ width (equation four)

For another example, the preset processing parameter may be a number greater than 0 and less than 1. In one example, the preset processing parameter may be, for example, 0.5.

For another example, the second processing parameter may be obtained by performing various operations according to the preset processing parameter and the ratio. The various operations may include, for example: summing, weighted summing, averaging, and the like. In one example, the second process parameter Momentum2 may be determined by the following formula:

momentum2 ═ Momentum1+ Scale1 (formula five)

Momentum1 is a preset processing parameter.

In other embodiments, determining the second process parameter based on the preset process parameter and the ratio may include: and taking the larger value between the ratio and the first preset value as the ratio after the first treatment. A difference between the second predetermined value and the predetermined processing parameter is determined. And taking the smaller value between the difference value and the first processed ratio as a second processed ratio. And determining a second processing parameter according to the second processed ratio and the preset processing parameter.

For example, the ratio may be determined by equation four, described above.

For another example, the first predetermined value may be 0, and the second predetermined value may be 1.

For another example, based on the ratio and the preset processing parameter, the second processing parameter may be determined by the following formula:

scale2 ═ max (0, Scale1) (formula six)

Scale3 ═ min (Scale2, 1-Momentum1) (formula seven)

Momentum2 ═ Momentum1+ Scale3 (formula eight)

Scale1 is the ratio, Scale2 is the first processed ratio, Scale3 is the second processed ratio, Momentum1 is the preset processing parameter, Momentum2 is the second processing parameter.

By the embodiment of the disclosure, the offset value is scaled based on the first processing parameter, so that the method provided by the disclosure can be applied to a scene in which the target object moves stably and can also be applied to a scene in which the target object moves rapidly.

In some embodiments, in some implementations, such as operation S230 of the method 200, determining the update detection box based on the second processing parameter, the first detection box, and the second detection box includes: determining a difference between the second processing parameter and a preset value as a first weight; determining a second processing parameter as a second weight; and determining an updated detection frame according to the first weight, the second weight, the first detection frame and the second detection frame.

For example, the preset value may be 1. In one example, the first weight may be, for example, 1-Momentum 2.

As another example, the second weight may be Momentum 2.

In an embodiment of the present disclosure, determining to update the detection box according to the first weight, the second weight, the first detection box, and the second detection box includes: obtaining a first weighted coordinate according to the preset point coordinate of the first detection frame and the first weight; obtaining a second weighted coordinate according to the preset point coordinate of the second detection frame and the second weight; and determining the preset point coordinates of the updated detection frame according to the first weighted coordinates and the second weighted coordinates.

For example, as described above, the preset point coordinates of the first detection box bbox1 may be (x1, y1), for example. The preset point coordinates of the second detection box bbox2 may be, for example, (x2, y 2).

For example, the preset point coordinates (x, y) of the update detection box can be determined by the following formula:

x ═ 1-Momentum2 × 1+ Momentum2 × 2 (formula nine)

y ═ 1-Momentum2 ═ y1+ Momentum2 × y2 (formula ten)

(1-Momentum2) × 1, (1-Momentum2) × y1) may be used as the first weighted coordinates. (Momentum2 x2, Momentum2 y2) may be used as the second weighted coordinate.

In an embodiment of the present disclosure, determining to update the detection box according to the first weight, the second weight, the first detection box, and the second detection box includes: determining a first weighted side length according to the side length of the first detection frame and the first weight; determining a second weighted side length according to the side length of the second detection frame and the second weight; and determining the side length of the updated detection frame according to the first weighted side length and the second weighted side length.

For example, the side lengths characterizing the sides in the width direction may include, for example: the width1 of the first detection frame bbox1 described above, and the width2 of the second detection frame bbox2 described above. In one example, the width of the update detection frame may be determined by the following formula:

(1-Momentum2) width1+ Momentum2 width2 (formula nine)

(1-Momentum2) width1 is the first weighted width, and Momentum2 is the second weighted width 2.

For example, the side lengths of the sides characterizing the height direction may include, for example: height1 of the first detection box bbox1 described above, height2 of the second detection box bbox2 described above. In one example, the height of the updated detection box may be determined by the following formula:

height (1-Momentum2) height1+ Momentum2 height2 (formula nine)

(1-Momentum2) height1 is the first weighted height, Momentum2 height2 is the second weighted height.

According to the embodiment of the disclosure, the weighted summation is performed according to the information of the first detection frame and the information of the second detection frame, and the weight used in the weighted summation is related to the moving speed of the target object, so that the accurate updated detection frame can be output even in a scene where the target object moves rapidly.

It is to be understood that the manner in which the update detection block is determined is described in detail above, and the principles of the method provided by the present disclosure will be described in detail below in conjunction with the first video frame and the second video frame.

Fig. 3A is a schematic diagram of a first video frame, according to one embodiment of the present disclosure.

The image capture device captures a video. The video includes a sequence of video frames. The sequence of video frames includes a plurality of video frames.

As shown in fig. 3A, for example, a target object 301 is contained in a first video frame 310. The first detection block 302 may be obtained by performing object detection on the first video frame using an object detection model.

Fig. 3B is a schematic diagram of a second video frame, according to one embodiment of the present disclosure.

As shown in fig. 3B, a second video frame 320 is captured after the first video frame. And performing target detection on the second video frame by using the target detection model to obtain a second detection frame.

As shown in fig. 3A and 3B, in the case where the movement of the target object 301 is fast, the update detection block 303 may be determined by the method 200 described above for target tracking. As shown in fig. 3B, the difference between the side length (height or width) of the updated detection frame 303 and the side length of the first detection frame 302 is small, which is visually stable, and the detection frame jitter phenomenon is alleviated.

It is to be understood that the target object described above may be a human, an animal, an object, or the like.

FIG. 4 is a block diagram of a target object tracking device according to one embodiment of the present disclosure.

As shown in fig. 4, the apparatus 400 may include a first determination module 410, a second determination module 420, a third determination module 430, and a target tracking module 440.

The first determining module 410 is configured to determine an offset value and a first processing parameter according to a first detection frame of a first video frame and a second detection frame of a second video frame.

A second determining module 420, configured to determine a second processing parameter according to the offset value and the first processing parameter.

And a third determining module 430, configured to determine an updated detection frame according to the second processing parameter, the first detection frame, and the second detection frame.

And the target tracking module 440 is configured to perform target tracking according to the updated detection frame.

In some embodiments, the second determining module comprises: a first determining submodule for determining a ratio between the offset value and the first processing parameter; and the second determining submodule is used for determining a second processing parameter according to the preset processing parameter and the ratio.

In some embodiments, the first determining module comprises: and the third determining submodule is used for determining the offset value according to the preset point coordinates of the first detection frame and the preset point coordinates of the second detection frame.

In some embodiments, the first determining module comprises: the fourth determining submodule is used for determining the first processing parameter according to the preset side length weight, the side length of the first detection frame and the side length of the second detection frame, wherein the side length comprises at least one of the following: the side length of the side representing the height direction and the side length of the side representing the width direction.

In some embodiments, the third determining module comprises: a fifth determining submodule, configured to determine a difference between the second processing parameter and a preset value as a first weight; a sixth determining submodule for determining the second processing parameter as the second weight; and a seventh determining submodule, configured to determine to update the detection frame according to the first weight, the second weight, the first detection frame, and the second detection frame.

In some embodiments, the seventh determination submodule includes: the first determining unit is used for obtaining a first weighted coordinate according to the preset point coordinate of the first detection frame and the first weight; the second determining unit is used for obtaining a second weighted coordinate according to the preset point coordinate of the second detection frame and the second weight; and a third determining unit, configured to determine the preset point coordinates of the updated detection frame according to the first weighted coordinates and the second weighted coordinates.

In some embodiments, the seventh determination submodule includes: a fourth determining unit, configured to determine the first weighted side length according to the side length of the first detection frame and the first weight; a fifth determining unit, configured to determine a second weighted side length according to the side length of the second detection frame and the second weight; and a sixth determining unit, configured to determine, according to the first weighted side length and the second weighted side length, a side length of the updated detection frame, where the side length includes at least one of: the side length of the side representing the height direction and the side length of the side representing the width direction.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 executes the respective methods and processes described above, such as the target object tracking method. For example, in some embodiments, the target object tracking method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the target object tracking method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the target object tracking method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A target object tracking method, comprising:

determining an offset value and a first processing parameter according to a first detection frame of a first video frame and a second detection frame of a second video frame;

determining a second processing parameter according to the offset value and the first processing parameter;

determining an updating detection frame according to the second processing parameter, the first detection frame and the second detection frame; and

and tracking the target according to the updating detection frame.

2. The method of claim 1, wherein the determining a second processing parameter from the offset value and the first processing parameter comprises:

determining a ratio between the offset value and the first processing parameter; and

and determining the second processing parameter according to a preset processing parameter and the ratio.

3. The method of claim 1, wherein the determining an offset value and a first processing parameter from a first detection box of a first video frame and a second detection box of a second video frame comprises:

and determining the offset value according to the preset point coordinates of the first detection frame and the preset point coordinates of the second detection frame.

4. The method of claim 1, wherein the determining an offset value and a first processing parameter from a first detection box of a first video frame and a second detection box of a second video frame comprises:

determining the first processing parameter according to a preset side length weight, the side length of the first detection frame and the side length of the second detection frame,

wherein the side length comprises at least one of: the side length of the side representing the height direction and the side length of the side representing the width direction.

5. The method of claim 1, wherein the determining an update detection box from the second processing parameter, the first detection box, and the second detection box comprises:

determining a difference between the second processing parameter and a preset value as a first weight;

determining the second processing parameter as a second weight; and

and determining the updating detection frame according to the first weight, the second weight, the first detection frame and the second detection frame.

6. The method of claim 5, wherein the determining the update detection box according to the first weight, the second weight, the first detection box, and the second detection box comprises:

obtaining a first weighted coordinate according to the preset point coordinate of the first detection frame and the first weight;

obtaining a second weighted coordinate according to the preset point coordinate of the second detection frame and the second weight; and

and determining the preset point coordinate of the updated detection frame according to the first weighted coordinate and the second weighted coordinate.

7. The method of claim 5, wherein the determining the update detection box according to the first weight, the second weight, the first detection box, and the second detection box comprises:

determining a first weighted side length according to the side length of the first detection frame and the first weight;

determining a second weighted side length according to the side length of the second detection frame and the second weight; and

determining the side length of the updated detection frame according to the first weighted side length and the second weighted side length,

8. A target object tracking apparatus, comprising:

the first determining module is used for determining an offset value and a first processing parameter according to a first detection frame of a first video frame and a second detection frame of a second video frame;

a second determining module, configured to determine a second processing parameter according to the offset value and the first processing parameter;

a third determining module, configured to determine an update detection frame according to the second processing parameter, the first detection frame, and the second detection frame; and

and the target tracking module is used for tracking the target according to the updated detection frame.

9. The apparatus of claim 8, wherein the second determining means comprises:

a first determining sub-module for determining a ratio between the offset value and the first processing parameter; and

and the second determining submodule is used for determining the second processing parameter according to a preset processing parameter and the ratio.

10. The apparatus of claim 8, wherein the first determining means comprises:

and the third determining submodule is used for determining the offset value according to the preset point coordinate of the first detection frame and the preset point coordinate of the second detection frame.

11. The apparatus of claim 8, wherein the first determining means comprises:

a fourth determining submodule, configured to determine the first processing parameter according to a preset side length weight, the side length of the first detection frame, and the side length of the second detection frame,

12. The apparatus of claim 8, wherein the third determining means comprises:

a fifth determining submodule, configured to determine a difference between the second processing parameter and a preset value as a first weight;

a sixth determining submodule for determining the second processing parameter as a second weight; and

a seventh determining submodule, configured to determine the updated detection frame according to the first weight, the second weight, the first detection frame, and the second detection frame.

13. The apparatus of claim 12, wherein the seventh determination submodule comprises:

the first determining unit is used for obtaining a first weighted coordinate according to the preset point coordinate of the first detection frame and the first weight;

the second determining unit is used for obtaining a second weighted coordinate according to the preset point coordinate of the second detection frame and the second weight; and

and the third determining unit is used for determining the preset point coordinates of the updated detection frame according to the first weighted coordinates and the second weighted coordinates.

14. The apparatus of claim 12, wherein the seventh determination submodule comprises:

a fourth determining unit, configured to determine a first weighted side length according to the side length of the first detection frame and the first weight;

a fifth determining unit, configured to determine a second weighted side length according to the side length of the second detection frame and the second weight; and

a sixth determining unit configured to determine a side length of the updated detection box according to the first weighted side length and the second weighted side length,

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.