CN111601013B

CN111601013B - Method and apparatus for processing video frames

Info

Publication number: CN111601013B
Application number: CN202010472966.0A
Authority: CN
Inventors: 贾金让
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-03-31
Anticipated expiration: 2040-05-29
Also published as: CN111601013A

Abstract

The application discloses a method and a device for processing video frames, and relates to the technical field of intelligent traffic. The specific implementation mode comprises the following steps: detecting a target object in each video frame of a video frame sequence, and acquiring a detection frame position set of the target object according to the position of a detection frame in each video frame; in the detection frame position set, acquiring the position of a detection frame in a previous video frame and the position of a detection frame in a subsequent video frame in the video frame sequence, and respectively taking the positions as a first target position and a second target position; determining a distance between the first target location and the second target location; based on the distance, a state of the target object is determined. According to the method and the device, the positions of the detection frames in each video frame can be comprehensively collected aiming at the target object by utilizing the position set of the detection frames. And the positions respectively detected in the previous video frame and the later video frame in the video frame sequence are utilized to accurately determine the state of the target object, so that the accuracy of road side or vehicle end perception in vehicle-road cooperation is improved.

Description

Method and apparatus for processing video frames

Technical Field

The embodiment of the application relates to the technical field of intelligent traffic, in particular to a method and a device for processing video frames.

Background

In the fields of roadside perception, intelligent transportation and the like, a camera can be adopted to shoot a live-action scene, so that a live-action picture is obtained to be analyzed and processed. Specifically, the camera can be installed in the light pole department and obtain the picture, or the picture that directly adopts city surveillance camera head to shoot.

In many application scenarios, it is necessary to determine whether a detected object is still or moving in a picture taken by a camera, for example, zombie car detection, queuing vehicle determination, queuing number determination, and the like. However, in the current detection algorithm for images, since each frame of image changes, such as a vehicle moves, illumination changes, and the like, the detection result of the same stationary object is unstable, which affects the judgment of the object state, thereby improving the difficulty of the judgment.

Disclosure of Invention

A method, an apparatus, an electronic device, and a storage medium for processing a video frame are provided.

According to a first aspect, there is provided a method for processing video frames, comprising: storing the positions of detection frames detected for the target object in each video frame of the video frame sequence into a detection frame position set of the target object; in the detection frame position set, acquiring the position of a detection frame in a previous video frame and the position of a detection frame in a subsequent video frame in the video frame sequence, and respectively taking the positions as a first target position and a second target position; determining a distance between the first target location and the second target location; based on the distance, a state of the target object is determined, wherein the state includes moving or stationary.

According to a second aspect, there is provided an apparatus for processing video frames, comprising: the detection unit is configured to detect a target object in each video frame of a video frame sequence to obtain the position of a detection frame in each video frame; a set obtaining unit configured to obtain a set of detection frame positions of the target object according to positions of the detection frames in the respective video frames; an acquisition unit configured to acquire, in the set of detection frame positions, a position of a detection frame in a preceding video frame and a position of a detection frame in a following video frame in the sequence of video frames as a first target position and a second target position, respectively; a distance determination unit configured to determine a distance between the first target position and the second target position; a state determination unit configured to determine a state of the target object based on the distance, wherein the state includes moving or stationary.

According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device to store one or more programs that, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method for processing video frames.

According to a fourth aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the method as any one of the embodiments of the method for processing video frames.

According to the scheme of the application, the positions of the detection frames in each video frame can be comprehensively collected aiming at the target object by utilizing the position set of the detection frames. And accurately determining the state of the target object by using the positions respectively detected in the previous video frame and the later video frame in the video frame sequence.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing video frames according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for processing video frames according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing video frames according to the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for processing video frames according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a method for processing video frames according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for processing video frames or apparatus for processing video frames may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on the received data such as the video frame sequence, and feed back a processing result (e.g., a state of the target object) to the terminal device.

It should be noted that the method for processing video frames provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, and 103, and accordingly, the apparatus for processing video frames may be disposed in the server 105 or the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing video frames in accordance with the present application is shown. The method for processing video frames comprises the following steps:

step 201, detecting a target object in each video frame of the video frame sequence to obtain a position of a detection frame in each video frame.

In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the method for processing the video frames operates may detect a target object included in each video frame in the sequence of video frames, so as to obtain a position of a detected detection frame. Specifically, the target object is detected, that is, the position of a detection box (bounding box) in which the target object is detected. The video frame sequence may include a target object in each frame, or may include a target object in a partial video frame.

The position of the detection box indicates the position of the target object. Specifically, the position of the detection frame may be (x, y, w, h), where x and y are respectively the abscissa and ordinate of a specified point of the detection frame, where the specified point may be the center point or the top left corner vertex of the detection frame, and so on. Where w and h may be the width and height of the detection box. Further, the position of the detection frame may also be represented by the coordinates of the vertices of the four corners of the detection frame or the coordinates of the vertices of the diagonal line.

The target object may be various objects or obstacles. For example, the obstacle may be a person, a car, an animal, etc.

Step 202, obtaining a detection frame position set of the target object according to the positions of the detection frames in each video frame.

In this embodiment, the execution body described above may obtain a set of detection frame positions of the target object from the detection frame positions to be detected in the video frame. Specifically, the execution body may obtain the set in various ways. For example, the execution body may store the positions of the detected detection frames into the detection frame position set of the target object, and use the stored detection frame position set as the obtained set. The stored set of detection frame positions is the set established for the target object. In addition, the execution subject may further select, from the detected positions of the detection frames, positions of a part of the detection frames, for example, positions of detection frames detected in every other frame of the video frame sequence, and form the selected positions into a detection frame position set.

Step 203, in the position set of the detection frame, acquiring the position of the detection frame in the previous video frame and the position of the detection frame in the subsequent video frame in the sequence of the video frames as the first target position and the second target position, respectively.

In this embodiment, the execution main body may acquire the position of the detection frame of the preceding video frame and the position of the detection frame of the following video frame in the detection frame position set. Specifically, the preceding video frame and the following video frame are both video frames in the video frame sequence. In the sequence of video frames, the preceding video frame is a video frame that precedes the succeeding video frame. In practice, the preceding video frame and the following video frame may be various video frames. For example, the preceding video frame and the following video frame may be the first video frame and the last video frame in the sequence of video frames (in the case where the number of positions of the detection frame in the set of detection frame positions does not reach the target number). Further, the preceding video frame and the following video frame may also be video frames such as the 2 nd frame and the 10 th frame located in the middle in the sequence of video frames.

Step 204, determining a distance between the first target position and the second target position.

In this embodiment, the execution subject may determine a distance between the first target position and the second target position, that is, a distance between positions of two detection frames. In practice, the distance between the positions of the two detection frames may be the distance between the specified points of the two detection frames.

Based on the distance, a state of the target object is determined, step 205, wherein the state includes moving or stationary.

In this embodiment, the execution subject may determine the state of the target object based on the distance, that is, determine whether the target object is moving or stationary in capturing the sequence of video frames.

In practice, the executing body may determine the state of the target object in various ways. For example, the executing entity may directly compare the distance with a preset distance threshold, so as to obtain the state of the target object. For example, the execution subject may determine that the target object moves when the distance reaches a preset distance threshold, and determine that the target object is stationary when the distance does not reach the preset distance threshold.

The method provided by the above embodiment of the present application may utilize the detection frame position set to comprehensively collect the positions of the detection frames in each video frame for the target object. And the positions respectively detected in the previous video frame and the later video frame in the video frame sequence are utilized to accurately determine the state of the target object, so that the accuracy of road side or vehicle end perception in vehicle-road cooperation is improved.

In some optional implementations of this embodiment, step 203 may include: in response to the number of positions of the detection box in the set of detection box positions reaching the target number, performing the steps of: acquiring the positions of the target object in the detection frames of at least two previous video frames respectively in the detection frame position set, determining an average value to obtain a first position average value, and taking the first position average value as a first target position; acquiring the positions of the target object in the detection frames of at least two subsequent video frames respectively in the detection frame position set, determining an average value to obtain a second position average value, and taking the second position average value as a second target position; wherein, in the sequence of video frames, a number of video frames that are spaced between a preceding video frame and a following video frame is greater than a second number threshold.

In these alternative implementations, the number of preceding video frames and the number of following video frames may be at least two. Here, among the at least two preceding video frames and the at least two following video frames, each video frame is a different video frame. The execution subject described above may determine an average value of the positions of the detection frames detected in the respective previous video frames, respectively, and regard the average value as the first target position. Also, the execution subject may determine an average value of the positions of the detection frames detected in the respective subsequent video frames, and regard the average value as the second target position.

The executing body may execute the step of determining the average value as the first target position and the second target position only in a case where the number of positions of the detection frame in the detection frame position set reaches a target number.

In practice, the at least two preceding video frames and the at least two following video frames may be separated by a greater number of video frames. For example, the at least two previous video frames may be a specified number of video frames, such as first through fifth frames, in the sequence of video frames including the first frame. The at least two subsequent video frames may be a specified number of video frames in the sequence of video frames including the last frame, such as the last frame to the fifth last frame. In the present application, both the at least two preceding video frames and the at least two following video frames may be consecutive video frames.

These alternative implementations may eliminate the problem of inaccurate state determination caused by picture jitter in the video to a certain extent by using the average value of the positions of the detection frames of at least two previous video frames and the average value of the positions of the detection frames of at least two subsequent video frames when the number of the positions of the detection frames in the detection frame position set reaches the target number. And a more accurate state determination result can be obtained by using a preceding video frame and a succeeding video frame which are far apart.

In some optional implementation manners of this embodiment, the position set of the detection frames is a queue formed by positions of the detection frames, the number of the positions of the detection frames in the queue does not exceed a preset number, and the positions of the detection frames in the queue are arranged according to a sequence of video frames corresponding to the detection frames in the sequence of the video frames.

In these alternative implementations, the queues may be various queues, such as a first-in-first-out queue. There is an upper limit, i.e. a preset number, such as 20, to the number of positions of the detection boxes included in the queue. When the number of positions in the queue reaches the preset number, if a new position is detected for the target object, the position needs to be stored in the queue, and the position of the detection frame stored first in the queue (for example, the position of the detection frame of the target object in the first target video frame) needs to be removed from the queue. In the sequence of the video frames, the video frames are arranged according to the sequence, and the positions of the detection frames detected for the target object in the video frames are arranged according to the sequence. The video frame corresponding to the detection frame refers to the position of the detection frame detected in the video frame.

These alternative implementations may employ queues to accurately arrange the positions of the detection frames in the order of the video frames and limit the number of positions included in the queues to ensure real-time performance of the positions in the queues, thereby improving the accuracy of determining the state of the target object.

In some optional application scenarios of these implementations, the sequence of video frames includes a first target video frame and a second target video frame subsequent to the first target video frame. The method may further comprise: in response to detecting a new object in the first target video frame that is not detected in a video frame prior to the first target video frame, treating the new object as a target object, and establishing a queue of positions of detection frames of the target object; step 202 may include: storing the position of a detection frame of a target object in a first target video frame into a queue; in response to detecting the target object in the second target video frame, the position of the detection frame of the target object in the second target video frame is stored in the queue.

In these optional application scenarios, if the execution main body detects a new object, a queue corresponding to the new object may be established, where the queue is used to store the position of the detection frame detected for the new object in each video frame. And the execution body may store the position of the target object detected in the first target video frame, that is, the position of the detection frame, in the queue as the position of the detection frame arranged first in the queue.

If the execution body detects the target object in the second target video frame, the execution body may continue to store the position of the detected detection frame of the target object in the queue. If the position of the detection frame detected in the first target video frame and the position of the detection frame detected in the second target video frame exist in the queue at the same time, the former is arranged in the order prior to the latter.

The application scenes can establish a queue for each new object, so that the positions of the detection frames of different objects are accurately recorded, and more accurate states of the target objects are obtained.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing video frames according to the present embodiment. In the application scenario of fig. 3, the execution subject 301 detects one red car a in each video frame of the video frame sequence, obtains the position of the detection frame in each video frame, and stores the position of the detection frame in the detection frame position set 302 of the red car a. The execution subject 301 acquires, as the first target position and the second target position 303, a position of a detection frame in the first video frame and a position of a detection frame in the last video frame in the sequence of video frames, respectively, in the set of detection frame positions. The execution body 301 determines a distance 304 between the first target position and the second target position. The executive 301 determines a state 305 of the red car a based on the distance 304, where the state 305 includes moving or stationary.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for processing video frames is shown. The process 400 may include the following steps:

step 401, detecting a target object in each video frame of the video frame sequence to obtain a position of a detection frame in each video frame.

In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the method for processing a video frame operates may detect a target object included in each video frame in a sequence of video frames, so as to obtain a position of a detected detection frame. Specifically, the target object is detected, that is, the position of the detection frame in which the target object is detected. The video frame sequence may include a target object in each frame, or may include a target object in a part of the video frame.

Step 402, according to the positions of the detection frames in each video frame, a detection frame position set of the target object is obtained.

In this embodiment, the execution body described above may obtain a set of detection frame positions of the target object from the detection frame positions to be detected in the video frame. Specifically, the execution body may obtain the set in various ways. For example, the execution body may store the positions of the detected detection frames into the detection frame position set of the target object, and use the stored detection frame position set as the obtained set.

Step 403, in the detection frame position set, acquiring the position of the detection frame in the previous video frame and the position of the detection frame in the subsequent video frame in the video frame sequence as the first target position and the second target position, respectively.

In this embodiment, the execution main body may acquire the position of the detection frame of the preceding video frame and the position of the detection frame of the following video frame in the detection frame position set. Specifically, the preceding video frame and the following video frame are both video frames in the video frame sequence. In the sequence of video frames, the preceding video frame is a video frame that precedes a succeeding video frame. In practice, the preceding video frame and the following video frame may be various video frames.

Step 404, determining a distance between the first target location and the second target location.

In this embodiment, the execution subject may determine a distance between the first target position and the second target position, that is, a distance between positions of two detection frames.

Step 405, determine the target value from the quotient of: a distance, and a height of a detection box in the sequence of video frames specifying the target object in a following video frame, wherein a number of video frames specifying an interval between the following video frame and a last video frame in the sequence of video frames is less than a first number threshold.

In this embodiment, an execution subject (e.g., a server or a terminal device shown in fig. 1) on which the method for processing video frames is executed may determine the target value according to a quotient of the two. The two are the distance between the first target position and the second target position, and the height of the detection frame in the sequence of video frames specifying the target object in the following video frame, respectively. The specified subsequent video frame may be the subsequent video frame corresponding to the second target position, may be one of the subsequent video frames, or may be another video frame other than the subsequent video frame and not being the first frame in the sequence of video frames. In practice, the number of video frames specifying the interval between the following video frame and the last video frame in the sequence of video frames may be smaller than the first number threshold, for example, the number of intervals may be 0, that is, the following video frame may be the last video frame in the sequence of video frames.

In practice, the execution subject may determine the target value according to the quotient of the two in various ways, such as directly determining the quotient of the two as the target value. In addition, the execution subject may perform preset processing on a quotient of the two, for example, inputting a preset model or processing by using a preset algorithm, and take a processing result as a target value.

Step 406, determining the state of the target object based on the target value and a preset value threshold.

In this embodiment, the execution subject may determine the state of the target object based on the target value and the preset value threshold in various ways. For example, the executing entity may directly compare the magnitude relationship between the target value and a preset value threshold, and make the following determination: and determining the state of the target object as motion under the condition that the target value is not less than the preset value threshold, and determining the state of the target object as static under the condition that the target value is less than the preset value threshold. In addition, the executing entity may also update the target value, for example, input the target value into a preset model or multiply the target value by a preset coefficient to update, compare the magnitude relationship between the update result of the target value and a preset value threshold, and determine the state of the target object by using the above manner of making a judgment according to the target value and the preset value threshold.

According to the embodiment, the influence of the size of the object in the video frame on the determined state can be removed by determining the quotient of the distance and the height of the detection frame in the newer video frame, so that the determined state of the target object is more accurate.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for processing video frames, the apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the apparatus embodiment may include the same or corresponding features or effects as the method embodiment shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the apparatus 500 for processing a video frame of the present embodiment includes: a detection unit 501, a set acquisition unit 502, an acquisition unit 503, a distance determination unit 504, and a state determination unit 505. The detection unit 501 is configured to detect a target object in each video frame of a video frame sequence to obtain a position of a detection frame in each video frame; a set obtaining unit 502 configured to obtain a detection frame position set of the target object according to positions of detection frames in respective video frames; an acquiring unit 503 configured to acquire, in the set of detection frame positions, a position of a detection frame in a preceding video frame and a position of a detection frame in a following video frame in the sequence of video frames as a first target position and a second target position, respectively; a distance determining unit 504 configured to determine a distance between the first target position and the second target position; a state determination unit 505 configured to determine a state of the target object based on the distance, wherein the state includes moving or stationary.

In this embodiment, specific processes of the detecting unit 501, the set obtaining unit 502, the obtaining unit 503, the distance determining unit 504, and the state determining unit 505 of the apparatus 500 for processing a video frame and technical effects brought by the specific processes may refer to related descriptions of step 201, step 202, step 203, step 204, and step 205 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of the embodiment, the state determination unit is further configured to perform the distance-based determination of the state of the target object as follows: the target value is determined from the quotient of: a distance, and a height of a detection box of the target object in a specified following video frame in the sequence of video frames, wherein a number of video frames specifying an interval between the following video frame and a last video frame in the sequence of video frames is less than a first number threshold; and determining the state of the target object based on the target value and a preset value threshold.

In some optional implementations of this embodiment, the obtaining unit is further configured to obtain, as the first target position and the second target position, a position of the detection frame in a previous video frame and a position of the detection frame in a subsequent video frame in the sequence of video frames in the set of detection frame positions as follows: in response to the number of positions of the detection box in the set of detection box positions reaching the target number, performing the steps of: acquiring the positions of the target object in the detection frames of at least two previous video frames respectively in the detection frame position set, determining an average value to obtain a first position average value, and taking the first position average value as a first target position; acquiring the positions of the target object in the detection frames of at least two subsequent video frames respectively in the detection frame position set, determining an average value to obtain a second position average value, and taking the second position average value as a second target position; wherein, in the sequence of video frames, the number of video frames that are spaced between a preceding video frame and a succeeding video frame is greater than a second number threshold.

In some optional implementations of the present embodiment, the sequence of video frames includes a first target video frame and a second target video frame subsequent to the first target video frame; the device still includes: an establishing unit configured to, in response to detection of a new object in the first target video frame that is not detected in a video frame preceding the first target video frame, take the new object as a target object, and establish a queue of positions of detection frames of the target object; and a set obtaining unit further configured to perform obtaining a set of detection frame positions of the target object based on the positions of the detection frames in the respective video frames as follows: storing the position of a detection frame of a target object in a first target video frame into a queue; in response to detecting the target object in the second target video frame, the position of the detection frame of the target object in the second target video frame is stored in the queue.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, is a block diagram of an electronic device for processing a video frame according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for processing video frames provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for processing video frames provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for processing video frames in the embodiments of the present application (for example, the detection unit 501, the set obtaining unit 502, the acquisition unit 503, the distance determination unit 504, and the state determination unit 505 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the method for processing video frames in the above method embodiments.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device for processing a video frame, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected to an electronic device for processing video frames over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of processing video frames may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus for processing video frames, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, which may be described as: a processor includes a detection unit, a set acquisition unit, an acquisition unit, a distance determination unit, and a state determination unit. Where the names of the cells do not in some cases constitute a limitation on the cells themselves, for example, the state determination unit may also be described as a "cell that determines the state of the target object based on the distance".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: detecting a target object in each video frame of a video frame sequence to obtain the position of a detection frame in each video frame; acquiring a detection frame position set of a target object according to the positions of detection frames in each video frame; in the detection frame position set, acquiring the position of a detection frame in a previous video frame and the position of a detection frame in a subsequent video frame in the video frame sequence, and respectively taking the positions as a first target position and a second target position; determining a distance between the first target location and the second target location; based on the distance, a state of the target object is determined, wherein the state includes moving or stationary.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for processing video frames, the method comprising:

detecting a target object in each video frame of a video frame sequence to obtain the position of a detection frame in each video frame, wherein the video frames in the video frame sequence are traffic scene live-action pictures;

acquiring a detection frame position set of the target object according to the positions of the detection frames in the video frames;

in the detection frame position set, acquiring the position of a detection frame in a previous video frame and the position of a detection frame in a subsequent video frame in the video frame sequence, and respectively taking the positions as a first target position and a second target position;

determining a distance between the first target location and the second target location;

determining a state of the target object based on the distance, wherein the state comprises motion or stationary;

the determining the state of the target object based on the distance comprises:

the target value is determined from the quotient of: the distance, and a height of a detection box of the target object in a specified following video frame in the sequence of video frames, wherein a number of video frames of an interval between the specified following video frame and a last video frame in the sequence of video frames is less than a first number threshold; determining the state of the target object based on the target value and a preset value threshold;

in the detecting frame position set, acquiring a position of a detecting frame in a previous video frame and a position of a detecting frame in a subsequent video frame in the video frame sequence as a first target position and a second target position respectively, including:

in response to the number of positions of the detection frame in the set of detection frame positions reaching a target number, performing the steps of:

acquiring the positions of the target object in the detection frames of at least two previous video frames respectively in the detection frame position set, determining an average value to obtain a first position average value, and taking the first position average value as a first target position;

and acquiring the positions of the target object in the detection frames of at least two subsequent video frames respectively in the detection frame position set, determining an average value to obtain a second position average value, and taking the second position average value as a second target position.

2. The method of claim 1, wherein, in the sequence of video frames, a number of video frames that are spaced between the preceding video frame and the following video frame is greater than a second number threshold.

3. The method according to one of claims 1-2, wherein the set of positions of the detection frame is a queue formed by positions of the detection frames, the number of the positions of the detection frames in the queue does not exceed a preset number, and the positions of the detection frames in the queue are arranged according to the sequence of the video frames corresponding to the detection frames in the sequence of the video frames.

4. The method of claim 3, wherein the sequence of video frames comprises a first target video frame and a second target video frame subsequent to the first target video frame;

the method further comprises the following steps:

in response to detecting a new object in the first target video frame that is not detected in a video frame preceding the first target video frame, treating the new object as a target object, and establishing a queue of positions of detection frames of the target object; and

the obtaining a detection frame position set of the target object according to the positions of the detection frames in the video frames includes:

storing the position of the detection frame of the target object in the first target video frame into the queue;

in response to detecting the target object in the second target video frame, storing a position of a detection frame of the target object in the second target video frame in the queue.

5. A device for processing video frames, the device comprising:

the detection unit is configured to detect a target object in each video frame of a video frame sequence to obtain the position of a detection frame in each video frame, wherein the video frames in the video frame sequence are traffic scene live-action pictures;

a set obtaining unit configured to obtain a detection frame position set of the target object according to positions of detection frames in the respective video frames;

an acquisition unit configured to acquire, in the set of detection frame positions, a position of a detection frame in a preceding video frame and a position of a detection frame in a following video frame in the sequence of video frames as a first target position and a second target position, respectively;

a distance determination unit configured to determine a distance between the first target position and the second target position;

a state determination unit configured to determine a state of the target object based on the distance, wherein the state includes motion or stationary;

the state determination unit is further configured to perform the determining the state of the target object based on the distance as follows:

the obtaining unit is further configured to perform the obtaining, in the set of detected frame positions, a position of a detected frame in a previous video frame and a position of a detected frame in a subsequent video frame in the sequence of video frames as a first target position and a second target position, respectively, as follows:

6. The apparatus of claim 5, wherein a number of video frames in the sequence of video frames that are spaced between the preceding video frame and the following video frame is greater than a second number threshold.

7. The apparatus according to one of claims 5 to 6, wherein the set of positions of the detection frame is a queue formed by positions of detection frames, the number of positions of detection frames in the queue does not exceed a preset number, and the positions of detection frames in the queue are arranged according to the order of the video frames corresponding to the detection frames in the sequence of the video frames.

8. The apparatus of claim 7, wherein the sequence of video frames comprises a first target video frame and a second target video frame subsequent to the first target video frame;

the device further comprises:

an establishing unit configured to, in response to detection of a new object that is not detected in a video frame preceding the first target video frame in the first target video frame, regard the new object as a target object, and establish a queue of positions of detection frames of the target object; and

the set obtaining unit is further configured to perform the obtaining of the detection frame position set of the target object according to the positions of the detection frames in the video frames as follows:

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-4.