CN114663793A

CN114663793A - Target behavior identification method and device, storage medium and terminal

Info

Publication number: CN114663793A
Application number: CN202011401980.8A
Authority: CN
Inventors: 邓志东; 张睿文; 鹿红超
Original assignee: Tsinghua University; Toyota Motor Corp
Current assignee: Tsinghua University; Toyota Motor Corp
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2022-06-24

Abstract

A target behavior identification method and device, a storage medium and a terminal are provided, and the target behavior identification method comprises the following steps: acquiring a video to be identified; carrying out target detection on each frame of image to be identified so as to respectively obtain a moving target and a lane line in each frame of image to be identified; mapping the moving target and the lane line which are identified in each frame of image to be identified in the same mapping image according to the pixel coordinates; determining a contact line of the moving object and the road surface as a horizontal reference line in each mapping image, and determining a first pixel coordinate of the moving object on the horizontal reference line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal reference line; and calculating the relative position relation of the first pixel coordinate, the second pixel coordinate and the third pixel coordinate in each mapping image to obtain behavior time sequence data of the moving object in the video to be identified. The technical scheme of the invention can realize the action recognition of the moving target in a high-speed long-distance scene.

Description

Target behavior identification method and device, storage medium and terminal

Technical Field

The invention relates to the technical field of image processing, in particular to a target behavior identification method and device, a storage medium and a terminal.

Background

The main technology of the lane break-in behavior prediction problem is video action recognition, and more accurately video action recognition in a high-speed long-distance scene. Video motion recognition refers to the recognition of the motion of an object (person) in a video from a continuous piece of video (i.e., a sequence of image frames). Two types of features are contained in video: temporal features and spatial features, the temporal features being features of pixel changes corresponding to different frames (different time instants) of the image. Spatial features are pixel features in a frame of image. Simultaneous extraction of spatio-temporal features requires computation of the relationship of pixels within a frame and between different frames.

For video motion recognition, there are two main methods, 3D convolutional neural network and dual-stream method. The 3D space-time convolution method captures both temporal and spatial features using a 3D convolution kernel. Compared with two-dimensional convolution, the 3D convolution kernel has one added dimension, and the 3D convolution kernel can calculate the pixel relation of the same area of continuous frames, so that the time and space characteristics can be simultaneously extracted, and the action recognition of the video is realized. The dual-flow method comprises an optical flow network branch and an image network branch, and the time characteristics are represented by optical flow images. The optical flow is the instantaneous velocity of pixel motion of a spatially moving object on the viewing imaging plane. The optical flow method is a method for calculating motion information of an object between adjacent frames by using the change of pixels in an image sequence in a time domain and the correlation between adjacent frames to find the corresponding relationship between a previous frame and a current frame. The double-current method extracts time characteristics in an optical flow network branch by using a 2D convolution kernel, extracts spatial characteristics in an image network branch, then fuses the two to extract space-time characteristics, and further classifies video motion.

However, the 3D convolution and dual stream methods are not suitable for high speed distant scenes. The high-speed long-distance scene is that the camera moves at a high speed, and the distance between the action target and the camera is greater than a preset distance, such as 100 meters. In this case, irregular visual changes of moving objects in the image frame and severe shaking of the image may both cause changes in the image frame to not truly reflect the actual motion of the objects. In both the 3D convolution method and the optical flow method, it is considered that the extracted feature is the change feature of the corresponding pixel in the image frame, and therefore, the whole behavior of the image cannot be recognized obviously only according to the absolute change of the target area pixel in the image. In addition, since the distance is long, the motion target only occupies a small part of the pixel space in the plane of the video image frame, the pixel features of the whole image, including the background, are extracted by both the 3D convolution and the dual-stream methods, and therefore, the motion features of such small target are difficult to capture and accurately predict the behavior intention of the small target.

Disclosure of Invention

The invention solves the technical problem of how to realize the action recognition of the moving target in a high-speed long-distance scene.

In order to solve the above technical problem, an embodiment of the present invention provides a target behavior identification method, where the target behavior identification method includes: acquiring a video to be identified, wherein the video to be identified comprises a plurality of frames of images to be identified; carrying out target detection on each frame of image to be identified so as to respectively obtain a moving target and a lane line in each frame of image to be identified; mapping the moving target and the lane line which are identified in each frame of image to be identified in the same mapping image according to the pixel coordinates, wherein each frame of image to be identified has a corresponding mapping image; determining a contact line of the moving target and a road surface as a horizontal datum line in each mapping image, and determining a first pixel coordinate of the moving target on the horizontal datum line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal datum line; and calculating the relative position relationship among the first pixel coordinate, the second pixel coordinate and the third pixel coordinate in each mapping image to obtain behavior time sequence data of the moving object in the video to be recognized, wherein the behavior time sequence data comprises a plurality of relative position relationships arranged according to the time sequence corresponding to a plurality of frames of images to be recognized.

Optionally, the calculating the relative position relationship between the first pixel coordinate and the second pixel coordinate and the third pixel coordinate in each mapping image includes: determining a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image; and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image to be used as the relative position relation.

Optionally, the calculating the relative position relationship between the first pixel coordinate and the second pixel coordinate and the third pixel coordinate in each mapping image includes: determining the width of the lane line and a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image; and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image, and normalizing by using the width of the lane line to be used as the relative position relation.

Optionally, the performing target detection on each frame of image to be identified includes: and respectively inputting each frame of image to be recognized into a trained moving target detection model and a trained lane line detection model, wherein the moving target detection model outputs the recognized pixel coordinates of the moving target, and the lane line detection model recognizes the pixel coordinates of the lane line.

Optionally, the moving object is represented by pixel coordinates of an envelope box, and the lane line is represented by coordinates of pixels of the lane line in the image to be recognized.

Optionally, the determining the first pixel coordinate of the moving target on the horizontal reference line, and the second pixel coordinate and the third pixel coordinate of the two lines of the lane line on the horizontal reference line further includes: and performing Kalman filtering on the first pixel coordinate, the second pixel coordinate and the third pixel coordinate corresponding to each frame of image to be identified.

Optionally, the target behavior identification method further includes: and classifying the intrusion behavior of the moving target to the lane line according to the relative position relation in the behavior time series data.

In order to solve the above technical problem, an embodiment of the present invention further discloses a target behavior recognition apparatus, where the target behavior recognition apparatus includes: the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a video to be recognized, and the video to be recognized comprises a plurality of frames of images to be recognized; the target detection module is used for carrying out target detection on each frame of image to be identified so as to respectively obtain a moving target and a lane line in each frame of image to be identified; the mapping module is used for mapping the moving target and the lane line which are obtained by identification in each frame of image to be identified in the same mapping image according to the pixel coordinates, and each frame of image to be identified has a corresponding mapping image; the pixel coordinate calculation module is used for determining a contact line of the moving target and the road surface as a horizontal datum line in each mapping image, and determining a first pixel coordinate of the moving target on the horizontal datum line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal datum line; and the relative position relation calculation module is used for calculating the relative position relation among the first pixel coordinate, the second pixel coordinate and the third pixel coordinate in each mapping image so as to obtain behavior time sequence data of the moving object in the video to be recognized, wherein the behavior time sequence data comprises a plurality of relative position relations which are arranged according to the time sequence corresponding to a plurality of frames of images to be recognized.

The embodiment of the invention also discloses a storage medium, wherein a computer program is stored on the storage medium, and the computer program executes the steps of the target behavior identification method when being executed by a processor.

The embodiment of the invention also discloses a terminal which comprises a memory and a processor, wherein the memory is stored with a computer program which can be run on the processor, and the processor executes the steps of the target behavior identification method when running the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

according to the technical scheme, the moving target and the lane line in each frame of image to be recognized are mapped in the same mapping image, and the relative position relation of the moving target and the lane line on the horizontal reference line is calculated in the mapping image, so that the position characteristics of the small target can be captured, the problem that the small target action is difficult to capture in a long-distance scene is solved, and a foundation is laid for accurate prediction of target behaviors. In addition, according to the technical scheme, the target behavior recognition is converted into the behavior time sequence data for recognition, so that the recognition difficulty can be reduced, and the recognition accuracy and the system robustness can be improved.

Further, determining the width of the lane line and a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image; and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image, and normalizing by using the width of the lane line to be used as the relative position relation. Because the video shot in a high-speed long-distance scene has the problems of scale change, image jitter and the like, the position relation between the moving target and the lane line in each frame of image can only represent the position relation in the current frame, and the behavior time sequence data based on the position relation can not accurately reflect the movement behavior of the target.

Drawings

FIG. 1 is a flow chart of a method for identifying target behavior according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an exemplary application scenario of the present invention;

FIG. 3 is a diagram illustrating another exemplary application scenario of an embodiment of the present invention;

FIG. 4 is a flowchart of one embodiment of step S105 shown in FIG. 1;

FIG. 5 is a detailed diagram of the relative position relationship according to an embodiment of the present invention;

FIG. 6 is a detailed schematic diagram of another relative position relationship according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a target behavior recognition apparatus according to an embodiment of the present invention.

Detailed Description

As described in the background, the 3D convolution and dual stream methods are not suitable for high speed long distance scenes. The high-speed long-distance scene is that the camera moves at a high speed, and the distance between the action target and the camera is greater than a preset distance, such as 100 meters. In this case, irregular visual changes of moving objects in the image frame and severe shaking of the image may both cause changes in the image frame to not truly reflect the actual motion of the objects. In both the 3D convolution method and the optical flow method, it is considered that the extracted change features of the corresponding pixels in the image frame, and therefore, it is obvious that the entire behavior cannot be recognized only from the absolute change of the target area pixels in the image. Furthermore, due to the greater distance, motion targets occupy only a small portion of the pixel space in the video image frame plane. Both the 3D convolution and the dual-stream methods extract pixel features of the whole image, including background, and the proportion of moving object features is small, so that these methods are difficult to capture the motion features of such small objects and accurately predict their behavior intentions.

Furthermore, due to the problems of scale change, image jitter and the like of a video shot in a high-speed long-distance scene, the position relation between a moving object and a lane line in each frame of image can only represent the position relation in a current frame, and the movement behavior of the object cannot be accurately reflected by behavior time sequence data based on the position relation.

The term "moving object" in embodiments of the present invention may refer to an object that is small relative to a vehicle, such as a pedestrian, a person riding a non-motorized vehicle, or other animal. Further, the moving object in the embodiment of the present invention refers to a pedestrian performing a lane line intrusion behavior of a motor vehicle or a person riding a non-motor vehicle.

The lane line referred to in the embodiments of the present invention may refer to a lane line of a motor vehicle, which is usually a pair, and two lines of the lane line can define a driving range of the motor vehicle, that is, a prohibited area of a non-motor vehicle.

The embodiment of the invention refers to a high-speed long-distance scene, which means that the movement speed of a camera for collecting videos is higher than a preset speed. Such as 60 km/h; the distance from the camera to the moving object to be shot is greater than a preset distance, for example, a scene of 100 meters. In one non-limiting example, the camera may be located in a vehicle, the vehicle traveling at a speed greater than 60km/h, and the vehicle being located at a distance greater than 100 meters from the moving object.

The target behavior referred to in the embodiment of the present invention may be a lane break-in behavior, that is, a behavior in which a moving target enters a range defined by two lines of a lane line.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart of a target behavior identification method according to an embodiment of the present invention.

The target behavior identification method in the embodiment of the present invention may be used at a terminal device side, for example, may be a vehicle-mounted device, that is, the vehicle-mounted device may execute each step of the method shown in fig. 1.

Specifically, the target behavior identification method shown in fig. 1 may include the following steps:

step S101: acquiring a video to be identified, wherein the video to be identified comprises a plurality of frames of images to be identified;

step S102: carrying out target detection on each frame of image to be identified so as to respectively obtain a moving target and a lane line in each frame of image to be identified;

step S103: mapping the moving target and the lane line which are identified in each frame of image to be identified into the same mapping image according to the pixel coordinates, wherein each frame of image to be identified has a corresponding mapping image;

step S104: determining a contact line of the moving target and a road surface as a horizontal datum line in each mapping image, and determining a first pixel coordinate of the moving target on the horizontal datum line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal datum line;

step S105: and calculating the relative position relationship among the first pixel coordinate, the second pixel coordinate and the third pixel coordinate in each mapping image to obtain behavior time sequence data of the moving object in the video to be recognized, wherein the behavior time sequence data comprises a plurality of relative position relationships arranged according to the time sequence corresponding to a plurality of frames of images to be recognized.

It should be noted that the sequence numbers of the steps in this embodiment do not represent a limitation on the execution sequence of the steps.

In a specific implementation of step S101, the vehicle-mounted device may obtain the video to be recognized from a vehicle-mounted camera, and the vehicle-mounted camera may be disposed inside the vehicle-mounted device or may be externally coupled to the vehicle-mounted device. Specifically, the vehicle-mounted camera can be a monocular wide-angle front-view camera, and can collect videos in real time and transmit the videos to the vehicle-mounted equipment for target behavior identification.

In the specific implementation of step S102, the vehicle-mounted device performs target detection on each frame of image to be recognized in the video to be recognized, that is, the moving target and the lane line are recognized independently. The moving object and the lane line in the image to be recognized of each frame are represented by the pixel coordinates thereof in the image to be recognized. More specifically, the moving object is represented in the coordinates of the pixels of the envelope box, and the lane lines are represented in the coordinates of the pixels thereof in the image to be recognized.

In one non-limiting example, the envelope box of the moving object is defined as the coordinates in the image pixel UV coordinate system, including its upper left coordinate (u)_l,t,v_l,t) And the lower right coordinate (u)_r,b,v_r,b)。

In one non-limiting example, the lane line may be identified by a grayscale map. The smaller the gray value of the pixel is, the higher the probability that the pixel is a lane line is, and therefore, the coordinates of the left and right lane lines of the lane line in the image coordinate system can be obtained in the gray scale image.

In one non-limiting embodiment, step S102 shown in fig. 1 may include the following steps: and respectively inputting each frame of image to be recognized into a trained moving target detection model and a trained lane line detection model, wherein the moving target detection model outputs the recognized pixel coordinates of the moving target, and the lane line detection model recognizes the pixel coordinates of the lane line.

In this embodiment, the moving object detection model and the lane line detection model may be constructed by using a deep learning algorithm and trained in advance, and may be used for detecting a moving object and detecting a lane line, respectively. Specifically, the labeled video data may be utilized to train a moving target detection model, where the video data includes the labeled moving target; the lane marking detection model may be trained using marked video data, including marked lane markings.

It should be noted that, as for the specific way of constructing and training the moving object detection model and the lane line detection model by using the deep learning way, reference may be made to the prior art, and details of the embodiment of the present invention are not described herein again.

In the implementation of step S103, since the moving object and the lane line are detected separately, they need to be mapped in the same image. Compared with the original image to be identified, the ratio of the moving target to the lane line in the mapping image is increased, and a foundation is laid for the accuracy of the behavior prediction of the moving target.

In the specific implementation of step S104, it can be reasonably assumed that the lower line of the moving object envelope box is the contact line between the moving object and the road surface, i.e. the horizontal reference line of the current frame image.

Referring specifically to FIG. 2, the line of contact extends linearly, and the line is a horizontal base passing through the moving object T and perpendicular to the current lane lines V1, V2A directrix L. By constructing a one-dimensional coordinate system on the horizontal reference line L, the positions of the moving target T and the lane lines V1 and V2 can be clearly represented. The one-dimensional coordinate system may be a u-coordinate system in the image coordinate system. Interval [ u ]_l,t,u_r,b]Representing the area occupied by the moving object, point O representing the center of the moving object, and its coordinate value is (u)_l,t,u_r,b) /2, wherein u_l,tAnd u_r,bThe upper left coordinate and the lower right coordinate of the envelope box representing the moving target T on the u coordinate system. So far, the pixel positions of the moving object and the lane line under the unified one-dimensional coordinate system can be determined.

Further, referring to fig. 3 together, the abscissa in fig. 3 represents the coordinates of a pixel on a horizontal reference line, and the ordinate represents the probability p that the pixel is a lane line. Each spike represents a lane line when determining the second and third pixel coordinates of the two lines of lane lines on the horizontal reference line. I.e. the peak L in FIG. 3₁、L₂Are respectively the central points of the left lane line and the right lane line. In general, the distance between two lines of a lane line, i.e. the point L₁And point L₂The distance between should be 2-4 times the moving object O area (i.e. the width occupied by the envelope box on the horizontal reference line).

Specifically, the pixel points of part of the lane lines may not be detected due to shielding and the like, and at the moment, a quadratic polynomial can be used for fitting the central point L of the lane line₁、L₂So as to ensure that the lane line is a continuous and complete curve.

In a specific implementation of step S105, a relative position relationship between the first pixel coordinate and the second pixel coordinate and the third pixel coordinate may be calculated to represent a relative position relationship between the moving object and the lane line.

Specifically, the relative positional relationship between the first pixel coordinate and the second pixel coordinate and the third pixel coordinate may be represented by a difference between the first pixel coordinate and a pixel coordinate of a midpoint of the lane line on the horizontal reference line, where the midpoint of the lane line on the horizontal reference line is a sum of the second pixel coordinate and the third pixel coordinate divided by 2. Alternatively, the relative positional relationship may be directly expressed by the second pixel coordinate, the third pixel coordinate, and the midpoint position of the lane line on the horizontal reference line.

Further, since the video to be recognized includes a plurality of frames of images to be recognized, one relative positional relationship can be determined from each frame of image to be recognized, and thus a behavior time-series data is finally obtained, the behavior time-series data including a plurality of relative positional relationships arranged in the time order of the plurality of frames of images to be recognized. Specifically, the behavior time-series data may include a difference between a first pixel coordinate at a plurality of time instants and a pixel coordinate of a midpoint of the lane line on the horizontal reference line, or a second pixel coordinate, a third pixel coordinate, and a midpoint position of the lane line on the horizontal reference line at the plurality of time instants.

In one non-limiting embodiment, step S105 shown in fig. 1 may include the following steps: determining a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image; and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image to be used as the relative position relation.

In this embodiment, first, a fourth pixel coordinate L of the center point of the lane line on the horizontal reference line is calculated_c＝(L₁+L₂)/2. Then calculating the difference P between the first pixel coordinate and the fourth pixel coordinate_r＝(O-L_c) Wherein L is_cFourth pixel coordinate, L, representing the center point of the current lane₁And L₂Pixel coordinates of left and right lane lines respectively representing a current lane, O is a pixel coordinate of a moving object such as a pedestrian or a rider, P_rRepresenting the positional relationship of the moving object with respect to the center point.

In another non-limiting embodiment, referring to fig. 4, the step S105 shown in fig. 1 may include the following steps:

step S401: determining the width of the lane line and a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image;

step S402: and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image, and normalizing by using the width of the lane line to be used as the relative position relation.

Unlike the foregoing embodiment, the embodiment of the present invention performs normalization processing on the difference value between the first pixel coordinate and the fourth pixel coordinate.

Specifically, in a high-speed long-distance scene, the camera moves at a high speed along with the vehicle, so that problems such as obvious scale change of a moving object, image jitter and the like occur. At this time, the positions of the moving object and the lane line can only represent the position relationship in the current frame, and the position time sequence of the moving object based on the position relationship can not accurately reflect the moving behavior of the object. In order to accurately describe the motion of the target, the embodiment adopts a normalized relative positioning mode to solve the problems of scale change and image jitter.

Wherein w ═ L₁-L₂|，L_c＝(L₁+L₂)/2，P_r＝(O-L_c) W, w represents the pixel width of the current lane, L_cFourth pixel coordinate, L, representing the center point of the current lane₁And L₂Pixel coordinates of left and right lane lines respectively representing a current lane, O is a pixel coordinate of a moving object such as a pedestrian or a rider, P_rRepresenting the positional relationship of the moving object with respect to the center point.

Referring specifically to fig. 5 and 6, line l in fig. 5₁And l₂The line o is obtained by fitting according to the first pixel coordinate in the behavior time series data and represents a moving target. As can be seen from fig. 5, due to camera shake and scale change in a high-speed scene, the width of the lane line also changes, which may cause inaccurate judgment of the intrusion behavior of the moving object.

FIG. 6 shows the line o and the line l after normalization₁And l₂The relative relationship of (a). Wherein, the abscissa isAnd the ordinate represents the value of the difference value of the first pixel coordinate and the fourth pixel coordinate after normalization. Negative values indicate that the moving object is on the left side of the center point of the lane line, and positive values indicate that the moving object is on the right side of the center point of the lane line. As can be seen from fig. 5 and 6, the lane width is stable through normalization processing, and the problems of scale change and image jitter of images shot by the camera in a high-speed scene are solved.

In one non-limiting embodiment, step S104 shown in fig. 1 may include: and performing Kalman filtering on the first pixel coordinate, the second pixel coordinate and the third pixel coordinate corresponding to each frame of image to be identified.

Specifically, when the traditional or deep learning-based visual target detection method is used for tracking and detecting a moving target and a lane line in real time, false detection and missing detection inevitably occur. In order to reduce the adverse effects caused by false detection and missed detection, the embodiment performs filtering processing on the calculated moving target and lane line. For high speed application scenarios, the number of moving objects is small, e.g. within 5. The method can be used for tracking the moving target by the Sort multi-target tracking algorithm with good real-time performance, thereby achieving the aim of target alignment. After alignment, the ratio r between the number N of frames where each moving object appears and the total number N of frames of the image is counted, which is N/N. If r < T (T is a preset threshold), the target is considered as false detection and is discarded. The preset threshold T may be sized with reference to the accuracy and recall of the target detection model. After the aligned moving target sequence is obtained, false detection and missing detection generated by the target detection model are taken as a outlier and random noise, and the outlier is removed by using a Kalman filter and the noise is smoothed. Then, accurate continuous time sequence data of the moving object and the lane line under the one-dimensional coordinate system can be obtained.

In a non-limiting embodiment, step S105 shown in fig. 1 may be followed by the following steps: and classifying the intrusion behavior of the moving target to the lane line according to the relative position relation in the behavior time series data.

In a specific implementation, the lane break-in behavior may be classified into break-in and non-break-in. Class intrusion can be further subdivided into left intrusion and right intrusion.

In a specific application scenario, in conjunction with fig. 6, the object o enters the range defined by lane lines l1 and l2, and the type of intrusion behavior is intrusion. Further, when the target o enters from the lane line l1, the lane line l1 is the right side of the lane, and the type of the entering behavior is right entering.

It can be understood that any implementable algorithm in the prior art may be used as the specific algorithm for classifying the intrusion behavior, and the type of the intrusion behavior may also be adaptively set according to the actual application scenario, which is not limited in this embodiment of the present invention.

Referring to fig. 7, the embodiment of the invention further discloses a target behavior recognition device. The target behavior recognition device 70 may include:

an obtaining module 701, configured to obtain a video to be identified, where the video to be identified includes multiple frames of images to be identified;

a target detection module 702, configured to perform target detection on each frame of image to be identified, so as to obtain a moving target and a lane line in each frame of image to be identified respectively;

the mapping module 703 is configured to map the moving target and the lane line identified in each frame of image to be identified in the same mapping image according to the pixel coordinates, where each frame of image to be identified has a corresponding mapping image;

a pixel coordinate calculation module 704, configured to determine, in each mapping image, that a contact line between the moving target and a road surface is a horizontal reference line, and determine a first pixel coordinate of the moving target on the horizontal reference line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal reference line;

the relative position relationship calculating module 705 is configured to calculate a relative position relationship between the first pixel coordinate and the second pixel coordinate and between the first pixel coordinate and the third pixel coordinate in each mapping image, so as to obtain behavior time series data of the moving object in the video to be recognized, where the behavior time series data includes a plurality of relative position relationships arranged according to a time sequence corresponding to a plurality of frames of images to be recognized.

According to the embodiment of the invention, the moving target and the lane line in each frame of image to be recognized are mapped in the same mapping image, and the relative position relation of the moving target and the lane line on the horizontal reference line is calculated in the mapping image, so that the position characteristic of the small target can be captured, the problem that the small target action is difficult to capture in a long-distance scene is solved, and a foundation is laid for accurate prediction of target behavior. In addition, the embodiment of the invention can reduce the identification difficulty and improve the identification accuracy and the system robustness by converting the target behavior identification into the behavior time sequence data for identification.

For more details of the operation principle and the operation manner of the target behavior recognition device 70, reference may be made to the relevant descriptions in fig. 1 to fig. 6, and details are not repeated here.

The embodiment of the invention also discloses a storage medium, which is a computer-readable storage medium and stores a computer program thereon, and the computer program can execute the steps of the target behavior identification method shown in fig. 1 when running. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with a computer program which can run on the processor. The processor, when running the computer program, may perform the steps of the target behavior recognition method shown in fig. 1. The user equipment includes but is not limited to a mobile phone, a computer, a tablet computer and other terminal equipment.

It should be understood that the processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, a system on chip (SoC), a Central Processing Unit (CPU), a Network Processor (NP), a Digital Signal Processor (DSP), a Micro Controller Unit (MCU), a programmable logic controller (PLD), or other integrated chip. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will also be appreciated that the memory referred to in this embodiment of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor. It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A target behavior identification method is characterized by comprising the following steps:

acquiring a video to be identified, wherein the video to be identified comprises a plurality of frames of images to be identified;

carrying out target detection on each frame of image to be identified so as to respectively obtain a moving target and a lane line in each frame of image to be identified;

mapping the moving target and the lane line which are identified in each frame of image to be identified in the same mapping image according to the pixel coordinates, wherein each frame of image to be identified has a corresponding mapping image;

determining a contact line of the moving target and a road surface as a horizontal datum line in each mapping image, and determining a first pixel coordinate of the moving target on the horizontal datum line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal datum line;

and calculating the relative position relationship among the first pixel coordinate, the second pixel coordinate and the third pixel coordinate in each mapping image to obtain behavior time sequence data of the moving object in the video to be recognized, wherein the behavior time sequence data comprises a plurality of relative position relationships arranged according to the time sequence corresponding to a plurality of frames of images to be recognized.

2. The method according to claim 1, wherein the calculating the relative position relationship between the first pixel coordinate and the second pixel coordinate and the third pixel coordinate in each mapping image comprises:

determining a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image;

and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image to be used as the relative position relation.

3. The method according to claim 1, wherein the calculating the relative position relationship between the first pixel coordinate and the second pixel coordinate and the third pixel coordinate in each mapping image comprises:

determining the width of the lane line and a fourth pixel coordinate of the center point of the lane line on the horizontal reference line according to the second pixel coordinate and the third pixel coordinate in each mapping image; and calculating the difference value of the first pixel coordinate and the fourth pixel coordinate in each mapping image, and normalizing by using the width of the lane line to be used as the relative position relation.

4. The method for recognizing the target behavior according to claim 1, wherein the target detection of each frame of the image to be recognized comprises:

and respectively inputting each frame of image to be recognized into a trained moving target detection model and a trained lane line detection model, wherein the moving target detection model outputs the recognized pixel coordinates of the moving target, and the lane line detection model recognizes the pixel coordinates of the lane line.

5. The object behavior recognition method according to claim 4, characterized in that the moving object is represented in pixel coordinates of an envelope box, and the lane line is represented in coordinates of its pixels in the image to be recognized.

6. The method of claim 1, wherein the determining the first pixel coordinate of the moving object on the horizontal reference line and the second pixel coordinate and the third pixel coordinate of the two lines of the lane line on the horizontal reference line further comprises:

and performing Kalman filtering on the first pixel coordinate, the second pixel coordinate and the third pixel coordinate corresponding to each frame of image to be identified.

7. The method of claim 1, further comprising:

and classifying the intrusion behavior of the moving target to the lane line according to the relative position relation in the behavior time series data.

8. An object behavior recognition apparatus, comprising:

the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a video to be recognized, and the video to be recognized comprises a plurality of frames of images to be recognized; the target detection module is used for carrying out target detection on each frame of image to be identified so as to respectively obtain a moving target and a lane line in each frame of image to be identified;

the mapping module is used for mapping the moving target and the lane line which are obtained by identification in each frame of image to be identified in the same mapping image according to the pixel coordinates, and each frame of image to be identified has a corresponding mapping image;

the pixel coordinate calculation module is used for determining a contact line of the moving target and the road surface as a horizontal datum line in each mapping image, and determining a first pixel coordinate of the moving target on the horizontal datum line, and a second pixel coordinate and a third pixel coordinate of two lines of the lane line on the horizontal datum line;

and the relative position relation calculation module is used for calculating the relative position relation among the first pixel coordinate, the second pixel coordinate and the third pixel coordinate in each mapping image so as to obtain behavior time sequence data of the moving object in the video to be recognized, wherein the behavior time sequence data comprises a plurality of relative position relations which are arranged according to the time sequence corresponding to a plurality of frames of images to be recognized.

9. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, performing the steps of the target behavior recognition method according to any one of claims 1 to 7.

10. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the target behavior recognition method according to any of claims 1 to 7.