CN110991397B

CN110991397B - Travel direction determining method and related equipment

Info

Publication number: CN110991397B
Application number: CN201911304545.0A
Authority: CN
Inventors: 唐健; 张军; 祝严刚; 石伟; 王志元; 陶昆
Original assignee: Shenzhen Jieshun Science and Technology Industry Co Ltd
Current assignee: Shenzhen Jieshun Science and Technology Industry Co Ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2023-08-04
Anticipated expiration: 2039-12-17
Also published as: CN110991397A

Abstract

The embodiment of the application discloses a pedestrian traveling direction judging method, which comprises the following steps: and acquiring a plurality of frames of video images containing the person object, wherein the plurality of frames of video images have a sequential acquisition sequence. And determining image areas comprising a single person object in each frame of the video image respectively by using a deep learning neural network model to obtain a plurality of image areas. A plurality of target image areas containing the same person object are determined from the plurality of image areas using a deep learning based tracking algorithm. And determining the change trend of the target image area according to the sequence of collecting the target image areas. And determining the advancing direction of the same person object according to the change trend.

Description

Travel direction determining method and related equipment

Technical Field

The embodiment of the application relates to the field of image recognition, in particular to a traveling direction determining method and related equipment.

Background

The image recognition technology is a leading edge technology which is fast in progress and has wide prospects, the image recognition is divided into two parts of learning and recognition, the learning aims at forming various image features required by recognition, and the recognition is to correctly distinguish the types of images to be recognized according to the features. The image processing and recognition technology plays an important role in the fields of intelligent equipment, the Internet of things technology and the like.

The image of the person is collected, and the travelling direction information of the person can be identified according to the image intelligent equipment. The travel direction information has a practical effect, such as: for the access control equipment, the image recognition technology is used for recognizing the advancing direction of a person, the requirement of a user is judged according to the advancing direction, and if the pedestrian has the requirement of using the access control equipment, the access control equipment starts an access control recognition flow; if the pedestrian does not use the requirement of the access control equipment, the access control equipment does not start the access control identification flow.

Therefore, a technical solution is required to recognize the traveling direction of the person from the image.

Disclosure of Invention

An embodiment of the present application provides a method for determining a traveling direction, including:

acquiring multi-frame video images containing character objects, wherein the multi-frame video images have a sequential acquisition sequence;

determining image areas comprising a single person object in each frame of the video image respectively by using a deep learning neural network model to obtain a plurality of image areas;

determining a plurality of target image areas containing the same person object from the plurality of image areas using a tracking algorithm based on deep learning;

Determining the change trend of the target image area according to the sequence of collecting the target image areas;

and determining the advancing direction of the same person object according to the change trend.

According to the first aspect of the embodiments of the present application, optionally, the determining, using a deep learning neural network model, an image area including a single person object in each frame of the video image, to obtain a plurality of image areas includes:

and respectively determining image areas comprising a single person object in each frame of the video image by using a single-step multi-frame detector to obtain a plurality of image areas, wherein the single-step multi-frame detector neural network model is trained by an image data set comprising a person head.

According to the first aspect of the embodiments of the present application, optionally, after determining an image area including a single person object in each frame of the video image by using the deep learning neural network model to obtain a plurality of image areas, before determining a plurality of target image areas including the same person object from the plurality of image areas by using a tracking algorithm based on deep learning, the method further includes:

and enlarging the plurality of image areas according to a preset proportion, so that the image areas comprise the upper body image area of the person object.

According to the first aspect of the embodiment of the present application, optionally, the tracking algorithm based on deep learning includes a pedestrian re-recognition neural network model, and the loss function of the pedestrian re-recognition neural network model is a boundary loss function.

According to a first aspect of the embodiments of the present application, optionally, the determining, using a tracking algorithm based on deep learning, a plurality of target image areas including the same person object from the plurality of image areas includes:

performing feature contrast on the image areas by using a pedestrian re-recognition neural network model to obtain a first probability that each image area and other image areas belong to the same person object;

predicting the image areas by using a Kalman filtering algorithm to obtain a second probability that each image area and other image areas belong to the same person object;

setting different weights for the first probability and the second probability;

performing weighted operation on the first probability and the second probability to obtain a weighted result;

comparing the weighted result with a preset threshold value, and if the weighted result is larger than the preset threshold value, determining that two image areas corresponding to the weighted result belong to the same person object;

And combining the image areas belonging to the same person object to obtain a plurality of target image areas of the same person object.

According to the first aspect of the embodiments of the present application, optionally, the determining the trend of the change of the target image area according to the sequential acquisition sequence of the plurality of target image areas includes:

extracting the size and position information of a plurality of target image areas;

and calculating the change trend of the target image area by using the size, the position information and the sequence of acquisition of the plurality of target image areas.

According to the first aspect of the embodiment of the present application, optionally, after determining the traveling direction of the same pedestrian, the method further includes:

and processing the multi-frame video image by using a multi-frame voting method to verify whether the advancing direction of the same pedestrian is correct.

According to a first aspect of embodiments of the present application, optionally, the method further includes:

determining a scene included in the multi-frame video image;

obtaining a preset travelling direction range corresponding to the scene;

judging whether the advancing direction of the same person object belongs to the preset advancing direction range or not to obtain a judging result; the judging result is used for triggering and executing a preset processing action corresponding to the judging result.

According to the first aspect of the embodiment of the application, optionally, the multi-frame video image includes a multi-frame video image acquired by an image acquirer of the access control system.

A second aspect of the embodiments of the present application provides a travel direction determining apparatus, including:

the system comprises a video image acquisition unit, a character object acquisition unit and a character object acquisition unit, wherein the video image acquisition unit is used for acquiring a plurality of frames of video images containing the character object, and the plurality of frames of video images have a sequential acquisition sequence;

a portrait region determining unit, configured to determine an image region including a single person object in each frame of the video image, respectively, using a deep learning neural network model, to obtain a plurality of image regions;

a same person determining unit configured to determine a plurality of target image areas containing the same person object from the plurality of image areas using a tracking algorithm based on deep learning;

the change trend determining unit is used for determining the change trend of the target image areas according to the sequence of collecting the target image areas;

and the advancing direction determining unit is used for determining the advancing direction of the same person object according to the change trend.

According to a second aspect of the embodiments of the present application, optionally, the image region determining unit is configured to determine, using a deep learning neural network model, image regions including a single person object in each frame of the video image, so as to obtain a plurality of image regions, where the image region determining unit is specifically configured to:

According to a second aspect of embodiments of the present application, optionally, further includes:

and the amplifying unit is used for amplifying the image areas according to preset proportion so that the image areas comprise the upper half image area of the person object.

According to a second aspect of the embodiments of the present application, optionally, the deep learning-based tracking algorithm includes a pedestrian re-recognition neural network model, and the loss function of the pedestrian re-recognition neural network model is a boundary loss function.

According to a second aspect of the embodiments of the present application, optionally, the same person determining unit is configured to, when determining a plurality of target image areas including the same person object from the plurality of image areas using a tracking algorithm based on deep learning, specifically:

setting different weights for the first probability and the second probability;

According to the second aspect of the embodiments of the present application, optionally, the change trend determining unit is configured to, when determining the change trend of the target image area according to the sequential acquisition sequences of the plurality of target image areas, specifically:

and the verification unit is used for processing the multi-frame video images by using a multi-frame voting method so as to verify whether the advancing direction of the same pedestrian is correct.

a scene determining unit, configured to determine a scene included in the multi-frame video image;

the direction range unit is used for obtaining a preset travelling direction range corresponding to the scene;

the judging direction unit is used for judging whether the advancing direction of the same person object belongs to the preset advancing direction range or not to obtain a judging result; the judging result is used for triggering and executing a preset processing action corresponding to the judging result.

According to a second aspect of the embodiment of the present application, optionally, the multi-frame video image includes a multi-frame video image acquired by an image acquirer of the access control system.

A third aspect of the embodiments of the present application provides a travel direction determining apparatus, including:

the device comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;

the memory is a short-term memory or a persistent memory;

The central processor is configured to communicate with the memory and to execute instruction operations in the memory to perform the method according to any of the first aspects of the embodiments of the present application.

A fourth aspect of the embodiments provides a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform a method according to any of the first aspects of the embodiments of the application.

From the above technical solutions, the embodiments of the present application have the following advantages: and judging the advancing direction of the pedestrian according to the change of the pedestrian image in the video stream, so that the device can simultaneously execute the next strategy execution by referring to the information of whether the pedestrian exists or not and the advancing direction of the pedestrian.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of a method for determining a traveling direction of the present application;

FIG. 2 is another flow chart of an embodiment of a method for determining a direction of travel of the present application;

FIG. 3 is a schematic diagram of an apparatus for determining a direction of travel according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another apparatus according to an embodiment of a method of determining a direction of travel of the present application;

fig. 5 is a schematic diagram of another apparatus according to an embodiment of the travel direction determining method of the present application.

Detailed Description

The embodiment of the application provides a travel direction determining method and related equipment, which are used for intelligent equipment and related fields such as implementation strategies thereof.

Image recognition, which is a technique for processing, analyzing and understanding images by a computer to recognize targets and objects in various modes, is a practical application for applying a deep learning algorithm. The image recognition technology at the present stage is generally divided into face recognition and commodity recognition, wherein the face recognition is mainly applied to security inspection, identity verification and mobile payment; the commodity identification is mainly applied to the commodity circulation process, in particular to the unmanned retail fields such as unmanned goods shelves, intelligent retail cabinets and the like.

Under the current technical environment, the intelligent device can collect real-time images of certain scenes and process the collected images by using an image recognition technology. In the aspect of face recognition, after the image is acquired, a video frame containing the face image can be identified, and a next strategy is performed according to the information. For example, face recognition technology based on video image processing is commonly adopted in the current intelligent access control system. However, the door control system uses the image recognition technology in such a way that the door control system is opened after the face is recognized in the image in the specific area, and the door control system often detects the person in the non-passing direction by mistake, for example, when the person passes by or passes by reversely in parallel, the face faces the camera to be recognized by mistake, and the phenomenon of opening the door by mistake is caused. This phenomenon is particularly remarkable in a system in which a plurality of access control channels exist in parallel, and seriously affects the accuracy and efficiency of opening the access control system.

Referring to fig. 1, an embodiment of a travel direction determining method of the present application may be applied to, but is not limited to, an access control system. The embodiment specifically comprises: steps 101-105.

101. A multi-frame video image is acquired that includes a person object.

And acquiring a plurality of frames of video images containing the person object, wherein the plurality of frames of video images have a sequential acquisition sequence. The multi-frame video image generally originates from an image acquisition device in the intelligent equipment or the system, and the image acquired by the image acquisition device has indirect connection with the operation executed by the intelligent equipment, and the multi-frame video image can be a continuous video stream or can be acquired at regular time intervals for equipment which is not commonly used to form the multi-frame video image and process the multi-frame video image. The multi-frame video image should contain a person object, i.e. a pedestrian whose traveling direction needs to be judged.

102. And determining image areas comprising a single person object in each frame of the video image respectively by using a deep learning neural network model to obtain a plurality of image areas.

And determining image areas comprising a single person object in each frame of the video image respectively by using a deep learning neural network model to obtain a plurality of image areas. The deep learning neural network model may perform recognition processing on the video image and obtain an image area including a single person object, marked in the video image in the form of a recognition frame, and the image contained in the recognition frame is referred to as an image area including a single person object. Person identification technology based on face identification technology is mature in development and widely applied to public facilities such as stations, supermarkets and intersections.

103. A plurality of target image areas containing the same person object are determined from the plurality of image areas using a deep learning based tracking algorithm.

A plurality of target image areas containing the same person object are determined from the plurality of image areas using a deep learning based tracking algorithm. The tracking algorithm based on the deep learning can identify and group the image areas including the single person object identified in the step 102, determine the image areas belonging to different persons according to different features of different person objects in the image areas, is also applied to the identification and tracking of vehicles, trains a neural network and combines with other algorithms, can accurately identify images belonging to the same vehicle, and obtains information such as vehicle speed according to motion tracks represented by the images. The principle of the deep learning-based tracking algorithm applied to the recognition of the person object is similar to that of the vehicle, and only the training set used in training the neural network is required to be adjusted, so that the description is omitted.

The image areas obtained in step 102 are image areas containing human objects, and the human objects contained in these image areas may be the same human object or may be different human objects. In this step, image areas including the same person object are extracted from these image areas, and these image areas are referred to as target image areas of the same person object. It should be noted that, in this step, all image areas may be grouped to obtain a set of target image areas of each of all character object areas; alternatively, the step may be to extract a set of target image areas of each of the partial character objects from all the image areas.

104. And determining the change trend of the target image area according to the sequence of collecting the target image areas.

And determining the change trend of the target image area according to the sequence of collecting the target image areas. The video images corresponding to the target areas are analyzed according to the time sequence of the video images, the change trend of the target image areas can be obtained through the size of the image areas, the position on the video images and the like, the change trend can be that the image areas are smaller or larger, and the position on the video images can be moved from left to right or from top to bottom.

105. And determining the advancing direction of the same person object according to the change trend.

According to analysis of the change trend of the target image area, the advancing direction of the person object is determined, the advancing direction of the person object generally accords with the change trend of the target image area, the change trend of the target image area can be directly regarded as the advancing direction of the person object for the condition of low requirement judgment, the change trend of the target image area can be refined and analyzed, if the target image area is smaller and smaller, the advancing direction of the person object comprises a component far away from the image acquisition equipment according to the perspective principle, the component can be obtained according to the image change amplitude, and then the component is overlapped with speed components in other directions, so that the determined advancing direction is obtained.

Referring to fig. 2, one embodiment of the travel direction determining method of the present application includes: step 201-step 216.

201. A multi-frame video image is acquired that includes a person object.

And acquiring a plurality of frames of video images containing the person object, wherein the plurality of frames of video images have a sequential acquisition sequence. This step is similar to step 101 in the corresponding embodiment of fig. 1, and is not repeated here.

202. Image regions including a single person object are determined in each frame of video image using a single-step multi-frame detector neural network model, respectively, resulting in a plurality of image regions.

Image areas including a single person object are determined in each frame of video image using a single-step multi-frame detector, respectively, resulting in a plurality of image areas. The main idea of the single-step multi-frame detector (SSD, single Shot MultiBox Detector) in detection is to uniformly and densely sample at different positions of a picture, different scales and length-width ratios can be adopted in sampling, and then the characteristic is extracted by using a convolutional neural network (CNN, convolutional Neural Networks) and then classification and regression are directly carried out, wherein the whole process only needs one step. SSD has many scale feature map, utilizes convolution to detect, sets up characteristics such as priori frame. This results in higher SSD accuracy and also better detection results for small targets.

203. And enlarging the plurality of image areas according to a preset proportion, so that the image areas comprise the upper body image area of the person object.

And enlarging the plurality of image areas according to a preset proportion, so that the image areas comprise the upper body image area of the person object. In the process of processing the video image by using the SSD, as the SSD has a good small target detection effect, the training set can be used to include the training set of the head or face information, so that the obtained image area has higher accuracy and is not easy to miss. However, for the following tracking process, the smaller the image area, the fewer the features contained, and the more difficult the tracking. The target image area is enlarged so that the target image area contains the upper body image area of the person object for easy tracking process.

204. And comparing the characteristics of the image areas by using a pedestrian re-recognition neural network model to obtain a first probability that each image area and other image areas belong to the same person object.

And comparing the characteristics of the image areas by using a pedestrian re-recognition neural network model to obtain a first probability that each image area and other image areas belong to the same person object. In the process of using a deep learning-based tracking algorithm (deep-sort tracking algorithm), the image region is first subjected to feature contrast by using a pedestrian re-recognition neural network model, wherein the pedestrian re-recognition neural network model is substantially equal to a classification network. After training the pedestrian re-recognition network with enough data, inputting a test picture again, automatically extracting a feature vector from the network, wherein the feature is used for the pedestrian re-recognition task, and comparing according to the distance between the feature vectors to judge whether the two compared image areas belong to the same person object.

The tracking algorithm based on deep learning comprises a pedestrian re-recognition neural network model, and a loss function of the pedestrian re-recognition neural network model is a boundary loss function. The boundary loss function improves the recognition capability and robustness of the pedestrian re-recognition neural network. Easier to implement in practice and gives accurate results.

The comparison result of determining whether the two image areas to be compared belong to the same person object according to the distance between the feature vectors is called a first probability, for example, the image area included in the first frame in the video image is called a first image area, the image area included in the second frame is called a second image area, and so on. And comparing the characteristics of the image areas through the pedestrian re-recognition neural network model to obtain a probability value A of the person object contained in the first image area and the second image area belonging to the same person, wherein the value A is called as a first probability. For the practical operation process, every two video images can be compared, but because of the calculation amount problem, the comparison of every two video frames is generally performed in time sequence, and the practical implementation process is not limited herein.

205. And predicting the image areas by using a Kalman filtering algorithm to obtain a second probability that each image area and other image areas belong to the same person object.

And predicting the image areas by using a Kalman filtering algorithm to obtain a second probability that each image area and other image areas belong to the same person object. The Kalman filtering algorithm is also a sub-algorithm included in the deep-sort tracking algorithm, and the Kalman filtering essence is to reconstruct the state vector of the system from the metrology values. It is recursively deduced in the order of "prediction-actual measurement-correction", eliminates random interference based on the measured value of the system, reproduces the state of the system, or restores the original purpose of the system from the contaminated system based on the measured value of the system. The working principle of the Kalman filtering applied to the image tracking process is to infer the occurrence position of the image region of the next frame according to the attribute values of different image regions in the original video image and give probability values of the occurrence of the image region of the next frame to the different regions. A probability value, referred to as a second probability, may also be obtained with the determination of the next frame image region location information.

206. And setting different weights for the first probability and the second probability, and carrying out weighting operation on the first probability and the second probability to obtain a weighted result.

And setting different weights for the first probability and the second probability, and carrying out weighting operation on the first probability and the second probability to obtain a weighted result. Different weight values are respectively assigned to a first probability obtained by the pedestrian re-recognition neural network model and a second probability obtained by using a Kalman filtering algorithm. The weight value may be fixed or may be changed according to the probability value. In general, the weight given by the first probability obtained by the pedestrian re-recognition neural network model is higher, the specific implementation process is not limited, the probabilities obtained by two modes of different video frames are respectively calculated to obtain corresponding weighted results, and the robustness of the method is improved by using the two tracking modes together.

207. And comparing the weighted result with a preset threshold value.

And comparing the weighted result with a preset threshold, wherein the preset threshold is set manually, and the threshold can be set to a higher value correspondingly for scenes with higher tracking requirements, namely, the judgment of the image areas belonging to the same person object is stricter. And comparing all obtained weighted results with a preset threshold value respectively to obtain corresponding results.

208. And if the weighted result is larger than the preset threshold value, determining that the two image areas corresponding to the weighted result belong to the same person object.

And if the weighted result is larger than the preset threshold value, determining that the two image areas corresponding to the weighted result belong to the same person object. And for the figures contained in the image areas in two different frames for comparing and obtaining the weighted results to be the same figure object, finishing the most important part in the tracking process, and obtaining the conclusion of whether the figure object contained in the frame after the previous frame is the same figure.

209. And combining the image areas belonging to the same person object to obtain a plurality of target image areas of the same person object.

And combining the image areas belonging to the same person object to obtain a plurality of target image areas of the same person object. After the comparison, a plurality of results that the person objects contained in the two image areas are the same person object are obtained, and the results are grouped according to the occurrence condition of the results, for example, for the first frame in the video image, which comprises an A image area and a B image area, and the second frame comprises C, D, E three image areas, the steps are used for judging that the A image area and the C image area belong to the same person object, the B image area and the D image area belong to the same person object, the E belongs to an independent person object, and the conclusion that the A image area and the C image area do not exist and the B image area or the D image area belong to the same person object is obtained, so that the A image area and the C image area are divided into a group corresponding to one person object, and the image areas in the group are called as target image areas corresponding to the person; the B image area and the D image area are divided into a group corresponding to another person object, and the image areas in the group are referred to as target image areas corresponding to another person. And the same thing E is regarded as a target image area corresponding to another character object different from the character objects corresponding to the other two groups.

In the practical application process, the situation that the acquired video frames need to be processed in real time may also occur, so after grouping information is obtained, feature extraction can be performed on image areas included in the groups to obtain feature sums acquired in the video frames of the same person appearing before the image areas, and after a new video frame is acquired, the result of which group of target image areas the image areas belong to can be obtained only by comparing the image areas included in the newly acquired video frame with the feature sums corresponding to each group of target image areas respectively.

When the target image area corresponding to a certain person object cannot be acquired, two situations that the person object is separated from the video image or is temporarily blocked may exist, in order to ensure the robustness of the tracking process, 10 frames of video frames after the video frame corresponding to the target image area of the person object cannot be acquired can be detected, if the target image area corresponding to the person object does not exist, the person object is determined to be separated from the video image, and the tracking of the person object is finished.

210. Size and position information of a plurality of target image areas are extracted.

Size and position information of a plurality of target image areas are extracted and processed in accordance with the grouping of step 209 described above. The size information of the target image area can obtain the speed component information of whether the person object approaches to or departs from the image acquisition equipment in the time change process, can qualitatively obtain the conclusion of approaching or departing, and can also obtain the quantitative information of the approaching or departing speed according to the speed of the size change of the image area in the time change process. The position information can obtain the speed component information of the target image area parallel and the image acquisition equipment, and the speed component information are combined and analyzed to obtain specific character object speed direction information, namely the travelling direction information.

211. And calculating the change trend of the target image area by using the size, the position information and the sequence of acquisition of the plurality of target image areas.

And calculating the change trend of the target image area by using the size, the position information and the sequence of acquisition of the plurality of target image areas. The scene acquired by the image acquisition equipment generally remains unchanged, so that the change trend of the target image area can be obtained only by analyzing the change information of the size and the position information of the image area along the time sequence, and the change trend of the target image area corresponding to different person objects can be obtained by respectively analyzing different groups of target image areas, namely the target image areas to which different person objects belong.

212. And determining the advancing direction of the same person object according to the change trend.

And determining the advancing direction of the same person object according to the change trend. The actual travelling direction of the person object is analyzed by combining the scene contained in the video image acquired by the image acquisition device and the change trend of the target image area, for example, for the case that the setting place of the image acquisition device is parallel to the horizontal direction, the travelling direction component of part of the person object in the horizontal direction can be obtained according to the situation that the person object is close to or far from the image acquisition device, and if the image acquisition device is set to north, the person object is close to the image acquisition device (that is, the change trend of the size of the image area is larger and larger), and the person object has the north-facing speed component. And sequentially analyzing the change trend of the image area by combining the scene contained in the video image and the position information set by the image acquisition equipment to obtain the actual advancing direction of the person object.

213. The multi-frame video image is processed using a multi-frame voting method to verify whether the direction of travel of the same pedestrian is correct.

The multi-frame video image is processed using a multi-frame voting method to verify whether the direction of travel of the same pedestrian is correct. After the result of the advancing direction of the same pedestrian is obtained, the result can be judged by using the historical image of the pedestrian, the judging result can be used for making percentage, the credibility of the advancing direction is represented, the advancing direction of some pedestrians is not fixed, and the flexibility of the intelligent equipment in the executing process of the next strategy can be enhanced by using the credibility information.

It should be noted that this step may also be performed after the embodiment shown in fig. 1.

214. And determining the scene included in the multi-frame video image.

And determining the scene included in the multi-frame video image. The actual use scene corresponding to the image acquisition equipment is determined, and the traveling direction information can be more specific according to the scene information, for example, in the scene determination process, a conclusion that only one passable road exists in the scene is obtained, and if the traveling direction information of pedestrians is judged not to belong to the traveling direction or the opposite direction of the road, the reliability of the traveling direction information is not high, or the traveling direction information is judged again, so that the robustness and the implementation degree of the method are improved.

It should be noted that the scene determination process included in the multi-frame video image may be performed at any time point between the foregoing steps 201 to 213, and has no influence on other steps, which is not limited herein.

215. And obtaining a preset travelling direction range corresponding to the scene.

The method is mainly applied to intelligent equipment, the traveling directions are required to be classified, whether the pedestrians have the purpose of using equipment or not is judged according to the categories of the traveling directions, for example, for access control equipment, the directions can be classified into three categories of far away from access control, close to access control and parallel movement with the access control directions, wherein the pedestrians close to the access control are regarded as pedestrians with the purpose of use, and the access control equipment can directly perform access control opening operation on the pedestrians with the purpose of use or further judge the identities of the pedestrians.

It should be noted that, the obtaining of the preset travel direction range corresponding to the scene may be performed at any time point between the foregoing steps 201 to 213, and no influence is caused to other steps, which is not limited herein.

216. And judging whether the advancing direction of the same person object belongs to the preset advancing direction range or not to obtain a judging result.

After the advancing direction of the person object is obtained, judging whether the advancing direction of the same person object belongs to the preset advancing direction range or not, and obtaining a judging result. The preset travel direction range is generally defined for the purpose, for example, for a road scene, direction information of traveling along a certain road can be obtained. Summarizing various traveling directions into a judging result of a certain preset traveling direction range, and easily triggering the next operation by the intelligent equipment according to the judging result.

The order of execution of the steps is not limited to the illustration, but may be other. For example, step S215 may be performed in other sequences, as long as the present step is performed after step 212 and step 215. In addition, steps 214-216 may also be performed after the embodiment shown in FIG. 1.

Referring to fig. 3, one embodiment of the travel direction determining apparatus of the present application includes: a video image acquisition unit 301, a portrait area determination unit 302, the same person determination unit 303, a change trend determination unit 304, a travel direction determination unit 305, and the like. Wherein:

the video image acquisition unit 301 is configured to acquire a plurality of frames of video images including a person object, where the plurality of frames of video images have a sequential acquisition order.

And a portrait region determining unit 302, configured to determine image regions including a single portrait object in each frame of the video image respectively using a deep learning neural network model, so as to obtain a plurality of image regions.

The same person determining unit 303 is configured to determine a plurality of target image areas containing the same person object from the plurality of image areas using a tracking algorithm based on deep learning.

And the change trend determining unit 304 is configured to determine a change trend of the target image area according to a sequential acquisition sequence of the plurality of target image areas.

A traveling direction determining unit 305, configured to determine a traveling direction of the same person object according to the change trend.

The operation procedures executed by the video image acquisition unit 301, the portrait area determination unit 302, the same person determination unit 303, the change trend determination unit 304, the travel direction determination unit 305, and the like in this embodiment are similar to those in the corresponding embodiment of fig. 1, and are not repeated here.

Referring to fig. 4, one embodiment of the travel direction determining apparatus of the present application includes: a video image acquisition unit 401, a portrait area determination unit 402, an enlargement unit 403, the same person determination unit 404, a change trend determination unit 405, a travel direction determination unit 406, a verification unit 407, a scene determination unit 408, a direction range unit 409, a judgment direction unit 410, and the like.

The video image obtaining unit 401 and the travelling direction determining unit 406 have the same purpose as the same name units in the corresponding embodiment of fig. 3, and are not described herein.

And a portrait region determining unit 402, configured to determine image regions including a single portrait object in each frame of the video image using a single-step multi-frame detector, so as to obtain a plurality of image regions, where the single-step multi-frame detector neural network model is obtained by training an image data set including a head of a person.

An enlarging unit 403 for enlarging a plurality of the image areas by a preset ratio so that the image areas include an upper body image area of the person object.

The same person determining unit 404 is configured to perform feature comparison on the image areas by using a pedestrian re-recognition neural network model, so as to obtain a first probability that each image area and other image areas belong to the same person object; wherein the loss function of the pedestrian re-recognition neural network model is a boundary loss function.

And predicting the image areas by using a Kalman filtering algorithm to obtain a second probability that each image area and other image areas belong to the same person object.

Different weights are set for the first probability and the second probability.

And carrying out weighted operation on the first probability and the second probability to obtain a weighted result.

And comparing the weighted result with a preset threshold value, and if the weighted result is larger than the preset threshold value, determining that two image areas corresponding to the weighted result belong to the same person object.

A change trend determining unit 405, configured to extract size and position information of a plurality of the target image areas;

A verification unit 407, configured to process the multiple frames of video images using a multiple frame voting method, so as to verify whether the traveling direction of the same pedestrian is correct.

A scene determining unit 408, configured to determine a scene included in the multi-frame video image.

A direction range unit 409, configured to obtain a preset travel direction range corresponding to the scene.

The judging direction unit 410 is configured to judge whether the traveling direction of the same person object belongs to the preset traveling direction range, so as to obtain a judging result; the judging result is used for triggering and executing a preset processing action corresponding to the judging result.

In this embodiment, the operation flows executed by the video image acquisition unit 401, the portrait area determination unit 402, the amplifying unit 403, the same person determination unit 404, the change trend determination unit 405, the traveling direction determination unit 406, the verification unit 407, the scene determination unit 408, the direction range unit 409, the direction determination unit 410, and other units are similar to the step flows in the corresponding embodiment of fig. 2, and are not repeated here.

Fig. 5 is a schematic structural diagram of a travel direction determining apparatus according to an embodiment of the present application, where the server 500 may include one or more central processing units (central processing units, CPU) 501 and a memory 505, and one or more application programs or data are stored in the memory 505.

In this embodiment, the specific functional module division in the cpu 501 may be similar to the above-described dividing manner of each unit functional module described in fig. 3 or fig. 4, and will not be repeated here.

Wherein the memory 505 may be volatile storage or persistent storage. The program stored in the memory 505 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 501 may be configured to communicate with the memory 505 and execute a series of instruction operations in the memory 505 on the server 500.

The server 500 may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input/output interfaces 504, and/or one or more operating systems, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The cpu 501 may perform the operations performed by the travel direction determining device in the embodiments shown in fig. 3 or fig. 4, and will not be described in detail herein.

The present embodiment also provides a computer storage medium for storing computer software instructions for the above travel direction determination apparatus, which includes a program for executing the program designed for the geographic information system.

The travel direction determining device may be as described in the foregoing fig. 3 or fig. 4.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random access memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A travel direction determining method, characterized by comprising:

the determining a plurality of target image areas containing the same person object from the plurality of image areas using a deep learning based tracking algorithm includes:

setting different weights for the first probability and the second probability;

combining the image areas belonging to the same person object to obtain a plurality of target image areas of the same person object;

2. The travel direction determining method according to claim 1, wherein the determining image areas including a single person object in each frame of the video image using the deep learning neural network model, respectively, results in a plurality of image areas, comprises:

3. The travel direction determination method according to claim 1, wherein after determining image areas including a single person object in each frame of the video image respectively using the deep learning neural network model, resulting in a plurality of image areas, the determining, using a tracking algorithm based on deep learning, a plurality of target image areas including the same person object from the plurality of image areas, further comprises:

4. The travel direction determination method according to claim 1, wherein the deep learning-based tracking algorithm includes a pedestrian re-recognition neural network model whose loss function is a boundary loss function.

5. The travel direction determining method according to claim 1, wherein the determining the trend of the change in the target image area according to the sequential acquisition order of the plurality of target image areas includes:

6. The travel direction determining method according to claim 1, characterized by further comprising, after determining the travel direction of the same person:

the multi-frame video image is processed using a multi-frame voting method to verify that the direction of travel of the same person is correct.

7. The travel direction determination method according to claim 1, characterized in that the method further comprises:

Determining a scene included in the multi-frame video image;

obtaining a preset travelling direction range corresponding to the scene;

8. The travel direction determination method according to claim 1, wherein the multi-frame video image includes a multi-frame video image acquired by an access control system image acquirer.

9. A traveling direction determining apparatus characterized by comprising:

the same person determining unit is configured to, when determining a plurality of target image areas including the same person object from the plurality of image areas using a tracking algorithm based on deep learning, specifically:

setting different weights for the first probability and the second probability;

10. The traveling direction determining apparatus according to claim 9, wherein the portrait region determining unit is configured to determine image regions including a single person object in each frame of the video image using a deep learning neural network model, respectively, to obtain a plurality of image regions, in particular:

11. The travel direction determination apparatus according to claim 9, further comprising:

12. The travel direction determination apparatus according to claim 9, wherein the deep learning-based tracking algorithm includes a pedestrian re-recognition neural network model whose loss function is a boundary loss function.

13. The travel direction determination apparatus according to claim 9, wherein the change trend determination unit is configured to, when determining the change trend of the target image area according to a sequential acquisition order of the plurality of target image areas, specifically:

14. The travel direction determination apparatus according to claim 9, further comprising:

and the verification unit is used for processing the multi-frame video image by using a multi-frame voting method so as to verify whether the advancing direction of the same person is correct.

15. The travel direction determination apparatus according to claim 9, further comprising:

16. The travel direction determination apparatus according to claim 9, wherein the multi-frame video image includes a multi-frame video image acquired by an access control system image acquirer.

17. A traveling direction determining apparatus characterized by comprising:

the memory is a short-term memory or a persistent memory;

the central processor is configured to communicate with the memory, execute instruction operations in the memory to perform the method of any of claims 1 to 9.

18. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 9.