CN112311734A

CN112311734A - Image feature extraction method of multi-channel video, electronic equipment and storage medium

Info

Publication number: CN112311734A
Application number: CN201910695757.XA
Authority: CN
Inventors: 周伟; 陈喆; 王鹏; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2021-02-02
Anticipated expiration: 2039-07-30
Also published as: CN112311734B

Abstract

The embodiment of the invention provides a multi-channel video image feature extraction method, electronic equipment and a storage medium, wherein the method is applied to the electronic equipment and comprises the following steps: aiming at each path of the multi-path real-time video stream, selecting a key video frame for feature extraction from the path of the real-time video stream; for each key video frame, extracting a target area comprising a detection object of a preset type from the key video frame, generating a forged frame corresponding to the key video frame by using the target area, and storing the forged frame into a preset data area; when the forged frames are stored in the preset data area, extracting the characteristics of the forged frames in the preset data area to obtain the image characteristics of the forged frames; and according to the video stream identification carried by the forged frame, taking the image characteristic of the forged frame as the image characteristic of the target real-time video stream. Compared with the prior art, the scheme provided by the embodiment of the invention can improve the intelligent analysis efficiency of the multi-channel video.

Description

Image feature extraction method of multi-channel video, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of intelligent analysis of video images, in particular to a multi-channel video image feature extraction method, electronic equipment and a storage medium.

Background

Currently, with the rapid development of artificial intelligence technology, intelligent analysis of videos plays an increasingly important role in people's daily life. The extraction of the image features of the video is the basis of the intelligent analysis of the video and is the part which consumes the longest time and the most resources in the intelligent analysis process of the video.

In the related art, when obtaining image features of a video stream, in order to ensure real-time performance of video intelligent analysis, a key video frame for feature extraction is generally determined first, and then image feature extraction is performed on the key frame, and the image feature extraction process is implemented in a single GPU (Graphics Processing Unit) in an electronic device.

However, in the related art, when there is a need for performing intelligent analysis on multiple paths of videos in parallel, since there is competition for GPU resources among the paths of video streams, it may not be possible to perform image feature extraction on multiple paths of videos at the same time, so that the task of extracting image features of a certain path or several paths of video streams is blocked. Therefore, the whole process of video intelligent analysis is inevitably delayed, and the efficiency of the video intelligent analysis is reduced.

Disclosure of Invention

The embodiment of the invention aims to provide a multi-channel video image feature extraction method, electronic equipment and a storage medium, so as to improve the efficiency of intelligent analysis of multi-channel videos. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides an image feature extraction method for a multi-channel video, which is applied to an electronic device, and the method includes:

selecting a video frame for feature extraction from each path of real-time video stream in the multi-path real-time video streams as a key video frame;

for each key video frame, extracting a target area comprising a detection object of a preset type from the key video frame, generating a forged frame corresponding to the key video frame by using the extracted target area, and storing the forged frame corresponding to the key video frame into a preset data area; the fake frame corresponding to each key video frame carries a video stream identifier for representing the real-time video stream to which the key video frame belongs;

when the preset data area stores the forged frame, performing feature extraction on the forged frame in the preset data area to obtain the image feature of the forged frame;

according to the video stream identification carried by the forged frame, taking the image characteristic of the forged frame as the image characteristic of the target real-time video stream; wherein the target real-time video stream is: and the real-time video stream to which the key video frame corresponding to the fake frame belongs.

Optionally, in a specific implementation manner, each forged frame in the predetermined data area is stored in a queue; the step of extracting the features of each forged frame in the predetermined data area includes:

and according to a first-in first-out processing rule in the queue, performing feature extraction on each forged frame in the preset data area.

Optionally, in a specific implementation manner, before the step of storing the forged frame corresponding to the key video frame in the predetermined data area, the method further includes:

judging whether the number of the forged frames in the forged frame processing queue reaches a preset value or not; wherein, the forged frame processing queue is a queue formed by each forged frame in the preset data area;

if not, the step of storing the forged frame corresponding to the key video frame into the preset data area is executed.

Optionally, in a specific implementation, the method is applied to a target image processor GPU in the electronic device, where the electronic device includes a plurality of image processor GPUs, and the target GPU is one of the plurality of GPUs; the method further comprises the following steps:

if so, sending the forged frame corresponding to the key video frame to an auxiliary GPU, enabling the auxiliary GPU to receive the forged frame corresponding to the key video frame, performing feature extraction on the forged frame corresponding to the key video frame to obtain the image feature of the forged frame corresponding to the key video frame, and feeding back the image feature of the forged frame corresponding to the key video frame to the target GPU;

wherein the auxiliary GPU is one of the GPUs different from the target GPU.

Optionally, in a specific implementation manner, the manner of selecting a video frame for feature extraction from each real-time video stream includes:

performing target detection on each video frame in the real-time video stream to obtain a detection result; when the obtained detection result represents that the target object exists in the video frame, judging whether the target object exists in a plurality of continuous detected video frames before the video frame; if so, scoring the video frame based on the acquisition information of the target object in the video frame to obtain a scoring value of the video frame; wherein the target object is: the detection object of the predetermined type; the collecting information includes: when the video frame is collected, the relative position relation between the target object and the image collecting equipment;

and determining the video frames for feature extraction in the real-time video stream based on the credit values of the video frames with the target objects.

Optionally, in a specific implementation manner, the step of determining, based on the score values of the video frames in which the target object exists, a video frame used for feature extraction in the real-time video stream includes:

judging whether the score value of each video frame with the target object is larger than a preset threshold value or not, if so, determining the video frame as a video frame for feature extraction in the real-time video stream;

or the like, or, alternatively,

obtaining the score value of each video frame which is continuous in the real-time video stream and has the target object; and determining the video frame corresponding to the obtained highest scoring value as the video frame for feature extraction in the real-time video stream.

In a second aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes at least one image processor GPU and a memory, and the at least one GPU includes a target GPU;

the memory is used for storing a computer program;

the target GPU is used for executing the program stored on the memory, and the program enables the target GPU to execute the following operations:

Optionally, in a specific implementation manner, before storing the fake frame corresponding to the key video frame in the predetermined data area, the target GPU is further configured to:

Optionally, in a specific implementation manner, the at least one GPU further includes an auxiliary GPU;

the target GPU is also used for sending the forged frames corresponding to the key video frames to the auxiliary GPU when the number of the forged frames in the forged frame processing queue reaches the preset threshold value; receiving image characteristics of the fake frame fed back by the auxiliary GPU;

and the auxiliary GPU is used for receiving the forged frames corresponding to the key video frames, extracting the characteristics of the forged frames corresponding to the key video frames to obtain the image characteristics of the forged frames corresponding to the key video frames, and feeding the image characteristics of the forged frames corresponding to the key video frames back to the target GPU.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the image feature extraction method for multiple videos provided in the first aspect are implemented.

As can be seen from the above, with the adoption of the scheme provided by the embodiment of the present invention, when there is a need for performing intelligent analysis on multiple paths of videos in parallel, a key video frame in each path of real-time video stream may be determined for each path of real-time video stream, and then, based on a target region extracted from each key video frame, a forged frame corresponding to the key video frame is generated and stored in a predetermined data region. Thus, when the forged frame exists in the preset data area, the electronic equipment can extract the characteristics of the forged frame in the preset data area to obtain the image characteristics of the forged frame. And then, according to the video stream identification carried by the forged frame, taking the image feature of the forged frame as the image feature of the real-time video stream to which the key video frame corresponding to the forged frame belongs. Thus, after the image characteristics of all the generated forged frames are obtained, the image characteristics of each real-time video stream can be obtained.

Obviously, in the solution provided in the embodiment of the present invention, after determining that each video frame in each real-time video stream is a key video frame, the feature extraction is not directly performed on the video frame in sequence, but a forged frame of the video frame is generated, and then the feature extraction is performed on the forged frame through a processing program asynchronous to the program for determining the key video frame. In the solution provided in the embodiment of the present invention, two programs exist in the target GPU, where the target GPU running program 1 determines each key video frame in each real-time video stream and generates a forged frame of the key video frame, and the target GPU running program 2 performs feature extraction on each generated forged frame, and the two programs are performed asynchronously and do not affect each other.

In this way, for multiple paths of real-time video streams needing image feature extraction, image detection, tracking, scoring and feature extraction are not performed for each path of real-time video stream independently, but forged frames for feature extraction corresponding to each path of real-time video stream are determined uniformly, and further feature extraction is performed on the generated forged frames uniformly. Therefore, competition of each path of video stream for GPU resources can be avoided, the intelligent analysis efficiency of multiple paths of videos is improved, and the utilization rate of the GPU can be improved to a certain extent.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image feature extraction method for a multi-channel video according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a connection relationship between a state manager and each GPU according to an embodiment of the present invention;

fig. 3 is a schematic diagram of processing logic of a method for extracting image features of multiple paths of videos according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of another image feature extraction method for a multi-channel video according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of another image feature extraction method for a multi-channel video according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating processing logic of another method for extracting image features of multiple videos according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the related art, when there is a need for performing intelligent analysis on multiple channels of videos in parallel, because there is competition for GPU resources among the channels of video streams, it may not be possible to perform image feature extraction on multiple channels of videos at the same time, thereby causing a blocking of an image feature extraction task of a certain channel or several channels of video streams. Therefore, the whole process of video intelligent analysis is inevitably delayed, and the efficiency of the video intelligent analysis is reduced.

In order to solve the above technical problem, an embodiment of the present invention provides an image feature extraction method for a multi-channel video.

First, a method for extracting image features of a multi-channel video according to an embodiment of the present invention is described below.

Fig. 1 is a schematic flowchart of an image feature extraction method for a multi-channel video according to an embodiment of the present invention. The method is applied to an electronic device, which may be any electronic device that needs to perform image feature extraction of multiple paths of real-time video streams and includes at least one GPU (Graphics Processing Unit), such as a notebook computer, a desktop computer, and a tablet computer. Therefore, the embodiments of the present invention are not limited to the specific examples, and will be referred to as electronic devices hereinafter.

Furthermore, the electronic device may be an independent electronic device that does not form a device cluster with other electronic devices, and acquires the multiple paths of real-time video streams through communication with other electronic devices such as a camera, and further extracts image features of the multiple paths of real-time video streams by applying the scheme provided by the embodiment of the present invention. For example, the electronic device may be a surveillance video management server in communication with a plurality of surveillance cameras.

The electronic device may also be any electronic device in a cluster formed by a plurality of electronic devices, which needs to extract image features of multiple paths of real-time video streams, and the electronic device may acquire the multiple paths of real-time video streams from other electronic devices in the cluster and/or other electronic devices outside the cluster. For example, the electronic device may be an electronic device for image processing in a distributed cluster.

In addition, the GPU is a hardware for deep learning training, and the GPU resources are in units of resources that can be provided by each GPU, the size of the resources that can be provided by one GPU may be recorded as 1 card, and at least one GPU may be installed in each electronic device. Based on this, in the solution provided in the embodiment of the present invention, the electronic device may include a state manager and at least one GPU.

The state manager is used for managing the state of each GPU installed in the electronic equipment and scheduling tasks of each GPU. Namely, the state manager can receive and record the self state information reported by each GPU, and distribute each task to be processed for each GPU according to the state information of each GPU. Accordingly, the status information may include: and various information related to the current running state of the GPU, such as GPU residual resources, resources which are used by the GPU, processing processes of tasks which are processed by the GPU and the like.

For example, as shown in fig. 2, each GPU may report its own state information to the state manager through an RPC (Remote Procedure Call Protocol) heartbeat.

Accordingly, in the embodiment of the present invention, the state manager of the electronic device may allocate the image feature extraction method for multiple videos provided in the embodiment of the present invention to a GPU installed in the electronic device for operation, and the GPU allocated to operate the image feature extraction method is the target GPU. Therefore, in other words, the method for extracting image features of multiple paths of videos provided by the embodiment of the present invention is specifically applied to the target GPU of the electronic device.

As shown in fig. 1, the method for extracting image features of a multi-channel video according to an embodiment of the present invention may include the following steps:

s101: selecting a video frame for feature extraction from each path of real-time video stream in the multi-path real-time video streams as a key video frame;

s102: for each key video frame, extracting a target area comprising a detection object of a preset type from the key video frame, generating a forged frame corresponding to the key video frame by using the extracted target area, and storing the forged frame corresponding to the key video frame into a preset data area;

the fake frame corresponding to each key video frame carries a video stream identifier for representing the real-time video stream to which the key video frame belongs;

s103: when the forged frames are stored in the preset data area, extracting the characteristics of the forged frames in the preset data area to obtain the image characteristics of the forged frames;

s104: according to the video stream identification carried by the forged frame, taking the image characteristic of the forged frame as the image characteristic of the target real-time video stream;

wherein, the target real-time video stream is: and forging the real-time video stream to which the key video frame corresponding to the frame belongs.

As can be seen from the above, in the solution provided in the embodiment of the present invention, for each video frame in each real-time video stream, after determining that the video frame is a key video frame, the feature extraction is not directly performed on the video frame in sequence, but a forged frame of the video frame is generated, and then, the feature extraction is performed on the forged frame through a processing program asynchronous to a program for determining the key video frame. In the solution provided in the embodiment of the present invention, two programs exist in the target GPU, where the target GPU running program 1 determines each key video frame in each real-time video stream and generates a forged frame of the key video frame, and the target GPU running program 2 performs feature extraction on each generated forged frame, and the two programs are performed asynchronously and do not affect each other.

In the step S101, the target GPU may obtain multiple paths of real-time video streams in multiple ways, which is not limited in the embodiment of the present invention.

For example, the electronic device may be directly in communication connection with the plurality of video capture devices, so that when each video capture device captures a real-time video stream, the captured real-time video stream may be directly sent to the electronic device, and then, the state manager of the electronic device may send the received multiple paths of real-time video streams to the target GPU.

For another example, the electronic device may be directly in communication connection with a plurality of video capture devices, and further, a Central Processing Unit (CPU) of the electronic device executes a stream fetching operation, obtains a real-time video stream captured by the video capture device from each video capture device, and sends the obtained multiple video streams to the target GPU.

In this way, after the multiple real-time video streams are obtained, the target GPU may execute the step S101, and select, for each of the multiple real-time video streams, a video frame for feature extraction from the multiple real-time video streams as a key video frame.

It is understood that in a video stream, not every video frame includes an object for feature extraction, and in some video frames, even if an object for feature extraction is included, the object may not be advantageous for image extraction due to the shooting angle.

For example, when a video stream to be subjected to feature extraction is a road surveillance video and feature extraction is performed on a face image in the road surveillance video, a situation that a face region does not exist in a video frame, a situation that the face region exists in the video frame but is too small to have a feature extraction condition, or a situation that the face region exists in the video frame but is not complete enough to have a feature extraction condition due to an angular relationship between a face and an image acquisition device may occur.

Therefore, when image feature extraction is performed on a video, the target GPU needs to select a video frame for feature extraction from each video stream, and use the selected video frame as a key video frame.

The target GPU may select a video frame for feature extraction from each path of video stream in various ways, which is not limited in this embodiment of the present invention.

Optionally, in a specific implementation manner, the manner in which the target GPU selects a video frame for feature extraction from each real-time video stream may include the following steps:

step 1: performing target detection on each video frame in the real-time video stream to obtain a detection result; when the obtained detection result represents that the target object exists in the video frame, judging whether the target object exists in a plurality of continuous detected video frames before the video frame; if so, scoring the video frame based on the acquisition information of the target object in the video frame to obtain a scoring value of the video frame;

wherein, the target object is: a detection object of a predetermined type; the information acquisition comprises the following steps: when the video frame is collected, the relative position relation between the target object and the image collecting equipment;

step 2: and determining the video frames for feature extraction in the real-time video stream based on the credit values of the video frames of the existing target objects.

In this specific implementation, the target GPU first determines a detection object of a predetermined type corresponding to the real-time video stream. Wherein the detection objects of the predetermined type are: for each key video frame in the video stream, the target GPU needs to perform feature extraction on an object to which image feature extraction is performed, that is, at least one object included in each key video frame, in the object to which the target GPU needs to perform feature extraction. The predetermined type of detection object corresponding to each real-time video stream may be set according to the requirements of practical applications, and thus, the embodiment of the present invention is not particularly limited.

For example, extracting the facial image features in the real-time video stream, and then the detection object of the predetermined type is the facial image; when the license plate image features in the video stream are extracted, the detection object of the preset type is the license plate image and the like.

The target GPU may determine the detection object of the predetermined type corresponding to the real-time video stream in multiple ways, which is not limited in this embodiment of the present invention.

For example, the target GPU may receive a detection object of a predetermined type corresponding to each real-time video stream input by a user; for another example, the target GPU may pre-store a corresponding relationship between a predetermined type of detection object and a video stream source, so that the target GPU may determine, according to the source of each obtained video stream, a predetermined type of detection object corresponding to each real-time video stream, and exemplarily, the predetermined type of detection object corresponding to the video stream from road traffic monitoring is a license plate image; the detection object of a predetermined type corresponding to the video stream from the entrance/exit of the cell is a face image or the like.

In this way, after the detection object of the predetermined type corresponding to the real-time video stream is determined, for each video frame in the real-time video stream, the target GPU may perform target detection on the video frame to detect whether a target object exists in the video frame, and obviously, the target object is the detection object of the predetermined type corresponding to the real-time video stream. Further, the target GPU may obtain the detection result.

The target GPU can detect the video frame in various modes to obtain a detection result. The embodiment of the present invention is not particularly limited.

As an example, the target GPU may perform target detection on the video frame by using various target detection algorithms, and obtain a detection result.

For example, the target detection algorithm may be: a machine learning method based on HOG (Histogram of Oriented Gradient) features and SVM (support vector machine) network. Of course, other target detection algorithms may also be adopted to perform target detection on the video frame to obtain a detection result, and thus, the embodiment of the present invention is not particularly limited.

As an example, the target detection algorithm may input the video frame into a preset target detection model, and obtain a detection result output by the target detection model.

For example, the preset target detection model may be: deep learning models based on CNN (Convolutional Neural Network). Of course, other preset target detection models may also be adopted to obtain the detection result, and thus, the embodiment of the present invention is not particularly limited.

The target detection model is obtained by training a preset initial model based on a plurality of sample images and the label of each sample image. Each sample image is used for marking an image area where the target object exists in the sample image. And then, inputting the plurality of sample images and the label of each sample image into a preset initial model for training until a convergence condition is met, and obtaining a trained target detection model. Obviously, when the detection result of the target detection model represents that the target object exists in the video frame, the detection result may also include an image area where the target object is located in the video frame

The target detection model may be obtained through local training of the electronic device and input to the target GPU for use, or may be obtained by the electronic device from another electronic device in communication connection and input to the target GPU for use.

Further, when the detection result represents that the target object exists in the video frame, it is indicated that the target object for feature extraction is collected in the video frame. Thus, the target GPU may determine whether the target object exists in the consecutive detected video frames by using the consecutive detected video frames before the video frame. That is, the target GPU tracks, in consecutive video frames, the target object present in the video frame. The tracking is a process of fusing the dynamic state of the image.

In this way, when it is determined that the target object exists in the consecutive detected video frames before the video frame, the target GPU may determine that the target object appears in the video frame and the consecutive detected video frames before the video frame, and therefore, image feature extraction may need to be performed on the target object. Based on this, the target GPU may score the video frame based on the acquisition information of the target object in the video frame to obtain a score value of the video frame.

Wherein, the collecting information may include: when the video frame is acquired, the relative position relationship between the target object and the image acquisition device, obviously, the representation of the relative position relationship in the image is: the angle and position of the target object in the video frame. For example, when the target object is a face image, whether the face image is a front face image, whether the face image is completely located in the video frame, and the like. Of course, the collected information may also include other information, and the embodiment of the present invention is not particularly limited thereto.

In this way, in the video stream, the same target object exists in a plurality of consecutive video frames, and the score values of the plurality of video frames can be obtained. Based on the above, the target GPU may determine the video frames for feature extraction in the real-time video stream based on the score values of the video frames in which the target object exists. Obviously, the determined key video frame is about the target object.

The target GPU may execute the step 2 in multiple ways, and determine the video frames for feature extraction in the real-time video stream based on the score values of the video frames with the target object, which is not specifically limited in the embodiment of the present invention.

Preferably, in an embodiment, the manner in which the target GPU performs step 2 may include the following steps:

judging whether the score value of each video frame with the target object is greater than a preset threshold value or not, if so, determining the video frame as a video frame for feature extraction in the real-time video stream;

in this embodiment, after the step 1 is executed to obtain the score value of the video frame for each video frame in the real-time video stream, the target GPU may directly determine whether the score value of the video frame is greater than a preset threshold. And when the judgment result is yes, the target GPU can determine the video frame as the video frame for feature extraction in the real-time video stream.

The preset threshold value may be defined according to requirements in practical application, and therefore, the specific value of the preset threshold value is not defined in the embodiment of the present invention.

Optionally, after the target GPU determines the video frame as a video frame for feature extraction in the real-time video stream, when the target object is still detected in consecutive video frames after the video frame, the target GPU does not determine the video frame for feature extraction in the consecutive video frames after the video frame.

That is, for a plurality of consecutive video frames in which the same target object exists, the target GPU determines, as a video frame for feature extraction, a video frame in the plurality of consecutive video frames in which a score value appearing for the first time is greater than a preset threshold, without considering whether other video frames in the plurality of consecutive video frames are key video frames.

Based on this, after the target GPU determines the video frame as the video frame for feature extraction in the real-time video stream, when no other target object other than the target object can be detected in consecutive video frames following the frame, the target GPU may directly skip the consecutive video frames without scoring.

Optionally, for a plurality of consecutive video frames with the same target object, the target GPU may also determine, as the video frame for feature extraction, all video frames of which score values are greater than the preset threshold value among the plurality of consecutive video frames.

Preferably, in another embodiment, the manner in which the target GPU performs step 2 may include the following steps:

obtaining the score value of each video frame which is continuous in the real-time video stream and has a target object; and determining the video frame corresponding to the obtained highest scoring value as the video frame for feature extraction in the real-time video stream.

In this embodiment, for a plurality of consecutive video frames with the same target object, the GPU may obtain a score value of each of the plurality of consecutive video frames, and further, the GPU may determine the video frame corresponding to the obtained highest score value as the video frame for feature extraction in the path of real-time video stream.

Specifically, after the step 1 is executed to obtain the score value of the video frame, the target GPU may correspondingly store the video frame and the score value of the video frame in the cache until obtaining the score value of the last video frame of the consecutive video frames of the target object. At this time, the cache may store the video frames with the highest score value among the multiple consecutive video frames with the target object and the score values of each of the video frames, so that the target GPU may determine the video frame with the highest score value among the multiple consecutive video frames with the target object stored in the cache as the video frame for feature extraction in the real-time video stream.

In step 1, when the obtained detection result indicates that the target object exists in the video frame, determining whether the target object exists in a plurality of consecutive detected video frames before the video frame may be understood as tracking the target object in a plurality of frame images of the real-time video stream.

It should be noted that, in the specific implementation manner of the above step 1 and step 2, the step of tracking the target object is executed by using the target GPU.

Alternatively, in other specific implementations, the step of tracking the target object may be performed by using a CPU of the electronic device.

After the step S101 is completed, and the key video frames are acquired from each real-time video stream, the target GPU may continue to perform the step S102, extract a target area including a detection object of a predetermined type from the key video frames for each key video frame, generate a fake frame corresponding to the key video frame by using the extracted target area, and store the fake frame corresponding to the key video frame in a predetermined data area.

Specifically, each key video frame is determined based on a target object existing in the video frame, the target object is an object used by the target GPU when extracting image features, and the target object is a detection object of a predetermined type corresponding to the real-time video stream to which the key video frame belongs, so that for each key video frame, the target GPU may extract a target area including the detection object of the predetermined type from the key video frame, and generate a fake frame corresponding to the key video frame by using the extracted target area.

After the forged frame corresponding to the key video frame is generated, the target GPU may then generate a video stream identifier of the forged frame, where the video stream identifier may be used to represent the real-time video stream to which the key video frame belongs.

Optionally, the target GPU may generate the video stream identifier of the forged frame of the video frame by using the frame number of the key video frame in the belonging real-time video stream and the identifier of the belonging real-time video stream.

Of course, the target GPU may also set a video stream identifier for the forged frame in other manners, as long as it is ensured that the video identifier can represent the real-time video stream to which the key video frame belongs.

Thus, the target GPU may store the generated fake frame carrying the video stream identifier in the predetermined data area.

Optionally, the predetermined data area may be designed with a second level cache, where the first level cache is used to store each forged frame and the feature image of the forged frame extracted in step S103; the second level cache is used for storing the target region extracted from each key frame.

It should be emphasized that, in the method for extracting image features of multiple paths of videos provided by the above embodiment of the present invention, when the target GPU executes the above steps S101 to S102, the target GPU simultaneously acquires multiple paths of real-time video streams, synchronously selects key video frames from the multiple paths of real-time video streams, and generates a forged frame corresponding to the selected key video frames.

That is, the target GPU obtains each video frame of each real-time video path frame by frame from the transmission channel of each real-time video stream, and when the target GPU selects a key video frame from a certain real-time video stream, a forged frame corresponding to the key video frame is generated accordingly, and the forged frame is stored in a predetermined data area, so as to continue to obtain a subsequent video frame of the key video frame in the real-time video path. At this time, the target GPU may select a key video frame from the other real-time video stream, and then generate a forged frame corresponding to the key video frame, and store the forged frame in the predetermined data area.

Based on this, the key video frames corresponding to each forged frame stored in the predetermined data area come from different real-time video streams. And the storage sequence of the stored forged frames in the predetermined data area is uncertain, and depends on the sequence of selecting each key video frame and generating each forged frame by the target GPU, and does not depend on the number sequence of the real-time video stream to which the key video frame corresponding to each forged frame belongs.

Furthermore, when the predetermined data area stores the fake frame, the target GPU may perform feature extraction on the fake frame in the predetermined data area by using a preset image feature extraction program asynchronous with the execution of steps S101 and S102, so as to obtain the image feature of the fake frame.

The target GPU can sequentially extract the features of each forged frame in the preset data area to obtain the image features of each forged frame, namely, only extracting the features of one forged frame every time to obtain the image features of the forged frame; the feature extraction can also be performed on a plurality of forged frames in a preset data area in batch to obtain respective image features of the plurality of forged frames, that is, the feature extraction is performed on the plurality of forged frames each time to obtain respective image features of the plurality of forged frames.

That is to say, in the method for extracting image features of multiple paths of videos according to the embodiment of the present invention, the target GPU performs the steps S101 and S102, and the process of performing the step S103 is asynchronous. That is, the target GPU is provided with two programs for executing the image feature extraction method for the multi-channel video according to the embodiment of the present invention, where the target GPU running program 1 executes the steps S101 and S102, the running program 2 executes the step S103, and the trigger condition of the target GPU running program 2 is that a forged frame is stored in the predetermined data area.

It is apparent that, in the image feature extraction method for a multi-channel video provided in the embodiment of the present invention, when the method starts to operate, the target GPU operating program 1 starts to perform the above steps S101 and S102, and when a fake frame is stored in the predetermined data area, the target GPU operating program 2 starts to perform the above step S103. Moreover, the program 1 and the running program 2 are two completely asynchronous programs, and running and stopping are not conflicted with each other and are not influenced with each other, that is, the target GPU may run the program 1 or the program 2 separately, or may run the program 1 and the program 2 asynchronously at the same time.

In this way, for multiple paths of real-time video streams needing image feature extraction, the forged frames for feature extraction corresponding to each path of real-time video stream are determined uniformly, and further, feature extraction is performed on the generated forged frames uniformly. Therefore, competition of each path of video stream for GPU resources can be avoided, the intelligent analysis efficiency of multiple paths of videos is improved, and the utilization rate of the GPU can be improved to a certain extent.

The target GPU can perform feature extraction on each forged frame in multiple modes to obtain the image features of each forged frame. The embodiment of the present invention is not particularly limited.

For example, the target GPU may perform face attribute feature extraction or body attribute feature extraction on each forged frame by using a deep learning method. For example, the target GPU may use ResNet (Residual Network) to perform face image feature extraction or body attribute feature extraction on each forged frame. The extracted face attribute features may include: whether eyes are worn, whether a mask is worn, the age of a person, and the like. The extracted human body attribute features may include: coat color, pants or skirt color, etc.

Of course, for each forged frame, the target GPU may also perform feature extraction on the vehicle image where the forged frame exists, for example, the extracted vehicle features may include whether the driver wears a seat belt, whether the driver uses a mobile phone, a vehicle brand, and the like.

In addition, each forged frame in the predetermined data area may be stored in multiple ways, and the target GPU may sequentially perform feature extraction on each forged frame in the predetermined data area according to a preset processing sequence corresponding to multiple processing rules, which is not specifically limited in the embodiment of the present invention.

Optionally, in a specific implementation manner, each forged frame in the predetermined data area may be stored in a queue;

in this specific implementation manner, in step S103, the step of extracting the features of each forged frame in the predetermined data area by the target GPU may include the following steps:

and according to the first-in first-out processing rule in the queue, extracting the characteristics of each forged frame in the preset data area.

Wherein, the first-in first-out is: the forged frames stored in the predetermined data area are arranged in a queue formed by the forged frames in the predetermined data area, and then feature extraction is performed, so that the image features of the forged frames are obtained.

As shown in fig. 3, it is a processing logic of the image feature extraction method for multi-channel video according to this embodiment.

The key frame is the selected key video frame, the target in the key frame is a target area in the key video frame, the target in the forged frame is a corresponding area in the generated forged frame based on the extracted target area, the forged frame processing queue is a queue formed by all forged frames in the preset data area, and the target processing queue is a plurality of forged frames determined by the target GPU in the preset data area according to the processing rule of first-in first-out in the queue.

Furthermore, the target GPU may perform feature extraction on the multiple forged frames in the target processing queue in batch, that is, simultaneously extract the image feature of each forged frame in the multiple forged frames in the target processing queue.

Of course, in this specific implementation, the target GPU may also perform feature extraction on each forged frame in the predetermined data area in sequence according to the processing rule of first-in first-out in the queue. That is, the target GPU may perform feature extraction on only one forged frame at a time according to the processing rule of first-in first-out in the queue, to obtain the image features of the forged frame. This is also reasonable.

Furthermore, after obtaining the image features of the forged frames, the target GPU may continue to execute step S104, and use the image features of the forged frames as the image features of the target real-time video stream according to the video stream identifiers carried by the forged frames.

Because each forged frame carries a video stream identifier, the obtained image features of the forged frames also have a corresponding relationship with the video stream identifier, and further, the video stream identifier represents the real-time video stream to which the key video frames corresponding to the forged frames belong, so that after the image features of the forged frames are obtained, the real-time video stream of the key video frames corresponding to the forged frames can be determined according to the video stream identifier carried by the forged frames, that is, the target real-time video stream corresponding to the forged frames is determined, and further, the image features of the forged frames can be used as the image features of the determined target real-time video stream.

Thus, after all the generated forged frames are determined as the image characteristics of the corresponding target real-time video stream, the image characteristics of each real-time video stream in the obtained multiple paths of real-time video streams can be obtained.

The target GPU can immediately determine the target real-time video stream corresponding to each forged frame after obtaining the image characteristics of the forged frame, so that the image characteristics of the forged frame are used as the image characteristics of the determined target real-time video stream; after the image features of all the generated forged frames are obtained, the target real-time video stream corresponding to the forged frame is determined according to the video stream identifier carried by each forged frame, so that the image features of the forged frames are used as the image features of the determined target real-time video stream. This is all reasonable.

It can be understood that, since the resources of the target GPU are limited, when each forged frame in the predetermined data area can be stored in the form of a queue, the number of forged frames that can be included in a forged frame processing queue formed by each forged frame in the predetermined data area is also limited.

Based on this, optionally, in a specific implementation manner, as shown in fig. 4, when the respective forged frames in the predetermined data area are stored in the form of a queue, the image feature extraction method may include the following steps:

s401: selecting a video frame for feature extraction from each real-time video stream in the multi-channel real-time video stream as a key video frame;

s402: extracting a target region including a predetermined type of detection object from each key video frame, generating a forged frame corresponding to the key video frame using the extracted target region,

s403: judging whether the number of the forged frames in the forged frame processing queue reaches a preset value or not; if not, executing step S404;

wherein, the forged frame processing queue is a queue formed by forged frames in a predetermined data area;

s404: storing the forged frame corresponding to the key video frame into a preset data area;

in this specific implementation manner, when a fake frame is generated and needs to be stored in the predetermined data area, the target GPU may first determine whether the number of fake frames in the fake frame processing queue reaches a preset value, and then, when the determination result is no, it indicates that the number of fake frames in the fake frame processing queue does not reach the preset value, and may further continue to store the fake frames. Thus, the target GPU may continuously store the forged frames corresponding to the key video frames into the predetermined data area, that is, continuously store the forged frames corresponding to the key video frames into the forged frame processing queue.

S405: when the forged frames are stored in the preset data area, extracting the characteristics of the forged frames in the preset data area to obtain the image characteristics of the forged frames;

s406: and according to the video stream identification carried by the forged frame, taking the image characteristic of the forged frame as the image characteristic of the target real-time video stream.

In this specific implementation manner, the steps S401 to 402 and 404 and 406 are the same as the steps S101 to S104 in the embodiment shown in fig. 1, and are not described herein again.

According to the above description, in the image feature extraction method shown in fig. 1 and fig. 4, the image feature extraction method is applied to a target GPU in an electronic device, and corresponding to the specific implementation shown in fig. 4, alternatively, in another specific implementation, a plurality of GPUs may be included in the electronic device, and the target GPU is one of the GPUs included in the electronic device, so that, as shown in fig. 5, the image feature extraction method may further include the following steps:

s407, when the determination result in the step S403 is yes, sending the forged frame corresponding to the key video frame to the auxiliary GPU, so that the auxiliary GPU receives the forged frame corresponding to the key video frame, performs feature extraction on the forged frame corresponding to the key video frame, obtains the image feature of the forged frame corresponding to the key video frame, and feeds back the image feature of the forged frame corresponding to the key video frame to the target GPU;

the auxiliary GPU is one GPU which is not used for the target GPU in the multiple GPUs.

In this specific implementation manner, the determination result in the step S403 is yes, which indicates that the number of the forged frames in the forged frame processing queue has reached the preset value, and the forged frames cannot be stored continuously. Therefore, the target GPU cannot continuously store the forged frames corresponding to the key video frames into the preset data area, namely, cannot continuously store the forged frames corresponding to the key video frames into the forged frame processing queue.

Based on this, in order to ensure that the image features of each forged frame can be extracted and the efficiency of intelligent analysis of multiple paths of videos can be ensured, one GPU which has the selection resource capable of meeting the requirement of image extraction on the forged frames is selected from GPUs except for the target GPU and is used as the auxiliary GPU, and thus, the target GPU can send the forged frames corresponding to the key video frames to the selected auxiliary GPU.

Furthermore, after receiving the forged frame corresponding to the key video frame, the auxiliary GPU may perform feature extraction on the forged frame corresponding to the key video frame to obtain the image feature of the forged frame corresponding to the key video frame, and feed back the obtained image feature of the forged frame corresponding to the key video frame to the target GPU.

Further, preferably, when the number of the fake frames to be subjected to feature extraction is large, in order to ensure the efficiency of intelligent analysis of the multi-channel video, a plurality of GPUs, which have resources capable of satisfying image extraction on the fake frames, may also be selected from GPUs other than the target GPU included in the electronic device as the auxiliary GPUs. Therefore, when the fake frame is sent to the auxiliary GPUs, the fake frame can be sent to the GPU with more available resources according to the current load condition of each selected auxiliary GPU.

Obviously, in this specific implementation manner, the target GPU performs S406: according to the video stream identification carried by the forged frame, when the image feature of the forged frame is taken as the image feature of the target real-time video stream, the image feature of the forged frame comprises the following steps: and the target GPU performs feature extraction on the forged frames in the forged frame processing queue to obtain image features, and the auxiliary GPU performs image extraction on the received forged frames to obtain the image features.

The mode of the auxiliary GPU for extracting the features of the forged frames is the same as the mode of the target GPU for extracting the features of the forged frames, and is not repeated again.

Based on this, in the specific implementation manner, the auxiliary GPU assists the target GPU to extract the image features of the forged frames, so that it can be ensured that the forged frames are not lost, the image features of each forged frame are extracted, and the efficiency of intelligent analysis of multiple channels of videos is ensured.

Preferably, in an embodiment, the method for the target GPU to execute the step S407 may include the following steps:

sending the forged frame corresponding to the key video frame to a state manager of the electronic device, so that the state manager receives the forged frame corresponding to the key video frame and sends the forged frame corresponding to the key video frame to the auxiliary GPU, so that the auxiliary GPU receives the forged frame corresponding to the key video frame, performs feature extraction on the forged frame corresponding to the key video frame to obtain the image feature of the forged frame corresponding to the key video frame, and feeds the image feature of the forged frame corresponding to the key video frame back to the state manager, so that the state manager receives the image feature of the forged frame corresponding to the key video frame and feeds the image feature of the forged frame corresponding to the key video frame back to the target GPU.

In this embodiment, the target GPU may send the fake frame corresponding to the key video frame to the state manager of the electronic device.

Furthermore, after receiving the fake frame corresponding to the key video frame, the state manager may select, as the auxiliary GPU, one GPU whose resource can satisfy image extraction of the fake frame from among the GPUs installed in the electronic device except for the target GPU. Further, the state manager may send the fake frame corresponding to the key video frame to the selected auxiliary GPU.

Further, after receiving the forged frame corresponding to the key video frame, the auxiliary GPU may perform feature extraction on the forged frame corresponding to the key video frame to obtain the image feature of the forged frame corresponding to the key video frame, and feed back the obtained image feature of the forged frame corresponding to the key video frame to the state manager.

Thus, after receiving the image features of the forged frames corresponding to the key video frame, the state manager may feed back the image features of the forged frames corresponding to the key video frame to the target GPU.

For example, as shown in fig. 6, it is a processing logic of a method for extracting image features of multiple paths of videos according to this embodiment.

The multi-GPU state manager is a state manager in the electronic equipment, the GPU1 is a target GPU, and the GPU2 is an auxiliary GPU.

Corresponding to the image feature extraction method of the multi-channel video provided by the embodiment of the invention, the embodiment of the invention also provides electronic equipment. As shown in fig. 7, includes at least one image processor GPU701 and a memory 702; one of the at least one image processor GPU701 may be a target GPU;

a memory 702 for storing a computer program;

a target GPU configured to execute a program stored in the memory 702, the program causing the target GPU to:

when the forged frames are stored in the preset data area, extracting the characteristics of the forged frames in the preset data area to obtain the image characteristics of the forged frames;

according to the video stream identification carried by the forged frame, taking the image characteristic of the forged frame as the image characteristic of the target real-time video stream; wherein, the target real-time video stream is: and forging the real-time video stream to which the key video frame corresponding to the frame belongs.

Optionally, in a specific implementation manner, each forged frame in the predetermined data area is stored in a queue; the target GPU performs feature extraction on each forged frame in the preset data area, and the feature extraction comprises the following steps:

judging whether the number of the forged frames in the forged frame processing queue reaches a preset value or not; wherein, the forged frame processing queue is a queue formed by forged frames in a predetermined data area;

if not, executing the step of storing the forged frame corresponding to the key video frame into the preset data area.

Optionally, in a specific implementation manner, the at least one GPU701 further includes an auxiliary GPU;

the target GPU is also used for sending the forged frames corresponding to the key video frames to the auxiliary GPU when the number of the forged frames in the forged frame processing queue reaches a preset threshold value; receiving image characteristics of fake frames fed back by the auxiliary GPU;

Optionally, in a specific implementation manner, the manner in which the target GPU selects a video frame for feature extraction from each real-time video stream includes:

performing target detection on each video frame in the real-time video stream to obtain a detection result; when the obtained detection result represents that the target object exists in the video frame, judging whether the target object exists in a plurality of continuous detected video frames before the video frame; if so, scoring the video frame based on the acquisition information of the target object in the video frame to obtain a scoring value of the video frame; wherein, the target object is: a detection object of a predetermined type; the information acquisition comprises the following steps: when the video frame is collected, the relative position relation between the target object and the image collecting equipment;

and determining the video frames for feature extraction in the real-time video stream based on the credit values of the video frames of the existing target objects.

Optionally, in a specific implementation manner, the determining, by the target GPU, the video frame for feature extraction in the real-time video stream based on the score value of each video frame with the target object includes:

or obtaining the scoring values of a plurality of video frames which are continuous in the real-time video stream and have the target object; and determining the video frame corresponding to the obtained highest scoring value as the video frame for feature extraction in the real-time video stream.

Optionally, in a specific implementation manner, the electronic device further includes a communication interface and a communication bus. The at least one GPU701 and the memory 702 complete communication with each other through a communication bus.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

Corresponding to the image feature extraction method for the multi-channel video provided by the embodiment of the present invention, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the image feature extraction method for the multi-channel video provided by the embodiment of the present invention.

Specifically, the method for extracting image features of the multi-channel video includes:

It should be noted that other implementation manners of the method for extracting image features of multiple paths of videos implemented when the computer program is executed by the processor are the same as the embodiment of the method for extracting image features of multiple paths of videos provided in the foregoing method embodiment section, and are not described herein again.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the electronic device and the embodiment of the computer-readable storage medium, since they are substantially similar to the embodiment of the method, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the embodiment of the method.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The method for extracting the image features of the multi-channel video is applied to electronic equipment, and comprises the following steps:

2. The method according to claim 1, wherein each forged frame in the predetermined data area is stored in a queue; the step of extracting the features of each forged frame in the predetermined data area includes:

3. The method of claim 2, wherein prior to the step of storing the forged frames corresponding to the key video frames in the predetermined data area, the method further comprises:

4. The method according to claim 3, applied to a target image processor GPU in the electronic device, the electronic device comprising a plurality of image processor GPUs, the target GPU being one of the plurality of GPUs; the method further comprises the following steps:

wherein the auxiliary GPU is one of the GPUs different from the target GPU.

5. The method of any one of claims 1-4, wherein selecting video frames for feature extraction from each live video stream comprises:

6. The method according to claim 5, wherein the step of determining the video frames for feature extraction in the real-time video stream based on the score values of the video frames in which the target object exists comprises:

7. An electronic device, characterized in that it comprises at least one image processor GPU and a memory; the at least one GPU comprises a target GPU;

the memory is used for storing a computer program;

8. The electronic device of claim 7, wherein before storing the fake frame corresponding to the key video frame in the predetermined data area, the target GPU is further configured to:

9. The electronic device of claim 8, wherein the at least one GPU further comprises: an auxiliary GPU;

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.