WO2024045780A1

WO2024045780A1 - Video analysis method and apparatus, and storage medium

Info

Publication number: WO2024045780A1
Application number: PCT/CN2023/100986
Authority: WO
Inventors: 彭席汉
Original assignee: 华为技术有限公司
Priority date: 2022-08-29
Filing date: 2023-06-19
Publication date: 2024-03-07
Also published as: CN117667245A

Abstract

The present application relates to a video analysis method and apparatus, and a storage medium. The method comprises: a processor acquiring video data; acquiring execution engine state information of one or more video analysis cards; the processor generating data flow information according to the video data and the execution engine state information, wherein the data flow information comprises a calling sequence of plug-in instances and the video data; the processor sending the data flow information to the video analysis card according to the calling sequence of the plug-in instances; and according to the calling sequence of the plug-in instances, the video analysis card calling, by means of a plug-in instance, a corresponding execution engine to process the video data, so as to obtain a video analysis result. According to the embodiments of the present application, the utilization rate of hardware resources on each video analysis card can be improved, and the hardware resources on different video analysis cards are flexibly and fully utilized, thereby improving the efficiency of video analysis.

Description

Video analysis method, device and storage medium

This application claims priority to the Chinese patent application filed with the China Patent Office on August 29, 2022, with the application number 202211042853.2 and the invention name "Video Analysis Method, Device and Storage Medium", the entire content of which is incorporated into this application by reference. middle.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a video analysis method, device and storage medium.

Background technique

With the continuous development of artificial intelligence (AI) technology, it has also promoted the rapid development of video intelligent analysis. As modern smart cities have increasingly higher requirements for the content reflected by data, we are required to conduct more complete organization and analysis of these video data. Currently, video data is usually processed using video parsing cards.

However, large-scale video parsing may involve mixed service flow scenarios. Current technical methods usually virtualize the hardware resources of the video parsing card and divide the computing power to adapt to this scenario. This method cannot fully utilize the video. Parse the hardware resources on the card. Therefore, new video parsing methods are urgently needed to more efficiently utilize underlying hardware resources to parse video data and improve video parsing efficiency.

Contents of the invention

In view of this, a video analysis method, device and storage medium are proposed.

In the first aspect, embodiments of the present application provide a video analysis method. The method includes:

Get video data;

Obtain the execution engine status information of one or more video analysis cards;

Generate data flow information based on video data and execution engine status information. Data flow information includes the calling sequence of plug-in instances and video data;

According to the calling sequence of the plug-in instance, the data flow information is sent to the video analysis card. The data flow information is used by the video analysis card to call the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance to obtain the video analysis result. .

According to the embodiments of this application, the processor deploys a plug-in instance on the video parsing card, and calls the underlying execution engine through the plug-in instance. This can realize the separation of the plug-in instance and the underlying execution engine, and can flexibly meet the needs of different video data analysis. Call plug-in instances on different video parsing cards to parse the video and obtain the results of the video parsing. At the same time, by obtaining the execution engine status information of each video analysis card to generate corresponding data flow information and determine the calling order of plug-in instances when parsing videos, the utilization of hardware resources on each video analysis card can be improved, and the system can flexibly and fully Utilizing hardware resources on different video analysis cards improves the efficiency of video analysis.

According to the first aspect, in a first possible implementation of the video parsing method, the execution engine status information includes the resource idle status of one or more execution engines of one or more video parsing cards. According to the video data and the execution engine status Information, generate data flow information, including:

According to the business type information of the video data, one or more plug-ins corresponding to the business type are determined, and the plug-ins are instantiated on the video analysis card into the corresponding one or more plug-in instances;

The calling sequence of plug-in instances in the data flow information is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine.

According to the embodiment of the present application, by determining the calling sequence of plug-in instances corresponding to video data based on the resource consumption information of the execution engine corresponding to the plug-in that meets the business requirements and the resource idle state of the execution engine, the video analysis by the processor can be realized The process is dynamically scheduled to meet the needs of different business types, and the underlying hardware resources can be fully utilized during the video analysis process to maximize the utilization of hardware resources and improve the efficiency of video analysis.

According to a first possible implementation manner of the first aspect, in a second possible implementation manner of the video parsing method, the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle state of the execution engine are determined. The calling sequence of plug-in instances in the data flow information includes:

Determine the candidate calling sequence based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine;

When the number of candidate calling sequences is multiple, the calling sequence of the candidate with the smallest amount of data for the plug-in instance to interact across the video parsing card among the candidate calling sequences is determined to be the calling sequence of the plug-in instance.

According to the embodiment of the present application, the processor determines the candidate calling sequence according to the resource consumption information of the execution engine corresponding to the plug-in and the current resource idle state of the execution engine, and when there are multiple candidate calling sequences, performs cross-video analysis The calling order of the candidate with the smallest amount of card interaction data is used as the final calling order, which can improve the utilization of underlying hardware resources through dynamic scheduling, reduce the data transmission cost in the video parsing process, and overall improve the efficiency of data parsing.

According to the first or second possible implementation manner of the first aspect, in the third possible implementation manner of the video parsing method, according to the resource consumption information of the execution engine corresponding to one or more plug-ins and the resources of the execution engine The idle state determines the calling sequence of plug-in instances in the data flow information, including:

When the number of candidate calling sequences is zero, deploy a new plug-in instance on the video parsing card according to the resource idle status of the execution engine;

Determine the calling sequence of plug-in instances based on the resource idle status of the execution engine and the new plug-in instance.

According to the embodiment of the present application, the processor determines the candidate calling sequence according to the resource consumption information of the execution engine corresponding to the plug-in and the resource idle status of the current execution engine, and when there is no feasible calling sequence, by deploying a new plug-in The instance can dynamically expand the calling path to meet the needs of dynamic analysis and improve the real-time performance of video analysis.

According to the first aspect or the first or second or third possible implementation manner of the first aspect, in a fourth possible implementation manner of the video parsing method, the video parsing card passes The plug-in instance calls the corresponding execution engine to process the video data and obtains the video analysis results, including:

The video analysis card determines the data input queue corresponding to the first plug-in instance based on the data flow information, and the first plug-in instance is the first plug-in instance in the calling sequence of the plug-in instance;

The video analysis card calls the corresponding execution engine to process the data input queue through the first plug-in instance to obtain the data output queue;

The video parsing card uses the second plug-in instance after the first plug-in instance as the new first plug-in instance, determines the new data input queue according to the data output queue, and repeatedly executes the video parsing card to call the corresponding execution engine through the first plug-in instance. Process the data input queue to obtain the data output queue and subsequent steps until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance, and obtain the video analysis result.

According to the embodiment of the present application, the data input queue corresponding to the plug-in instance is determined, and the plug-in instance is sequentially called according to the calling order of the plug-in instance to process each data in the data input queue. The processing process calls the corresponding execution engine through the plug-in instance. Implementation can achieve unified management of each plug-in instance on each video analysis card to fully utilize the hardware resources on the video analysis card and achieve an efficient video analysis process.

According to the fourth possible implementation manner of the first aspect, in the fifth possible implementation manner of the video analysis method, the data input queue includes priority information corresponding to one or more pieces of data in the data input queue, and the video analysis card Through the first plug-in instance, the corresponding execution engine is called to process the data input queue to obtain the data output queue, including:

According to the priority information, the video analysis card calls the corresponding execution engine to process one or more data in the data input queue through the first plug-in instance to obtain the data output queue.

According to the embodiments of the present application, by introducing priority information into the data input queue corresponding to the plug-in instance, the need for video analysis in the presence of emergency services can be met, and the plug-in instance can be prioritized to process the corresponding video in the presence of emergency services. data to ensure real-time video analysis.

According to the fourth or fifth possible implementation manner of the first aspect, in the sixth possible implementation manner of the video analysis method, the video analysis card calls the corresponding execution engine to perform processing on the data input queue through the first plug-in instance. Process and obtain the data output queue, including:

The video analysis card calls the thread in the thread pool corresponding to the corresponding execution engine through the first plug-in instance to process one or more data in the data input queue to obtain the data output queue and the maximum number of concurrent threads in the thread pool. Depends on the size of the hardware resources.

According to the embodiment of the present application, the video analysis card can process one or more data items in the data input queue through threads in the thread pool in the execution engine when calling the execution engine, and can implement multiple items in the execution engine. Concurrent processing of data parsing tasks improves processing efficiency, and the underlying hardware resources can be fully utilized through the thread pool to improve the utilization of hardware resources.

In the second aspect, embodiments of the present application provide a video analysis method. The method includes:

Obtain the data flow information sent by the processor according to the calling sequence of the plug-in instance. The data flow information is generated by the processor according to the video data and execution engine status information. The data flow information includes the calling sequence and video data of the plug-in instance;

According to the calling sequence of the plug-in instance, the corresponding execution engine is called through the plug-in instance to process the video data and the video analysis result is obtained.

According to the embodiments of this application, the processor deploys a plug-in instance on the video parsing card, and calls the underlying execution engine through the plug-in instance. This can realize the separation of the plug-in instance and the underlying execution engine, and can flexibly meet the needs of different video data analysis. Call plug-in instances on different video parsing cards to parse the video and obtain the results of the video parsing. At the same time, by obtaining the execution engine status information of each video analysis card to generate corresponding data flow information and determine the calling sequence of plug-in instances when parsing the video, the utilization of the hardware resources on each video analysis card can be improved, and the system can flexibly and fully Utilizing hardware resources on different video analysis cards improves the efficiency of video analysis.

According to the second aspect, in the first possible implementation of the video parsing method, according to the calling sequence of the plug-in instance, the corresponding execution engine is called through the plug-in instance to process the video data, and the video parsing result is obtained, including:

According to the data flow information, determine the data input queue corresponding to the first plug-in instance, where the first plug-in instance is the first plug-in instance in the calling sequence of the plug-in instance;

Through the first plug-in instance, the corresponding execution engine is called to process the data input queue to obtain the data output queue;

Use the second plug-in instance after the first plug-in instance as the new first plug-in instance, determine the new data input queue according to the data output queue, and repeatedly call the corresponding execution engine through the first plug-in instance to process the data input queue. , obtain the data output queue and subsequent steps until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance, and obtain the video analysis result.

According to a first possible implementation manner of the second aspect, in a second possible implementation manner of the video parsing method, the data input queue includes priority information corresponding to one or more pieces of data in the data input queue, through the first The plug-in instance calls the corresponding execution engine to process the data input queue and obtains the data output queue, including:

According to the priority information, through the first plug-in instance, the corresponding execution engine is called to process one or more data items in the data input queue to obtain the data output queue.

According to the first or second possible implementation manner of the second aspect, in the third possible implementation manner of the video parsing method, the corresponding execution engine is called through the first plug-in instance to process the data input queue, and we obtain Data output queue, including:

Through the first plug-in instance, call the thread in the thread pool corresponding to the corresponding execution engine to process one or more data in the data input queue to obtain the data output queue. The maximum number of concurrent threads in the thread pool is based on the hardware resources. Determined by size.

According to the second aspect or the first or second or third possible implementation of the second aspect, in a fourth possible implementation of the video parsing method, the execution engine status information includes one or more video parsing The resource idle status of one or more execution engines of the card. The calling sequence of plug-in instances in the data flow information is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine. The plug-in is in the video parsing card. is instantiated into one or more corresponding plug-in instances, and one or more plug-ins are determined based on the business type information of the video data.

According to the fourth possible implementation manner of the second aspect, in the fifth possible implementation manner of the video parsing method, when the number of candidate calling sequences is multiple, it is determined that the calling sequence of the plug-in instance is the candidate In the calling sequence, the candidate calling sequence with the smallest amount of data for the plug-in instance to interact across the video parsing card is based on a The resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine are determined.

According to the embodiment of the present application, the processor determines the candidate calling sequence according to the resource consumption information of the execution engine corresponding to the plug-in and the current resource idle state of the execution engine, and when there are multiple candidate calling sequences, performs cross-video analysis The calling order of the candidate with the smallest amount of card interaction data is used as the final calling order, which can improve the utilization of underlying hardware resources through dynamic scheduling, and can reduce the data transmission cost in the video parsing process, and overall improve the efficiency of data parsing.

According to the fourth or fifth possible implementation manner of the second aspect, in the sixth possible implementation manner of the video parsing method, when the number of candidate calling sequences is zero, the calling sequence of the plug-in instance is based on The resource idle status of the execution engine is determined by the new plug-in instance. The new plug-in instance is deployed on the video analysis card based on the resource idle status of the execution engine. The candidate calling sequence is based on the resources of the execution engine corresponding to one or more plug-ins. Consumption information and resource idle status determination of the execution engine.

In a third aspect, embodiments of the present application provide a video analysis device. The device includes:

The first acquisition module is used to acquire video data;

The second acquisition module is used to acquire execution engine status information of one or more video analysis cards;

The generation module is used to generate data flow information based on the video data and execution engine status information. The data flow information includes the calling sequence of the plug-in instance and the video data;

The sending module is used to send the data flow information to the video analysis card according to the calling sequence of the plug-in instance. The data flow information is used by the video parsing card to call the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance. , get the video analysis results.

According to the third aspect, in a first possible implementation manner of the video analysis device, the execution engine status information includes the resource idle status of one or more execution engines of one or more video analysis cards, and the generation module is used to:

According to the first possible implementation manner of the third aspect, in the second possible implementation manner of the video analysis device, the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle state of the execution engine are determined. The calling sequence of plug-in instances in the data flow information includes:

According to the first or second possible implementation manner of the third aspect, in the third possible implementation manner of the video analysis device, according to the resource consumption information of the execution engine corresponding to one or more plug-ins and the resources of the execution engine The idle state determines the calling sequence of plug-in instances in the data flow information, including:

According to the third aspect or the first or second or third possible implementation manner of the third aspect, in a fourth possible implementation manner of the video analysis device, the video analysis card passes The plug-in instance calls the corresponding execution engine to process the video data and obtains the video analysis results, including:

The video parsing card uses the second plug-in instance after the first plug-in instance as the new first plug-in instance, determines the new data input queue according to the data output queue, and repeatedly executes the video parsing card to call the corresponding execution engine through the first plug-in instance. The data input queue is processed to obtain the data output queue and subsequent steps are performed until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance, and the video parsing result is obtained.

According to the fourth possible implementation manner of the third aspect, in the fifth possible implementation manner of the video analysis device, the data input queue includes priority information corresponding to one or more pieces of data in the data input queue, and the video analysis card Through the first plug-in instance, the corresponding execution engine is called to process the data input queue to obtain the data output queue, including:

According to the fourth or fifth possible implementation manner of the third aspect, in the sixth possible implementation manner of the video analysis device, the video analysis card calls the corresponding execution engine to perform processing on the data input queue through the first plug-in instance. Process and obtain the data output queue, including:

In a fourth aspect, embodiments of the present application provide a video analysis device. The device includes:

The third acquisition module is used to obtain the data flow information sent by the processor according to the calling sequence of the plug-in instance. The data flow information is generated by the processor according to the video data and execution engine status information. The data flow information includes the calling sequence and video data of the plug-in instance. ;

The determination module is used to call the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance, and obtain the video analysis result.

According to the fourth aspect, in the first possible implementation manner of the video analysis device, the determining module is used for:

Use the second plug-in instance after the first plug-in instance as the new first plug-in instance, determine the new data input queue according to the data output queue, and repeatedly call the corresponding execution engine through the first plug-in instance to perform the data input queue Process, obtain the data output queue and subsequent steps, until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance, and obtain the video analysis result.

According to a first possible implementation manner of the fourth aspect, in a second possible implementation manner of the video analysis device, the data input queue includes priority information corresponding to one or more pieces of data in the data input queue, through the first The plug-in instance calls the corresponding execution engine to process the data input queue and obtains the data output queue, including:

According to the first or second possible implementation manner of the fourth aspect, in the third possible implementation manner of the video analysis device, the corresponding execution engine is called through the first plug-in instance to process the data input queue, and we obtain Data output queue, including:

According to the fourth aspect or the first, second or third possible implementation of the fourth aspect, in a fourth possible implementation of the video analysis device, the execution engine status information includes one or more video analysis The resource idle status of one or more execution engines of the card. The calling sequence of plug-in instances in the data flow information is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine. The plug-in is in the video parsing card. is instantiated into one or more corresponding plug-in instances, and one or more plug-ins are determined based on the business type information of the video data.

According to the fourth possible implementation manner of the fourth aspect, in the fifth possible implementation manner of the video analysis device, when the number of candidate calling sequences is multiple, it is determined that the calling sequence of the plug-in instance is the candidate In the calling sequence, the candidate calling sequence with the smallest amount of data for the plug-in instance to interact across the video parsing card is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine.

According to the fourth or fifth possible implementation manner of the fourth aspect, in the sixth possible implementation manner of the video analysis device, when the number of candidate calling sequences is zero, the calling sequence of the plug-in instance is based on The resource idle status of the execution engine is determined by the new plug-in instance. The new plug-in instance is deployed on the video analysis card based on the resource idle status of the execution engine. The candidate calling sequence is based on the resources of the execution engine corresponding to one or more plug-ins. Consumption information and resource idle status determination of the execution engine.

In a fifth aspect, embodiments of the present application provide a video analysis device, which includes: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to implement the first aspect when executing the instructions. Or a video parsing method that implements one or more of the multiple possible implementations of the first aspect, or implements the above second aspect or one or more of the multiple possible implementations of the second aspect. parsing method.

In a sixth aspect, embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above-mentioned first aspect or multiple aspects of the first aspect are implemented. One or more video parsing methods among possible implementations, or one or more video parsing methods that implement the above second aspect or multiple possible implementations of the second aspect.

In a seventh aspect, embodiments of the present application provide a terminal device that can execute one or more of the video parsing methods of the first aspect or multiple possible implementations of the first aspect, or, Execute one or more of the video parsing methods of the second aspect or multiple possible implementations of the second aspect.

In an eighth aspect, embodiments of the present application provide a computer program product, including computer readable code, or A non-volatile computer-readable storage medium carrying computer-readable code. When the computer-readable code is run in an electronic device, a processor in the electronic device executes the first aspect or multiple aspects of the first aspect. one or more of the video parsing methods among the possible implementations; or, perform the above second aspect or one or more of the video parsing methods of the multiple possible implementations of the second aspect.

These and other aspects of the application will be better understood in the description of the embodiment(s) below.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the application and together with the description, serve to explain the principles of the application.

Figure 1 shows a schematic diagram of an application scenario according to an embodiment of the present application.

Figure 2 shows a flow chart of a video parsing method according to an embodiment of the present application.

Figure 3 shows a schematic diagram of video data processing by a video analysis card according to an embodiment of the present application.

Figure 4 shows a flow chart of a video parsing method according to an embodiment of the present application.

Figure 5 shows a schematic diagram of resource consumption information of an execution engine according to an embodiment of the present application.

FIG. 6 shows a schematic diagram of calculating the cost of data interaction across video analysis cards according to an embodiment of the present application.

Figure 7 shows a flow chart of a video parsing method according to an embodiment of the present application.

Figure 8 shows a schematic diagram of calling an execution engine to process video data according to an embodiment of the present application.

Figure 9 shows a schematic diagram of a plug-in instance implementation according to an embodiment of the present application.

Figure 10 shows a structural diagram of a video analysis device according to an embodiment of the present application.

Figure 11 shows a structural diagram of a video analysis device according to an embodiment of the present application.

Figure 12 shows a structural diagram of an electronic device 1200 according to an embodiment of the present application.

Detailed ways

Various exemplary embodiments, features, and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the drawings identify functionally identical or similar elements. Although various aspects of the embodiments are illustrated in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

The word "exemplary" as used herein means "serving as an example, example, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or superior to other embodiments.

In addition, in order to better explain the present application, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present application may be practiced without certain specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art are not described in detail in order to highlight the subject matter of the present application.

With the continuous development of AI technology, it has also promoted the rapid development of video intelligent analysis. As modern smart cities have increasingly higher requirements for the content reflected by data, we are required to conduct more complete organization and analysis of these video data. Currently, video data is usually processed using video parsing cards. In the current solution of using video analysis cards to parse video data, when large-scale video analysis and mixed service flow scenarios are involved, the hardware resources of the video analysis cards are usually divided into computing power and virtualization is used to combine the resources. This method makes the utilization of hardware resources depend on the granularity of the underlying computing power segmentation. The added virtualization layer will also bring additional performance consumption, and when the number of video streams is dynamically adjusted, virtual video parsing also needs to be reallocated. Card resources, these will cause the hardware resources on the video analysis card to not be fully utilized. Therefore, new video parsing methods are urgently needed to more efficiently utilize underlying hardware resources to parse video data.

In view of this, this application provides a video parsing method that can deploy plug-in instances on the video parsing card and call the underlying execution engine through the plug-in instance, thereby realizing the separation of the plug-in instance and the underlying execution engine, and targeting different video data For parsing needs, plug-in instances on different video parsing cards can be flexibly called to parse the video and obtain the results of the video parsing. At the same time, by obtaining the execution engine status information of each video analysis card to generate corresponding data flow information and determine the calling order of plug-in instances when parsing videos, the utilization of hardware resources on each video analysis card can be improved and different videos can be fully utilized. The hardware resources on the parsing card improve the efficiency of video parsing.

Figure 1 shows a schematic diagram of an application scenario according to an embodiment of the present application. As shown in Figure 1, the video analysis system 100 of the embodiment of the present application can be deployed on a terminal device or a server, and can include a processor 110 and video analysis cards 121-123. The video analysis system 100 can be connected to cameras 201-203. The camera can be a color camera, an infrared camera, etc. This application does not limit the type of camera.

For example, in multi-service scenarios (such as smart cities, etc.), the cameras 201-203 can be used for different services. For example, the cameras 201-202 can be used for services with structured human body information (such as face recognition, pedestrian re-identification, etc.) ), the camera 203 can be used for structured vehicle information services (such as vehicle target detection, etc.); the video analysis cards 121-123 can also be used for different services, for example, the video analysis cards 121-122 can be used for structured human body information services, The video analysis card 123 can be used for vehicle information structured services.

Among them, the processor 110 can be used to obtain the video data sent by the cameras 201-203 (ie, the video code stream in the figure), and the execution engine status information sent by the video analysis card 121-123. The processor 110 can obtain the video data and the execution engine status according to the video data and execution engine status. information, the corresponding data stream can be generated, and the calling sequence of the plug-in instance can be determined. The processor 110 can send the data stream to the corresponding video parsing card according to the calling sequence of the plug-in instance, for example, when the data stream corresponds to a structured business of human body information. , the data stream can be processed by the video parsing card 121-122. When the data stream corresponds to the vehicle information structured service, the data stream can be processed by the video parsing card 123; the video parsing card 121-123 can be processed according to the call of the plug-in instance. In sequence, the plug-in instance on the video parsing card calls the corresponding execution engine to process the video data and obtains the video parsing results.

It should be noted that the cameras 201-203 in the figure are only used as an example, and may also include a larger number of cameras, such as 100 cameras for the business of structuring human body information, and 50 cameras for the business of structuring vehicle information. etc.; the video analysis cards 121-123 in the figure are only used as an example, and a larger number of video analysis cards may also be included. The number of processors in the figure can also be multiple.

It should be noted that the embodiments of this application can also be used in other scenarios other than smart cities that require video analysis, such as autonomous driving, video content review and other scenarios, and this application does not limit this.

The terminal device involved in this application may refer to a device with a wireless connection function. The wireless connection function refers to the ability to connect to other terminal devices or servers through wireless connection methods such as Wi-Fi and Bluetooth. The terminal device in this application may also have The ability to communicate via a wired connection. The terminal device of this application can be a touch screen, a non-touch screen, or without a screen. If the touch screen is used, the terminal device can be controlled by clicking or sliding on the display screen with fingers, stylus, etc. , non-touch screen devices can be connected to input devices such as mice, keyboards, and touch panels, and the terminal devices can be controlled through the input devices. Devices without screens can be, for example, Bluetooth speakers without screens. For example, the terminal device of this application can be a smartphone, a netbook, a tablet, a laptop, a wearable electronic device (such as a smart bracelet, a smart watch, etc.), a TV, a virtual reality device, a stereo, an electronic ink, etc. .

The server involved in this application can be located in the cloud or locally. It can be a physical device or a virtual device, such as a virtual machine, a container, etc., and has a wireless communication function. The wireless communication function can be set on the chip (system) of the server or Other parts or components. It can refer to a device with a wireless connection function. The wireless connection function means that it can connect to other servers or terminal devices through wireless connection methods such as Wi-Fi and Bluetooth. The server of this application can also have the function of wired connection for communication. The processor and video analysis card in the embodiment of the present application can also be deployed on different devices, and can communicate through limited or wireless functions between devices.

In a possible implementation manner, the video analysis method in the embodiment of the present application can also be implemented through a software development kit (SDK).

The following is a detailed introduction to the video analysis method according to the embodiment of the present application through Figures 2 to 9.

Figure 2 shows a flow chart of a video parsing method according to an embodiment of the present application. This method can be used in the above-mentioned video analysis system 100. As shown in Figure 2, the method includes:

Step S201: The processor obtains video data.

The processor is, for example, the processor 110 in Figure 1. The video data (i.e., the video stream) can be captured by a camera (such as the cameras 201-202 in Figure 1), and the processor can obtain the video of the camera through a video transmission protocol. Data, the video transmission protocol can be a real-time streaming protocol (real time streaming protocol, RTSP), etc.

For a certain video analysis business, it can be decomposed into one or more parts. Each part can be encapsulated into a corresponding business plug-in. Each business plug-in can represent an abstraction at the business level. Each business plug-in can also Instantiated into one or more objects, deployed by the processor on different video parsing cards, the video parsing cards are, for example, the video parsing cards 121-123 in Figure 1. Taking the information structured business scenario as an example, the business process can include data decoding, data preprocessing, and model reasoning. The process of data decoding can be abstracted into a business plug-in, and the process of data preprocessing and model reasoning can be abstracted into Another business plugin.

Referring to Figure 3, a schematic diagram showing video data processing by a video analysis card according to an embodiment of the present application is shown. As shown in Figure 3, the video analysis card 121 and the video analysis card 123 can be deployed with several service plug-in instances, such as service plug-in instance 1, service plug-in instance 2 and service plug-in instance 3 in the figure. These service plug-in instances can be composed of different services. Plug-in instantiation is formed, corresponding to different business abstractions. The business plug-in instance 1 on the video analysis card 121 and the business plug-in instance 1 on the video analysis card 123 can, for example, represent the same business plug-in after instantiation on different video analysis cards. object.

Different hardware functional units on the video analysis card can be implemented as different execution engines. For example, these execution engines can include decoding engines, direct memory access (DMA) engines, scaling engines, and inference engines. The decoding engine can be used to decode video data, the DMA engine can be used to access storage devices, the scaling engine can be used to perform data preprocessing on video data, such as adjusting the size of the data, and the inference engine can be used to perform inference analysis on the data through models. , to obtain the video analysis results.

Through the business plug-in instance deployed on the video analysis card, the corresponding one or more execution engines can be called to process the video data according to the functional requirements. As shown in Figure 3, for the video analysis card 121 and the video analysis card 123, the business Plug-in instance 1 can be used to call the decoding engine, business plug-in instance 2 can be used to call the scaling engine and inference engine, and business plug-in instance 3 can be used to call the decoding engine. This enables the separation of business plug-in instances and underlying execution engines. During the video parsing process, the corresponding business plug-in instances are executed in sequence, and the business plug-in instances call the corresponding execution engines. The execution engines can use hardware resources to parse the video data.

In this process, in order to make full use of hardware resources and improve the processing efficiency of video data, when there is a need for video analysis, the processor can generate the optimal plug-in instance calling sequence for the corresponding video data. This process is described in detail below.

Step S202: The processor obtains execution engine status information of one or more video analysis cards.

Wherein, the processor can obtain the execution engine status information of each video analysis card every predetermined time (such as every 1 second) or when receiving video data.

Optionally, the execution engine status information may include resource idle status of one or more execution engines of one or more video analysis cards. The resource idle status of the execution engine may include, for example, the resource usage corresponding to each execution engine on the video analysis card. For example, 50% may indicate that half of the hardware resources of the execution engine are available for use.

Step S203: The processor generates data flow information based on the video data and execution engine status information.

The data flow information may include the calling sequence and video data of the plug-in instance.

The calling order of plug-in instances may refer to the order of business plug-in instances that the corresponding data flow passes through. As shown in Figure 3, data flow 1 and data flow 2 can respectively correspond to different video data, and data flow 1 and data flow 2 can flow through different business plug-in instances respectively to complete the analysis of the data.

As shown in the figure, in one example, the calling sequence of the plug-in instances corresponding to data stream 1 obtained after the processor searches for the optimal orchestration strategy is: business plug-in instance 1 in the video analysis card 121, business plug-in in the video analysis card 121 Example 2: The calling sequence of the plug-in instances corresponding to the obtained data stream 2 is: business plug-in instance 3 in the video analysis card 121, and business plug-in instance 2 in the video analysis card 123. Then for data flow 1, the business plug-in instance 1 in the video analysis card 121 can call the decoding engine to process it, and then the business plug-in instance 2 in the video analysis card 121 calls the scaling engine and the inference engine to process it respectively. ; For data flow 2, the business plug-in instance 3 in the video analysis card 121 can call the decoding engine to process it, and then the business plug-in instance 2 in the video analysis card 123 calls the scaling engine and the inference engine to process it respectively. . Finally, the video analysis results corresponding to data stream 1 and data stream 2 can be obtained respectively.

The following is a detailed introduction to the process of the processor searching for the optimal orchestration strategy to obtain the calling sequence of plug-in instances in the data flow information.

Figure 4 shows a flow chart of a video parsing method according to an embodiment of the present application. As shown in Figure 4, optionally, step S203 may include:

Step S401: The processor determines one or more plug-ins corresponding to the service type based on the service type information of the video data.

The plug-in can be instantiated on the video parsing card as one or more corresponding plug-in instances. The process of instantiating each plug-in (ie, the business plug-in in Figure 3) into a plug-in instance (the business plug-in instance in Figure 3) can be referred to the relevant description in step S201 above.

Among them, the plug-in to be used in the business process can be determined according to the business type of the video data. For example, for information-structured business scenarios, when the process of data decoding is abstracted into a business plug-in, and the process of data preprocessing and model reasoning is abstracted into another business plug-in, the plug-ins involved in the video data may include The process of data decoding is abstracted into a business plug-in (which can be called plug-in 1), and the process of data preprocessing and model reasoning is abstracted into a business plug-in (which can be called plug-in 2).

Step S402: The processor determines the calling sequence of plug-in instances in the data flow information based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine.

Among them, the plug-in instances that the corresponding plug-in is instantiated on different video parsing cards can be determined based on the plug-ins determined above. For example, for a business scenario with structured information, in Figure 3, the plug-in instance corresponding to plug-in 1 may include the business plug-in instance 1 on the video analysis card 121 and the business plug-in instance 1 on the video analysis card 123; the plug-in corresponding to plug-in 2 Instances may include the service plug-in instance 2 on the video analysis card 121 and the service plug-in instance 2 on the video analysis card 123 . Based on the resource consumption information of the execution engine corresponding to each plug-in and the resource idle status of the execution engine corresponding to each plug-in instance, the feasible plug-in instance calling sequence for the data flow can be determined, and the optimal sequence can be selected among the feasible plug-in instance calling sequences. As the final calling sequence of plug-in instances, the process can be seen as follows.

Optionally, this step S402 may include:

Step S4021: The processor determines the candidate calling sequence based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine.

Among them, the resource consumption information of the execution engine corresponding to the plug-in can indicate how much resources the execution engine needs to consume, which can be obtained through testing in advance.

Referring to FIG. 5 , a schematic diagram showing resource consumption information of an execution engine according to an embodiment of the present application is shown. Taking the face recognition business as an example, as shown in Figure 5, for video data with a video display format of 1080p (that is, a resolution of 1920*1080), for example, it needs to be decoded once by plug-in instance 1 (corresponding to plug-in 1). engine, once plug-in instance 2 (corresponding to plug-in 2) calls the scaling engine, and once plug-in instance 2 (corresponds to plug-in 2) calls the inference engine, the plug-in resource information obtained after pre-testing can be seen in the engine resources on the right side of Figure 5 Consumption table shown. In the business flow type, "face recognition@1080p" can indicate that the business type is face recognition of video data in 1080p format, and "face recognition@720p" can indicate that the business type is face recognition of video data in 720p format. The I/O (input/output) data amount is "x1", which means that the number of video data input/output times is 1, that is, the corresponding data amount is 1 times; the data amount is "x2", which means the number of video data input/output times. is 2, that is, the corresponding data amount is 2 times. An occupancy ratio of "x1%" can mean that the resources occupied by the plug-in calling engine for processing video data are 1% of the total hardware resources corresponding to the engine; an occupancy ratio of "x2%" can mean that the resources occupied by the plug-in calling engine for processing video data The resources are 2% of the total hardware resources corresponding to the engine; the occupation ratio is "x3%" which means that when the plug-in calls the engine to process video data, the resources occupied by the engine are 3% of the total hardware resources corresponding to the engine.

The candidate calling sequence can be determined based on the resource occupancy ratio of the engine corresponding to each plug-in instance in the table and the resource idle status of the execution engine corresponding to each plug-in instance. For example, corresponding to engine 1, if the processor obtains the current resource usage rate of engine 1 corresponding to plug-in instance 1 in the video analysis card 121, which is 95%, and determines according to the resource usage ratio of the engine in the table, if the plug-in instance is used to call the engine, It will occupy 1% of the resources. It can be seen that if the plug-in instance is used to call engine 1 to process the video data, the resource usage during the processing can become 96%. If the resource usage at this time does not exceed the predetermined maximum threshold (for example, 98 %), you can use the plug-in instance 1 on the video parsing card 121. Referring to the above method, all available plug-in instances can be obtained. By arranging and combining the plug-in instances available on each video analysis card, the candidate calling sequence of the plug-in instances can be obtained.

Step S4022: When the number of candidate calling sequences is multiple, the processor determines that the candidate calling sequence with the smallest amount of data for the plug-in instance to interact across the video analysis card among the candidate calling sequences is the calling sequence of the plug-in instance.

That is to say, in step S4021, if there are multiple candidate calling sequences determined by the processor, one of the optimal calling sequences may be selected as the final calling sequence among the multiple candidate calling sequences. The optimal calling order can be determined based on the amount of data interacted across video parsing cards between upstream and downstream plug-in instances. The optimal calling order can be the calling order corresponding to the smallest amount of data interacted across video parsing cards. The amount of data exchanged across video parsing cards between upstream and downstream plug-in instances can be determined by calculating the total cost of data interaction across video parsing cards.

FIG. 6 shows a schematic diagram of calculating the cost of data interaction across video analysis cards according to an embodiment of the present application. As shown in Figure 6, taking video analysis cards 121-123 as an example, the three video analysis cards include plug-in instance 1 and plug-in instance 2 respectively. and plug-in instance 3, where plug-in instance 1 can represent the object instantiated by plug-in 1, plug-in instance 2 can represent the object instantiated by plug-in 2, and plug-in instance 3 can represent the object instantiated by plug-in 3. The cost of one data exchange within the video analysis card can be recorded as 0, the cost of one data exchange between the video analysis card 121 and the video analysis card 122 is recorded as 15, and the cost of one data exchange between the video analysis card 121 and the video analysis card 123 It is recorded as 10, and the cost of one data interaction between the video analysis card 122 and the video analysis card 123 is recorded as 7.5. From this, the total cost in the process of analyzing the data stream can be calculated. For example, for the four candidate calling sequences in the figure, the total cost corresponding to the calling sequence of the first candidate (candidate 1 in the figure) is 0, and the calling sequence of the second candidate (candidate 2 in the figure) corresponds to The total cost is 10, the total cost corresponding to the third candidate's calling sequence (candidate 3 in the figure) is 15, and the total cost corresponding to the fourth candidate's calling sequence (candidate 4 in the figure) is 15. Therefore, the item with the smallest total cost, that is, the calling sequence corresponding to candidate 1, can be selected as the final optimal calling sequence.

Optionally, if there are multiple candidate calling sequences with the smallest total cost and consistent values, one of them can be randomly selected as the final calling sequence. The calculation shown in Figure 6 is only an example.

Optionally, if there is a sudden demand for video parsing, calling the execution engine according to the currently deployed plug-in instance may not be able to meet the demand for video parsing. Therefore, you can deploy a new plug-in instance to dynamically expand the parsing path to meet the dynamic needs. Analytical requirements. In this case, step S402 may also include:

Step S4023: When the number of candidate calling sequences is zero, the processor deploys a new plug-in instance on the video analysis card according to the resource idle state of the execution engine.

That is to say, in step S4021, if the candidate calling sequence determined by the processor is zero, it can mean that there is currently no feasible calling sequence, and the available calling sequence can be determined based on the resource idle status of the execution engine obtained by the processor. For example, you can find an execution engine whose current resource utilization meets the requirements (for example, the sum of resource utilization and the occupancy ratio of the corresponding plug-in instance is less than 100%), and deploy the corresponding execution engine in the video parsing card corresponding to the execution engine. new plugin instance. This enables the new plug-in instance to call the execution engine to implement corresponding functions.

Step S4024: The processor determines the calling sequence of plug-in instances based on the resource idle status of the execution engine and the new plug-in instance.

After a new plug-in instance is deployed, the newly deployed plug-in instance can be added to the calling sequence of the plug-in instance so that there is a feasible calling sequence, so that the video analysis card can perform video analysis accordingly.

Step S204: The processor sends the data flow information to the video analysis card according to the calling sequence of the plug-in instance.

Among them, the processor can send the data stream information to the video analysis card through a hardware bus. The hardware bus can be, for example, a peripheral component interconnect express (PCIe) or the like.

Taking Figure 3 as an example, for data flow 1, the calling sequence of the corresponding plug-in instance is: the business in the video analysis card 121 Service plug-in instance 1, service plug-in instance 2 in the video analysis card 121. Therefore, the data flow information can be first sent to the service plug-in instance 1 in the video analysis card 121 for processing.

Step S205: The video analysis card obtains the data flow information sent by the processor according to the calling sequence of the plug-in instance.

Among them, the video analysis card can obtain the data flow information sent by the processor through the hardware bus (such as PCIe bus).

Step S206: The video analysis card calls the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance, and obtains the video analysis result.

Taking Figure 3 as an example, for data flow 2, the calling sequence of the corresponding plug-in instances is: business plug-in instance 3 in the video analysis card 121, and service plug-in instance 2 in the video analysis card 123. According to the calling sequence, the decoding engine can be called by the business plug-in instance 3 in the video analysis card 121 to process the video data, and then the processed data is sent to the video analysis card 123. The business plug-in instance in the video analysis card 123 2 Call the scaling engine and inference engine respectively to process it. Finally, the video analysis result corresponding to data stream 2 is obtained. The detailed process can be found below.

Figure 7 shows a flow chart of a video parsing method according to an embodiment of the present application. As shown in Figure 7, this step S206 may include:

Step S2061: The video analysis card determines the data input queue corresponding to the first plug-in instance based on the data flow information.

Wherein, the first plug-in instance is the first plug-in instance in the calling sequence of the plug-in instances. Each item of data in the data input queue can correspond to a data stream.

Step S2062: The video analysis card calls the corresponding execution engine to process the data input queue through the first plug-in instance to obtain the data output queue.

The queue can work in a first-in-first-out (FIFO) manner, that is, the first plug-in instance can prioritize the data stream that enters the queue first.

Optionally, each item of data in the data input queue may include queue identification, timestamp, priority information, and actual valid data (ie, video data). This step S2062 may include:

The priority information may indicate the urgency of the corresponding data. For example, the larger the priority value in the priority information, the higher the urgency of the corresponding data. Therefore, when processing each piece of data, the first plug-in instance can prioritize the data stream corresponding to a higher priority on the basis of FIFO.

When the first plug-in instance processes the data input queue, the corresponding execution engine can be called to process the data. Optionally, step S2062 can include:

The video analysis card calls the thread in the thread pool corresponding to the corresponding execution engine through the first plug-in instance, logarithmically Process one or more data items in the input queue to obtain a data output queue.

Figure 8 shows a schematic diagram of calling an execution engine to process video data according to an embodiment of the present application. As shown in Figure 8, when calling the execution engine, a priority queue can be generated based on the data input queue. Each item of data in the priority queue can include the data to be processed (i.e., video data) and the method of processing the video data. (such as processing function), for example, for an inference engine, the way to process video data can be the inference model used.

As shown in Figure 8, when the execution engine processes each item of data in the priority queue, it can use a thread pool. Each thread in the thread pool can be obtained by dividing the underlying hardware resources. Among them, the maximum number of concurrent threads in the thread pool can be determined according to the size of the hardware resources. The larger the hardware resources, the larger the number of threads that can be executed concurrently in the thread pool. Threads in the thread pool can call hardware resources in the underlying hardware function modules for calculation through the underlying driver interface.

Step S2063: The video parsing card uses the second plug-in instance after the first plug-in instance as the new first plug-in instance, determines a new data input queue according to the data output queue, and repeatedly executes the video parsing card to call the corresponding execution through the first plug-in instance. The engine processes the data input queue to obtain the data output queue and subsequent steps until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance and obtains the video parsing result.

After calling the execution engine to process the data, the first plug-in instance can obtain the output data queue, and the output data queue can include each processed data stream corresponding to the input data queue. Determining the new data input queue according to the data output queue may be to send each processed data stream in the output data queue to the corresponding second plug-in instance (that is, the corresponding In the calling sequence of the plug-in instance, the next plug-in instance of the first plug-in instance) to form the data input queue of the second plug-in instance. After the second plug-in instance is used as the new first plug-in instance, the data of the second plug-in instance The input queue is also the data input queue of the new first plug-in instance.

Referring to Figure 9, a schematic diagram of a plug-in instance implementation according to an embodiment of the present application is shown. As shown in Figure 9, the upstream plug-in instance may represent the first plug-in instance, and the downstream plug-in instance may represent the second plug-in instance. If the calling sequence of the plug-in instances corresponding to the data flow includes the first plug-in instance and the second plug-in instance, the information according to the data flow may first be input into the data input queue of the upstream plug-in instance. After the upstream plug-in instance calls the execution engine to process it, it can be sent to the downstream plug-in instance, and it can be input into the data input queue of the corresponding downstream plug-in instance to obtain the data input queue of the downstream plug-in instance. After the downstream plug-in instance calls the execution engine to process it, since the downstream plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance corresponding to the data flow, the video parsing result corresponding to the data flow can be obtained.

Figure 10 shows a structural diagram of a video analysis device according to an embodiment of the present application. The device can be used for the above processor, as shown in Figure 10, the device includes:

The first acquisition module 1001 is used to acquire video data;

The second acquisition module 1002 is used to acquire execution engine status information of one or more video analysis cards;

The generation module 1003 is used to generate data flow information based on the video data and execution engine status information. The data flow information includes the calling sequence of the plug-in instance and the video data;

The sending module 1004 is used to send the data flow information to the video analysis card according to the calling sequence of the plug-in instance. The data flow information is used by the video parsing card to call the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance. Process and obtain the video analysis results.

Optionally, the execution engine status information includes the resource idle status of one or more execution engines of one or more video analysis cards, and the generation module 1003 is used to:

Optionally, determine the calling sequence of plug-in instances in the data flow information based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine, including:

When the number of candidate calling sequences is multiple, the calling sequence of the candidate with the smallest amount of data that the plug-in instance interacts with across the video parsing card among the candidate calling sequences is determined to be the calling sequence of the plug-in instance.

Optionally, the video analysis card calls the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance, and obtains the video analysis results, including:

Optionally, the data input queue includes priority information corresponding to one or more pieces of data in the data input queue. The video analysis card calls the corresponding execution engine to process the data input queue through the first plug-in instance to obtain the data output queue. include:

Optionally, the video analysis card calls the corresponding execution engine to process the data input queue through the first plug-in instance, and obtains the data output queue, including:

Figure 11 shows a structural diagram of a video analysis device according to an embodiment of the present application. This device can be used for the above video analysis card, as shown in Figure 11, the device includes:

The third acquisition module 1101 is used to acquire the data flow information sent by the processor according to the calling sequence of the plug-in instance. The stream information is generated by the processor based on the video data and execution engine status information. The data stream information includes the calling sequence of the plug-in instance and the video data;

The determination module 1102 is used to call the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance, and obtain the video analysis result.

Optionally, the determination module 1102 is used for:

Optionally, the data input queue includes priority information corresponding to one or more pieces of data in the data input queue. Through the first plug-in instance, the corresponding execution engine is called to process the data input queue to obtain the data output queue, including:

Optionally, through the first plug-in instance, call the corresponding execution engine to process the data input queue to obtain the data output queue, including:

Optionally, the execution engine status information includes resource availability of one or more execution engines of one or more video analysis cards. In the idle state, the calling sequence of plug-in instances in the data flow information is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine. The plug-in is instantiated on the video analysis card as the corresponding one or more Plug-in instance, one or more plug-ins are determined based on the business type information of the video data.

Optionally, when the number of candidate calling sequences is multiple, the calling sequence of the plug-in instance is determined to be a candidate calling sequence with the smallest amount of data for the plug-in instance to interact across the video parsing card among the candidate calling sequences. The candidate The calling sequence is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine.

Optionally, when the number of candidate calling sequences is zero, the calling sequence of the plug-in instance is determined based on the resource idle state of the execution engine and the new plug-in instance. The new plug-in instance is based on the resource idle status of the execution engine. Deployed on the video analysis card, the candidate calling sequence is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine.

Embodiments of the present application provide a video analysis device, including: a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above video analysis method when executing the instructions.

Embodiments of the present application provide a terminal device that can execute the above video parsing method.

Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above video analysis method is implemented.

Embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code, when the computer readable code is stored in a processor of an electronic device When running, the processor in the electronic device executes the above video parsing method.

Figure 12 shows a structural diagram of an electronic device 1200 according to an embodiment of the present application. As shown in Figure 12, the electronic device 1200 may be the above-mentioned video analysis system 100. The electronic device 1200 includes at least one processor 1801, at least one memory 1802, and at least one communication interface 1803. In addition, the electronic device may also include common components such as antennas, which will not be described in detail here.

Next, each component of the electronic device 1200 will be introduced in detail with reference to FIG. 12 .

The processor 1801 may be a general central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits used to control program execution of the above scheme. The processor 1801 may include one or more processing units. For example, the processor 110 may include Application processor (AP), modem processor, graphics processing unit (GPU), image signal processor (ISP), controller, video codec, digital signal processing (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors. The processor may also be the above-mentioned processor 110.

Communication interface 1803 is used to communicate with other electronic devices or communication networks, such as Ethernet, Radio Access Network (RAN), core network, Wireless Local Area Networks (Wireless Local Area Networks, WLAN), etc.

Memory 1802 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type that can store information and instructions. The dynamic storage device can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, optical disc storage ( Including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be stored by a computer. any other medium, but not limited to this. The memory can exist independently and be connected to the processor through a bus. Memory can also be integrated with the processor.

The memory 1802 is used to store the application code for executing the above solution, and the processor 1801 controls the execution. The processor 1801 is used to execute application codes stored in the memory 1802 .

Optionally, the electronic device 1200 may also include the above-mentioned video analysis cards 121-123. The video analysis cards may be connected to the processor through a hardware bus to implement the above-mentioned video analysis method.

In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), erasable memory Electrically Programmable Read-Only-Memory (EPROM or Flash Memory), Static Random-Access Memory (SRAM), Portable Compact Disc Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Versatile Disc (Digital Video Disc, DVD), memory stick, floppy disk, mechanical encoding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the above .

Computer-readable program instructions or code described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .

The computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or one or more Source code or object code written in any combination of programming languages, including object-oriented Object programming languages - such as Smalltalk, C++, etc., as well as conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer (e.g. Use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits are customized by utilizing state information of computer-readable program instructions, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Arrays, FPGAs), or programmable logic arrays (Programmable logic circuits). Logic Array (PLA), this electronic circuit can execute computer-readable program instructions to implement various aspects of the present application.

Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality and operations of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that embody one or more elements for implementing the specified logical function(s). Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.

It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by hardware (such as circuits or ASICs) that perform the corresponding function or action. Specific Integrated Circuit), or can be implemented with a combination of hardware and software, such as firmware.

Although the present invention has been described herein in conjunction with various embodiments, those skilled in the art, in practicing the claimed invention, will understand and understand by reviewing the drawings, the disclosure, and the appended claims. Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may perform several of the functions recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not mean that a combination of these measures cannot be combined to advantageous effects.

The embodiments of the present application have been described above. The above description is illustrative, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the illustrated embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to the technology in the market, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

A video analysis method, characterized in that the method includes:

Get video data;

Obtain the execution engine status information of one or more video analysis cards;

Generate data flow information according to the video data and the execution engine status information, where the data flow information includes the calling sequence of the plug-in instance and the video data;

According to the calling sequence of the plug-in instance, the data flow information is sent to the video analysis card. The data flow information is used by the video parsing card to call the corresponding execution engine through the plug-in instance according to the calling sequence of the plug-in instance. The video data is processed to obtain video analysis results.
The method according to claim 1, wherein the execution engine status information includes the resource idle status of one or more execution engines of one or more video analysis cards, and the execution engine status information is determined according to the video data and the execution engine status information. Engine status information generates data flow information, including:

According to the service type information of the video data, one or more plug-ins corresponding to the service type are determined, and the plug-ins are instantiated on the video parsing card as corresponding one or more plug-in instances;

The calling sequence of the plug-in instances in the data flow information is determined according to the resource consumption information of the execution engine corresponding to the one or more plug-ins and the resource idle status of the execution engine.
The method according to claim 2, characterized in that, based on the resource consumption information of the execution engine corresponding to the one or more plug-ins and the resource idle state of the execution engine, determining the said data flow information The calling sequence of plug-in instances, including:

Determine the candidate calling sequence according to the resource consumption information of the execution engine corresponding to the one or more plug-ins and the resource idle status of the execution engine;

When the number of candidate calling sequences is multiple, the calling sequence of the candidate with the smallest amount of data for the plug-in instance to interact across the video analysis card among the candidate calling sequences is determined to be the calling sequence of the plug-in instance.
The method according to claim 2 or 3, characterized in that, based on the resource consumption information of the execution engine corresponding to the one or more plug-ins and the resource idle status of the execution engine, it is determined that the data flow information The calling sequence of the plug-in instance includes:

Determine the candidate calling sequence according to the resource consumption information of the execution engine corresponding to the one or more plug-ins and the resource idle status of the execution engine;

When the number of candidate calling sequences is zero, deploy a new plug-in instance on the video parsing card according to the resource idle state of the execution engine;

The calling sequence of the plug-in instances is determined according to the resource idle state of the execution engine and the new plug-in instance.
The method according to any one of claims 1 to 4, characterized in that the video parsing card calls the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance to obtain the video Analysis results include:

The video analysis card determines the data input queue corresponding to the first plug-in instance according to the data flow information, and the first plug-in instance is the first plug-in instance in the calling sequence of the plug-in instance;

The video analysis card calls the corresponding execution engine to process the data input queue through the first plug-in instance to obtain the data output queue;

The video analysis card uses the second plug-in instance after the first plug-in instance as a new first plug-in instance, determines a new data input queue according to the data output queue, and repeatedly executes the video analysis card to pass the first plug-in instance. A plug-in instance calls the corresponding execution engine to process the data input queue to obtain the data output queue and subsequent steps until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance, and obtain the Video analysis results.
The method according to claim 5, characterized in that the data input queue includes priority information corresponding to one or more data in the data input queue, and the video analysis card passes the first plug-in instance, Call the corresponding execution engine to process the data input queue to obtain the data output queue, including:

According to the priority information, the video analysis card calls the corresponding execution engine to process one or more data in the data input queue through the first plug-in instance to obtain a data output queue.
According to the method of claim 5 or 6, the video analysis card calls the corresponding execution engine to process the data input queue through the first plug-in instance to obtain a data output queue, including:

The video analysis card calls the thread in the thread pool corresponding to the corresponding execution engine through the first plug-in instance to process one or more data in the data input queue to obtain a data output queue. The maximum number of concurrent threads in the thread pool is determined by the size of the hardware resources.
A video analysis method, characterized in that the method includes:

Obtain the data flow information sent by the processor according to the calling sequence of the plug-in instance. The data flow information is generated by the processor according to the video data and execution engine status information. The data flow information includes the calling sequence of the plug-in instance and all the data flow information. Describe video data;

According to the calling sequence of the plug-in instance, the corresponding execution engine is called through the plug-in instance to process the video data to obtain a video analysis result.
The method according to claim 8, characterized in that, according to the calling sequence of the plug-in instance, the corresponding execution engine is called through the plug-in instance to process the video data to obtain the video analysis result, including:

According to the data flow information, the data input queue corresponding to the first plug-in instance is determined, and the first plug-in instance is the The first plug-in instance in the calling sequence of the described plug-in instances;

Through the first plug-in instance, call the corresponding execution engine to process the data input queue to obtain the data output queue;

Using the second plug-in instance after the first plug-in instance as the new first plug-in instance, determine a new data input queue according to the data output queue, and repeatedly execute the corresponding execution called through the first plug-in instance The engine processes the data input queue to obtain the data output queue and subsequent steps until the second plug-in instance is the last plug-in instance in the calling sequence of the plug-in instance to obtain the video analysis result.
The method according to claim 9, characterized in that the data input queue includes priority information corresponding to one or more data in the data input queue, and the corresponding call is made through the first plug-in instance. The execution engine processes the data input queue to obtain a data output queue, including:

According to the priority information, through the first plug-in instance, the corresponding execution engine is called to process one or more data items in the data input queue to obtain a data output queue.
According to the method of claim 9 or 10, calling the corresponding execution engine to process the data input queue through the first plug-in instance to obtain the data output queue includes:

Through the first plug-in instance, the thread in the thread pool corresponding to the corresponding execution engine is called to process one or more data in the data input queue to obtain a data output queue. The concurrent data in the thread pool is The maximum number of threads is determined by the size of the hardware resources.
The method according to any one of claims 8-11, characterized in that the execution engine status information includes the resource idle status of one or more execution engines of one or more video analysis cards, and the data flow information The calling sequence of the plug-in instances is determined based on the resource consumption information of the execution engine corresponding to one or more plug-ins and the resource idle status of the execution engine. The plug-in is instantiated as the corresponding one or more plug-ins on the video analysis card. For example, the one or more plug-ins are determined based on the service type information of the video data.
The method according to claim 12, characterized in that, when the number of candidate calling sequences is multiple, determining the calling sequence of the plug-in instance is the plug-in instance interacting across the video parsing card in the candidate calling sequence. The calling order of a candidate with the smallest amount of data, which is determined based on the resource consumption information of the execution engine corresponding to the one or more plug-ins and the resource idle status of the execution engine.
The method according to claim 12 or 13, characterized in that, when the number of candidate calling sequences is zero, the calling sequence of the plug-in instance is determined according to the resource idle state of the execution engine and the new plug-in instance. , the new plug-in instance is deployed on the video analysis card according to the resource idle state of the execution engine, and the candidate calling sequence is based on the resource consumption information of the execution engine corresponding to the one or more plug-ins and the The resource idle status of the execution engine is determined.
A video analysis device, characterized in that the device includes:

The first acquisition module is used to acquire video data;

The second acquisition module is used to acquire execution engine status information of one or more video analysis cards;

A generation module configured to generate data flow information based on the video data and the execution engine status information, where the data flow information includes the calling sequence of the plug-in instance and the video data;

A sending module, configured to send the data flow information to the video analysis card according to the calling sequence of the plug-in instance. The data flow information is used by the video parsing card to call the corresponding plug-in instance according to the calling sequence of the plug-in instance. The execution engine processes the video data to obtain video analysis results.
A video analysis device, characterized in that the device includes:

The third acquisition module is used to acquire the data flow information sent by the processor according to the calling sequence of the plug-in instance. The data flow information is generated by the processor according to the video data and execution engine status information. The data flow information includes the The calling sequence of plug-in instances and the video data;

The determination module is configured to call the corresponding execution engine through the plug-in instance to process the video data according to the calling sequence of the plug-in instance, and obtain the video analysis result.
A video analysis device, characterized by including:

processor;

Memory used to store instructions executable by the processor;

Wherein, the processor is configured to implement the method described in any one of claims 1-7 or to implement the method described in any one of claims 8-14 when executing the instructions.
A non-volatile computer-readable storage medium with computer program instructions stored thereon, characterized in that when the computer program instructions are executed by a processor, the method described in any one of claims 1-7 is implemented, or , implement the method described in any one of claims 8-14.
A computer program product includes computer readable code, or a non-volatile computer readable storage medium carrying computer readable code. When the computer readable code is run in an electronic device, The processor performs the method described in any one of claims 1-7, or performs the method described in any one of claims 8-14.