CN113840169B

CN113840169B - Video processing method, device, computing equipment and storage medium

Info

Publication number: CN113840169B
Application number: CN202010581473.0A
Authority: CN
Inventors: 权雪菲; 吴昊男; 栾媛媛; 郭庆
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Liaoning Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Liaoning Co Ltd
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2023-09-19
Anticipated expiration: 2040-06-23
Also published as: CN113840169A

Abstract

The invention discloses a video processing method, a device, a computing device and a storage medium, wherein the method comprises the following steps: acquiring each frame image in a video to be processed; extracting corresponding effective area images from each frame image; the effective area images corresponding to the frame images are arranged and combined to obtain a recombined video containing a plurality of combined images; analyzing and processing each combined image in the recombined video to obtain a recombined video analysis result; and carrying out reduction treatment on the recombined video analysis result to obtain a target video analysis result. On the premise of not adding hardware, the method reduces the calculation times in the video processing process, fully utilizes the existing hardware resources and improves the video processing efficiency; and meanwhile, the target video analysis result is obtained by carrying out reduction processing on the recombined video, so that the video processing precision is ensured.

Description

Video processing method, device, computing equipment and storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video processing method, apparatus, computing device, and storage medium.

Background

Current artificial intelligence (Artificial Intelligence, AI) technology is continually evolving, the accuracy and performance of machine learning is also continually improving, and machine learning guided technology is accelerating many industry innovations and creating many new business opportunities and values. The artificial intelligence is rapidly developed in the aspect of image recognition at present, and the processing speed and the processing precision can be greatly improved in the aspect of video analysis. A video processing service typically requires multiple algorithms and model combination calculations to complete the analysis of a video image.

There are four ways of video acceleration processing based on AI: (1) The cloud graphics processor (Graphics Processing Unit, GPU) accelerates processing, that is, the cloud generally directly uses a physical or virtualized GPU virtual machine provided by a cloud service provider to provide computing power of a unified computing device architecture platform (Compute Unified Device Architecture, cuda), so that computing resources need to be transmitted to a cloud host, and AI reasoning tasks are directly performed at the cloud. (2) By edge processing, that is, the edge considers the factors of convenience of deployment, energy consumption and the like, the common processing is mostly directly processed on a mobile terminal or edge equipment, the low-delay visual identification processing is generally put on the edge equipment based on the delay consideration, the edge equipment is generally low in power consumption, miniaturization is dominant, and limited AI capability is provided. (3) Through cloud-edge collaborative hybrid processing, namely aiming at the situation that the computing power of the AI of the edge equipment is insufficient, preprocessing such as feature data extraction can be performed at the edge, then other computationally intensive tasks are transmitted to a cloud server for processing, and therefore the cloud and the edge cooperate to accelerate the performance and efficiency of the overall AI operation computation. (4) By software acceleration, some software schemes accelerate the flow of neural network computation by optimizing the adaptation of AI algorithms on hardware, such as by software-accelerated convolution computing performance on a multi-core central processing unit (Central Processing Unit, CPU) platform.

However, at present, AI computation is mainly computation of a convolutional neural network (Convolutional Neural Networks, CNN), and the convolutional neural network computation consumes hardware resources, so AI computation often needs to be accelerated by means of hardware acceleration, while the accelerated computation of CNN mainly depends on GPU, so that hardware acceleration is mainly realized by increasing the number of Cuda cores in terms of hardware, optimization in terms of software is mainly reflected on a model, optimization is realized by pruning the model, pruning, and reducing the number of layers of CNN, but pruning usually brings about loss of precision, and inaccurate computation results are caused.

Disclosure of Invention

The present invention has been made in view of the above problems, and provides a video processing method, apparatus, computing device, and storage medium that overcome or at least partially solve the above problems.

According to an aspect of the present invention, there is provided a video processing method including the steps of:

acquiring each frame image in a video to be processed;

extracting corresponding effective area images from each frame image;

the method comprises the steps of arranging and combining effective area images corresponding to each frame image, and determining coordinate position information of each effective area image in a combined image to obtain a recombined video containing a plurality of combined images;

analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model to obtain a recombined video analysis result;

and carrying out reduction processing on the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.

According to another aspect of the present invention, there is provided a video processing apparatus including:

the frame image acquisition module is used for acquiring each frame image in the video to be processed;

the extraction module is used for extracting corresponding effective area images from each frame image;

the video reorganization module is used for carrying out permutation and combination on the effective area images corresponding to each frame image, determining the coordinate position information of each effective area image in the combined image and obtaining a reorganized video containing a plurality of combined images;

the analysis module is used for analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model to obtain a recombined video analysis result;

and the restoration module is used for restoring the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.

According to yet another aspect of the present invention, there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the video processing method.

According to still another aspect of the present invention, there is provided a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the video processing method described above.

According to the video processing method, the video processing device, the computing equipment and the storage medium, each frame image in the video to be processed is acquired; extracting corresponding effective area images from each frame image; the method comprises the steps of arranging and combining effective area images corresponding to each frame image, and determining coordinate position information of each effective area image in a combined image to obtain a recombined video containing a plurality of combined images; analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model to obtain a recombined video analysis result; and carrying out reduction processing on the reconstructed video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result. On the premise of not adding hardware, the method extracts the effective area of the video to be processed, and then arranges and combines the images of the effective area to obtain the recombined video, thereby reducing the calculation times in the video processing process, fully utilizing the existing hardware resources and improving the video processing efficiency; and meanwhile, the target video analysis result is obtained by carrying out reduction processing on the recombined video, so that the video processing precision is ensured.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 shows a flowchart of a video processing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of extracting a corresponding effective area image from a frame image according to an embodiment of the present invention;

FIG. 3 illustrates a canvas placement point schematic diagram provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;

FIG. 5 illustrates a schematic diagram of a computing device provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In the invention, the AI processing procedure of the video is as follows:

1. video decoding: the video stream is transmitted to a CPU or a GPU for decoding, and the image of each frame is extracted, so that the video frame rate required by real-time processing is generally 24 frames per second, which corresponds to 24 images per second.

2. Target Detection (Detection): the Detection is used for detecting the coordinate position of the appearance of the target object in one frame of image data, for example, the frame image is input into an AI model for Detection and identification, and in the present invention, the AI model may be specifically a neural network analysis model. The common target detection algorithm includes You Only Look Once (YOLO) algorithm, YOLO is a public calculation method, the YOLO algorithm can train out an object to be identified according to actual needs of a user, for example, if a person needs to be identified, a model for detecting a human body is trained by using YOLO, if a tree needs to be identified, a model for detecting a tree is trained by using YOLO, a bounding box (a minimum rectangular frame containing a target image) is output for a detected target object in a frame image, and an effective area image is extracted.

3. Feature extraction (Extract): that is, feature vector (feature) data of the effective area image is extracted, the feature vector is a mathematical representation of an image feature value, for example, feature vector values of a person wearing yellow clothes and a person wearing black clothes are different, after the effective area image is extracted (for example, a pedestrian or a tree), feature vector of the effective area image is extracted again through an AI model, and feature extraction is generally performed in the GPU to increase the processing speed.

4. Tracking (Tracking): the effective area images with the same characteristics detected on each frame of image are correlated, and the same bounding box extracted from the images of the frames before and after the video is subjected to coordinate correlation by comparing the characteristic vector data extracted from each frame of image, so that a continuous line with a time sequence is drawn, and the tracking process can be generally carried out directly in a CPU without GPU operation.

5. Acquiring required business data according to business requirements, for example, if pedestrian counting is required, counting the number of pedestrians passing through a specific area, drawing lines in the video in advance to mark the specific area, and taking tracking lines intersected with the drawn lines as a counting unit according to a track result of target tracking; in addition, vehicle detection, dangerous goods detection, etc. may be performed, and the obtained effective area images (vehicle or dangerous goods) may be correlated for each frame of image.

Fig. 1 shows a flowchart of an embodiment of a video processing method according to the present invention, as shown in fig. 1, the method comprising the steps of:

s101: and acquiring each frame image in the video to be processed.

In the step, the video to be processed is transmitted to a CPU and a GPU for decoding, and each frame image is extracted.

S102: and extracting corresponding effective area images from each frame image.

In an alternative manner, step S102 further includes: for each frame image, detecting a target object contained in the frame image, and determining effective area information of the frame image; and extracting an effective area image from the frame image according to the effective area information.

Specifically, for each frame image, performing identification processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image; and determining the minimum rectangular envelope area of the tracking frame corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.

And for the pedestrian identification, the effective area image is all areas where pedestrians appear, and the tracking frames corresponding to all target objects are obtained by continuously carrying out Detection algorithm on the video to be processed, and the minimum rectangular envelope area of the tracking frames corresponding to all target objects in the frame image is taken as the effective area information of the frame image. The longer the length of video to be processed, the higher the accuracy of the effective area image (the better the accuracy can be obtained by selecting 24 hours a day), this process can be performed for each frame image or at preset time intervals to obtain a plurality of effective area images.

Fig. 2 is a schematic diagram of extracting a corresponding effective area image from a certain frame image, and as shown in fig. 2, is an example of a pedestrian as a target object, detecting coordinate positions of occurrence of pedestrians in each frame image, outputting a tracking frame by the detected pedestrians, further determining a minimum rectangular envelope area (bounding box) for enveloping the tracking frames corresponding to all the target objects in the frame image, and taking area information of the bounding box as effective area information of the frame image. Further, the effective area information is cut out, an effective area image is cut out, and the coordinates of each frame image and the effective area image of the video to be processed are input into a neural network analysis model for operation. The coordinates of the effective area image may be input by means of manual marking, or may be dynamically calculated by means of artificial intelligence.

S103: and (3) arranging and combining the effective area images corresponding to each frame image, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images.

In an alternative manner, step S103 further includes: adding the effective area image corresponding to each frame image into an initial gallery; according to a preset combination rule, arranging and combining the effective area images in the initial gallery to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined images; and summarizing all the combined images to obtain the recombined video.

Further, since the effective area image is a rectangle, a plurality of effective area images can be combined to generate a combined image, and then all the combined images are summarized into one video to obtain a recombined video, specifically, the effective area images corresponding to each frame image are added into a gallery to be used as an initial gallery, all the effective area images in the initial gallery are arranged and combined in a canvas-full manner, the result of all the combinations cannot exceed the range of the canvas, and the effective area images can be integrated according to the following algorithm.

In an alternative manner, a combined image generation process includes steps 1-5:

step 1: a canvas is created.

In this step, the canvas may be polygonal, preferably rectangular.

Step 2: determining an effective canvas area in the canvas, and searching at least one placement point in the effective canvas area; the effective canvas area is a blank area in the canvas, and the placement point is an intersection point of a horizontal edge and a vertical edge in the effective canvas area.

FIG. 3 is a schematic diagram of placement points of a canvas, as shown in FIG. 3, taking a rectangular canvas as an example, in the drawing, a blank canvas, a canvas filled with an effective area image and a canvas filled with two effective area images are respectively shown from left to right, the positions marked by circles in the drawing are the placement points, the placement points of the blank canvas are the left vertexes of the blank area (i.e. the effective canvas area) of the canvas (i.e. the canvas is traversed from top to bottom and the intersection point of the leftmost vertical edge and the leftmost horizontal edge of the effective canvas area is selected); similarly, the selection of placement points of the canvas filled with one effective area image and the canvas filled with two effective area images is to traverse the effective canvas area from top to bottom and from left to right, and the intersection point of the leftmost vertical edge of the canvas effective canvas area and the lower lateral edge of the filled effective area image and the intersection point of the right vertical edge of the filled effective area image and the uppermost lateral edge of the canvas effective canvas area are respectively selected as placement points; similarly, for a canvas filled with two or more active area images, placement points in the active canvas area are found.

Step 3: judging whether an initial gallery contains an effective area image capable of being used for filling an effective canvas area or not; if yes, executing the step 4; if not, go to step 5.

Step 4: and selecting an effective area image which can be used for filling an effective canvas area from the initial gallery, selecting a placement point from at least one placement point, filling the effective area image at a position corresponding to the placement point, and then jumping to execute the step 2.

Specifically, whether the initial gallery contains an effective area image capable of being used for filling an effective canvas area is judged, if so, an effective area image is selected from the initial gallery, a first placement point is selected, filling of the effective canvas area is started, after the selected effective area image cannot exceed the range of the effective canvas area and a picture is filled, the step 2 is executed again, the effective canvas area and all placement points in the canvas are calculated continuously, for the canvas with a plurality of placement points, all placement points can be traversed from top to bottom and from left to right, and the uppermost (or leftmost) placement point of the canvas is selected to start filling of the effective canvas area.

Step 5: and generating a combined image according to the filled effective area image in the canvas, and determining the coordinate position information of the effective area image in the combined image.

Specifically, whether the initial gallery contains an effective area image capable of being used for filling an effective canvas area is judged, if not, namely, if the effective area image capable of being filled cannot be found in the effective canvas area, one canvas is considered to be completed, the remaining effective area image is filled in the effective canvas area according to the flow, and the arrangement and combination process is finished after all the effective area images are filled.

Further, the arrangement and combination result is many, one canvas is placed on one effective area image, and the optimal result is that all effective area images are placed on the same canvas. The smaller the number of combined images, the smaller the calculated amount calculated by using the neural network analysis model, and the number of final combined images is the result of the present permutation and combination, and the calculation formula of the number of combined images is as follows:

wherein, C (n, m) is the identification arrangement combination, namely the combination of n number selected from m number.

For a certain combined image, if the effective area images included in the combined image are identical and the arrangement order is different, the same image needs to be removed. And (3) taking out the result of the minimum number of the combined images in the permutation and combination according to the calculation formula of the number of the combined images, and obtaining the coordinate position of each effective area image in the combined images.

S104: and analyzing and processing each combined image in the recombined video by utilizing the neural network analysis model to obtain a recombined video analysis result.

Because training of the neural network analysis model is usually based on fixed video resolution, training is generally performed on videos with 720P (1280 x 720) resolution of a close-range camera, and videos with 1080P (1920 x 1080) resolution or 4K resolution can be used for a long-range video, a model trained through videos with a certain resolution also needs a video source with a consistent resolution to be input into the neural network analysis model for calculation during reasoning. Through the reduction of the video to be processed in the steps S101-S103, the obtained image resolution is smaller than the resolution of the original video to be processed, the obtained effective area images are collected together and recombined into an image with the video source resolution (i.e. a combined image), a plurality of combined images are integrated into a virtual camera video stream, i.e. a recombined video, and the recombined video is output for AI operation.

S105: and carrying out reduction processing on the reconstructed video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.

For the recombined video obtained in step S104, that is, the virtual camera video stream, the original position of the video to be processed needs to be corresponding to the coordinate position information of the effective area image in the combined image, for example, the position of the effective area image in the video to be processed is restored to the original shooting range, so as to obtain the target video analysis result. And carrying out subsequent business processing, such as line passing counting of pedestrians and vehicles, according to business requirements for the video to be processed.

By adopting the method provided by the embodiment, on the premise of not adding hardware, the effective area of the video to be processed is dynamically calculated by extracting the effective area of the video to be processed, and then the effective area images are arranged and combined to obtain the recombined video, so that the computer can automatically determine the effective area according to a detection algorithm, and the efficiency of AI operation can be improved by combining a plurality of combined images to the recombined video through the combination algorithm, thereby reducing the calculation times in the video processing process, fully utilizing the existing hardware resources and improving the efficiency of video processing; meanwhile, the target video analysis result is obtained by carrying out reduction processing on the recombined video, so that the video processing precision is ensured, the whole method flow has the self-adaptability of dynamic and complex execution environments, the precision of machine learning operation is not influenced, and the problems of hardware increase requirement and precision reduction caused by various existing methods are solved.

Fig. 4 shows a schematic structural diagram of an embodiment of a video processing apparatus according to the present invention. As shown in fig. 4, the apparatus includes: a frame image acquisition module 401, an extraction module 402, a video reorganization module 403, an analysis module 404, and a restoration module 405.

The frame image obtaining module 401 is configured to obtain each frame image in the video to be processed.

And an extraction module 402, configured to extract a corresponding effective area image from each frame image.

In an alternative way, the extraction module 402 is further configured to: for each frame image, detecting a target object contained in the frame image, and determining effective area information of the frame image; and extracting an effective area image from the frame image according to the effective area information.

In an alternative way, the extraction module 402 is further configured to: for each frame image, carrying out identification processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image; and determining the minimum rectangular envelope area of the tracking frame corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.

The video reorganizing module 403 is configured to rank and combine the effective area images corresponding to each frame image, determine coordinate position information of each effective area image in the combined image, and obtain a reorganized video including a plurality of combined images.

In an alternative manner, the video reorganization module 403 is further configured to: adding the effective area image corresponding to each frame image into an initial gallery; according to a preset combination rule, arranging and combining the effective area images in the initial gallery to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined images; and summarizing all the combined images to obtain the recombined video.

In an alternative manner, for a combined image generation process, the video reorganization module 403 is further configured to:

step 1: creating a canvas;

step 2: determining an effective canvas area in the canvas, and searching at least one placement point in the effective canvas area; the effective canvas area is a blank area in the canvas, and the placement point is an intersection point of a transverse edge and a vertical edge in the effective canvas area;

step 3: judging whether an initial gallery contains an effective area image capable of being used for filling an effective canvas area or not; if yes, executing the step 4; if not, executing the step 5;

step 4: selecting an effective area image capable of being used for filling an effective canvas area from an initial gallery, selecting a placement point from at least one placement point, filling the effective area image at a position corresponding to the placement point, and then jumping to execute the step 2;

And the analysis module 404 is used for analyzing and processing each combined image in the recombined video by utilizing the neural network analysis model to obtain a recombined video analysis result.

And the restoration module 405 is configured to perform restoration processing on the reconstructed video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image, so as to obtain a target video analysis result.

By adopting the device provided by the embodiment, each frame image in the video to be processed is acquired; extracting corresponding effective area images from each frame image; the method comprises the steps of arranging and combining effective area images corresponding to each frame image, and determining coordinate position information of each effective area image in a combined image to obtain a recombined video containing a plurality of combined images; analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model to obtain a recombined video analysis result; and carrying out reduction processing on the reconstructed video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result. According to the device provided by the embodiment, on the premise that hardware is not added, the effective area of the video to be processed is extracted, and then the images of the effective area are arranged and combined to obtain the recombined video, so that the calculation times in the video processing process are reduced, the existing hardware resources are fully utilized, and the video processing efficiency is improved; and meanwhile, the target video analysis result is obtained by carrying out reduction processing on the recombined video, so that the video processing precision is ensured.

The embodiment of the invention provides a non-volatile computer storage medium, which stores at least one executable instruction, and the computer executable instruction can execute the video processing method in any of the above method embodiments.

The executable instructions may be particularly useful for causing a processor to:

acquiring each frame image in a video to be processed;

extracting corresponding effective area images from each frame image;

and carrying out reduction processing on the reconstructed video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.

FIG. 5 illustrates a schematic diagram of an embodiment of a computing device of the present invention, and the embodiments of the present invention are not limited to a particular implementation of the computing device.

As shown in fig. 5, the computing device may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus.

Wherein: the processor, communication interface, and memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers, etc. And the processor is used for executing the program, and can specifically execute relevant steps in the video processing method embodiment.

In particular, the program may include program code including computer-operating instructions.

The processor may be a central processing unit, CPU, or specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included by the server may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

And the memory is used for storing programs. The memory may comprise high-speed RAM memory or may further comprise non-volatile memory, such as at least one disk memory.

The program may be specifically operative to cause the processor to:

acquiring each frame image in a video to be processed;

extracting corresponding effective area images from each frame image;

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A video processing method, comprising the steps of:

acquiring each frame image in a video to be processed;

extracting corresponding effective area images from each frame image; the effective area image is determined according to the minimum rectangular envelope area of the tracking frame corresponding to all target objects to be detected in each frame image;

analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model, detecting a target object of each combined image, and determining a recombined video analysis result according to the target object detection result;

according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image, carrying out reduction processing on the recombined video analysis result to obtain a target video analysis result;

the method for obtaining the recombined video containing a plurality of combined images further comprises the steps of: adding the effective area image corresponding to each frame image into an initial gallery;

according to a preset combination rule, arranging and combining the effective area images in the initial gallery to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined images; summarizing all the combined images to obtain a recombined video;

wherein, the preset combination rule is as follows: and arranging and combining all the effective area images in the initial gallery in a mode of being fully distributed with a pre-established canvas, wherein the arrangement and combination result cannot exceed the range of the canvas.

2. The method of claim 1, wherein extracting the corresponding active area image from each frame image further comprises:

for each frame image, detecting a target object contained in the frame image, and determining effective area information of the frame image;

and extracting an effective area image from the frame image according to the effective area information.

3. The method according to claim 2, wherein for each frame image, detecting a target object included in the frame image, and determining the effective area information of the frame image further comprises:

for each frame image, carrying out identification processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image;

and determining a minimum rectangular envelope area of the tracking frame corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.

4. The method of claim 1, wherein the generating of a combined image comprises:

step 1: creating a canvas;

step 2: determining an effective canvas area in the canvas, and searching at least one placement point in the effective canvas area; the effective canvas area is a blank area in the canvas, and the placement point is an intersection point of a horizontal edge and a vertical edge in the effective canvas area;

step 3: judging whether the initial gallery contains an effective area image which can be used for filling the effective canvas area or not; if yes, executing the step 4; if not, executing the step 5;

step 4: selecting an effective area image capable of being used for filling the effective canvas area from the initial gallery, selecting a placement point from at least one placement point, filling the effective area image at a position corresponding to the placement point, and then jumping to execute the step 2;

5. A video processing apparatus, comprising:

the extraction module is used for extracting corresponding effective area images from each frame image; the effective area image is determined according to the minimum rectangular envelope area of the tracking frame corresponding to all target objects to be detected in each frame image;

the analysis module is used for analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model, detecting a target object of each combined image and determining a recombined video analysis result according to the target object detection result;

the restoration module is used for restoring the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result;

wherein the video reorganization module is further configured to: adding the effective area image corresponding to each frame image into an initial gallery; according to a preset combination rule, arranging and combining the effective area images in the initial gallery to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined images; summarizing all the combined images to obtain a recombined video;

6. The apparatus of claim 5, wherein the extraction module is further to:

7. The apparatus of claim 6, wherein the extraction module is further to:

8. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the video processing method according to any one of claims 1 to 4.

9. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the video processing method of any one of claims 1-4.