CN113840169A - Video processing method and device, computing equipment and storage medium - Google Patents

Video processing method and device, computing equipment and storage medium Download PDF

Info

Publication number
CN113840169A
CN113840169A CN202010581473.0A CN202010581473A CN113840169A CN 113840169 A CN113840169 A CN 113840169A CN 202010581473 A CN202010581473 A CN 202010581473A CN 113840169 A CN113840169 A CN 113840169A
Authority
CN
China
Prior art keywords
image
effective area
video
frame image
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010581473.0A
Other languages
Chinese (zh)
Other versions
CN113840169B (en
Inventor
权雪菲
吴昊男
栾媛媛
郭庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Liaoning Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Liaoning Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Liaoning Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010581473.0A priority Critical patent/CN113840169B/en
Publication of CN113840169A publication Critical patent/CN113840169A/en
Application granted granted Critical
Publication of CN113840169B publication Critical patent/CN113840169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video processing method, a video processing device, a computing device and a storage medium, wherein the method comprises the following steps: acquiring each frame image in a video to be processed; extracting corresponding effective area images from each frame image; arranging and combining the effective area images corresponding to the frame images to obtain a recombined video containing a plurality of combined images; analyzing each combined image in the recombined video to obtain an analysis result of the recombined video; and carrying out reduction processing on the analysis result of the recombined video to obtain the analysis result of the target video. On the premise of not adding hardware, the invention reduces the calculation times in the video processing process, fully utilizes the existing hardware resources and improves the video processing efficiency; meanwhile, the target video analysis result is obtained by restoring the recombined video, so that the video processing precision is ensured.

Description

Video processing method and device, computing equipment and storage medium
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a video processing method, an apparatus, a computing device, and a storage medium.
Background
The current Artificial Intelligence (AI) technology is continuously developing, the precision and performance of machine learning is continuously improving, and the technology guided by machine learning is accelerating the revolution of many industries and creating many new business opportunities and values. The artificial intelligence is developed rapidly in the aspect of image recognition at present, and the processing speed and the processing precision can be greatly improved in the aspect of video analysis. A video processing service usually needs a plurality of algorithms and model combination calculations to complete the analysis processing of a video image.
There are currently four ways for AI-based video acceleration processing: (1) the Processing is accelerated by a cloud Graphics Processing Unit (GPU), that is, the cloud generally directly uses a physical or virtualized GPU virtual machine provided by a cloud service provider to provide the computing capability of a Unified computing Device Architecture platform (Cuda), and the computing resource needs to be transmitted to a cloud host to directly perform an AI inference task at the cloud. (2) The edge end processing is performed, namely the edge end considers the factors such as the convenience of deployment, energy consumption and the like, the general processing is mostly directly performed on a mobile end or an edge device, based on the consideration of time delay, the visual identification processing with low time delay is generally placed on the edge device, the edge device generally mainly has low power consumption and is miniaturized, and the limited AI capability is provided. (3) Through cloud edge collaborative hybrid processing, namely aiming at the condition that the AI computing force of edge end equipment is insufficient, preprocessing can be carried out at the edge end, such as the extraction of characteristic data, and then other computing-intensive tasks are transmitted to a cloud server for processing, so that the cloud end and the edge end collaboratively accelerate the performance and efficiency of the overall AI operation computation. (4) Some software solutions accelerate the flow of neural network computation by optimizing the adaptation of the AI algorithm to hardware through software acceleration, such as accelerating the performance of convolution computation on a multi-Core Processing Unit (CPU) platform through software.
However, at present, AI calculation is mainly calculation of a Convolutional Neural Network (CNN), and the Convolutional Neural network calculation consumes hardware resources, so AI calculation often needs to be accelerated by means of hardware, and CNN acceleration calculation mainly depends on a GPU, so hardware acceleration is mainly realized by increasing the number of Cuda cores in the aspect of hardware, optimization in the aspect of software is mainly embodied on a model, and optimization is realized by reducing the number of layers of CNN through pruning and pruning of the model, but pruning usually brings loss of precision, and a calculation result is inaccurate.
Disclosure of Invention
In view of the above, the present invention has been made to provide a video processing method, apparatus, computing device and storage medium that overcome or at least partially address the above-mentioned problems.
According to an aspect of the present invention, there is provided a video processing method including the steps of:
acquiring each frame image in a video to be processed;
extracting corresponding effective area images from each frame image;
arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images;
analyzing each combined image in the recombined video by using a neural network analysis model to obtain an analysis result of the recombined video;
and restoring the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.
According to another aspect of the present invention, there is provided a video processing apparatus including:
the frame image acquisition module is used for acquiring each frame image in the video to be processed;
the extraction module is used for extracting corresponding effective area images from each frame image;
the video recombination module is used for arranging and combining the effective area images corresponding to the frame images, determining the coordinate position information of each effective area image in the combined image and obtaining a recombined video containing a plurality of combined images;
the analysis module is used for analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model to obtain an analysis result of the recombined video;
and the restoration module is used for restoring the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.
According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the video processing method.
According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the video processing method as described above.
According to the video processing method, the video processing device, the computing equipment and the storage medium, each frame image in a video to be processed is obtained; extracting corresponding effective area images from each frame image; arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images; analyzing each combined image in the recombined video by using a neural network analysis model to obtain an analysis result of the recombined video; and restoring the analysis result of the recombined video according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain the analysis result of the target video. According to the method, on the premise that no hardware is added, the effective area of the video to be processed is extracted, and the effective area images are arranged and combined to obtain the recombined video, so that the calculation times in the video processing process are reduced, the existing hardware resources are fully utilized, and the video processing efficiency is improved; meanwhile, the target video analysis result is obtained by restoring the recombined video, so that the video processing precision is ensured.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flow chart of a video processing method according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an image of a corresponding effective area extracted from a frame of image according to an embodiment of the present invention;
FIG. 3 illustrates a canvas placement point diagram provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In the present invention, the AI process of the video is as follows:
1. video decoding: the video stream is transmitted to a CPU or a GPU for decoding, images of each frame are extracted, the video frame rate which generally meets the real-time processing requirement is 24 frames per second, and the video frame rate corresponds to 24 images per second.
2. Target Detection (Detection): detection is used for detecting the coordinate position of the target object in the frame of image data, for example, inputting the frame image into an AI model for Detection and identification. The common target detection algorithm includes a Young Only Look Once (YOLO) algorithm, YOLO is a public calculation method, and the YOLO algorithm can train an object to be recognized according to actual needs of a user, for example, if a person needs to be recognized, a human body detection model is trained by using the YOLO algorithm, if a tree needs to be recognized, a tree detection model is trained by using the YOLO algorithm, a bounding box (a minimum rectangular frame containing a target image) is output for the target object detected in a frame image, and an effective area image is extracted, where the process is a process of a block diagram, and the block diagram is generally processed in a GPU so as to accelerate processing speed.
3. Feature extraction (Extract): that is, feature vector (feature) data of the effective area image is extracted, the feature vector is a mathematical representation of a feature value of the image, for example, feature vector values of a person wearing yellow clothes and a person wearing black clothes are different, after the effective area image is extracted (such as pedestrians or trees), the feature vector of the effective area image is extracted again through the AI model, and the feature extraction is generally processed in the GPU to accelerate the processing speed.
4. Tracking (Tracking): the method is characterized in that effective area images with the same characteristics detected on each frame of image are correlated, the extracted characteristic vector data of each frame of image are compared, and then the same bounding boxes extracted from the images of the frames before and after the video are subjected to coordinate correlation to be drawn into a continuous line of a time sequence, and the tracking process can be generally directly carried out in a CPU without GPU operation.
5. Acquiring required service data according to service requirements, for example, if pedestrian counting is required, counting the number of pedestrians passing through a specific area, drawing a line in a video in advance to mark the specific area, and taking a tracking line which has an intersection with the drawn line as a counting unit according to a track result of target tracking; in addition, car detection, dangerous goods detection, and the like can be performed, and the obtained effective area images (cars or dangerous goods) are associated with each other in each frame image.
Fig. 1 shows a flow chart of an embodiment of a video processing method of the present invention, as shown in fig. 1, the method comprising the steps of:
s101: and acquiring each frame image in the video to be processed.
In this step, the video to be processed is transmitted to the CPU and the GPU for decoding, and each frame image is extracted.
S102: and extracting corresponding effective area images from each frame image.
In an optional manner, step S102 further includes: detecting a target object contained in each frame image and determining effective area information of the frame image; and extracting the effective area image from the frame image according to the effective area information.
Specifically, for each frame image, performing recognition processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image; and determining a minimum rectangular envelope area for enveloping the tracking frames corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.
The effective area images in the video to be processed are all areas where target objects needing to be detected appear, for pedestrian identification, the effective area images are all areas where pedestrians appear, tracking frames corresponding to all the target objects are obtained by continuously performing a Detection algorithm on the video to be processed, and the minimum rectangular envelope area enveloping the tracking frames corresponding to all the target objects in the frame image is used as the effective area information of the frame image. The longer the length of the video to be processed, the higher the accuracy of the active area image (better accuracy is obtained by selecting 24 hours a day), and this process may be performed for each frame image or at preset time intervals to obtain a plurality of active area images.
Fig. 2 is a schematic diagram of extracting a corresponding effective region image from a frame image, and as shown in fig. 2, in an example where a target object is a pedestrian, a coordinate position where the pedestrian appears in each frame image is detected, the detected pedestrian outputs a tracking frame, a minimum rectangular envelope region (bounding box) for enveloping the tracking frames corresponding to all the target objects in the frame image is further determined, and region information of the bounding box is used as effective region information of the frame image. And further, cutting the effective area information to obtain effective area images, and inputting each frame image of the video to be processed and the coordinates of the effective area images as input into a neural network analysis model for operation. It should be noted that the coordinates of the effective area image may be input by a manual marking method, or may be dynamically calculated by a manual intelligent method.
S103: and arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images.
In an optional manner, step S103 further includes: adding the effective area image corresponding to each frame image into an initial gallery; arranging and combining the effective area images in the initial image library according to a preset combination rule to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined image; and summarizing all the combined images to obtain the recombined video.
Further, since the effective area image is a rectangle, a plurality of effective area images can be combined to generate a combined image, and all the combined images are summarized into one video to obtain a recombined video.
In an alternative form, the generation of a combined image comprises steps 1-5:
step 1: a canvas is created.
In this step, the canvas may be a polygon, preferably a rectangle.
Step 2: determining an effective canvas area in the canvas, and searching at least one placement point in the effective canvas area; wherein, effective canvas area is the blank area in the canvas, places the point and is the nodical of horizontal limit and perpendicular limit in the effective canvas area.
Fig. 3 is a schematic diagram of canvas placement points, as shown in fig. 3, a rectangular canvas is taken as an example, a blank canvas, a canvas filled with one effective area image, and a canvas filled with two effective area images are respectively shown in the diagram from left to right, the position marked by a circle in the diagram is the position of a placement point, and the placement point of the blank canvas is the left vertex of the blank area (i.e., the effective canvas area) of the canvas (i.e., the canvas is traversed from top to bottom from left to right, and the intersection point of the leftmost vertical edge and the uppermost horizontal edge of the effective canvas area is selected); similarly, the selection of the placement points of the canvas filled with one effective area image and the canvas filled with two effective area images is that the effective canvas areas are sequentially traversed from top to bottom and from left to right, and the intersection point of the leftmost vertical edge of the effective canvas area and the lower horizontal edge of the filled effective area image and the intersection point of the right vertical edge of the filled effective area image and the uppermost horizontal edge of the effective canvas area are respectively selected as the placement points; similarly, for a canvas filled with two or more effective area images, a placement point in the effective canvas area is found.
And step 3: judging whether the initial gallery contains an effective area image which can be used for filling an effective canvas area; if yes, executing step 4; if not, executing step 5.
And 4, step 4: selecting an effective area image which can be used for filling an effective canvas area from an initial gallery, selecting a placing point from at least one placing point, filling the effective area image at the position corresponding to the placing point, and then jumping to execute the step 2.
Specifically, whether an effective area image capable of being used for filling an effective canvas area is contained in an initial gallery is judged, if yes, an effective area image is selected from the initial gallery, a first placing point is selected, the effective canvas area starts to be filled, it is required to be explained that the selected effective area image cannot exceed the range of the effective canvas area, after a picture is filled, the step 2 is executed again, the effective canvas area and all the placing points in the canvas are continuously calculated, for the canvas with a plurality of placing points, all the placing points can be traversed from top to bottom and from left to right, and the uppermost edge (or the leftmost edge) of the canvas is selected as the first placing point to start to fill the effective canvas area.
And 5: and generating a combined image according to the filled effective area images in the canvas, and determining the coordinate position information of the effective area images in the combined image.
Specifically, whether the initial gallery contains an effective area image which can be used for filling an effective canvas area is judged, if not, the effective area image which can be filled cannot be found in the effective area of the canvas, one canvas is considered to be finished, the effective area of the canvas is filled with the rest effective area images according to the process, and the arranging and combining process is finished after all the effective area images are filled.
Further, the result of permutation and combination is many, and it is also the result of permutation and combination that one effective area image is placed on one canvas, and the optimal result is that all the effective area images are placed on the same canvas. The smaller the number of the combined images, the less the calculation amount calculated by using the neural network analysis model, and the final number of the combined images is the result of the permutation and combination at this time, wherein the calculation formula of the number of the combined images is as follows:
Figure BDA0002553346760000081
wherein, C (n, m) is the mark permutation combination, namely the combination of n numbers selected from m numbers.
In addition, for any one of the combined images, if the effective area images included in the combined image are the same and the arrangement order is different, the same image needs to be removed. And extracting the result of the minimum number of the combined images in the permutation and combination according to the calculation formula of the number of the combined images to obtain the coordinate position of each effective area image in the combined image.
S104: and analyzing each combined image in the recombined video by using the neural network analysis model to obtain an analysis result of the recombined video.
Because training of the neural network analysis model is usually based on fixed video resolution, videos with 720P (1280x720) resolution are generally adopted for training of close-range cameras, videos with 1080P (1920x1080) resolution or 4K resolution can be adopted for a far-range camera, and the model trained through videos with a certain resolution also needs video sources with consistent resolution to be input into the neural network analysis model for calculation during reasoning. Reducing the video to be processed through the steps S101 to S103, wherein the resolution of the obtained image is already smaller than that of the original video to be processed, collecting the obtained effective area images uniformly, recombining the effective area images into an image with a video source resolution (i.e. a combined image), integrating a plurality of combined images into a virtual camera video stream, i.e. a recombined video, and outputting the virtual camera video stream to perform AI operation.
S105: and restoring the analysis result of the recombined video according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain the analysis result of the target video.
For the recombined video obtained in step S104, that is, the virtual camera video stream, it is necessary to correspond to the original position of the video to be processed according to the coordinate position information of the effective area image in the combined image, for example, restore the position of the effective area image in the video to be processed to the original shooting range, so as to obtain the target video analysis result. And performing subsequent business processing according to the business requirements of the video to be processed, such as the crossing line counting of pedestrians and vehicles.
By adopting the method provided by the embodiment, on the premise of not adding hardware, the effective area of the video to be processed is extracted, the effective area of the video to be processed is dynamically calculated, and then the effective area images are arranged and combined to obtain the recombined video, so that the computer can automatically determine the effective area according to a detection algorithm, and the AI operation efficiency can be doubled by combining a plurality of combined images to the recombined video through the combination algorithm, thereby reducing the calculation times in the video processing process, fully utilizing the existing hardware resources and improving the video processing efficiency; meanwhile, the target video analysis result is obtained by restoring the recombined video, so that the video processing precision is ensured, the whole method flow has the self-adaptability of dynamic and complex execution environment, the precision of machine learning operation is not influenced, and the problems of hardware increase requirement and precision reduction risk brought by various existing methods are solved.
Fig. 4 is a schematic structural diagram of an embodiment of a video processing apparatus according to the present invention. As shown in fig. 4, the apparatus includes: a frame image acquisition module 401, an extraction module 402, a video recombination module 403, an analysis module 404, and a restoration module 405.
The frame image obtaining module 401 is configured to obtain each frame image in the video to be processed.
An extracting module 402, configured to extract corresponding effective region images from each frame image.
In an optional manner, the extraction module 402 is further configured to: detecting a target object contained in each frame image and determining effective area information of the frame image; and extracting the effective area image from the frame image according to the effective area information.
In an optional manner, the extraction module 402 is further configured to: aiming at each frame image, carrying out identification processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image; and determining a minimum rectangular envelope area for enveloping the tracking frames corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.
The video recombination module 403 is configured to perform permutation and combination on the effective region images corresponding to each frame image, determine coordinate position information of each effective region image in the combined image, and obtain a recombined video including multiple combined images.
In an alternative manner, the video recomposition module 403 is further configured to: adding the effective area image corresponding to each frame image into an initial gallery; arranging and combining the effective area images in the initial image library according to a preset combination rule to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined image; and summarizing all the combined images to obtain the recombined video.
In an alternative manner, for the generation process of a combined image, the video recomposition module 403 is further configured to:
step 1: creating a canvas;
step 2: determining an effective canvas area in the canvas, and searching at least one placement point in the effective canvas area; the effective canvas area is a blank area in the canvas, and the placement point is an intersection point of a horizontal edge and a vertical edge in the effective canvas area;
and step 3: judging whether the initial gallery contains an effective area image which can be used for filling an effective canvas area; if yes, executing step 4; if not, executing the step 5;
and 4, step 4: selecting an effective area image which can be used for filling an effective canvas area from an initial gallery, selecting a placing point from at least one placing point, filling the effective area image at a position corresponding to the placing point, and then jumping to execute the step 2;
and 5: and generating a combined image according to the filled effective area images in the canvas, and determining the coordinate position information of the effective area images in the combined image.
And the analysis module 404 is configured to analyze and process each combined image in the recombined video by using the neural network analysis model to obtain an analysis result of the recombined video.
And a restoring module 405, configured to restore the reconstructed video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image, to obtain a target video analysis result.
By adopting the device provided by the embodiment, each frame image in the video to be processed is obtained; extracting corresponding effective area images from each frame image; arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images; analyzing each combined image in the recombined video by using a neural network analysis model to obtain an analysis result of the recombined video; and restoring the analysis result of the recombined video according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain the analysis result of the target video. According to the device provided by the embodiment, on the premise of not adding hardware, the effective area of the video to be processed is extracted, and the images of the effective area are arranged and combined to obtain the recombined video, so that the calculation times in the video processing process are reduced, the existing hardware resources are fully utilized, and the video processing efficiency is improved; meanwhile, the target video analysis result is obtained by restoring the recombined video, so that the video processing precision is ensured.
An embodiment of the present invention provides a non-volatile computer storage medium, where at least one executable instruction is stored in the computer storage medium, and the computer executable instruction may execute the video processing method in any of the above method embodiments.
The executable instructions may be specifically configured to cause the processor to:
acquiring each frame image in a video to be processed;
extracting corresponding effective area images from each frame image;
arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images;
analyzing each combined image in the recombined video by using a neural network analysis model to obtain an analysis result of the recombined video;
and restoring the analysis result of the recombined video according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain the analysis result of the target video.
Fig. 5 is a schematic structural diagram of an embodiment of a computing device according to the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.
As shown in fig. 5, the computing device may include: a processor (processor), a Communications Interface (Communications Interface), a memory (memory), and a Communications bus.
Wherein: the processor, the communication interface, and the memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers. And the processor is used for executing the program, and specifically can execute the relevant steps in the video processing method embodiment.
In particular, the program may include program code comprising computer operating instructions.
The processor may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The server comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program may specifically be adapted to cause a processor to perform the following operations:
acquiring each frame image in a video to be processed;
extracting corresponding effective area images from each frame image;
arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images;
analyzing each combined image in the recombined video by using a neural network analysis model to obtain an analysis result of the recombined video;
and restoring the analysis result of the recombined video according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain the analysis result of the target video.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A video processing method, comprising the steps of:
acquiring each frame image in a video to be processed;
extracting corresponding effective area images from each frame image;
arranging and combining the effective area images corresponding to the frame images, and determining the coordinate position information of each effective area image in the combined image to obtain a recombined video containing a plurality of combined images;
analyzing each combined image in the recombined video by using a neural network analysis model to obtain an analysis result of the recombined video;
and restoring the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.
2. The method of claim 1, wherein extracting the corresponding active area image from each frame image further comprises:
detecting a target object contained in each frame image and determining effective area information of the frame image;
and extracting an effective area image from the frame image according to the effective area information.
3. The method according to claim 2, wherein the detecting a target object included in each frame image, and the determining the effective area information of the frame image further comprises:
aiming at each frame image, carrying out identification processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image;
and determining a minimum rectangular envelope area for enveloping the tracking frames corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.
4. The method according to any one of claims 1 to 3, wherein the arranging and combining the effective area images corresponding to the respective frame images, and determining coordinate position information of the respective effective area images in the combined image to obtain the recombined video including a plurality of combined images further comprises:
adding the effective area image corresponding to each frame image into an initial gallery;
arranging and combining the effective area images in the initial image library according to a preset combination rule to generate a plurality of combined images, and determining coordinate position information of the effective area images in each combined image in the combined image;
and summarizing all the combined images to obtain the recombined video.
5. The method of claim 4, wherein the generating of a combined image comprises:
step 1: creating a canvas;
step 2: determining an effective canvas area in the canvas, and searching at least one placement point in the effective canvas area; the effective canvas area is a blank area in the canvas, and the placement point is an intersection point of a horizontal edge and a vertical edge in the effective canvas area;
and step 3: judging whether the initial gallery contains an effective area image which can be used for filling the effective canvas area; if yes, executing step 4; if not, executing the step 5;
and 4, step 4: selecting an effective area image which can be used for filling the effective canvas area from the initial gallery, selecting a placing point from at least one placing point, filling the effective area image at the position corresponding to the placing point, and then jumping to execute the step 2;
and 5: and generating a combined image according to the filled effective area images in the canvas, and determining the coordinate position information of the effective area images in the combined image.
6. A video processing apparatus, comprising:
the frame image acquisition module is used for acquiring each frame image in the video to be processed;
the extraction module is used for extracting corresponding effective area images from each frame image;
the video recombination module is used for arranging and combining the effective area images corresponding to the frame images, determining the coordinate position information of each effective area image in the combined image and obtaining a recombined video containing a plurality of combined images;
the analysis module is used for analyzing and processing each combined image in the recombined video by utilizing a neural network analysis model to obtain an analysis result of the recombined video;
and the restoration module is used for restoring the recombined video analysis result according to the coordinate position information of each effective area image in the combined image and the effective area information of each effective area image in the frame image to obtain a target video analysis result.
7. The apparatus of claim 6, wherein the extraction module is further configured to:
detecting a target object contained in each frame image and determining effective area information of the frame image;
and extracting an effective area image from the frame image according to the effective area information.
8. The apparatus of claim 7, wherein the extraction module is further configured to:
aiming at each frame image, carrying out identification processing on the frame image, and determining a tracking frame corresponding to each target object in the frame image; the tracking frame corresponding to each target object can completely select a foreground image frame aiming at the target object in the frame image;
and determining a minimum rectangular envelope area for enveloping the tracking frames corresponding to all target objects in the frame image, and taking the area information of the minimum rectangular envelope area as the effective area information of the frame image.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the video processing method according to any one of claims 1-5.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the video processing method of any one of claims 1-5.
CN202010581473.0A 2020-06-23 2020-06-23 Video processing method, device, computing equipment and storage medium Active CN113840169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581473.0A CN113840169B (en) 2020-06-23 2020-06-23 Video processing method, device, computing equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581473.0A CN113840169B (en) 2020-06-23 2020-06-23 Video processing method, device, computing equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113840169A true CN113840169A (en) 2021-12-24
CN113840169B CN113840169B (en) 2023-09-19

Family

ID=78964058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581473.0A Active CN113840169B (en) 2020-06-23 2020-06-23 Video processing method, device, computing equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113840169B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115689277A (en) * 2022-10-12 2023-02-03 北京思路智园科技有限公司 Chemical industry park risk early warning system under cloud limit collaborative technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070162873A1 (en) * 2006-01-10 2007-07-12 Nokia Corporation Apparatus, method and computer program product for generating a thumbnail representation of a video sequence
CN106845338A (en) * 2016-12-13 2017-06-13 深圳市智美达科技股份有限公司 Pedestrian detection method and system in video flowing
CN108171716A (en) * 2017-12-25 2018-06-15 北京奇虎科技有限公司 Video personage based on the segmentation of adaptive tracing frame dresss up method and device
CN108460817A (en) * 2018-01-23 2018-08-28 维沃移动通信有限公司 A kind of pattern splicing method and mobile terminal
JP2018207178A (en) * 2017-05-30 2018-12-27 キヤノン株式会社 Imaging apparatus, control method and program of imaging apparatus
US20190355128A1 (en) * 2017-01-06 2019-11-21 Board Of Regents, The University Of Texas System Segmenting generic foreground objects in images and videos
WO2020098158A1 (en) * 2018-11-14 2020-05-22 平安科技(深圳)有限公司 Pedestrian re-recognition method and apparatus, and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070162873A1 (en) * 2006-01-10 2007-07-12 Nokia Corporation Apparatus, method and computer program product for generating a thumbnail representation of a video sequence
CN106845338A (en) * 2016-12-13 2017-06-13 深圳市智美达科技股份有限公司 Pedestrian detection method and system in video flowing
US20190355128A1 (en) * 2017-01-06 2019-11-21 Board Of Regents, The University Of Texas System Segmenting generic foreground objects in images and videos
JP2018207178A (en) * 2017-05-30 2018-12-27 キヤノン株式会社 Imaging apparatus, control method and program of imaging apparatus
CN108171716A (en) * 2017-12-25 2018-06-15 北京奇虎科技有限公司 Video personage based on the segmentation of adaptive tracing frame dresss up method and device
CN108460817A (en) * 2018-01-23 2018-08-28 维沃移动通信有限公司 A kind of pattern splicing method and mobile terminal
WO2020098158A1 (en) * 2018-11-14 2020-05-22 平安科技(深圳)有限公司 Pedestrian re-recognition method and apparatus, and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘嗣超;武鹏达;赵占杰;李成名;: "交通监控视频图像语义分割及其拼接方法", 测绘学报 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115689277A (en) * 2022-10-12 2023-02-03 北京思路智园科技有限公司 Chemical industry park risk early warning system under cloud limit collaborative technology
CN115689277B (en) * 2022-10-12 2024-05-07 北京思路智园科技有限公司 Chemical industry garden risk early warning system under cloud edge cooperation technology

Also Published As

Publication number Publication date
CN113840169B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
US10936911B2 (en) Logo detection
CN109543549B (en) Image data processing method and device for multi-person posture estimation, mobile terminal equipment and server
CN108875537B (en) Object detection method, device and system and storage medium
CN109816769A (en) Scene based on depth camera ground drawing generating method, device and equipment
CN111709406B (en) Text line identification method and device, readable storage medium and electronic equipment
CN112232293A (en) Image processing model training method, image processing method and related equipment
CN111242122B (en) Lightweight deep neural network rotating target detection method and system
CN112862874A (en) Point cloud data matching method and device, electronic equipment and computer storage medium
CN114066718A (en) Image style migration method and device, storage medium and terminal
EP4211651A1 (en) Efficient three-dimensional object detection from point clouds
WO2023207778A1 (en) Data recovery method and device, computer, and storage medium
CN114863539A (en) Portrait key point detection method and system based on feature fusion
JP2022173321A (en) Object detection method, apparatus, device, medium, and program
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN113688839A (en) Video processing method and device, electronic equipment and computer readable storage medium
CN113840169B (en) Video processing method, device, computing equipment and storage medium
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
CN112734827A (en) Target detection method and device, electronic equipment and storage medium
CN111652181A (en) Target tracking method and device and electronic equipment
CN111062473A (en) Data calculation method, image processing method and device in neural network model
CN112651351B (en) Data processing method and device
WO2021237727A1 (en) Method and apparatus of image processing
CN113033337A (en) TensorRT-based pedestrian re-identification method and device
CN117830305B (en) Object measurement method, device, equipment and medium
CN113705690B (en) Face positioning method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant