CN116680438B

CN116680438B - Video concentration method, system, storage medium and electronic equipment

Info

Publication number: CN116680438B
Application number: CN202310542297.3A
Authority: CN
Inventors: 蒋铭; 梅雨; 杨广学; 段红伟; 孙禄明; 杨海松; 周成
Original assignee: Panoramic Zhilian Wuhan Technology Co ltd
Current assignee: Panoramic Zhilian Wuhan Technology Co ltd
Priority date: 2023-05-13
Filing date: 2023-05-13
Publication date: 2024-02-27
Anticipated expiration: 2043-05-13
Also published as: CN116680438A

Abstract

A video concentration method, a video concentration system, a storage medium and electronic equipment relate to the technical field of video processing. The method comprises the following steps: acquiring a video to be processed, and determining a target area video to be detected in the video to be processed; detecting a target object in the target area video based on a deep learning target detection model and the frame skip interval number corresponding to the video to be processed, so as to obtain at least one frame of video frame image containing the target object; and generating a concentrated video according to each video frame image. By implementing the technical scheme provided by the application, the video information processing efficiency is improved.

Description

Video concentration method, system, storage medium and electronic equipment

Technical Field

The application relates to the technical field of video processing, in particular to a video concentration method, a video concentration system, a storage medium and electronic equipment.

Background

The monitoring camera records the change of a specific scene per minute per second, and the shot monitoring video contains rich information. Therefore, the monitoring video is widely applied to the fields of national defense and military, intelligent security, traffic control, home monitoring and the like, provides important data support for related technologies, and is an extremely important data form.

Along with the rapid development of the intelligent community construction and intelligent security industry, the intelligent video monitoring technology is widely applied to security construction engineering of the intelligent community. A large number of security monitoring cameras are arranged in public places of communities, particularly bayonets of communities, and all-weather monitoring is carried out on important event targets such as pedestrians, vehicles and the like entering and exiting the communities. The information can be intuitively obtained through video monitoring, but as society advances into an informatization age and a big data age, the number of monitoring videos is increasing in geometric multiples, and the video monitoring also comprises a large amount of redundant background information, so that certain problems exist when people store and retrieve massive video data, such as the consumption of a large amount of labor cost and working time when viewing and retrieving community monitoring, and the efficiency of video information processing is reduced.

Disclosure of Invention

The application provides a video concentration method, a video concentration system, a storage medium and electronic equipment, which have the effect of improving the efficiency of video information processing.

In a first aspect, the present application provides a video concentrating method, which adopts the following technical scheme:

acquiring a video to be processed, and determining a target area video to be detected in the video to be processed;

detecting a target object in the target area video based on a deep learning target detection model and the frame skip interval number corresponding to the video to be processed, so as to obtain at least one video frame image containing the target object;

and generating a concentrated video according to each video frame image.

By adopting the technical scheme, the target area video to be detected is determined in the original video to be processed, the detection of objects outside the target area is reduced, the video concentration effect is improved, the target objects in the target area video are detected based on the deep learning target detection model and the frame skip interval number, at least one video frame image containing the target objects can be obtained, and the concentrated video is generated according to each video frame image, so that the concentrated video contains the video frame images corresponding to all the target objects in the target area video, the file size of the original video is reduced, the duration of the original video is shortened, and the video information processing efficiency is further improved.

Optionally, before the obtaining the video to be processed, the method further includes: constructing an initial deep learning target detection model; inputting a video sample into the initial deep learning target detection model for training; and calculating a loss value in the training process based on the loss function, and stopping training when the loss value is lower than a loss threshold value to obtain a deep learning target detection model after training is completed.

By adopting the technical scheme, the initial deep learning target detection model is trained, the loss value in the training process is calculated, and when the loss value is lower than the loss threshold value, the training is stopped, so that the deep learning target detection model with higher accuracy is trained, and the detection of a target object is improved.

Optionally, the obtaining the video to be processed and determining the target area video to be detected in the video to be processed include: acquiring a video to be processed sent by a monitoring terminal; determining a target area to be detected of each frame of image in the video to be processed according to a preset effective area; and reducing each frame of image corresponding to the target area to a first resolution in the video to be processed to obtain the target area video.

By adopting the technical scheme, as the original video to be processed possibly shoots a region with a larger range, but the actual situation may only need to pay attention to one target region, for example, the target region can be a gate entrance guard region of a district gate, and dividing the target region can reduce the detection of objects outside the target region, thereby improving the video concentration effect, and simultaneously scaling each frame of image corresponding to the target region in the video to be processed to the first resolution, thereby reducing the calculated amount in the deep learning target detection model.

Optionally, before detecting the target object in the target area video, the method further includes: acquiring shooting time corresponding to the video to be processed; and determining the corresponding frame skip interval number of the target area according to the time period of the shooting time.

By adopting the technical scheme, the frame skip interval number corresponding to the target area view is determined according to the time period to which the shooting time corresponding to the video to be processed belongs, for example, the frame skip interval number is determined according to the day and night, and the efficiency of concentrating the original video can be further improved by flexibly adjusting the frame skip interval number.

Optionally, the detecting the target object in the target area video based on the deep learning target detection model and the frame skip interval number corresponding to the video to be processed to obtain at least one video frame image including the target object includes: reading a video frame image in the target area video; based on a deep learning target detection model, storing the currently read video frame image containing the target object, and reading the next video frame image according to the frame skip interval number; judging whether the next video frame image is successfully read or not; if the next video frame image is successfully read, taking the next video frame image as the currently read video frame image, and executing the step of saving the currently read video frame image containing the target object until the target area video is completely read, and saving at least one video frame image containing the target object; and if the next video frame image is not read, ending the detection of the target area video.

By adopting the technical scheme, the video frame images in the target area video are sequentially read, the video frame images containing the target object read by the current frame are stored until the target area video is read, and the video frame images containing the target object can be completely stored, so that a foundation is provided for subsequent video concentration.

Optionally, the storing the video frame image including the target object read by the current frame based on the deep learning target detection model includes: inputting the video frame image read currently into the deep learning target detection model, and judging whether a target object exists in the video frame image read currently;

if the video frame image read by the current frame has a target object, taking the video frame image read by the current frame as an effective frame, and storing the video frame image corresponding to the effective frame and the video frame image corresponding to the tailing frame of the effective frame, wherein the tailing frame is a preset number of frames after the effective frame; if the currently read video frame image does not have the target object, not storing the currently read video frame image, and executing the step of reading the next video frame image according to the frame skip interval number.

By adopting the technical scheme, the video frame image corresponding to the effective frame and the video frame image corresponding to the tailing frame of the effective frame are stored, so that the continuity of the frame image picture after the target object disappears in the concentrated video is ensured.

Optionally, the generating the condensed video according to each video frame image includes: counting and numbering each video frame image according to a storage sequence; and converting the video frame image corresponding to the first resolution into a second resolution, and generating a concentrated video according to the number, wherein the second resolution is larger than the first resolution.

By adopting the technical scheme, the video frame image corresponding to the first resolution is converted into the second resolution, and the concentrated video is generated according to the number, so that the display quality of the concentrated video picture can be ensured.

In a second aspect of the present application, there is provided a video concentration system, the system comprising:

the target area video determining module is used for acquiring a video to be processed and determining a target area video to be detected in the video to be processed;

the video frame image acquisition module is used for detecting a target object in the target area video based on a deep learning target detection model and the frame skip interval number corresponding to the video to be processed to obtain at least one video frame image containing the target object;

and the concentrated video generation module is used for generating concentrated video according to each video frame image.

In a third aspect the present application provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a fourth aspect of the present application, there is provided an electronic device comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps described above

In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

1. according to the method and the device, the target area video to be detected is determined in the original video to be processed, detection of objects outside the target area is reduced, video concentration effect is improved, the target objects in the target area video are detected based on the deep learning target detection model and the frame skip interval number, at least one video frame image containing the target objects can be obtained, concentrated video is generated according to each video frame image, the concentrated video contains video frame images corresponding to all the target objects in the target area video, the file size of the original video is reduced, the duration of the original video is shortened, and further the video information processing efficiency is improved.

2. The method and the device for classifying the target region can reduce detection of objects outside the target region, improve video concentration effect, simultaneously scale each frame of image corresponding to the target region in the video to be processed to the first resolution, and reduce calculation amount in a deep learning target detection model

3. According to the method and the device for concentrating the original video, the number of the frame skipping intervals corresponding to the target area view is determined according to the time period to which the shooting time corresponding to the video to be processed belongs, for example, the number of the frame skipping intervals is determined according to the day and night, and the efficiency of concentrating the original video can be further improved through flexibly adjusting the number of the frame skipping intervals.

Drawings

Fig. 1 is a schematic flow chart of a video concentration method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of detecting a target object according to an embodiment of the present application;

FIG. 3 is an exemplary diagram of video enrichment provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a video concentrating system according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals illustrate: 1. a target area video determining module; 2. a video frame image acquisition module; 3. a concentrated video generation module; 500. an electronic device; 501. a processor; 502. a communication bus; 503. a user interface; 504. a network interface; 505. a memory.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments.

In the description of embodiments of the present application, words such as "for example" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described herein as "such as" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In order to facilitate understanding of the methods and systems provided in the embodiments of the present application, a description of the background of the embodiments of the present application is provided before the description of the embodiments of the present application.

With the rapid development of scientific and technical informatization, the analysis and the processing of massive video image data are not slow. At present, a large number of high-definition monitoring cameras are arranged in each metropolitan area, and convenience service and safety protection are provided for people to a certain extent. But also puts great stress on data storage and presents significant challenges for target retrieval and video browsing, etc. Because the targets in the monitoring area do not appear at any time, a great deal of labor cost and working time are consumed when people browse the monitoring videos manually, and the efficiency of video information processing is reduced.

In order to solve the above problems, the embodiments of the present application provide a video concentration method, and based on the video concentration method, the file size of an original video is reduced, and the duration of the original video is shortened, so that the efficiency of video information processing is improved.

In one embodiment, please refer to fig. 1, a schematic flow chart of a video concentration method is specifically provided, and the method may be implemented by a computer program, may be implemented by a single chip microcomputer, and may also be run on a video concentration system based on von neumann system. The computer program can be integrated into an application or can be run as a separate tool application, and specifically, in the embodiment of the application, the method can be applied to a terminal device for video concentration, and the method comprises the steps of 10-30, wherein the steps are as follows:

step 10: and acquiring the video to be processed, and determining the video of the target area to be detected in the video to be processed.

Specifically, the video to be processed can be understood as an original monitoring video which needs to be checked by a person, and the target area video can be understood as a video corresponding to a target area which needs to be checked by the person and is intercepted in the original monitoring video. Because the original monitoring video contains a larger range of monitoring area, in actual condition, only the picture of the target area which the personnel want to view is needed to be obtained, the target area video to be detected needs to be determined in the video to be processed, so that the detection of the object outside the target area is reduced, and the video concentration effect is improved.

On the basis of the above embodiment, as an optional embodiment, the step of acquiring the video to be processed and determining the target area video to be detected in the video to be processed may further include steps 101 to 103:

step 101: and acquiring the video to be processed sent by the monitoring terminal.

Specifically, in the embodiment of the application, the terminal device acquires a video to be processed sent by the monitoring terminal, where the video to be processed includes a video frame within a period of time within a monitoring area. The video to be processed can be photographed and recorded by monitoring devices installed in public places, enterprises and public institutions, residential communities and the like. The condition of the monitored area can be observed and mastered in real time through the video to be processed.

In this embodiment of the present application, the video to be processed is a monitoring video including a cell gate area, where the video to be processed includes a video frame of a person or a target object such as a vehicle entering or exiting the cell gate.

Step 102: and determining a target area to be detected of each frame of image in the video to be processed according to the preset effective area.

Specifically, the preset effective area may be understood as a target area that a person wants to view. Such as a cell bayonet region in the embodiments of the present application. Because of the uncertainty of the installation position and angle of each monitoring camera, for example, the monitoring cameras of the cell bayonets may face to the outer channel of the cell, and the monitoring cameras in the cell face to the parking space, and the like, although each monitoring camera can shoot the area of the cell bayonets, other areas except the cell bayonets exist, and as people do not pay attention to the pictures of the areas, dynamic people or objects in the pictures of the other areas can influence the efficiency of video concentration when video concentration is carried out. In the embodiment of the application, a preset effective area is selected from the picture of the video to be processed as a target area to be detected of each frame of image.

Step 103: and reducing each frame of image corresponding to the target area to a first resolution in the video to be processed to obtain the target area video.

Specifically, in order to reduce the subsequent detection calculation amount of the target object, the first resolution may be smaller than the original resolution of the video to be processed, and each frame of image corresponding to the target area is reduced to the first resolution in the video to be processed, so that the target area video is obtained, and the target area video is a video which only contains the area that the person wants to view and is smaller than the original resolution, thereby improving the efficiency of video concentration.

Step 20: and detecting the target object in the target area video based on the deep learning target detection model and the frame skip interval number corresponding to the video to be processed, so as to obtain at least one video frame image containing the target object.

The deep learning target detection model is a target detection technology realized by using a deep learning algorithm, and objects, positions and categories can be automatically detected and positioned in an image or a video. In the embodiment of the application, the deep learning target detection model may be a YOLOv5 model, the YOLOv5 model is a target detection algorithm, a video can be input into the YOLOv5 model for detection, and the YOLOv5 model can realize processing of each frame, extract features and perform target detection.

According to the embodiment of the application, an initial deep learning target detection model is built in advance, a video sample is input into the initial deep learning target detection model for training, a loss value of the initial deep learning target detection model in a training process is calculated based on a loss function, training is stopped when the loss value is lower than a loss threshold value, and a trained deep learning target detection model is obtained, namely the training is utilized to realize target object detection, and at least one frame of video frame image containing a target object can be obtained.

Referring to fig. 2, a schematic flow chart of detecting a target object provided in the embodiment of the present application is provided, as an optional embodiment, based on a deep learning target detection model and a frame skip interval number corresponding to a video to be processed, the step of detecting a target object in a target area video to obtain at least one frame of a video frame image including the target object may further include steps 201 to 205:

step 201: and reading the video frame image in the target area video.

Specifically, the target area video is composed of a series of continuous image frames, each frame is a still image, each frame image is composed of pixels, each pixel contains information such as color brightness, and the resolution of the video refers to the number of pixels contained in each frame image in the video. And inputting the target area video into a deep learning target detection model, and reading video frame images in the target area video, wherein the video frame images refer to currently read frame images.

Step 202: and based on the deep learning target detection model, storing the video frame image containing the target object which is read currently, and reading the next video frame image according to the frame skip interval number.

In the embodiment of the present application, the target object may refer to an object that a dynamic person such as a pedestrian or a vehicle wants to search, or may refer to another object, which is not limited herein. The number of frame skip intervals may refer to the number of frames skipped during video frame reading. For example, if the number of frame skipping intervals is 20, the video of every 20 frames is subjected to target detection, so that the video processing speed can be increased and the calculated amount can be reduced.

Further, the number of frame skipping intervals is set by oneself, specifically, shooting time corresponding to the video to be processed, namely, the video of the target area recorded in the video to be processed in a certain time period is obtained, the number of frame skipping intervals corresponding to the target area is determined according to the time period of the arrangement, for example, the number of frame skipping intervals is set to be the first number of frame skipping intervals in the time period of daytime, the number of frame skipping intervals is set to be the second number of frame skipping intervals in the time period of evening, and as the target object in evening is fewer than the time period of daytime, the second number of frame skipping intervals can be larger than the first number of frame skipping intervals.

In another possible embodiment, the number of frame skipping intervals may also be set according to the historical traffic of the shooting area corresponding to the video to be processed, for example, the traffic of people at each time in a day of the area is calculated, the corresponding number of frame skipping intervals is set according to the traffic of people at each time, the number of frame skipping intervals at the time of high traffic is set to be small, and the number of frame skipping intervals at the time of low traffic is set to be large, so that the concentration efficiency of the video to be processed can be further improved.

Specifically, the target area video is input into a deep learning target detection model, video frame images in the target area video are read, the currently read video frame images containing the target object are stored, the video frame with the frame skip interval number after the current video frame is used as the next video frame, and then the next video frame is read, so that the target area video can be read.

On the basis of the above embodiment, as an alternative embodiment, the step of storing the currently read video frame image containing the target object based on the deep learning target detection model and reading the next video frame image according to the number of frame skip intervals may further include steps 2021 to 2023.

Step 2021: based on the deep learning target detection model, whether a target object exists in the video frame image read currently is judged.

Specifically, the deep learning target detection model reads a first video frame image of a target area video, performs target detection on the first video frame image to acquire information such as the position and the size of a target object possibly existing in the video, calculates a feature vector of the target object in the first video frame image, stores the feature vector as a reference vector, acquires a next video frame image after the frame skipping interval number of the first video frame image, performs target detection on the next video frame image to acquire information such as the position and the size of the target object possibly existing in the video, calculates a feature vector of the target object in the next video frame image, compares the feature vector with the reference vector of the first video frame image, and judges that the target object exists in the next video frame image if the distance between the feature vector and the reference vector of the target object in the next video frame image is smaller than a set threshold.

Further, according to the method, a next video frame image is taken as a first video frame image, the next video frame image after the frame skip interval is read, target detection is carried out on the next video frame image, so that information such as the position and the size of a target object possibly existing in a video is obtained, a feature vector of the target object in the next video frame image is calculated, the feature vector is compared with a reference vector of the first video frame image, if the distance between the feature vector of the target object in the next video frame image and the reference vector is smaller than a set threshold value, the step of judging that the target object exists in the next video frame image is carried out until the target area video is read, and at least one video frame image containing the target object can be obtained.

Step 2022: if the video frame image read by the current frame has the target object, the video frame image read by the current frame is taken as an effective frame, and the video frame image corresponding to the effective frame and the video frame image corresponding to the tailing frame of the effective frame are stored.

Specifically, if the video frame image read by the current frame has the target object, the video frame image read by the current frame is taken as an effective frame, the video frame image corresponding to the effective frame and the video frame image corresponding to the tailing frame of the effective frame are stored, wherein the tailing frame of the effective frame refers to a preset number of frames after the effective frame, namely whether the target object exists in the tailing frame or not, the target object is stored, and the continuity of the frame image picture of the dynamic target object in video concentration after disappearance is ensured.

For example, assuming that the trailer frame is 80 frames, after a target object appears in the current frame, the video frame which is actually read currently is taken as an effective frame, and the video frame image corresponding to the effective frame and the video frame image corresponding to the 80 frames after the effective frame are stored.

Step 2023: if the currently read video frame image does not have the target object, the currently read video frame image is not saved, and the step of reading the next video frame image according to the frame skip interval number is executed.

Specifically, if the currently read video frame image does not have the target object, the currently read video frame image is not saved, and the step of reading the next video frame image according to the frame skip interval number is performed until the target area video is completely read, and at least one video frame image containing the target object can be obtained.

Step 203: and judging whether the next video frame image is successfully read.

Specifically, after the video frame image including the target object that is read currently is saved and the next video frame image is read according to the number of frame skip intervals, whether the next video frame image is read successfully or not needs to be judged, namely whether the next video frame image can be accessed or not is judged.

Step 204: and if the next video frame image is successfully read, taking the next video frame image as the currently read video frame image, and executing the step of storing the currently read video frame image containing the target object until the target area video is completely read, and storing at least one video frame image containing the target object.

Step 205: if the next video frame image reading fails, ending the detection of the target area video.

Specifically, if the next video frame image is successfully read, taking the next video frame image as the currently read video frame image, and executing the step of storing the currently read video frame image containing the target object until the target area video is completely read, and storing at least one video frame image containing the target object, if the next video frame image fails to be read, the next video frame image is the last frame of the target area video, and if the next video frame image fails to be read, the reading of the target area video is finished.

Step 30: and generating a concentrated video according to each video frame image.

After at least one obtained video frame image containing the target object is stored, a concentrated video is generated according to the storage time corresponding to the video frame image of each target object, and the data volume of the original video can be compressed to be smaller by the concentrated video, so that the video processing efficiency is improved.

On the basis of the above embodiment, as an optional embodiment, the step of generating the condensed video from each video frame image may further include steps 301 to 302:

step 301: and counting and numbering each video frame image according to the preservation sequence.

Specifically, after the video frame image containing the target object is read, the frames corresponding to the video frame image are counted and numbered, and after the video frame image containing the target object is read each time, the last frame of video image is recounting, so that the concentrated video can be generated according to the numbers of the video frame images, and the sequence of playing the concentrated video is ensured.

Step 302: and converting the video frame image corresponding to the first resolution into the second resolution, and generating the concentrated video according to the number.

Specifically, after the video frame image containing the target object is read, and the frames corresponding to the video frame image are counted and numbered, the video frame image corresponding to the first resolution is converted into the second resolution, and the second resolution can be the original resolution of the video to be processed, so that the picture quality of the concentrated video is ensured. In the process of generating the concentrated video, the attribute of each video frame image containing the target object is also saved, such as the position, time and the like of the video frame image in the video to be processed, so that a person can conveniently check the video to be processed at fixed points according to the attribute.

Referring to fig. 3, an example diagram of video concentration provided in the embodiments of the present application may be combined with fig. 3, where a target object is a moving pedestrian, to save a video frame image of an original monitoring video including the target object, set a frame skip interval number, speed up processing the original monitoring video, reduce a data volume of the concentrated video, and shorten a duration of the original monitoring video.

Referring to fig. 4, a schematic block diagram of a video concentrating system according to an embodiment of the present application is provided, where the video concentrating system may include: a target area video determining module 1, a video frame image acquiring module 2 and a concentrated video generating module 3, wherein:

the target area video determining module 1 is used for acquiring a video to be processed and determining a target area video to be detected in the video to be processed;

the video frame image acquisition module 2 is used for detecting a target object in the target area video based on a deep learning target detection model and the frame skip interval number corresponding to the video to be processed to obtain at least one video frame image containing the target object;

a concentrated video generating module 3, configured to generate a concentrated video according to each video frame image.

It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.

The application also discloses electronic equipment. Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to the disclosure in an embodiment of the present application. The electronic device 500 may include: at least one processor 501, at least one network interface 504, a user interface 503, a memory 505, at least one communication bus 502.

Wherein a communication bus 502 is used to enable connected communications between these components.

The user interface 503 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 503 may further include a standard wired interface and a standard wireless interface.

The network interface 504 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 501 may include one or more processing cores. The processor 501 connects various parts throughout the server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 505, and invoking data stored in the memory 505. Alternatively, the processor 501 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 501 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 501 and may be implemented by a single chip.

The Memory 505 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 505 comprises a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 505 may be used to store instructions, programs, code sets, or instruction sets. The memory 505 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described various method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 505 may also optionally be at least one storage device located remotely from the processor 501. Referring to fig. 5, an operating system, a network communication module, a user interface module, and an application program of a video condensing method may be included in the memory 505 as a computer storage medium.

In the electronic device 500 shown in fig. 5, the user interface 503 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 501 may be configured to invoke an application program in the memory 505 that stores a video enrichment method that, when executed by the one or more processors 501, causes the electronic device 500 to perform the method as described in one or more of the embodiments above. It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure.

This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims

1. A method of video concentration, the method comprising:

constructing an initial deep learning target detection model;

inputting a video sample into the initial deep learning target detection model for training;

calculating a loss value in the training process based on the loss function, and stopping training when the loss value is lower than a loss threshold value to obtain a deep learning target detection model after training is completed;

acquiring a video to be processed sent by a monitoring terminal;

determining a target area to be detected of each frame of image in the video to be processed according to a preset effective area;

reducing each frame of image corresponding to the target area to a first resolution in the video to be processed to obtain a target area video; reading a video frame image in the target area video;

based on a deep learning target detection model, storing a currently read video frame image containing a target object, and reading a next video frame image according to the number of frame skipping intervals, wherein the deep learning target detection model is a YOLOv5 model;

judging whether the next video frame image is successfully read or not;

if the next video frame image is successfully read, taking the next video frame image as the currently read video frame image, and executing the step of saving the currently read video frame image containing the target object until the target area video is completely read, and saving at least one video frame image containing the target object;

if the next video frame image is not read, ending the detection of the target area video;

counting and numbering each video frame image according to a storage sequence;

and converting the video frame image corresponding to the first resolution into a second resolution, and generating a concentrated video according to the number, wherein the second resolution is larger than the first resolution.

2. The video concentration method according to claim 1, wherein before the video frame image containing the target object, which is currently read, is saved based on the deep learning target detection model, the video concentration method further comprises:

acquiring shooting time corresponding to the video to be processed;

and determining the corresponding frame skip interval number of the target area according to the time period of the shooting time.

3. The video concentration method according to claim 1, wherein the storing the video frame image including the target object read by the current frame based on the deep learning target detection model includes:

based on the deep learning target detection model, judging whether a target object exists in the video frame image read currently;

if the video frame image read by the current frame has a target object, taking the video frame image read by the current frame as an effective frame, and storing the video frame image corresponding to the effective frame and the video frame image corresponding to the tailing frame of the effective frame, wherein the tailing frame is a preset number of frames after the effective frame;

if the currently read video frame image does not have the target object, not storing the currently read video frame image, and executing the step of reading the next video frame image according to the frame skip interval number.

4. A video concentration system, the system comprising:

the target area video determining module (1) is used for acquiring a video to be processed sent by the monitoring terminal; determining a target area to be detected of each frame of image in the video to be processed according to a preset effective area; reducing each frame of image corresponding to the target area to a first resolution in the video to be processed to obtain a target area video;

the video frame image acquisition module (2) is used for constructing an initial deep learning target detection model; inputting a video sample into the initial deep learning target detection model for training; calculating a loss value in the training process based on the loss function, and stopping training when the loss value is lower than a loss threshold value to obtain a deep learning target detection model after training is completed; based on a deep learning target detection model, storing a currently read video frame image containing a target object, and reading a next video frame image according to the number of frame skipping intervals, wherein the deep learning target detection model is a YOLOv5 model; judging whether the next video frame image is successfully read or not; if the next video frame image is successfully read, taking the next video frame image as the currently read video frame image, and executing the step of saving the currently read video frame image containing the target object until the target area video is completely read, and saving at least one video frame image containing the target object; if the next video frame image is not read, ending the detection of the target area video;

a concentrated video generation module (3) for numbering each video frame image according to a storage sequence; and converting the video frame image corresponding to the first resolution into a second resolution, and generating a concentrated video according to the number, wherein the second resolution is larger than the first resolution.

5. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method of any one of claims 1 to 3.

6. An electronic device comprising a processor, a memory and a transceiver, the memory for storing instructions, the transceiver for communicating with other devices, the processor for executing the instructions stored in the memory to cause the electronic device to perform the method of any one of claims 1-3.