CN115314713A

CN115314713A - Method, system and device for extracting target segment in real time based on accelerated video

Info

Publication number: CN115314713A
Application number: CN202210945035.7A
Authority: CN
Inventors: 彭杰; 董天意; 赵阳; 晋晶; 宋绍方; 孔德润; 董兰芳; 吴艾久
Original assignee: Hefei Zhongna Medical Instrument Co ltd
Current assignee: Hefei Zhongna Medical Instrument Co ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-11-08

Abstract

The invention relates to a method, a system and a device for extracting a target segment in real time based on an accelerated video. The real-time extraction method comprises the following steps: s1: and copying the original video into two to-be-processed videos. And performing frame processing on one video to be processed to obtain a plurality of high-definition pictures. And compressing the other video to be processed to obtain a compressed video. S2: and performing frame division processing on the compressed video to obtain a plurality of low-definition pictures. S3: and identifying a plurality of low-definition pictures, and recording time axis information of the pictures which are successfully identified in the compressed video. S4: and extracting the high-definition picture at the corresponding moment according to the time shaft information. S5: and combining the extracted high-definition pictures to obtain the identification video. The invention carries out frame-by-frame identification on the compressed video to replace the high-definition image identification of the original video, reduces the hardware quality requirement on an identification tool, and thus realizes the improvement of the efficiency of image identification under the condition of low cost.

Description

Method, system and device for extracting target segments in real time based on accelerated videos

Technical Field

The invention relates to the technical field of video processing, in particular to a real-time target segment extracting method based on accelerated videos, a real-time target segment extracting system based on accelerated videos and a real-time target segment extracting device based on accelerated videos.

Background

For the real-time identification and extraction of the target segment of the video image, the video is generally subjected to framing processing, then each frame image of the video is sent into a deep learning model for processing and identification, and then the successfully identified video frame images are combined into an identification video. However, when the accelerated videos with high definition and high frame rate are identified in real time, the display card is required to identify the pictures of each frame, and then the identified pictures are synthesized into the videos to be displayed in real time.

At present, for the processing of accelerated videos with high definition and high frame rate, a multi-display card is usually used for simultaneous multi-thread processing, which has very high requirements on hardware and increases the equipment cost for extracting target segments. However, if only the single-thread processing of the single display card is adopted, the efficiency of video identification processing is reduced, and it is difficult to extract the target segment in time.

Disclosure of Invention

In view of the above, it is necessary to provide a method, a system, and an apparatus for extracting a target segment in real time based on an accelerated video, aiming at the problem that it is difficult to efficiently extract a target segment in a low-cost condition in the conventional accelerated video.

A real-time target segment extraction method based on accelerated videos comprises the following steps:

s1: and copying the original video to be processed into two videos to be processed. And performing frame processing on one video to be processed to obtain an image set containing a plurality of continuous high-definition pictures. And compressing the other video to be processed to obtain a low-definition low-frame-rate compressed video.

S2: and performing frame division processing on the compressed video to obtain a plurality of low-definition pictures.

S3: and inputting a plurality of low-definition pictures into an image recognition model for recognition. And marking the pictures which are successfully identified, and recording the time axis information of the pictures in the compressed video. The image recognition model is used for recognizing an image with target characteristics;

s4: and extracting high-definition pictures at corresponding moments from the image set according to the marked time axis information.

S5: and combining the extracted high-definition pictures to obtain the high-definition identification video.

According to the invention, the original video is compressed, the compressed video and the original video are respectively framed, and the video frame images of the compressed video are identified one by one to replace the video frame image identification of the original video, so that the hardware quality requirement of an identification tool in the identification process is reduced, and the image identification efficiency is improved. In the process, the corresponding high-definition pictures are marked and extracted according to the identified low-definition pictures through the mapping relation between the compressed video and the original video, so that the identification accuracy is maintained.

In one embodiment, the method for framing the video to be processed includes the following steps:

a1: and acquiring the characteristic information of the original video, including the frame rate and the duration of the original video.

a2: and extracting the original video frame by frame according to the frame rate and the duration information to obtain a frame by frame image of the original video.

a3: and marking the frame-by-frame images according to the time information, and obtaining an image set containing all the frame-by-frame images. Each image in the set of images contains a corresponding time of day marker.

In one embodiment, the method for compressing the video to be processed comprises the following steps:

b1: and acquiring the code rate of the original video.

b2: and calculating the compression ratio according to the preset code rate.

b3: and compressing the original video into a compressed video with low definition and low frame rate according to the compression ratio.

In one embodiment, the image recognition model is established as follows:

s31: an existing initial image recognition model is obtained.

S32: collecting a plurality of characteristic pictures, and dividing the characteristic pictures into a training set and a testing set according to a preset proportion. The characteristic picture is a picture with target characteristics.

S33: and sequentially inputting the characteristic pictures in the training set into the initial image recognition model to train the initial image recognition model, and iteratively updating the parameters of the image recognition model through the training process.

S34: and after the training is finished, inputting the characteristic pictures in the test set into the optimized image recognition model for testing, and reserving parameters of the image recognition model meeting the recognition precision requirement to obtain the final image recognition model.

In one embodiment, the method for extracting the high-definition picture comprises the following steps:

s41: time axis information of the marked low-definition pictures in the compressed video is acquired.

S42: and calculating the sequence number of the corresponding high-definition picture in the picture set according to the ratio of the time of the low-definition picture in the time axis information in the compressed video to the total time of the compressed video.

S43: and extracting the corresponding high-definition pictures according to the sequence numbers.

The invention also provides a real-time target segment extraction system based on the accelerated video, which comprises the following steps: the device comprises an acquisition module, a compression module, a framing module, an identification module, an extraction module and a frame combination module.

The acquisition module is used for acquiring the characteristic information of the input original video, including the frame rate, the duration and the code rate of the video.

The compression module is used for compressing the input video to be processed into a low-definition low-frame-rate compressed video.

The framing module is used for framing the original video to be processed to obtain an image set containing a plurality of continuous high-definition pictures. The framing module is also used for framing the compressed video to obtain a plurality of low-definition pictures.

The identification module is used for extracting a picture with target characteristics from a plurality of low-definition pictures as a picture to be marked.

The extraction module is used for marking each picture to be marked and extracting a corresponding high-definition picture in the image set according to the time axis information of the marked low-definition picture.

And the frame combining module is used for combining the plurality of marked pictures according to the sequence to obtain a video clip containing the target characteristics.

In one embodiment, the compression module employs a video encoder. The video encoder is used for converting the format of the video, and the definition and the frame rate of the video are changed along with the format conversion.

In one embodiment, the recognition module recognizes the low-resolution picture using an image recognition model. The image recognition model is obtained by the following method:

an existing initial image recognition model is obtained. Collecting a plurality of characteristic pictures, and dividing the characteristic pictures into a training set and a test set according to a preset proportion. And inputting the pictures in the training set into the initial image recognition model to train the initial image recognition model, and iteratively updating the parameters of the image recognition model through the training process. And inputting the pictures in the test set into the optimized image recognition model for testing, and reserving parameters of the image recognition model meeting the recognition precision requirement to obtain a final image recognition model.

In one embodiment, the extraction module finds the high-definition picture for marking through a pre-stored mapping relation. Wherein, the mapping relation is expressed as:

T _i /T _A ＝O _j /O _s

wherein, T _i For the time, T, of the ith picture to be marked in the low-definition low-frame-rate video _A For total duration, O, in low definition low frame rate video _j Number of pictures in a picture set for high definition, O _s The total number of pictures in the picture set.

The invention also provides a real-time target segment extracting device based on the accelerated video, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor.

The functional modules in the real-time target segment extraction device based on the accelerated video are deployed in the manner of the identification processing system, and when the processor executes the computer program, the steps of the real-time target segment extraction method based on the accelerated video are realized, so that the video segments containing the target features are extracted from the accelerated video with high definition and high frame rate.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, the original video is compressed, the compressed video and the original video are respectively framed, and the video frame images of the compressed video are identified one by one to replace the video frame image identification of the original video, so that the hardware quality requirement of an identification tool in the identification process is reduced, and the image identification efficiency is improved.

2. According to the invention, the mapping relation between the compressed video and the original video is calculated, and the corresponding high-definition picture is marked and extracted according to the identified low-definition picture, so that the identification accuracy is maintained.

3. According to the invention, the characteristics of the existing image recognition model are trained to strengthen the target characteristics, and the optimized image recognition model is verified through the test set, so that the final image recognition model meeting the recognition accuracy requirement is obtained, and the accuracy of target object recognition is improved.

Drawings

Fig. 1 is a step diagram of a method for extracting a target segment based on an accelerated video in real time according to embodiment 1 of the present invention;

FIG. 2 is a flow chart of the real-time extraction method of the target segment based on the accelerated video in FIG. 1;

fig. 3 is a schematic structural diagram of an accelerated video-based real-time target segment extraction module using the real-time target segment extraction method in fig. 1.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that when an element is referred to as being "mounted on" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. When an element is referred to as being "secured to" another element, it can be directly secured to the other element or intervening elements may also be present.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "or/and" includes any and all combinations of one or more of the associated listed items.

Example 1

Referring to fig. 1 and fig. 2, fig. 1 is a step diagram of a real-time target segment extracting method based on accelerated video according to the embodiment; fig. 2 is a flowchart of the method for extracting a target segment based on an accelerated video in real time in fig. 1. The real-time target segment extracting method based on the accelerated video comprises the following steps:

The original video is an accelerated video with high definition and high frame rate. Such video contains a larger number of video frame images due to the high frame rate. And each video frame image is a high-definition image, and the occupied storage space is larger. Therefore, when the original video is processed, if the method of frame-by-frame identification is adopted, the hardware quality requirement for the identification tool is very high, and the identification rate is relatively low.

The method for performing frame processing on the video to be processed comprises the following steps:

a1: and acquiring the characteristic information of the original video, including the frame rate and the duration of the original video. In the process of initial acquisition of each video, initial characteristic information can be obtained according to the tool specification of the acquired video. After the video is accelerated, the characteristic information of the accelerated video can be calculated according to the accelerating magnification and the adopted accelerating method. The frame rate, duration, and code rate of the video are all key information required for video processing.

a2: and extracting the original video frame by frame according to the frame rate and the duration information to obtain a frame by frame image of the original video. The frame rate of the video is the number of frames per second of the video displayed. And calculating the total frame number in the video according to the frame rate and the total duration of the video. And marking the time on the time axis of the video according to the frame rate or the total frame number of the video, and further exporting the video frame image corresponding to each marked time to obtain the frame-by-frame image of the original video.

a3: and marking the frame-by-frame images according to the time information, and obtaining an image set containing all the frame-by-frame images. Each image in the set of images contains a corresponding time of day marker. In the extraction process of each video frame image, the time information of each video frame image mark is reserved, all the video frame images are sequenced according to the time information, an image set containing frame-by-frame images is further obtained, and the sequence of the video frame images in the image set is consistent with the display sequence in the original video.

The method for compressing the video to be processed comprises the following steps:

b1: and acquiring the code rate of the original video. The video bitrate is also one of the video characteristic information. The code rate is the number of data bits transmitted per unit time during data transmission, and is also referred to as the sampling rate. The higher the sampling rate in unit time is, the higher the precision is, the higher the code rate is, the clearer the video is, otherwise, the coarser the picture is. The original video has high definition, i.e. the code rate is relatively high.

b2: and calculating the compression ratio according to the preset code rate. In order to facilitate the identification processing of the video and reduce the quality requirement of the hardware in the identification process, the embodiment adopts a mode of reducing the code rate to convert the high-definition video into the low-definition video. In order to ensure that the video frame images for identification meet the requirements of identification processing, a code rate threshold needs to be preset so that the storage capacity of the video frame images meets the minimum hardware quality requirement. The ratio of the code rate of the original video to the code rate threshold is the actual compression ratio.

b3: and compressing the original video into the compressed video with low definition and low frame rate according to the compression ratio. Video compression generally includes both methods of adjusting frame rate and adjusting resolution. Because the original video is accelerated and the frame rate is high, the frame rate can be adjusted to a preset frame rate. In order to enable the video frame image of the compressed video to meet the identification requirement, the resolution of the original video needs to be adjusted, that is, the definition of the original video is reduced.

S2: and performing frame division processing on the compressed video to obtain a plurality of low-definition pictures. The frame dividing method of the compressed video is consistent with the frame dividing method of the original video, namely, the frame rate, the code rate and the time length of the compressed video are calculated firstly, then the compressed video is extracted frame by frame to obtain a plurality of low-definition images, and the time axis information of each low-definition image in the compressed video is recorded.

S3: and inputting a plurality of low-definition pictures into an image recognition model for recognition. And marking the pictures which are successfully identified, and recording the time axis information of the pictures in the compressed video. Wherein the image recognition model is used for recognizing the image with the target characteristic. The low-definition pictures have relatively low storage capacity, and the requirement on the hardware quality of the low-definition picture identification is low. The adoption of multi-graphics card and multi-thread processing can improve the recognition rate.

The image recognition model is established by the following method:

s31: an existing initial image recognition model is obtained. And selecting a corresponding initial image recognition model according to the target characteristics, such as an OCR character recognition model based on CRNN, a face recognition model based on face _ recognition algorithm, an object recognition model based on YOLO V5 and the like.

S32: a plurality of feature pictures are collected. And dividing the plurality of characteristic pictures into a training set and a test set according to a preset proportion. The characteristic picture is a picture with target characteristics. The characteristic picture can be obtained by shooting a target object at multiple angles. If the human face is taken as the target feature, the human face can be shot from multiple angles, different postures, different facial expressions and the like, and then multiple pictures with human face features are obtained. In these pictures, the distribution can be performed according to the proportion to obtain a training set and a test set. If 80% of them are selected as training nodes, the rest 20% are selected as test set. Of course, in other embodiments, the training set and the test set may be divided according to other ratios.

S33: and sequentially inputting the characteristic pictures in the training set into the initial image recognition model to train the initial image recognition model, and iteratively updating the parameters of the image recognition model through the training process. The initial image recognition model has the same general characteristics and cannot be directly used for the recognition of target characteristics. By inputting a plurality of pictures with target characteristics into the initial image model, more accurate characteristics of the target object are extracted. In this process, the parameters of the initial image recognition model are iteratively updated.

S34: and after the training is finished, inputting the characteristic pictures in the test set into the optimized image recognition model for testing, and reserving parameters of the image recognition model meeting the recognition precision requirement to obtain the final image recognition model. Since the target features in each of the feature pictures used for training are different, and the updating directions of the initial image recognition model are different, the updating of the image recognition model is not a continuous optimization process. After training is finished, the characteristic pictures of the test set are identified in the optimized image identification models, and the optimal image identification model parameters are selected according to identification precision, so that the final image identification model is obtained. Of course, the image recognition model can be tested while being trained, and corresponding parameters can be reserved as long as the test result meets the requirement of recognition accuracy, so that the image recognition model meeting the requirement is obtained.

And comparing the characteristics of each low-definition image through the image recognition model, marking the low-definition images with the target characteristics, and acquiring corresponding time axis information.

S4: and extracting high-definition pictures at corresponding moments from the image set according to the marked time axis information. Because the compressed video is formed by compressing the original video according to a certain proportion, a one-to-one mapping relationship exists between the low-definition pictures obtained by framing in the compressed video and the high-definition pictures obtained by framing in the original video. By identifying each low-definition picture, the low-definition picture can be mapped to a high-definition picture, and then the high-definition picture with the target characteristics is extracted.

The method for extracting the high-definition picture comprises the following steps:

s41: time axis information of the marked low-definition pictures in the compressed video is acquired. And recording the time axis information of each low-definition picture in the framing process of the compressed video, and reserving the time axis information of the low-definition pictures after the identification is successful.

S42: and calculating the sequence number of the corresponding high-definition picture in the picture set according to the ratio of the time of the low-definition picture in the time axis information in the compressed video to the total time of the compressed video. And finding out the high-definition picture for marking according to the mapping relation between the low-definition picture and the high-definition picture. Wherein, the mapping relation is expressed as:

T _i /T _A ＝O _j /O _s

S43: and extracting corresponding high-definition pictures according to the sequence numbers.

S5: and combining the extracted high-definition pictures to obtain the high-definition identification video. And sequencing the extracted high-definition pictures according to the sequence number, then setting the frame rate of the synthesized video, and synthesizing the high-definition pictures into the identification video. The frame rate of the identified video may be the same as the frame rate of the original video, or the frame rate of the compressed video, or may be set according to the requirement. And identifying that each video frame image in the video contains the target feature.

According to the embodiment, the original video is compressed, the compressed video and the original video are respectively framed, and the video frame images of the compressed video are identified one by one to replace the video frame image identification of the original video, so that the hardware quality requirement of an identification tool in the identification process is reduced, and the image identification efficiency is improved. In the process, the corresponding high-definition pictures are marked and extracted according to the identified low-definition pictures through the mapping relation between the compressed video and the original video, so that the identification accuracy is maintained.

Please refer to fig. 3, which is a schematic structural diagram of an accelerated video-based real-time target segment extracting module using the real-time target segment extracting method shown in fig. 1. In order to implement the method for extracting the target segment based on the accelerated video in real time, the embodiment further provides a system for extracting the target segment based on the accelerated video in real time. The real-time target segment extracting system comprises: the device comprises an acquisition module, a compression module, a framing module, an identification module, an extraction module and a frame combination module.

The acquisition module is used for acquiring the characteristic information of the input original video, including the frame rate, the duration and the code rate of the video. In the process of initial acquisition of each video, initial characteristic information can be obtained according to the tool specification of the acquired video. After the video is accelerated, the characteristic information of the accelerated video can be calculated according to the accelerating magnification and the adopted accelerating method. For an original video with unknown source, the media file can be opened in a binary mode, and the characteristic information can be obtained through the media file structure. The characteristic information of the original video can also be identified by a video identification tool, such as a pymediainfo tool, so as to obtain the characteristic information of the original video.

The compression module is used for compressing the input video to be processed into a low-definition low-frame-rate compressed video. The compression module may employ a video encoder. The video encoder is used for converting the format of the video, and the definition and the frame rate of the video are changed along with the format conversion process. Video compression techniques are a prerequisite for computer processing of video. The digitized video signal has a high data bandwidth, usually above 20 MB/sec, and is therefore difficult for a computer to store and process. The data bandwidth is typically reduced to 1-10 MB/sec using compression techniques, so that the video signal can be stored in a computer and processed accordingly. Common video coding is the MPEG series, including MPEG1, MPEG2, MPEG4 AVC, etc., and the h.26x series, including h.261, h.262, h.263+ +, h.264, etc.

And the framing module is used for framing the original video to be processed to obtain an image set containing a plurality of continuous high-definition pictures. The framing module is also used for framing the compressed video to obtain a plurality of low-definition pictures. When the continuous image changes more than 24 frames per second, human eyes cannot distinguish a single static image according to the principle of persistence of vision, and the static image looks smooth and continuous, so that the continuous image is called a video. The video framing is to extract a plurality of continuous pictures forming a video one by one to form a plurality of video frame images. The framing of the video can be done using video tools such as Python, openCV, or Premiere.

And the identification module is used for extracting a picture with target characteristics from the plurality of low-definition pictures as a picture to be marked. Image recognition refers to a technique of processing, analyzing, and understanding an image with a computer to recognize various patterns of objects and objects. Image recognition generally includes three types, namely, character recognition, digital image processing and recognition, and object recognition. In this embodiment, the video frame image is identified by digital image processing and identification. Specifically, the identification module identifies the low-definition picture by using an image identification model. The image recognition model is a recognition mode based on an artificial neural network, and can adopt an OCR character recognition model based on CRNN, a face recognition model based on face _ recognition algorithm, an object recognition model based on YOLO V5 and the like.

The image recognition model is obtained by the following method:

an existing initial image recognition model is obtained. A plurality of feature pictures are collected. And dividing the plurality of characteristic pictures into a training set and a test set according to a preset proportion. And inputting the pictures in the training set into the initial image recognition model to train the initial image recognition model, and iteratively updating the parameters of the image recognition model through the training process. And inputting the pictures in the test set into the optimized image recognition model for testing, and reserving parameters of the image recognition model meeting the recognition precision requirement to obtain a final image recognition model.

And the extraction module is used for marking each picture to be marked and extracting a corresponding high-definition picture in the image set according to the time axis information of the marked low-definition picture. Because the compressed video is obtained by compressing the original video according to a certain proportion, a one-to-one mapping relation exists between each frame of image of the compressed video and each frame of image of the original video. The extracting module can find out the high-definition picture for marking through a prestored mapping relation. Wherein, the mapping relation is expressed as:

T _i /T _A ＝O _j /O _s

wherein, T _i And the time of the ith picture to be marked in the low-definition low-frame-rate video is obtained. T is _A Is the total duration in the low definition low frame rate video. O is _j The sequence number of the high definition pictures in the picture set. O is _s The total number of pictures in the picture set.

And the frame combining module is used for combining the plurality of marked pictures according to the sequence to obtain a video clip containing the target characteristics. Frame merging is the process of merging a plurality of continuous images into a video, just as opposed to frame splitting. The synthesized video may be consistent with the frame rate of the original video, or the frame rate of the compressed video, or of course, may be set according to the requirement. In practice, the frame may be completed using a video tool such as Python, openCV, or Premiere.

Through cooperation among a plurality of modules, the original video with high definition and high frame rate can be compressed and framed, and then the low definition image is obtained by framing in the compressed video with low definition and low frame rate after compression, so that the identification of the high definition video frame image is converted into the identification of the low definition video frame image, the hardware quality requirement of a detection tool in actual detection is reduced, and the cost of a recognition device is reduced. And extracting a corresponding high-definition picture according to the successfully identified low-definition picture, and combining the frames into a high-definition identification video. The identification video can be directly output and displayed, and can also be stored in a corresponding storage medium.

In order to facilitate the user to operate easily, the real-time target segment extracting system based on the accelerated video is deployed in the computer equipment, so that the real-time target segment extracting device based on the accelerated video is obtained. The real-time target segment extracting device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor. The functional modules in the real-time target segment extraction device based on the accelerated video are deployed in the manner of the recognition processing system, and when the processor executes the computer program, the steps of the real-time target segment extraction method based on the accelerated video are realized, so that the video segments containing the target features are extracted from the accelerated video with high definition and high frame rate.

The computer device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including an independent server or a server cluster composed of a plurality of servers) capable of executing programs, and the like. The computer device of the embodiment at least includes but is not limited to: a memory and a processor communicatively coupled to each other via a system bus.

In this embodiment, the memory (i.e., the readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device. Of course, the memory may also include both internal and external storage devices for the computer device. In this embodiment, the memory is generally used for storing an operating system, various types of application software, and the like installed in the computer device. In addition, the memory may also be used to temporarily store various types of data that have been output or are to be output.

The processor may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to run a program code stored in the memory or process data, so as to implement the above step of the method for extracting the target segment based on the accelerated video in real time, and further extract the video segment including the target feature from the accelerated video with high definition and high frame rate.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A real-time extraction method of a target segment based on an accelerated video is used for extracting a video segment containing target features from an accelerated video with high definition and high frame rate; the real-time extraction method is characterized by comprising the following steps:

s1: copying an original video to be processed into two videos to be processed; performing frame processing on one of the videos to be processed to obtain an image set containing a plurality of continuous high-definition pictures; compressing the other video to be processed to obtain a low-definition low-frame-rate compressed video;

s2: performing frame division processing on the compressed video to obtain a plurality of low-definition pictures;

s3: inputting a plurality of low-definition pictures into an image recognition model for recognition; marking the pictures which are successfully identified, and recording the time axis information of the pictures in the compressed video; wherein the image recognition model is used for recognizing an image with target characteristics;

s4: extracting high-definition pictures at corresponding moments from the image set according to the marked time axis information;

2. The method for extracting the target segment based on the accelerated video in real time as claimed in claim 1, wherein in S1, the method for performing the framing processing on the video to be processed is as follows:

a1: acquiring characteristic information of the original video, wherein the characteristic information comprises the frame rate and the duration of the original video;

a2: extracting the original video frame by frame according to the frame rate and the duration information to obtain a frame by frame image of the original video;

a3: marking the frame-by-frame images according to the time information, and obtaining an image set containing all the frame-by-frame images; each image in the set of images contains a corresponding time stamp.

3. The method for extracting target segments from accelerated videos in real time as claimed in claim 1, wherein in S1, the method for compressing the videos to be processed is as follows:

b1: acquiring the code rate of the original video;

b2: calculating a compression ratio according to a preset code rate;

b3: and compressing the original video into a compressed video with low definition and low frame rate according to a compression ratio.

4. The method for extracting the target segment based on the accelerated video in real time as claimed in claim 1, wherein in S3, the image recognition model is established as follows:

s31: acquiring an existing initial image recognition model;

s32: collecting a plurality of characteristic pictures; dividing the characteristic pictures into a training set and a test set according to a preset proportion; the characteristic picture is a picture with target characteristics;

s33: sequentially inputting the characteristic pictures in the training set into the initial image recognition model to train the initial image recognition model, and iteratively updating the parameters of the image recognition model through the training process;

s34: and after the training is finished, inputting the characteristic pictures in the test set into the optimized image recognition model for testing, and reserving the parameters of the image recognition model meeting the recognition precision requirement to obtain the final image recognition model.

5. The method for extracting target segments from accelerated videos in real time according to claim 1, wherein the method for extracting high-definition pictures comprises the following steps:

s41: acquiring time axis information of the marked low-definition pictures in the compressed video;

s42: calculating the sequence number of the corresponding high-definition picture in the image set according to the ratio of the time of the low-definition picture in the time axis information in the compressed video to the total time of the compressed video;

s43: and extracting a corresponding high-definition picture according to the sequence number.

6. A real-time extraction system of target segments based on accelerated videos, which employs the recognition processing method according to any one of claims 1 to 5, and is characterized by comprising:

the acquisition module is used for acquiring the characteristic information of an input original video, including the frame rate, the duration and the code rate of the video;

the compression module is used for compressing the input video to be processed into a low-definition low-frame-rate compressed video;

the frame dividing module is used for dividing the original video to be processed into frames to obtain an image set containing a plurality of continuous high-definition pictures; the framing module is also used for framing the compressed video to obtain a plurality of low-definition pictures;

the identification module is used for extracting a picture with target characteristics from a plurality of low-definition pictures as a picture to be marked;

the extraction module is used for marking each picture to be marked and extracting a corresponding high-definition picture in the image set according to the time axis information of the marked low-definition picture;

7. The system according to claim 6, wherein the compression module employs a video encoder; the video encoder is used for converting the format of the video, and the definition and the frame rate of the video are changed along with the format conversion.

8. The system of claim 6, wherein the recognition module uses an image recognition model to recognize low-resolution images; the image recognition model is obtained by the following method:

acquiring an existing initial image recognition model; collecting a plurality of characteristic pictures; dividing a plurality of characteristic pictures into a training set and a test set according to a preset proportion; inputting pictures in a training set into an initial image recognition model to train the initial image recognition model, and iteratively updating parameters of the image recognition model through a training process; and inputting the pictures in the test set into the optimized image recognition model for testing, and reserving parameters of the image recognition model meeting the recognition precision requirement to obtain the final image recognition model.

9. The system according to claim 6, wherein the extraction module finds a high-definition picture for marking according to a pre-stored mapping relationship; wherein the mapping relation is expressed as:

T _i /T _A ＝O _j /O _s

wherein, T _i The time of the ith picture to be marked in the low-definition low-frame-rate video is determined; t is _A The total duration in the low-definition low-frame-rate video; o is _j The sequence number of the high-definition pictures in the picture set; o is _s The total number of pictures in the picture set.

10. An accelerated video-based target segment real-time extraction device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein each functional module in the accelerated video-based target segment real-time extraction device is deployed in the manner of the identification processing system according to any one of claims 6 to 9, and when the processor executes the computer program, the processor implements the steps of the accelerated video-based target segment real-time extraction method according to any one of claims 1 to 5, so as to extract a video segment containing target features from an accelerated video with a high definition and a high frame rate.