CN112235598A

CN112235598A - Video structured processing method and device and terminal equipment

Info

Publication number: CN112235598A
Application number: CN202011038705.4A
Authority: CN
Inventors: 刘海军; 顾鹏; 苏岚; 王成波; 刘毛
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-01-15
Anticipated expiration: 2040-09-27
Also published as: CN112235598B

Abstract

The application is applicable to the technical field of video processing, and provides a video structured processing method, a device, a terminal device and a storage medium, wherein the method comprises the following steps: caching each frame of image in the first N frames of images of the video stream to a cache block respectively, and acquiring a cache address of each frame of image in a first cache region; according to the cache address of each frame of image in the first cache region, sequentially reading the images from the first cache region according to a preset frame number sequence and carrying out structured processing; respectively detecting whether the images cached in each cache block are invalid or not according to the structured processing progress; and when the failed image is detected, clearing the failed image, and caching a new frame of image in a cache block corresponding to the failed image. According to the method and the device, the failed image can be removed when the failed image is detected, the cache can be released in time, and a new image can be cached after the release, so that the use efficiency of the cache is improved, the image is not easy to lose, and the efficiency of structural analysis and processing of the video is improved.

Description

Video structured processing method and device and terminal equipment

Technical Field

The application belongs to the technical field of video processing, and particularly relates to a video structured processing method and device and terminal equipment.

Background

Video structuring is increasingly widely applied to video monitoring, and the video structuring is to analyze and process videos to generate semantic structural description information of video contents. When the video information with the video structure is stored, transmitted and applied, the storage requirement and the transmission bandwidth can be greatly reduced compared with the original video, and the application of the video information is enriched.

The current video structuring processing mode has low use efficiency on cache, and is easy to lose images, so that the efficiency of video structuring analysis processing is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a video structured processing method, a video structured processing device, and a terminal device, so as to solve the problems in the prior art that the efficiency of video structured analysis processing is low and the efficiency of cache usage is low.

In a first aspect, an embodiment of the present application provides a video structured processing method, including:

dividing a first cache region into N cache blocks according to the resolution of the video stream; each cache block is used for storing a frame of image, the video stream comprises at least N frames of images, each frame of image corresponds to a frame number, and N is greater than or equal to 2;

caching each frame of image in the first N frames of images of the video stream to one cache block respectively, and acquiring a cache address of each frame of image in the first cache area;

according to the cache address of each frame of image in the first cache region, sequentially reading the images from the first cache region according to a preset frame number sequence and carrying out structured processing;

respectively detecting whether the images cached in each cache block are invalid or not according to the structured processing progress;

when a failed image is detected, clearing the failed image, and caching a new frame of image in a cache block corresponding to the failed image; the new image is an image after the first N frames of images of the video stream.

In an embodiment, the sequentially reading the images from the first buffer area according to the buffer address of each frame of image in the first buffer area and the preset frame number sequence and performing the structuring process includes:

according to the cache address of each frame of image in the first cache region, sequentially reading images from the first cache region in the order from small to large of the frame number for target identification, and obtaining all first images containing a first target;

screening out a second image meeting a preset condition from all the first images;

and carrying out structural description on the first target in the second image, and obtaining and storing structural description information of the first target.

In an embodiment, the sequentially reading images from the first buffer area for object identification according to the buffer addresses of each frame of image in the first buffer area in the order from small to large of the frame numbers to obtain all first images including a first object includes:

and sequentially reading images from the first cache region according to the cache address of each frame of image in the first cache region from small to large, and performing target identification through a pre-trained network model to obtain all first images containing a first target.

In one embodiment, the screening out the second image satisfying the preset condition from all the first images includes:

sequentially carrying out quality comprehensive scoring on the first target in each frame of the first image according to the sequence of the frame numbers from small to large;

and screening out second images meeting preset conditions from all the first images according to the quality comprehensive scoring results of the first targets in each frame of first image.

In one embodiment, the performing quality comprehensive scoring on the first target in each frame of the first image sequentially in the order from small to large of the frame number includes:

sequentially scoring the first target in each frame of the first image from J dimensions according to the sequence of the frame numbers from small to large; wherein the J dimensions include one or more of an image quality dimension, a size dimension, a target integrity dimension, and a target pose dimension of the first target;

and respectively obtaining a quality comprehensive scoring result of the first target in each frame of first image according to the scoring values of the first target in each frame of first image in the J dimensions and a preset weight factor.

performing quality comprehensive scoring on a first target in the first sub-image to obtain a first score; the first sub-image is a first image with the minimum frame number;

updating the historical score to the first score when the first score is larger than a preset threshold;

when the historical score is updated to the first score, intercepting a first target image in the first sub-image, encoding the first sub-image, and caching the first target image and the encoded first sub-image to a second cache region;

performing quality comprehensive scoring on the first target in the Kth sub-image to obtain a Kth score; the Kth sub-image is a first image with a small frame number Kth, wherein K is 2,3,4 …, M and M is the total number of all first images;

when the Kth score is larger than or equal to the sum of the historical score and a preset increase step value, updating the historical score to the Kth score;

when the historical score is updated to the Kth score, intercepting a first target image in the Kth sub-image, encoding the Kth sub-image, clearing the first target image cached in the second cache region and the encoded first image, and caching the first target image in the Kth sub-image and the encoded Kth sub-image to the second cache region;

the step of screening out a second image meeting a preset condition from all the first images according to the quality comprehensive scoring result of the first target in each frame of first image comprises the following steps:

and when the historical scores are not updated any more, acquiring a second image corresponding to the score which is updated last in the historical scores in all the first images.

In one embodiment, the performing the structural description on the first object in the second image, obtaining and storing structural description information of the first object includes:

acquiring a first target image and a coded second image corresponding to the second image from the second cache region;

detecting a preset attribute feature of a first target image corresponding to the second image, determining attribute information of the first target image corresponding to the second image, and obtaining structural description information of a first target in the second image; the structured description information of the first target in the second image includes a first target image corresponding to the second image, the encoded second image, and attribute information of the first target image corresponding to the second image.

coding the first target image to obtain a coded first target image;

detecting a preset attribute feature of a first target image corresponding to the second image, determining attribute information of the first target image corresponding to the second image, and obtaining structural description information of a first target in the second image; the structured description information of the first target in the second image includes the first target image corresponding to the second image after being encoded, and the attribute information of the first target image corresponding to the second image.

In one embodiment, each cache block has a predetermined tag stored therein;

the detecting whether the image cached in each cache block is invalid according to the structured processing progress comprises:

setting a preset mark in a cache block corresponding to an image which is not subjected to structured processing as a valid cache mark;

setting a preset mark in a cache block corresponding to the image with the structuralized processing as an invalid cache mark;

judging the image cached in the cache block in which the valid cache mark is detected to be valid;

and judging the image cached in the cache block in which the cache invalidation mark is detected as invalidation.

In one embodiment, before dividing the first buffer area into N buffer blocks according to the resolution of the video stream, the method includes:

detecting scene information of a video stream, and acquiring a capacity ratio of a first cache region and a second cache region corresponding to the scene information;

and setting the capacities of the first buffer area and the second buffer area according to the capacity ratio.

In one embodiment, the detecting scene information of the video stream includes: and carrying out feature detection on the images in the video stream, and determining corresponding scene information according to the detected feature information.

In one embodiment, the dividing the first buffer area into N buffer blocks according to the resolution of the video stream includes: determining the storage capacity for storing one frame of image according to the resolution of the video stream;

and dividing the first cache region into N cache blocks according to the storage capacity of the frame of image.

In a second aspect, an embodiment of the present application provides a video structured processing apparatus, including:

the buffer dividing module is used for dividing the first buffer area into N buffer blocks according to the resolution of the video stream; each cache block is used for storing a frame of image, the video stream comprises at least N frames of images, each frame of image corresponds to a frame number, and N is greater than or equal to 2;

the acquisition module is used for respectively caching each frame of image in the first N frames of images of the video stream to one cache block and acquiring the cache address of each frame of image in the first cache area;

the processing module is used for sequentially reading the images from the first cache region according to the cache address of each frame of image in the first cache region and a preset frame number sequence and carrying out structuralized processing;

the detection module is used for respectively detecting whether the image cached in each cache block is invalid or not according to the structured processing progress;

the clearing module is used for clearing the failed image when the failed image is detected, and caching a new frame of image in a cache block corresponding to the failed image; the new image is an image after the first N frames of images of the video stream.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the video structured processing method according to any one of the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for processing a video structure according to any one of the first aspect is implemented.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the video structuring processing method according to any one of the above first aspects.

Compared with the prior art, the first aspect of the embodiment of the application has the following beneficial effects:

according to the embodiment of the application, whether the cached image in each cache block is invalid or not can be detected according to the structured processing progress, the invalid image is removed when the invalid image is detected, the cache can be released in time, and a frame of new image can be cached after the release, so that the use efficiency of the cache is improved, the image is not easy to lose, and the efficiency of video structured analysis and processing is improved.

According to the method and the device, in all the first images containing the first target, when the score of the quality comprehensive score of the first target in the first image with the minimum frame number is larger than the preset threshold value, the first image with the minimum frame number is coded and the first target image is intercepted; when the value of the quality comprehensive score of the first target in the first image with the frame number of Kth is larger than or equal to the sum of the historical value and the preset growth step value, the first image with the Kth is encoded and intercepted, the intercepted first target image can be ensured to be the first target image with better quality, the encoding and screenshot operation caused by frequently meeting the conditions is avoided, and the efficiency of video structural analysis processing is further improved.

According to the embodiment of the application, due to the fact that the probability of the occurrence of the possible targets in different scenes is different, the capacity ratio of the first cache region to the second cache region is set according to the scenes, the capacity of the first cache region and the capacity of the second cache region are set, the capacity of the first cache region and the capacity of the second cache region can be reasonably set according to different scenes, the use efficiency of cache is improved, and the efficiency of video structural analysis and processing can be improved.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic view of an application scenario of a video structuring processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a video structuring processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a video structuring processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a video structuring processing method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating the corresponding scores of object A in the 1 st-4 th frames of images according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a video structuring processing method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a video structured processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The video structured processing method provided by the embodiment of the application can be applied to terminal devices such as desktop computers, notebook computers, Ultra-Mobile Personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), wearable devices, vehicle-mounted devices, Augmented Reality (AR)/Virtual Reality (VR) devices, televisions, robots, Mobile phones and the like. The embodiment of the application does not limit the specific type of the terminal device.

For example, in an application scenario, video monitoring is required, and since video data obtained by video monitoring is very large in volume and is not easy to store and transmit, required target information can be obtained by extracting a target and related information of the target in video data, for example, in video monitoring of smart traffic, smart city or smart security application, a target (a target object may be a person, a vehicle, other objects, and the like) needs to be checked, currently, the video can be checked manually, data content which can best express the target cannot be extracted, effective management and efficient retrieval cannot be performed on video data, and the like, in the application scenario, the video structured processing method of the present application is used for performing structured processing on video data, the video structured analysis processing can be performed efficiently, and the data content which can best express the target can be extracted, thereby, video data can be efficiently managed and retrieved. As shown in fig. 1, which is a schematic view of an application scene of video structuring processing, a video capture device 10 captures video data, and a terminal device 20 obtains the captured video data for structuring processing, so as to obtain target structured description information in video content. The structural description information corresponds to the original video data and the data volume is very small, the structural description information is stored in the storage device 30, the storage device 30 can be locally stored or transmitted to an external device for storage, target information in the video can be conveniently retrieved and browsed according to the stored structural description information, and only part of the related original video is called when necessary.

In order to explain the technical means described in the present application, the following examples are given below.

Fig. 2 is a schematic flowchart of a video structuring processing method according to an embodiment of the present application, and as shown in fig. 2, the video structuring processing method includes:

step S1, dividing the first buffer area into N buffer blocks according to the resolution of the video stream; each cache block is used for storing a frame of image, the video stream comprises at least N frames of images, each frame of image corresponds to a frame number, and N is greater than or equal to 2.

Specifically, a video stream is obtained, and the video stream may be: the original video data acquired by the video acquisition device, or the video data obtained by RTSP decoding the original video data, or the video stream may be the video data transmitted from a local storage or receiving external device. The video capture device may be a camera with an MIPI interface as described above. The first cache region is preset to be used for caching a video stream, the larger the resolution of an image in the video stream is, the larger the cache that needs to be occupied is, the first cache region is divided into N cache blocks according to the resolution of the video stream, and the capacity of each image block may be the size of one frame of image in the video stream.

In one embodiment, the dividing the first buffer area into N buffer blocks according to the resolution of the video stream includes: determining the storage capacity for storing one frame of image according to the resolution of the video stream; and dividing the first cache region into N cache blocks according to the storage capacity of the frame of image. If the storage space size of the first buffer area is T, the storage capacity size for storing each frame of image is determined to be Q according to the resolution of the video stream image, the buffer area is divided into T/Q-N buffer blocks, and N is an integer rounded down by the result of T/Q.

Step S2, respectively caching each frame of image in the first N frames of images of the video stream into one cache block, and obtaining a cache address of each frame of image in the first cache area.

Specifically, after the first buffer area is divided into N buffer blocks, each buffer block may store one frame of image, and the first N frames of images of the video stream may be stored in one buffer block. The cache address for acquiring each frame of image in the first cache region may be an address of a cache block stored in the acquired each frame of image.

And step S3, reading images from the first buffer area in sequence according to the buffer address of each frame of image in the first buffer area and performing structuring processing.

Specifically, according to the address of the cache block stored in the image, the image is sequentially read from the corresponding cache block according to the preset frame number sequence, and the image is subjected to structuring processing.

In one embodiment, as shown in fig. 3, the step S3 specifically includes steps S31 to S33:

step S31, sequentially reading images from the first buffer area in the order from small to large according to the buffer address of each frame of image in the first buffer area, and performing target identification to obtain all first images including the first target.

Specifically, in the images stored in the first buffer area, the images are sequentially read from small to large according to the size of the frame number of the image in the video stream for object recognition, the object recognition may be performed by using an object recognition algorithm, and when a plurality of images including the same object are recognized, the plurality of images including the same object may be: a plurality of images containing the same object in a video stream with continuous sequence numbers are all called first images, or a plurality of images containing the same object in one video stream are all called first images. This same object is referred to as a first object, and an image containing the first object is referred to as a first image. The video stream is composed of a plurality of frames of images in sequence, and the frame number is understood to be the sequence number of the images in the sequence in the video stream.

In one embodiment, the step of sequentially reading images from the first buffer area for object identification according to the buffer addresses of each frame of image in the first buffer area in the order from small to large of the frame numbers to obtain all first images including a first object includes: and sequentially reading images from the first cache region according to the cache address of each frame of image in the first cache region from small to large, and performing target identification through a pre-trained network model to obtain all first images containing a first target.

Specifically, the pre-trained network model may be a trained neural network model, a neural network model for target recognition is pre-constructed, and specifically, the pre-trained network model may be constructed by a lightweight network model, and the constructed network model is trained to perform target recognition on the trained neural network model. The process of training the neural network can be that a large number of pictures including various types of targets to be recognized are prepared in advance, the targets in the pictures are labeled, a large number of pictures containing the targets are prepared to train the neural network model until the preset loss function of the neural network model converges, and the neural network model is judged to be the trained neural network model.

And step S32, screening out a second image meeting a preset condition from all the first images.

Specifically, a first image satisfying a preset condition may be selected from first images including the same object, and may be referred to as a second image.

In one embodiment, the screening out the second image satisfying the preset condition from all the first images includes: sequentially carrying out quality comprehensive scoring on the first target in each frame of the first image according to the sequence of the frame numbers from small to large; and screening out second images meeting preset conditions from all the first images according to the quality comprehensive scoring results of the first targets in each frame of first image.

Specifically, the quality of the first targets in the first images may be comprehensively scored, and the first images meeting the preset condition may be screened out according to the comprehensive quality scoring result of the first targets in each first image, which is called as second images.

In one embodiment, the performing quality comprehensive scoring on the first target in each frame of the first image sequentially in the order from small to large of the frame number includes: sequentially scoring the first target in each frame of the first image from J dimensions according to the sequence of the frame numbers from small to large; wherein the J dimensions include one or more of an image quality dimension, a size dimension, a target integrity dimension, and a target pose dimension of the first target. And respectively obtaining a quality comprehensive scoring result of the first target in each frame of first image according to the scoring values of the first target in each frame of first image in the J dimensions and a preset weight factor.

Specifically, the quality comprehensive scoring of the target in the image may be: the method comprises the steps of firstly scoring a target in an image from J dimensions, multiplying the score value of each dimension by a corresponding preset weight factor to obtain a quality comprehensive score value of each dimension, and adding the quality comprehensive score values of each dimension to obtain a quality comprehensive score value of the target.

In one embodiment, as shown in fig. 4, step S32 specifically includes steps S321 to S327:

step S321, performing quality comprehensive scoring on the first target in the first sub-image to obtain a first score; wherein the first sub-image is a first image with a minimum frame number.

Specifically, when the first target is detected for the first time, the image of the first target detected for the first time is used as a first image with the minimum frame number, the first image is called a first sub-image, medium-quality comprehensive scoring is performed on the first target in the first sub-image, and the obtained quality comprehensive scoring value is the first score.

Step S322, when the first score is greater than a preset threshold, updating the historical score to the first score.

Specifically, when the first score of the first target in the first sub-image is greater than the preset threshold, it indicates that the image quality of the first target in the first sub-image meets the requirement, the historical score of the first target is updated to the first score, and the initial value of the historical score corresponding to each target is zero.

Step S323, when the historical score is updated to the first score, intercepting a first target image in the first sub-image, encoding the first sub-image, and caching the first target image and the encoded first sub-image in a second cache region.

Specifically, when the first score of the first target in the first sub-image is greater than the preset threshold, the historical score is updated to the first score, that is, when the historical score is updated to the first score, the first target in the first sub-image meets the requirement, the first target image in the first sub-image is intercepted, the first sub-image is encoded, and the encoding may be compression encoding, and the first target image and the encoded first sub-image are cached in the second cache region. At this time, the first target in the first sub-image only meets the requirement temporarily, and if it is detected that the image quality of the first target in the sub-image containing the first target meets the requirement, the first target image intercepted from the first sub-image in the second buffer area and the encoded first sub-image are replaced.

Step S324, carrying out quality comprehensive grading on the first target in the Kth sub-image to obtain a Kth score; the kth sub-image is a first image with a frame number of kth being small, where K is 2,3,4 …, and M is the total number of all first images.

Specifically, when the first target is detected at the Kth time, the image of the first target detected at the Kth time is used as a first image with a small frame number Kth, the first image is called a Kth sub-image, the first target in the Kth sub-image is subjected to quality comprehensive scoring, and the obtained quality comprehensive scoring value is a Kth score.

Step S325, when the kth score is greater than or equal to the sum of the historical score and a preset increase step value, updating the historical score to the kth score.

Specifically, the preset increasing step value may be a preset increasing step value, when the kth score of the first target in the kth sub-image is greater than or equal to the sum of the historical score and the preset increasing step value, it indicates that the image quality of the first target in the kth sub-image better meets the requirement than the first target image detected at the previous K-1 times, and the historical score is updated to the kth score.

Step S326, when the historical score is updated to the kth score, intercepting the first target image in the kth sub-image, encoding the kth sub-image, clearing the first target image and the encoded first image cached in the second cache region, and caching the first target image and the encoded kth sub-image in the second cache region.

Specifically, when the kth score is greater than or equal to the sum of the historical score and a preset increase step value, the historical score is updated to the kth score, that is, when the historical score is updated to the kth score, a first target in the kth sub-image is more satisfactory than a first K-1 sub-image, at this time, a first target image in the kth sub-image is intercepted, the kth sub-image is encoded, and the first target image and the first encoded image cached at present in the second cache region are removed, the first encoded image cached at present in the second cache region is one of the first K-1 sub-images, and the first target image in the kth sub-image and the kth sub-image cached after encoding are cached in the second cache region.

Step S327, when the historical score is not updated any more, acquiring a second image corresponding to the score which is updated last in the historical score in all the first images.

Specifically, the step of no longer updating the historical scores of the first targets may be to finish quality comprehensive scoring on all the first targets in the first images, and when the step of updating whether the historical scores are performed on all the scores of the quality comprehensive scoring of the first images is finished, the first image corresponding to the score updated last by the historical score is referred to as a second image.

When the score of the quality comprehensive score of the first target in the first image with the minimum frame number is larger than a preset threshold value, the first image with the minimum frame number is encoded and the first target image is intercepted; when the value of the quality comprehensive score of the first target in the first image with the frame number of Kth is larger than or equal to the sum of the historical value and the preset growth step value, the first image with the Kth is encoded and the first target image is intercepted, so that the image with better quality and containing the first target can be intercepted, the encoding and screenshot operation caused by frequently meeting the conditions is avoided, and the efficiency of the video structural analysis processing is further improved.

For example, taking an application scenario as an example, if a target a is identified (the target a may be understood as the first target), the 1 st frame image is used as a first sub-image of the target a (i.e., the first image with the smallest frame number corresponding to the target a), the quality of the target a in the 1 st frame image is comprehensively scored to obtain a first score, when the first sub-image meets a condition (i.e., the first score is greater than a preset threshold) of the target a, the historical score of the target a is updated to the first score, and the target a image is encoded and captured for the first sub-image. Caching the coded and intercepted target A image to a second cache region; performing target identification on a 2 nd frame image in the video stream, if a target A is also identified in the 2 nd frame image, taking the 2 nd frame image as a 2 nd sub-image of the target A (namely, a first image with a frame number smaller than 2 corresponding to the target A), performing quality comprehensive scoring on the target A in the 2 nd frame image to obtain a second score, updating the historical score of the target A to the second score when the target A in a second sub-image meets a condition (namely, the second score is greater than or equal to the sum of the current historical score and a preset growth step value), encoding and intercepting the target A image of the second sub-image, clearing the target A image and the encoded first sub-image which are cached in the second cache region at present, and caching the target A image and the encoded second sub-image in the second sub-image to the second cache region; and then, performing target identification on the 3 rd frame image in the video stream, if the target A is also identified in the 3 rd frame image, taking the 3 rd frame image as a 3 rd sub-image of the target A (namely, a first image with a frame number which is smaller than the 3 rd frame number corresponding to the target A), performing quality comprehensive scoring on the target A in the 3 rd frame image to obtain a third score, and when the condition of the target A in the third sub-image is not met (namely, the third score is smaller than the sum of the current historical score and a preset growth step value), discarding the 3 rd frame image, namely, finishing the processing of the 3 rd frame image. Then, performing target identification on the 4 th frame image in the video stream, if a target a is also identified in the 4 th frame image, taking the 4 th frame image as a 4 th sub-image of the target a (i.e. a first image with a frame number smaller than 4 corresponding to the target a), performing quality comprehensive scoring on the target a in the 4 th frame image to obtain a fourth score, when the fourth sub-image meets a condition (i.e. the fourth score is greater than or equal to the sum of the current history score and a preset growth step value), updating the history score of the target a to the fourth score, encoding and intercepting the target a image, then clearing the target a image and the encoded second sub-image currently cached in the second cache region, caching the target a in the fourth sub-image and the encoded fourth sub-image in the second cache region, as shown in fig. 5, fig. 5 is a schematic diagram of corresponding scores of the target a in the 1 st frame image, the 2 nd frame image, the 3 rd frame image and the 4 th frame image, where th denotes a preset threshold value and step denotes a preset increase step value. According to the processing modes of the 1 st frame image, the 2 nd frame image, the 3 rd frame image and the 4 th frame image, corresponding processing is carried out on the subsequent frames of the video stream in such a way, so that the image of the target A and the sub-image of the target A after corresponding coding can be replaced according to the comprehensive quality scoring result of the target A, the image of the target A recorded in the second cache region by the target A and the sub-image of the target A after corresponding coding are replaced, a better image of the target A is ensured to be obtained, coding and screenshot operation caused by frequently meeting conditions is avoided, and complexity is increased for the whole structural process. The examples are given solely to aid understanding and are not intended to limit the relevant disclosure.

And step S33, performing structural description on the first object in the second image, obtaining and storing structural description information of the first object.

Specifically, the structural description of the first object in the second image may be to determine one or more of a first object image corresponding to the second image, the encoded second image, attribute information of the first object image corresponding to the second image, and a frame number of the second image in the video stream. Such as when the target is a person, the attribute information may include, but is not limited to, one or more of gender, age, hat, glasses, mask, and length of hair. If the target is a vehicle, the attribute information may include, but is not limited to, one or more of vehicle color, vehicle type, and vehicle brand.

In one embodiment, the performing the structural description on the first object in the second image, obtaining and storing structural description information of the first object includes: acquiring a first target image and a coded second image corresponding to the second image from the second cache region; detecting a preset attribute feature of a first target image corresponding to the second image, determining attribute information of the first target image corresponding to the second image, and obtaining structural description information of a first target in the second image; the structured description information of the first target in the second image includes a first target image corresponding to the second image, the encoded second image, and attribute information of the first target image corresponding to the second image.

Specifically, the first target image corresponding to the second image is the first target image obtained after the first target capturing operation is performed on the second image. The first target image and the encoded second image corresponding to the second image are currently cached in the second cache region, the first target image and the encoded second image corresponding to the second image can be directly obtained from the second cache region, preset attribute features are carried out on the first target image corresponding to the second image, attribute information of the first target image corresponding to the second image is detected, and structural description information of the first target in the second image is obtained.

In another embodiment, the performing the structural description on the first object in the second image, obtaining and storing structural description information of the first object, includes: acquiring a first target image and a coded second image corresponding to the second image from the second cache region; coding the first target image to obtain a coded first target image; detecting a preset attribute feature of a first target image corresponding to the second image, determining attribute information of the first target image corresponding to the second image, and obtaining structural description information of a first target in the second image; the structured description information of the first target in the second image includes the first target image corresponding to the second image after being encoded, and the attribute information of the first target image corresponding to the second image.

Specifically, in order to save more storage space, the first target image is encoded to obtain an encoded first target image, the encoding is compression encoding, and the structural description information includes the encoded first target image, the encoded second image, and attribute information of the first target image corresponding to the second image.

Step S4, respectively detecting whether the image cached in each cache block is invalid according to the structured processing progress.

Specifically, whether the image cached in each cache is invalid or not is detected according to the structured processing progress of the image cached in each cache block.

In one embodiment, each cache block has a predetermined tag stored therein; according to the structured processing progress, whether the image cached in each cache block is invalid or not is respectively detected, and the method comprises the following steps: setting a preset mark in a cache block corresponding to an image which is not subjected to structured processing as a valid cache mark; setting a preset mark in a cache block corresponding to the image with the structuralized processing as an invalid cache mark; judging the image cached in the cache block in which the valid cache mark is detected to be valid; and judging the image cached in the cache block in which the cache invalidation mark is detected as invalidation.

Specifically, when the image cached in the cache block is not structured or is structured, the preset flag in the corresponding cache block is set as the valid cache flag. Setting a preset mark in a cache block corresponding to an image with the end of the structuring processing as an invalid cache mark, judging whether the structuring processing is ended or not, wherein the preset mark can be a reference count reference _ cnt, and adding 1 to the reference count when the image cached in the cache block is structured; at the end of the structuring process, the reference count is decremented by 1, and when the reference count returns to 0, the end of the structuring process is indicated.

Step S5, when a failed image is detected, the failed image is cleared, and a new frame of image is cached in the cache block corresponding to the failed image; the new image is an image after the first N frames of images of the video stream.

Specifically, when detecting that an image in a cache block is invalid, the invalid image in the cache block is cleared, and a new image is cached in the cache block corresponding to the invalid image, so that the new image is conveniently structured, and the new image is an image after the first N frames of images of the video stream.

In one embodiment, as shown in fig. 6, before the dividing the first buffer area into N buffer blocks according to the resolution of the video stream, the method includes steps S21 to S22:

step S21, detecting scene information of the video stream, and obtaining a capacity ratio of the first buffer area to the second buffer area corresponding to the scene information.

Specifically, for performing the structuring process on the video stream, the image in the video stream needs to be structured by the above structuring process method, and the related data can be directly obtained from the cache to be processed in the process of the structuring process. The existing cache space is divided into a first cache area and a second cache area, the first cache area is used for caching images in video streams, and the second cache area is used for caching data needing to be cached in the structured processing process of encoding or target interception and the like of the images. The method includes the steps of storing a preset relation mapping table between different scene information and capacity ratios of a first cache region and a second cache region corresponding to the different scene information in advance, for example, the scene information is a traffic scene (for example, a video stream is a video stream shot by a camera at a certain intersection) and is pre-associated with a first capacity ratio, the scene information is an indoor scene (for example, a certain supermarket or a shopping mall) and is pre-associated with a second capacity ratio, and the like. Detecting scene information of a video stream, and acquiring a capacity ratio of a first cache region and a second cache region corresponding to the scene information according to a preset relational mapping table.

Specifically, in order to reasonably determine the allocation of the first buffer area and the second buffer area, the pre-training is performed, and the pre-training process may be to pre-collect video streams of a plurality of different scenes, extract feature information included in scene information in the video streams of different scenes, and pre-store feature information included in the different scenes. The image in the video stream can be subjected to feature detection through a feature detection algorithm, and corresponding scene information is determined according to the detected feature information. For example, the feature information corresponding to the traffic scene includes, but is not limited to, feature information of contours and colors of objects such as an intersection, a traffic light, and a vehicle, and when the image in the video stream detects the feature information of the contours and colors of one or more objects such as the intersection, the traffic light, and the vehicle, the scene corresponding to the video is determined to be the traffic scene.

In one embodiment, the scene information may also be directly input by the local terminal of the user or sent by the external terminal, and when the scene information is acquired or received, the scene information is used as the scene information of the video stream.

Step S22, setting the capacities of the first buffer area and the second buffer area according to the capacity ratio.

Specifically, the capacities of the first cache region and the second cache region are set according to the capacity ratio of the first cache region and the second cache region corresponding to the scene identification, and the capacities of the first cache region and the second cache region can be reasonably set according to different scenes, so that the use efficiency of the cache is improved, and the efficiency of structured analysis and processing of the video can be improved.

Because the probability of the occurrence of the possible targets in different scenes is different, the capacity ratio of the first cache region to the second cache region is set according to the scenes, the capacity of the first cache region and the capacity of the second cache region are set, the capacity of the first cache region and the capacity of the second cache region can be reasonably set according to different scenes, the use efficiency of the cache is improved, and the efficiency of video structural analysis and processing can be improved.

The above video structuring processing method is explained by the fact that the image includes a first object, when a plurality of first objects are included in the video, the video includes a first object a, a first object B, a first object C, and the like, each first object can be processed by the video structuring processing method in the above embodiment as well, and when the object images of the plurality of first objects need to be intercepted and the original images corresponding to the plurality of first objects need to be encoded in the same frame of original image, the original image does not need to be encoded repeatedly, only once encoding is needed, the storage address of the encoding result is recorded, and the plurality of first objects share the encoded image.

The embodiment of the present application further provides a video structuring processing device, which is configured to execute the steps in the foregoing video structuring processing method embodiment. The video structuring processing device may be a virtual appliance (virtual application) in the terminal device, which is executed by a processor of the terminal device, or may be the terminal device itself.

As shown in fig. 7, a video structuring processing device 700 provided in the embodiment of the present application includes:

a buffer dividing module 701, configured to divide the first buffer area into N buffer blocks according to a resolution of the video stream; each cache block is used for storing a frame of image, the video stream comprises at least N frames of images, each frame of image corresponds to a frame number, and N is greater than or equal to 2;

an obtaining module 702, configured to respectively cache each frame of image in the first N frames of images of the video stream to one cache block, and obtain a cache address of each frame of image in the first cache area;

the processing module 703 is configured to sequentially read images from the first buffer area according to a preset frame number sequence and perform structuring processing according to the buffer address of each frame of image in the first buffer area;

a detecting module 704, configured to respectively detect whether the image cached in each cache block is invalid according to the structured processing progress;

a clearing module 705, configured to clear a failed image when the failed image is detected, and cache a new frame of image in a cache block corresponding to the failed image; the new image is an image after the first N frames of images of the video stream.

In one embodiment, the processing module 703 specifically includes:

the identification unit is used for reading images from the first cache region in sequence from small to large according to the cache address of each frame of image in the first cache region and carrying out target identification to obtain all first images containing a first target;

the screening unit is used for screening out a second image meeting a preset condition from all the first images;

and the obtaining unit is used for carrying out structural description on the first target in the second image, obtaining and storing structural description information of the first target.

In one embodiment, the identification unit is specifically configured to:

In one embodiment, the screening unit includes:

the scoring subunit is used for sequentially carrying out quality comprehensive scoring on the first target in each frame of the first image according to the sequence of the frame numbers from small to large;

and the screening subunit is used for screening out a second image meeting a preset condition from all the first images according to the quality comprehensive scoring result of the first target in each frame of first image.

In one embodiment, the scoring subunit is specifically configured to:

sequentially scoring the first target in each frame of the first image from J dimensions according to the sequence of the frame numbers from small to large; wherein the J dimensions include one or more of an image quality dimension, a size dimension, a target integrity dimension, and a target pose dimension of the first target.

In one embodiment, the scoring subunit is specifically configured to: performing quality comprehensive scoring on a first target in the first sub-image to obtain a first score; the first sub-image is a first image with the minimum frame number;

the screening subunit is specifically configured to: and when the historical scores are not updated any more, acquiring a second image corresponding to the score which is updated last in the historical scores in all the first images.

In one embodiment, the obtaining unit specifically includes:

the first obtaining subunit is configured to obtain, from the second cache area, a first target image and a coded second image that correspond to the second image;

the first determining subunit is configured to perform preset attribute feature detection on a first target image corresponding to the second image, determine attribute information of the first target image corresponding to the second image, and obtain structural description information of a first target in the second image; the structured description information of the first target in the second image includes a first target image corresponding to the second image, the encoded second image, and attribute information of the first target image corresponding to the second image.

In another embodiment, the obtaining unit specifically includes:

the second obtaining subunit is configured to obtain, from the second cache area, a first target image and a coded second image that correspond to the second image;

the coding subunit is used for coding the first target image to obtain a coded first target image;

the second determining subunit is configured to perform preset attribute feature detection on the first target image corresponding to the second image, determine attribute information of the first target image corresponding to the second image, and obtain structural description information of the first target in the second image; the structured description information of the first target in the second image includes the first target image corresponding to the second image after being encoded, and the attribute information of the first target image corresponding to the second image.

In one embodiment, each cache block has a predetermined tag stored therein; the detection module specifically includes:

the first setting unit is used for setting a preset mark in a cache block corresponding to an image which is not finished in the structuralized processing as a valid cache mark;

the second setting unit is used for setting the preset mark in the cache block corresponding to the image with the structuralized processing as an invalid cache mark;

a first cache unit configured to determine that an image cached in a cache block in which a valid cache flag is detected is valid;

and the second cache unit is used for judging the cached image in the cache block in which the invalidation cache mark is detected as invalidation.

In one embodiment, the video structuring processing device 700 further comprises:

the device comprises a determining module, a capacity determining module and a capacity determining module, wherein the determining module is used for detecting scene information of a video stream and acquiring the capacity ratio of a first cache region and a second cache region corresponding to the scene information;

and the capacity setting module is used for setting the capacities of the first cache region and the second cache region according to the capacity ratio.

In one embodiment, the cache partitioning module is specifically configured to:

determining the storage capacity for storing one frame of image according to the resolution of the video stream;

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

As shown in fig. 8, an embodiment of the present invention further provides a terminal device 800 including: a processor 801, a memory 802 and a computer program 803, such as a video structuring process program, stored in the memory 802 and executable on the processor 801. The processor 801, when executing the computer program 803, implements the steps in the various video structuring method embodiments described above. The processor 801, when executing the computer program 803, implements the functions of the modules in the device embodiments described above, such as the functions of the modules 701 to 705 described above.

Illustratively, the computer program 803 may be partitioned into one or more modules that are stored in the memory 802 and executed by the processor 801 to implement the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 803 in the terminal device 800. For example, the computer program 803 may be divided into a cache dividing module, an obtaining module, a processing module, a detecting module, and a clearing module, and specific functions of the modules are described in the foregoing embodiments, and are not described herein again.

The terminal device 800 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 801, a memory 802. Those skilled in the art will appreciate that fig. 8 is merely an example of a terminal device 800 and does not constitute a limitation of terminal device 800 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 801 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 802 may be an internal storage unit of the terminal device 800, such as a hard disk or a memory of the terminal device 800. The memory 802 may also be an external storage device of the terminal device 800, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 800. Further, the memory 802 may also include both an internal storage unit and an external storage device of the terminal apparatus 800. The memory 802 is used for storing the computer programs and other programs and data required by the terminal device. The memory 802 may also be used to temporarily store data that has been output or is to be output.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A video structuring method, comprising:

2. The video structuring processing method according to claim 1, wherein said reading the images from the first buffer area in sequence according to the buffer address of each frame of image in the first buffer area and according to the preset frame number order and performing structuring processing comprises:

3. The video structural processing method according to claim 2, wherein the step of reading images from the first buffer area in sequence from small to large frame numbers for object identification according to the buffer address of each frame of image in the first buffer area to obtain all first images including a first object comprises:

4. The video structuring processing method according to claim 2, wherein said screening out the second image satisfying the preset condition from all the first images comprises:

5. The video structural processing method according to claim 4, wherein said performing quality comprehensive scoring on the first target in the first image of each frame in sequence from small to large includes:

6. The video structural processing method according to claim 4, wherein said performing quality comprehensive scoring on the first target in the first image of each frame in sequence from small to large includes:

7. The video structural processing method according to claim 6, wherein said structurally describing a first object in the second image, obtaining and storing structural description information of the first object comprises:

8. The video structural processing method according to claim 6, wherein said structurally describing a first object in the second image, obtaining and storing structural description information of the first object comprises:

coding the first target image to obtain a coded first target image;

9. The method of claim 1, wherein each cache block has a predetermined flag stored therein;

10. The video structuring processing method according to any one of claims 5 to 9, wherein said dividing the first buffer area into N buffer blocks according to the resolution of the video stream comprises:

11. The method according to claim 10, wherein the detecting scene information of the video stream comprises:

and carrying out feature detection on the images in the video stream, and determining corresponding scene information according to the detected feature information.

12. The video structuring processing method according to any one of claims 1 to 9, wherein said dividing the first buffer area into N buffer blocks according to the resolution of the video stream comprises:

13. A video structuring processing device, comprising:

14. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 12 when executing the computer program.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 12.