CN111708635A

CN111708635A - Video intelligent grading processing system and method

Info

Publication number: CN111708635A
Application number: CN202010547328.0A
Authority: CN
Inventors: 宋博然; 段立新; 何宜兵; 张神力; 蔡忠鹏
Original assignee: Shenzhen Tianhai Chenguang Technology Co ltd
Current assignee: Shenzhen Tianhai Chenguang Technology Co ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2020-09-25

Abstract

The invention relates to a video intelligent grading processing system and a method, wherein the system comprises: the device end, the edge end and the display terminal; the method comprises the following steps: the equipment terminal carries out primary identification on the collected picture or video and gives a primary identification result; the edge end carries out further refined identification processing on the picture or the video, and identifies more refined structured data; and the display terminal displays the structured data. The video intelligent grading processing system and method provided by the invention can be used for preliminarily judging the identification result at the equipment end, thereby reducing the operation pressure of the edge end and reducing the operation cost of the whole system.

Description

Video intelligent grading processing system and method

Technical Field

The invention relates to the field of machine vision, in particular to an intelligent video grading processing system and method.

Background

In the field of machine vision, application scenarios such as face recognition and vehicle recognition have been widely applied. In the practical application process, two technical implementation modes exist, one implementation mode is that a camera product has the recognition capability of complete artificial intelligence machine vision, and the camera product is very expensive; another implementation manner is that a common camera product transmits a shot video to an edge server or a central server for processing, which may cause too much operation pressure on the edge server or the central server, which may cause high deployment cost of the server.

Disclosure of Invention

In view of the defects of the prior art, the invention aims to provide a video intelligent hierarchical processing system and method, which are used for performing distributed deployment on the operation pressure of video intelligent processing and performing primary judgment on an identification result at an equipment end, so that the operation pressure of an edge end is reduced, and the operation cost of the whole system is reduced.

The technical scheme provided by the invention is as follows:

a video intelligent hierarchical processing system, wherein the system comprises:

and the equipment end is used for acquiring the picture or the video, preliminarily identifying the acquired picture or the video and giving a preliminary identification result.

And the edge terminal is used for further performing fine identification processing on the picture or the video and identifying finer structured data.

And the display terminal is used for displaying the picture or the video and the structured data.

The video intelligent hierarchical processing system comprises the equipment side, wherein the equipment side specifically comprises:

and the image acquisition module is used for acquiring picture or video data.

And the frame extracting module is used for extracting frames of the collected video data to obtain the picture data.

And the initial identification module is used for carrying out initial target identification on the collected picture or the picture obtained by frame extraction and giving an initial identification result.

And the communication module is used for sending the acquired picture or video data and the primary identification result data to the edge terminal.

The video intelligent hierarchical processing system comprises the following specific components:

and the communication module is used for receiving the picture or video data and the preliminary identification result data and sending the refined identification result data and the picture or video to the display terminal.

And the frame extracting module is used for extracting frames from the received video data to obtain the picture data.

The fine identification module is used for judging the received picture or video and the data of the primary identification result thereof and not processing the picture or video segment without the primary identification result; and performing refined identification on the picture with the initial identification result or the picture obtained by frame extraction of the video clip to obtain the structural data description of the identification result.

And the identification result module is used for encapsulating the structural description of the refined identification result into a data frame and sending the data frame to the display terminal through the communication module.

The video intelligent hierarchical processing system, wherein the display terminal specifically includes:

and the data receiving module is used for receiving the picture or video data and the refined identification result data.

And the decoding and synchronizing module is used for decoding the picture or the video and synchronizing the identification result structured data and the video.

And the display module is used for displaying the pictures or videos and the structured data.

Further, the invention also discloses a video intelligent hierarchical processing method, wherein the method comprises the following steps:

A. the method comprises the steps that a device side collects pictures or videos, initial identification is conducted on the pictures or the videos, and then the pictures or the videos and initial identification result data are sent.

B. And the edge end performs fine identification on the received picture or video and sends the structured data of the identification result and the picture or video.

C. The display terminal decodes and displays the picture or video and the structured data.

The video intelligent hierarchical processing method comprises the following steps:

a1, an image acquisition module of the equipment end, acquiring picture or video data.

A2, a frame extracting module of the device side, extracting the frame of the collected video data to obtain the picture data.

A3, an initial identification module of the device end, which performs initial target identification on the collected picture or the picture obtained by frame extraction and gives an initial identification result.

A4, the communication module of the equipment end sends the collected picture or video data and the data of the preliminary identification result to the edge end.

The video intelligent hierarchical processing method, wherein the step B specifically includes:

b1, the communication module of the edge end receives the picture or video data and the preliminary identification result data, and sends the refined identification result data and the picture or video to the display terminal.

And B2, the frame extracting module of the edge end extracts the frame of the received video data to obtain the picture data.

B3, a fine identification module of the edge end, which is used for judging the received picture or video and the data of the preliminary identification result, and not processing the picture or video segment without the preliminary identification result; and performing refined identification on the picture with the initial identification result or the picture obtained by frame extraction of the video clip to obtain the structural data description of the identification result.

B4, the identification result module of the edge end encapsulates the structural description of the refined identification result into a data frame, and the data frame is sent to the display terminal by the communication module.

The video intelligent hierarchical processing method, wherein the step C specifically includes:

c1, a data receiving module of the display terminal receives the picture or video data and the refined identification result data.

C2, a decoding and synchronizing module of the display terminal decodes the picture or the video and synchronizes the identification result structured data and the video.

C3, a display module of the display terminal, displaying pictures or videos, and structured data.

Based on the video intelligent grading processing system and method provided by the invention, the operation pressure of video intelligent processing is distributed and deployed, and the identification result is preliminarily judged at the equipment end, so that the operation pressure of the edge end is reduced, and the operation cost of the whole system is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a system architecture block diagram of a video intelligent hierarchical processing system of the present invention.

Fig. 2 is a functional structure block diagram of the device side in the system architecture of the video intelligent hierarchical processing system according to the present invention.

Fig. 3 is a functional structure block diagram of an edge end in the system architecture of the video intelligent hierarchical processing system according to the present invention.

Fig. 4 is a functional structure block diagram of a display terminal in the system architecture of the video intelligent hierarchical processing system according to the present invention.

Fig. 5 is a flow chart of a video intelligent hierarchical processing method according to a first preferred embodiment of the present invention.

Fig. 6 is a flow chart of a second preferred embodiment of the video intelligent hierarchical processing method according to the present invention.

Fig. 7 is a flow chart of a third preferred embodiment of the video intelligent hierarchical processing method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a system architecture block diagram of a video intelligent hierarchical processing system, which is shown in FIG. 1. The method specifically comprises the following steps:

device end 100, edge end 200, display terminal 300.

The device end 100 is configured to collect a picture or a video, perform preliminary identification on the collected picture or video, and provide a preliminary identification result; fig. 1 only shows an application example, and in practical applications, the device side 100 generally includes a plurality of devices; the device side 100 is identified by a unique device ID.

The edge terminal 200 is configured to perform further fine identification processing on the picture or the video, and identify finer structured data; the edge terminal 200 is generally referred to as an edge algorithm server; the edge terminal 200 is generally connected to a plurality of input device terminals 100.

The display terminal 300 is configured to display a picture or a video and structured data.

The invention provides a functional structure block diagram of a device side in a system architecture of a video intelligent hierarchical processing system, as shown in fig. 2. The method specifically comprises the following steps:

the system comprises an image acquisition module 101, a frame extraction module 102, an initial identification module 103 and a communication module 104.

The image acquisition module 101 is used for acquiring picture or video data; in practical application, the image acquisition module 101 may be a camera or a bayonet camera; the captured picture encoding format includes, but is not limited to: JPEG, JPEG2000, BMP; the captured video data encoding format includes, but is not limited to: h.264, h.265; the picture is identified by a unique picture ID; the video is identified by a unique video ID.

The frame extracting module 102 is configured to perform frame extraction on the acquired video data to obtain picture data; the frame extracting module 102 extracts independently decodable picture frames and/or non-independently decodable picture frames in the video data sequence, and re-encodes the decoded picture frames; the encoding format of the re-encoded picture data includes, but is not limited to: JPEG, JPEG2000, BMP; the picture data is identified by a unique picture ID.

The initial identification module 103 is configured to perform initial target identification on the picture data and provide an initial identification result; the picture data includes pictures acquired by the image acquisition module 101 and pictures obtained by frame extraction by the frame extraction module 102; the encoding format of the picture includes but is not limited to: JPEG, JPEG2000, BMP; the initial identification module 103 performs target detection and identification on the picture data; the target detection and identification is used for detecting and identifying whether a target object exists in the picture; the target object is different according to different actual application scenes; the target objects include, but are not limited to: personnel, automobiles, non-motorized vehicles; under different application scenes, the target objects can be one or more, and are determined according to the actual application scenes; when the initial identification module 103 detects that a target object exists in the picture, basic structural description is performed on the target object; the structured description includes types of target objects including, but not limited to: personnel, automobiles, non-motorized vehicles; the structural description is associated with a specific picture; when the target object does not exist in the picture, the structural description of the target object cannot be generated.

The communication module 104 is configured to send the acquired picture or video data and the preliminary identification result data to the edge; the sent picture data comprises picture coding data and picture description data; the picture description data comprises a device ID and a picture ID; when the picture contains the target object, the picture data also comprises the structural description information of the picture; the video data comprises video coding data and video description data; the video description data comprises a device ID and a video ID; when the video data contains the target object, the video data also contains the structural description information of the video.

The present invention provides a functional structure block diagram of an edge end in a system architecture of a video intelligent hierarchical processing system, as shown in fig. 3. The method specifically comprises the following steps:

the system comprises a communication module 201, a frame extracting module 202, a fine identification module 203 and an identification result module 204.

The communication module 201 is configured to receive the picture or video data and the preliminary identification result data, and send the refined identification result data and the picture or video to the display terminal; the data specifically included in the picture data and the video data has already been described in detail in the communication module 104 of the device end 100, and is not described herein again; the refinement identification result data refers to refinement description data obtained by performing refinement identification on the picture or the video by the refinement identification module 203 of the edge terminal 200.

The frame extracting module 202 is configured to perform frame extraction on the received video data to obtain picture data; the frame extracting module 202 extracts independently decodable picture frames and/or non-independently decodable picture frames in the received video data sequence, and re-encodes the decoded picture frames; the encoding format of the re-encoded picture data includes, but is not limited to: JPEG, JPEG2000, BMP; the picture data is identified by a unique picture ID.

The refinement identification module 203 is configured to determine the received picture or video and the data of the preliminary identification result thereof, and not process the picture or video segment without the preliminary identification result; carrying out fine identification on the picture with the primary identification result or the picture obtained by frame extraction of the video clip to obtain the structural data description of the identification result; the preliminary identification result is that basic structural description is carried out only when a target object exists in the picture or the video; that is, a picture or video clip without a target object has no basic structural description as a recognition result; the refined identification refers to extracting and describing more feature points of the target object identified by the initial identification module 103 of the device end 100; the machine vision recognition algorithm of the object detection and segmentation model for fine recognition is based on a deep convolutional neural network; the model is subjected to feature labeling and data training aiming at different target objects; the structural description of the refined identification result is related to a specific target object.

The identification result module 204 is configured to encapsulate the structural description of the refined identification result into a data frame, and send the data frame to the display terminal through the communication module; the data frame not only contains the structural description information of the refined identification result, but also contains the incidence relation between the structural description information and the picture or the video and the time synchronization information; the data sent to the display end by the communication module also comprises picture or video information.

The present invention provides a functional structure block diagram of a display terminal in a system architecture of a video intelligent hierarchical processing system, as shown in fig. 4. The method specifically comprises the following steps:

a data receiving module 301, a decoding and synchronizing module 302 and a display module 303.

The data receiving module 301 is configured to receive picture or video data and refined identification result data; the picture or video data is associated with refinement identification result data; the refined identification result data of the video data further includes time synchronization information.

The decoding and synchronizing module 302 is configured to decode a picture or a video and synchronize the identification result structured data with the video; the decoding comprises decoding of pictures, the decoding of pictures being based on a coding standard of the pictures; coding standards for the picture include, but are not limited to: JPEG, JPEG2000, BMP; the decoding further comprises decoding the video, the decoding of the video being based on a video encoding standard; the coding standards for the video include, but are not limited to: h.264, h.265; the decoding further comprises decoding of structured data; and when the structured data is the structured data of the video, decoding according to the time marked by the decoding time stamp.

The display module 303 is configured to display a picture or a video and structured data; the displaying refers to displaying the picture or video decoded by the decoding and synchronizing module 302 and the structured data; and displaying the structured data of the video by the time marked by the display time stamp.

Furthermore, the present invention also proposes a flow chart of a first preferred embodiment of the video intelligent hierarchical processing method, as shown in fig. 5. The method comprises the following specific steps:

step S101: the method comprises the steps that a device side collects pictures or videos, initial identification is conducted on the pictures or the videos, and then the pictures or the videos and initial identification result data are sent.

The image acquisition module 101 of the device side 100 acquires picture or video data; the specific implementation of the image acquisition module is already described in detail in the functional structure block diagram part of the device side in fig. 2, and is not described herein again.

The frame extracting module 102 of the device side 100 extracts frames from the acquired video data to obtain picture data; the specific implementation process of the frame extraction is already described in detail in the functional structure block diagram part of the device side in fig. 2, and is not described herein again.

The initial identification module 103 of the device 100 is configured to perform initial target identification on the picture data and provide an initial identification result; when the initial identification module 103 detects that a target object exists in the picture, basic structural description is performed on the target object; when the target object does not exist in the picture, the structural description of the target object cannot be generated; the more specific functions of the initial identification module 103 have already been described in detail in the functional structure block diagram part of the device side in fig. 2, and are not described herein again;

the communication module 104 of the device side 100 is configured to send the acquired picture or video data and the preliminary identification result data to an edge side; the more specific functional implementation of the communication module 104 is already described in detail in the functional block diagram of the device side in fig. 2, and is not described herein again.

Step S102: and the edge end performs fine identification on the received picture or video and sends the structured data of the identification result and the picture or video.

The communication module 201 of the edge terminal 200 receives the picture or video data and the preliminary identification result data, and sends the refined identification result data and the picture or video to the display terminal; the more specific functions of the communication module 201 are already described in detail in the functional block diagram portion of the edge end in fig. 3, and are not described herein again.

The frame extracting module 202 of the edge end 200 extracts frames from the received video data to obtain picture data; the specific implementation of the frame extraction module 202 in frame extraction is already described in detail in the functional block diagram portion at the edge of fig. 3, and is not described herein again.

The refinement identification module 203 of the edge terminal 200 determines the received picture or video and the data of the preliminary identification result thereof, and does not process the picture or video segment without the preliminary identification result; carrying out fine identification on the picture with the primary identification result or the picture obtained by frame extraction of the video clip to obtain the structural data description of the identification result; the detailed implementation of the refined recognition module 203 for performing the refined recognition on the picture or the video is already described in detail in the functional structure block diagram portion at the edge of fig. 3, and is not described herein again.

The identification result module 204 of the edge terminal 200 encapsulates the structural description of the refined identification result into a data frame, and sends the data frame to the display terminal through the communication module; the structural description and the specific implementation of the encapsulation into the data frame by the recognition result module 204 have already been explained in detail in the functional structure block diagram part at the edge of fig. 3, and are not described herein again.

Step S103: the display terminal decodes and displays the picture or video and the structured data.

The data receiving module 301 of the display terminal 300 receives picture or video data and refined identification result data; the specific implementation of the data receiving module 301 for receiving data is already described in detail in the functional block diagram portion of the display terminal in fig. 4, and is not described herein again.

The decoding and synchronizing module 302 of the display terminal 300 is configured to decode a picture or a video and synchronize the identification result structured data with the video; the specific implementation of the decoding and synchronization technology of the decoding and synchronization module 302 is already described in detail in the functional block diagram portion of the display terminal in fig. 4, and is not described herein again.

The display module 303 of the display terminal 300 displays pictures or videos, and structured data; the displaying refers to displaying the picture or video decoded by the decoding and synchronizing module 302 and the structured data; and displaying the structured data of the video by the time marked by the display time stamp.

Preferably, the present invention further provides a flowchart of a second preferred embodiment of the video intelligent hierarchical processing method of the present invention, as shown in fig. 6. The method comprises the following specific steps:

step S201: the method comprises the steps that a device side collects pictures or videos, initial identification is conducted on the pictures or the videos, structural description is conducted on the pictures or the videos where people appear, and then the pictures or the videos and initial identification result data are sent.

The image acquisition module 101 of the device side 100 acquires picture or video data.

The frame extracting module 102 of the device side 100 extracts frames from the acquired video data to obtain picture data.

The initial identification module 103 of the device side 100 is integrated with a machine vision intelligent program for identifying people; the machine vision intelligent program can identify whether a person exists in the picture or the video; if the person exists, giving out a structural description of the person existing in the picture; if not, the picture is not described in any structure.

The communication module 104 of the device side 100 is configured to send the acquired picture or video data and the preliminary identification result data to the edge side.

Step S202: the edge end judges the received picture or video and the initial identification result data thereof; and carrying out refined identification on the pictures or videos appearing in the people, and sending the structured data and the pictures or videos of the refined identification result.

The communication module 201 of the edge terminal 200 receives the picture or video data and the preliminary identification result data, and sends the refined identification result data and the picture or video to the display terminal.

The frame extracting module 202 of the edge terminal 200 extracts frames from the received video data to obtain picture data.

The refinement identification module 203 of the edge terminal 200 determines the received picture or video and the preliminary identification result data thereof, that is, whether personnel exist in the picture or video is described, and if not, the picture or video segment is not processed; if the picture or the video exists, performing refined identification on the picture or the video; the refined recognition module 203 is used for training a large amount of deep learning aiming at the recognition of the personnel so as to obtain a personnel detection and recognition model; the machine vision algorithm of the personnel detection and identification model is based on a deep convolutional neural network; refined identification of the person includes, but is not limited to: gender, eye color, skin color, special characterization of human face, ornaments, expression; the gender comprises unknown gender, male and female; the eye colors include, but are not limited to: black, blue, brown, gray, green; the skin tones include, but are not limited to: yellow, white, black, brown; the face characterization includes but is not limited to: the Chinese fiddle comprises a Chinese character 'ba hu', a goat hu, a gill of a Chinese character 'luohu', teeth visible in a natural state, one-eyed eyes, no eyebrows, black nevus, obvious scars, baldness and facial distortion; such jewelry includes, but is not limited to: wearing glasses, sunglasses, eye-shields for the left eye, eye-shields with eyes, nose rings, ear rings, masks and caps; the expressions include, but are not limited to: neutral, smile, open mouth smile, eyebrow rising, eye spirit leaving lens, squinting eye, frown eyebrow; the refined recognition result refers to detailed description of sex, eye color, skin color, special characterization of human face, ornaments, expressions and the like of the person.

The recognition result module 204 of the edge terminal 200 encapsulates the structural description of the refined recognition result into a data frame, and sends the data frame to the display terminal through the communication module.

Step S203: the display terminal decodes and displays the picture or video and the structured data; the structured data specifically includes the following characteristics of the person: gender, eye color, skin color, special characterization of human face, ornamentation, expression, etc.

Preferably, the present invention further provides a flowchart of a third preferred embodiment of the video intelligent hierarchical processing method of the present invention, as shown in fig. 7. The method comprises the following specific steps:

step S301: the method comprises the steps that a device side collects pictures or videos, initial identification is conducted on the pictures or the videos, structural description is conducted on the pictures or the videos with vehicles, and then the pictures or the videos and initial identification result data are sent.

The initial identification module 103 of the device side 100 is integrated with a machine vision intelligent program for vehicle identification; the machine vision intelligent program can identify whether a vehicle exists in the picture or the video; if the person exists, giving out a structural description of the person existing in the picture; if not, the picture is not described in any structure.

Step S302: the edge end judges the received picture or video and the initial identification result data thereof; and carrying out fine identification on the pictures or videos with the vehicles, and sending the structural data and the pictures or videos with the fine identification results.

The refinement identification module 203 of the edge terminal 200 determines the received picture or video and the data of the preliminary identification result thereof, that is, whether a vehicle exists in the picture or video is described, and if not, the picture or video segment is not processed; if the picture or the video exists, performing refined identification on the picture or the video; the refined recognition module 203 is used for training a large amount of deep learning aiming at vehicle recognition, so as to obtain a vehicle detection and recognition model; the machine vision algorithm of the vehicle detection and recognition model is based on a deep convolutional neural network; the refined identification of the vehicle includes, but is not limited to: vehicle type, license plate number, vehicle color, vehicle brand, vehicle sub-brand; the vehicle types include, but are not limited to: passenger cars, large trucks, sedans, vans, minivans, SUV/MPV, medium buses, and two-wheelers/tricycles; the license plate number refers to a specific license plate number of the vehicle; the vehicle colors include, but are not limited to: red, yellow, green, cyan, blue, violet, pink, brown, white, gray, black; the vehicle brand refers to the identification of the automobile brand, and generally at least 160 identification of the automobile brand is supported; the vehicle sub-brand refers to a vehicle sub-brand and a yearly money, and generally at least 2000 vehicle sub-brands and yearly money are identified; the refined identification result refers to detailed description of the vehicle type, the license plate number, the vehicle color, the vehicle brand, the vehicle sub-brand and the like of the vehicle.

Step S303: the display terminal decodes and displays the picture or video and the structured data; the structured data specifically includes the following characteristics of the vehicle: vehicle type, license plate number, vehicle color, vehicle brand, vehicle sub-brand, etc.

It should be understood that the invention is not limited to the embodiments described above, but that modifications and variations can be made by one skilled in the art in light of the above teachings, and all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. An intelligent video hierarchical processing system, comprising:

the device end is used for acquiring pictures or videos, carrying out primary identification on the acquired pictures or videos and giving a primary identification result;

the edge terminal is used for further performing fine identification processing on the picture or the video and identifying finer structured data;

2. The video intelligent hierarchical processing system according to claim 1, wherein the device side specifically includes:

the image acquisition module is used for acquiring picture or video data;

the frame extracting module is used for extracting frames of the collected video data to obtain picture data;

the initial identification module is used for carrying out initial target identification on the collected picture or the picture obtained by frame extraction and giving an initial identification result;

3. The video intelligent hierarchical processing system according to claim 1, wherein the edge terminal specifically includes:

the communication module is used for receiving the picture or video data and the preliminary identification result data and sending the refined identification result data and the picture or video to the display terminal;

the frame extracting module is used for extracting frames from the received video data to obtain picture data;

the fine identification module is used for judging the received picture or video and the data of the primary identification result thereof and not processing the picture or video segment without the primary identification result; carrying out fine identification on the picture with the primary identification result or the picture obtained by frame extraction of the video clip to obtain the structural data description of the identification result;

4. The video intelligent hierarchical processing system according to claim 1, wherein the display terminal specifically includes:

the data receiving module is used for receiving the picture or video data and the refined identification result data;

the decoding and synchronizing module is used for decoding the picture or the video and synchronizing the identification result structured data with the video;

5. A method for intelligent hierarchical processing of video, the method comprising the steps of:

A. the method comprises the steps that a device side collects a picture or a video, performs initial identification on the picture or the video, and then sends the picture or the video and initial identification result data;

B. the edge end carries out fine identification on the received picture or video and sends the structured data of the identification result and the picture or video;

6. The method for intelligently processing videos according to claim 5, wherein the step A specifically comprises:

a1, an image acquisition module of the equipment end acquires picture or video data;

a2, a frame extracting module of the equipment end, extracting frames of the collected video data to obtain picture data;

a3, an initial identification module of the equipment end, which performs initial target identification on the collected picture or the picture obtained by frame extraction and gives an initial identification result;

7. The method for processing video in an intelligent manner according to claim 5, wherein the step B specifically includes:

b1, the communication module of the edge end receives the picture or video data and the preliminary identification result data, and sends the refined identification result data and the picture or video to the display terminal;

b2, the frame extracting module of the edge end extracts the frame of the received video data to obtain the picture data;

b3, a fine identification module of the edge end, which is used for judging the received picture or video and the data of the preliminary identification result, and not processing the picture or video segment without the preliminary identification result; carrying out fine identification on the picture with the primary identification result or the picture obtained by frame extraction of the video clip to obtain the structural data description of the identification result;

8. The method for intelligently processing videos according to claim 5, wherein the step C specifically comprises:

c1, a data receiving module of the display terminal receives picture or video data and refined identification result data;

c2, a decoding and synchronizing module of the display terminal decodes the picture or video and synchronizes the identification result structured data and the video;