CN116347125A

CN116347125A - Method for displaying image frames in a marked manner and related product

Info

Publication number: CN116347125A
Application number: CN202111580221.7A
Authority: CN
Inventors: 贺晓腾; 林昊; 刘海斌; 王亚夫
Original assignee: Shanghai Yixin Industry Co ltd
Current assignee: Shanghai Yixin Industry Co ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2023-06-27

Abstract

The invention provides a method for displaying an image frame in an identification way and a related product thereof, wherein the method comprises the following steps: receiving video data transmitted from an image acquisition device by using a video server; using a computing server to perform: acquiring a video stream, and analyzing and processing the video stream according to an image frame identification method to obtain a plurality of pairs of first time stamps and identification information; performing, using the client device: the method comprises the steps of receiving a first time stamp and identification information from a computing server, receiving a video stream from a video server, acquiring a second time stamp of each image frame in the video stream, determining the identification information corresponding to each image frame according to the first time stamp and the second time stamp, marking the target area on the corresponding image frame according to the identification information, and displaying the identified image frame. The scheme of the invention can display the analysis result of the image frames in the video stream in real time, and can reduce the bandwidth consumption.

Description

Method for displaying image frames in a marked manner and related product

Technical Field

The present invention relates generally to the field of sign displays. More particularly, the present invention relates to a method and system for identifying and displaying image frames.

Background

Various AI algorithms have the requirement of displaying some calculation results in real time in the calculation process, for example, a face recognition algorithm needs to show the region frame where the head of a person is located on each frame of picture of a video stream, and simultaneously display information such as facial expression states of the person. Currently, existing calculation result display schemes are divided into two categories: the first is to output the identified picture by algorithm and push it to the front end for direct display; the other type is that the video stream is converted into a picture, the picture is identified by an algorithm, then the identified picture is compressed into the video stream again, and the identified video stream is pushed to the front end for display. However, when the first type of display scheme is used for real-time display of the calculation result, the calculation force requirement on the algorithm is high, meanwhile, because uncompressed frame-level pictures are pushed, the consumption of network bandwidth is often hundreds of times that of directly playing video streams, and the video streams are difficult to be used for playing scenes of actual clients. When the second type of display scheme is used for real-time display of the calculation result, the time spent on converting the video stream into the picture and then converting the picture into the video stream is long. In view of this, there is a need in the art for a technical solution for real-time identification display that combines network bandwidth consumption requirements and real-time synchronization requirements.

Disclosure of Invention

In order to at least solve one or more technical problems in the background art, the invention provides a method for displaying an image frame in an identification way and a related product thereof. To this end, the present invention provides solutions in several aspects as follows.

According to a first aspect of the present invention, there is provided a first method for identifying and displaying image frames according to the first aspect of the present invention, characterized by comprising: receiving a plurality of pairs of first time stamps and identification information, wherein the identification information in each pair of first time stamps and the identification information is used for identifying a target area in an image frame corresponding to the first time stamp of the pair of first time stamps and the identification information; receiving a video stream and acquiring a second timestamp of each image frame in the video stream; comparing the second timestamp with the first timestamp to obtain a comparison result; and performing a corresponding operation according to the comparison result so as to identify the target area on the corresponding image frame according to the identification information and displaying the identified image frame.

According to a first aspect of the present invention, there is provided a method for displaying an image frame, and a second aspect of the present invention, the method further comprising: initiating an identification display request to a server for identifying the image frame; and receiving the first timestamp and identification information from the server as the identification display request response.

According to the first or second method for displaying the image frames in the first aspect of the present invention, there is provided a third method for displaying the image frames in the first aspect of the present invention, wherein the performing the corresponding operation according to the comparison result includes: and based on the comparison of the second time stamp and the first time stamp, determining that the second time stamp is matched with the first time stamp, marking the target area on the image frame corresponding to the second time stamp according to the identification information, and displaying the marked image frame.

The method for displaying an image frame according to any one of the first to third aspects of the present invention provides a fourth method for displaying an image frame according to the first aspect of the present invention, wherein the performing a corresponding operation according to the comparison result includes: based on the comparison of the second time stamp and the first time stamp, if the identification information of the first image frame is received earlier than the first image frame, the first time stamp and the identification information corresponding to the first image frame are cached until the first image frame is received; and identifying the target area on the first image frame according to the identification information, and displaying the identified image frame.

A fifth method for displaying an image frame according to the first aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to fourth aspects of the present invention, wherein the performing a corresponding operation according to the comparison result includes:

determining that the identification information of the first image frame is received earlier than the first image frame based on the comparison of the second timestamp and the first timestamp, waiting to receive the first image frame within a predetermined first time period; and stopping the identification display of the first image frame and performing the identification display operation on a second image frame in response to the first image frame not being received within a preset time period, wherein the second image frame is the next image of the first image frame.

A sixth method for displaying an image frame according to the first aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to fifth aspects of the present invention, wherein the performing a corresponding operation according to the comparison result includes: determining that the first image frame is received earlier than the corresponding identification information based on the comparison of the second timestamp and the first timestamp, and suspending the identification display of the first image frame until the identification information corresponding to the first image frame is received; and identifying the target area on the first image frame according to the identification information, and displaying the identified image frame.

A seventh method for displaying an image frame according to the first aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to sixth aspects of the present invention, wherein the performing the corresponding operation according to the comparison result includes: determining that the first image frame is received earlier than the corresponding identification information based on the comparison of the second timestamp and the first timestamp, and waiting for receiving the identification information of the first image frame in a preset second time period; and in response to the identification information of the first image frame not being received within a predetermined second time period, not identifying the first image frame, directly displaying the first image frame or discarding the first image frame.

A method for displaying an image frame according to any one of the first to seventh aspects of the present invention provides the eighth method for displaying an image frame according to the first aspect of the present invention, wherein the identification information includes at least one identification information obtained by one or more image frame identification methods, wherein the plurality of image frame identification methods includes a plurality of image frame identification methods of the same kind and/or a plurality of image frame identification methods of different kinds.

A method for displaying an image frame according to any one of the first to eighth aspects of the present invention provides the ninth method for displaying an image frame according to the first aspect of the present invention, the at least one identification information including information for identifying a face position, a face contour, a specific item, a facial expression and/or a key point of a human body in the image frame.

A tenth method for displaying an image frame according to the first aspect of the present invention is provided according to any one of the first to ninth aspects of the present invention, wherein the identification display request includes selection request information regarding selection identification information for requesting the server to transmit part or all of the at least one identification information.

The method for displaying an image frame according to any one of the first to tenth aspects of the present invention provides the method for displaying an image frame according to the eleventh aspect of the present invention, wherein the at least one identification information further comprises a confidence level associated with the image frame identification method.

A twelfth method for displaying an image frame according to the first aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to eleventh aspects of the present invention, the method further comprising: selecting first identification information from the at least one piece of identification information according to preset selection logic, identifying the target area on the image frame according to the first identification information, and displaying the identified image frame; or selecting the first identification information from the at least one piece of identification information according to the confidence, identifying the target area on the image frame according to the first identification information, and displaying the identified image frame.

A thirteenth method for displaying an image frame according to the first aspect of the present invention is provided, wherein receiving a plurality of pairs of first time stamps and identification information comprises: a connection is established with a server based on websocket to continuously receive pairs of first time stamps and identification information from the server.

A method for displaying an image frame according to any one of the first to thirteenth aspects of the present invention provides a fourteenth method for displaying an image frame according to the first aspect of the present invention, wherein receiving a video stream includes: acquiring a video stream transmitted in real time through a web instant communication (web RTC) technology; and decoding the acquired video stream to obtain the image frames in the video stream.

According to a second aspect of the present invention, there is provided a first client device for identifying and displaying image frames according to the second aspect of the present invention, characterized by comprising: a processor; and a memory storing program instructions for identifying and displaying image frames, which when executed by the processor, implement the method according to the first aspect.

According to a third aspect of the present invention, there is provided a first method for identifying and displaying image frames according to the third aspect of the present invention, comprising: acquiring a first timestamp of each image frame in a video stream; performing a corresponding image frame identification method analysis on the video stream according to an image frame identification method to obtain respective identification information for one or more image frames in the video stream, wherein the identification information is used to identify a target region within an image frame; and sending a first timestamp and identification information to the client device in response to an identification display request from the client device for identification display of the image frame.

According to a first method for displaying an image frame in a third aspect of the present invention, there is provided a second method for displaying an image frame in a third aspect of the present invention, wherein the identification display request includes selection request information indicating selection of one or more image frame identification methods, the method further comprising: and performing corresponding one or more image frame identification method analysis on the video stream according to the selected one or more image frame identification methods to obtain corresponding one or more identification information.

According to a third aspect of the present invention, there is provided a third method for identifying and displaying image frames, wherein the one or more identification information further comprises a confidence level associated with each.

According to a fourth aspect of the present invention, there is provided a first server for identification display of image frames according to the fourth aspect of the present invention, characterized by comprising: a processor; and a memory storing program instructions for identifying and displaying image frames, which when executed by the processor, implement the method according to the third aspect.

According to a fifth aspect of the present invention, there is provided a system for identifying and displaying image frames according to the first aspect of the present invention, characterized by comprising: a video server configured to generate a video stream from video data acquired by the image acquisition device; a computing server configured to: the video stream is obtained, analysis processing is carried out on the video stream according to an image frame identification method, so that a plurality of pairs of first time stamps and identification information are obtained, and the first time stamps and the identification information are sent to the client device in response to an identification display request from the client device for carrying out identification display on the image frames, wherein the identification information in each pair of first time stamps and the identification information is used for identifying a target area in the image frames corresponding to the first time stamps of the pair of first time stamps and the identification information; a client device configured to: receiving the first timestamp and identification information from the computing server; receiving the video stream from the video server and obtaining a second timestamp for each image frame in the video stream; and determining the identification information corresponding to each image frame according to the first timestamp and the second timestamp, identifying the target area on the corresponding image frame according to the identification information, and displaying the identified image frame.

According to a first aspect of the present invention there is provided a system for identifying and displaying image frames, a second aspect of the present invention, the client device being further configured to initiate an identification display request to a computing server for identifying the image frames; the computing server is further configured to send the first timestamp and identification information to the client device in response to receiving the identification display request.

According to a third aspect of the present invention, there is provided a system for displaying an image frame, wherein the client device is further configured to generate an identification display request in response to receiving a real-time play request transmitted by a user, and to transmit the identification display request to a computing server.

A system for displaying an image frame according to any one of the first to third aspects of the present invention provides a system for displaying an image frame according to the fourth aspect of the present invention, the computing server is further configured to perform image frame identification method analysis on the video stream using a plurality of image frame identification methods, respectively, to obtain a plurality of identification information.

A system for displaying an image frame according to any one of the first to fourth aspects of the present invention provides the system for displaying an image frame according to the fifth aspect of the present invention, wherein the client device is further configured to generate the identification information selection request in response to receiving a real-time play request transmitted by a user, and transmit the identification information selection request to a computing server as the identification display request or a part thereof; and the computing server is further configured to select to send the first timestamp and some or all of the plurality of identification information to the client device in accordance with the identification information selection request.

A system for displaying an image frame according to any one of the first to fifth aspects of the present invention provides a sixth system for displaying an image frame according to the fifth aspect of the present invention, the client device further configured to: selecting first identification information from part or all of the received multiple identification information according to a preset selection logic, identifying the target area on an image frame according to the first identification information, and displaying the identified image frame; or selecting the first identification information from part or all of the received multiple identification information according to the confidence degree associated with the identification information, identifying the target area on the image frame according to the first identification information, and displaying the identified image frame.

A system for displaying an image frame according to any one of the first to sixth aspects of the present invention provides a seventh system for displaying an image frame according to the fifth aspect of the present invention, the client device further configured to: and based on the comparison of the second time stamp and the first time stamp, determining that the second time stamp is matched with the first time stamp, marking the target area on the image frame corresponding to the second time stamp according to the identification information, and displaying the marked image frame.

A system for displaying an image frame according to any one of the first to seventh aspects of the present invention provides the eighth system for displaying an image frame according to the fifth aspect of the present invention, the client device further configured to: based on the comparison of the second time stamp and the first time stamp, if the identification information of the first image frame is received earlier than the image frame, the first time stamp and the identification information corresponding to the first image frame are cached until the first image frame is received; and identifying the target area on the first image frame according to the identification information, and displaying the identified image frame.

A system for displaying an image frame according to any one of the first to eighth aspects of the present invention provides a ninth system for displaying an image frame according to the fifth aspect of the present invention, the client device further configured to: determining that the identification information of the first image frame is received earlier than the image frame thereof based on the comparison of the second timestamp and the first timestamp, waiting to receive the first image frame within a predetermined first time period; and stopping the identification display of the first image frame and performing the identification display operation on a second image frame in response to the fact that the first image frame is not received within a preset time period, wherein the second image frame is the next image of the first image frame.

A system for displaying an image frame according to any one of the first to ninth aspects of the present invention provides the system for displaying an image frame according to the tenth aspect of the present invention, the client device further configured to: determining that the first image frame is received earlier than the corresponding identification information based on the comparison of the second timestamp and the first timestamp, and suspending the identification display of the first image frame until the identification information corresponding to the first image frame is received; and identifying the target area on the first image frame according to the identification information, and displaying the identified image frame.

A system for displaying an image frame according to any one of the first to tenth aspects of the present invention provides the system for displaying an image frame according to the eleventh aspect of the present invention, the client device further configured to: determining that the first image frame is received earlier than the corresponding identification information based on the comparison of the second timestamp and the first timestamp, and waiting for receiving the identification information of the first image frame in a preset second time period; and in response to not receiving the identification information of the first image frame within a predetermined second time period, directly displaying the first image frame without identifying the first image frame or discarding the first image frame.

According to a sixth aspect of the present invention, there is provided a first method for identifying and displaying image frames according to the sixth aspect of the present invention, comprising: receiving video data transmitted from an image acquisition device by using a video server, and generating a video stream according to the video data; using a computing server to perform: the video stream is obtained, and is analyzed and processed according to an image frame identification method to obtain a plurality of pairs of first time stamps and identification information, wherein the identification information in each pair of first time stamps and the identification information is used for identifying a target area in an image frame corresponding to the first time stamp of the pair of first time stamps and the identification information; performing, using the client device: the first time stamp and the identification information are received from the computing server, the video stream is received from the video server, the second time stamp of each image frame in the video stream is obtained, the identification information corresponding to each image frame is determined according to the first time stamp and the second time stamp, the target area is identified on the corresponding image frame according to the identification information, and the identified image frames are displayed.

According to a first method for displaying an image frame in a marking manner, there is provided a second method for displaying an image frame in a marking manner, the method further comprising: initiating, using the client device, an identification display request to the computing server regarding identifying the image frame; and transmitting, using the computing server, the first timestamp and identification information to the client device in response to receiving the identification display request.

According to a first or second method for displaying an image frame in a logo according to a sixth aspect of the present invention, there is provided a third method for displaying an image frame in a logo according to the sixth aspect of the present invention, wherein the steps of: the identification display request is generated in response to receiving a real-time play request sent by a user, and the identification display request is sent to a computing server.

A fourth method for displaying an image frame according to the sixth aspect of the present invention is provided, wherein the performing, using a computing server: and in response to receiving the video stream sent by the image acquisition device, performing image frame identification method analysis on the video stream by utilizing a plurality of image frame identification methods respectively so as to obtain a plurality of identification information.

A fifth method for displaying an image frame according to the sixth aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to fourth aspects of the present invention, wherein the steps of: generating the identification information selection request in response to receiving a real-time play request sent by a user, and sending the identification information selection request to a computing server as the identification display request or a part thereof; and performing, using the computing server: in response to receiving the identification information selection request, a selection is made to send the first timestamp and some or all of the plurality of identification information to the client device.

A method for displaying an image frame according to any one of the first to fifth aspects of the present invention provides the sixth method for displaying an image frame according to the sixth aspect of the present invention, wherein the steps of: receiving part or all of the plurality of types of identification information, selecting first identification information from the received part or all of the plurality of types of identification information according to preset selection logic, identifying the target area on an image frame according to the first identification information, and displaying the identified image frame; or selecting the first identification information from part or all of the received multiple identification information according to the confidence degree associated with the identification information, identifying the target area on the image frame according to the first identification information, and displaying the identified image frame.

A seventh method for displaying an image frame according to the sixth aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to sixth aspects of the present invention, wherein the steps of using the client device: and acquiring the second timestamp, determining that the second timestamp is matched with the first timestamp based on comparison of the second timestamp and the first timestamp, marking the target area on the image frame corresponding to the second timestamp according to the identification information, and displaying the marked image frame.

A method for displaying an image frame according to any one of the first to seventh aspects of the present invention provides the eighth method for displaying an image frame according to the sixth aspect of the present invention, wherein the steps of: in response to acquiring the second timestamp, determining that the identification information of the first image frame is received earlier than the image frame based on the comparison of the second timestamp and the first timestamp, and caching the first timestamp and the identification information corresponding to the first image frame until the first image frame is received; and identifying the target area on the first image frame according to the identification information, and displaying the identified image frame.

A ninth method for displaying an image frame according to the sixth aspect of the present invention is provided according to the method for displaying an image frame of any one of the first to eighth aspects of the present invention, wherein the steps of using the client device to perform: in response to acquiring the second timestamp, determining that the identification information of the first image frame is received earlier than the image frame thereof based on the comparison of the second timestamp and the first timestamp, waiting to receive the first image frame within a predetermined first time period; and stopping the identification display of the first image frame and performing the identification display operation on a second image frame in response to the fact that the first image frame is not received within a preset time period, wherein the second image frame is the next image of the first image frame.

A method for displaying an image frame according to any one of the first to ninth aspects of the present invention provides a tenth method for displaying an image frame according to the sixth aspect of the present invention, wherein, using the client device, performing: in response to acquiring the second timestamp, determining that the first image frame is received earlier than corresponding identification information based on comparison of the second timestamp and the first timestamp, suspending identification display of the first image frame until the identification information corresponding to the first image frame is received; and identifying the target area on the first image frame according to the identification information, and displaying the identified image frame.

A method for displaying an image frame according to any one of the first to tenth aspects of the present invention provides the method for displaying an image frame according to the eleventh aspect of the present invention, wherein the steps of: in response to acquiring the second timestamp, determining that the first image frame is received earlier than corresponding identification information based on a comparison of the second timestamp and the first timestamp, waiting for receiving the identification information of the first image frame within a predetermined second time period; and in response to not receiving the identification information of the first image frame within a predetermined second time period, directly displaying the first image frame without identifying the first image frame or discarding the first image frame.

A method for displaying an image frame according to any one of the first to eleventh aspects of the present invention provides a twelfth method for displaying an image frame according to the sixth aspect of the present invention, and the present invention discloses a computer program product characterized by comprising a computer program for displaying an image frame, which when executed by a processor, implements the method according to the first aspect, implements the method according to the third aspect, or implements the method according to the sixth aspect.

According to the method, the device, the system and the computer program product provided in the aspects of the invention, the second time stamp pushed by the video stream can be ensured to be synchronous with the first time stamp pushed by the algorithm, and then the identification information associated with the first time stamp can be loaded into the image frame corresponding to the second time stamp by utilizing the synchronization of the time stamps, so that the real-time display of the algorithm result of the image frame is realized. Further, by independently transmitting the identification information and the video stream, not only can the consumption of network bandwidth and the computational effort at the client device be reduced, but also the dynamic accurate loading of the identification information onto the corresponding image frames can be achieved when the video stream is identified.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. In the drawings, several embodiments of the invention are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:

FIG. 1 is a block diagram illustrating a system for identifying display of image frames according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a method for identifying display of image frames according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating another method for identifying display of image frames according to an embodiment of the present invention;

FIG. 4 is an exemplary block diagram illustrating a system for identifying display of image frames in accordance with one embodiment of the present invention;

FIG. 5 is an exemplary block diagram illustrating a system for identifying display of image frames in accordance with yet another embodiment of the present invention;

FIG. 6 is an exemplary block diagram illustrating a system for identifying display of image frames in accordance with another embodiment of the present invention;

FIG. 7 is a particular flow diagram illustrating a system for identifying display of image frames according to an embodiment of the present invention; and

fig. 8 is an interaction diagram illustrating a system for identifying display of image frames according to an embodiment of the present invention.

Detailed Description

Embodiments will now be described with reference to the accompanying drawings. It will be appreciated that for simplicity and clarity of illustration, where considered appropriate, reference numerals have been repeated among the figures to indicate corresponding or analogous elements. Furthermore, the invention has been set forth in numerous specific details in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the embodiments described herein. Moreover, this description should not be taken as limiting the scope of the embodiments described herein.

Specific embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Fig. 1 is a block diagram illustrating a system 100 for identifying display of image frames according to an embodiment of the present invention. In one embodiment, the system 100 may include an image capture device (e.g., a camera as shown in the figures) 101, a server 102, and a plurality of client devices 103 (e.g., desktops, notebooks, and mobile digital assistants as shown).

In operation, video information (e.g., personnel work area or partial community streetscapes) may first be acquired in real-time by an image acquisition device and may be used to send a video stream about the video information to a server, wherein each image frame of the video stream may resolve a unique timestamp (Presentation Time Stamp, PTS) corresponding thereto. Then, after the server receives the video stream, it may perform necessary parsing and transcoding on the video stream to obtain an image frame of the video stream and a timestamp corresponding to the image frame (including the first timestamp and the second timestamp in the context of the present invention). Further, the image frames may be passed to an algorithm parsing module of the server 102 and target detection of the image frames may be performed using the algorithm parsing module to obtain identification information for target detection of the image frames, which is associated with the aforementioned first timestamp.

In one embodiment, a client device (specifically a user thereof) may initiate a command request to a server identifying a display. In response thereto, the server may transmit identification information (the above identification information) of the image target frame obtained by the calculation starting at the present time to the client device. In addition, the client device needs to receive the video stream from the current time transmitted from the server or the play agent (not shown), in addition to the identification information from the server. By way of example, the play agent may be provided integrally with the server or may be provided separately from the server. Thus, the client device may perform an identification operation on the corresponding image frame based on the identification information from the server, e.g. by comparing the first timestamp and the second timestamp to determine the corresponding image frame. The client device may then display the identification information on a target area (e.g., a location or area of interest) of the image frame based on the comparison. For example, the display image frames may be identified on a liquid crystal display of the client device. In one application scenario, the client device may play the video stream by clicking on a web link, and may display the identification operation of the identification information on the display screen through the web link. Specifically, as shown at 104, the identification information may, for example, identify a face region in a video stream played on a display screen of the client device.

By way of example, the data interaction process between the server and the image capture device is independent of whether the server receives a command request identifying the display, and whether the command request is received or not, the image capture device captures a video stream and transmits the video stream to the server. In addition, the server may not be controlled by the instruction after receiving the video stream, perform necessary parsing and transcoding on the video stream to obtain an image frame of the video stream and a timestamp corresponding to the image frame, and perform object detection on the image frame by using the algorithm parsing module to obtain identification information for object detection of the image frame, where the two operations are independent of whether the server receives a command request for identifying display, that is, whether the server receives the command request for identifying display, so long as the image acquisition device sends the video stream to the server, the server will perform the operations of parsing and transcoding and object detection of the image frame. Further, after the server detects the image frame object and obtains the identification information, the server does not directly push the image frame object to the client device, but repeats the operations of caching and covering in the memory of the image frame object, and only when the server receives a command request sent by the client and displayed by the identification, the server sends the calculated above identification information to the client device from the moment of receiving the command request.

Fig. 2 is a flowchart 200 illustrating a method for identifying display of image frames according to an embodiment of the present invention. It is to be appreciated that the methods herein may be performed by a client device (e.g., a computer or mobile terminal). For example, the client device may initiate execution of the method when the image acquisition device just begins to acquire image frames. Of course, the present invention is not limited to specific execution time, and thus the execution may be initiated by the user at any time after the image capturing device captures an image frame for a period of time, or may be initiated by the user at any time during the image capturing device captures an image frame.

As shown in fig. 2, first, at step S202, an operation of receiving a plurality of paired first time stamps and identification information each for identifying a target area within an image frame corresponding to the paired first time stamps thereof is performed. As an example, the client device may establish a connection with the server based on websocket to continuously receive pairs of first time stamps and identification information from the server. In one application scenario, the identification information may be used to identify information related to face locations, face contours, specific items, facial expressions, and/or key points of the human body in the image frames. It can be understood that each image frame of the video stream may parse a timestamp, where the timestamp is used to authenticate the time when the image of each frame is acquired, for example, the timestamp may be the number of milliseconds between the current video stream frame and the playing start time interval of the video stream when the image acquisition device acquires the local time of the image, and the timestamp may be used as a unique identifier for synchronously pushing and displaying each image frame of the video stream. In addition, the identification information may be, for example, at least one identification information obtained from one or more image frame identification methods, wherein the plurality of image frame identification methods includes a plurality of image frame identification methods of the same class and/or a plurality of image frame identification methods of different classes.

In order to facilitate understanding, the following description will briefly discuss, by way of example, a plurality of image frame identification methods of the same kind, and a plurality of image frame identification methods of different kinds. For example, in the scheme provided by the invention, the face recognition method comprises three face recognition methods A, B and C, and the face recognition method A, B and C are similar multiple image frame identification methods, which are face recognition methods. In addition, in the scheme provided by the invention, the image frame recognition method comprises a face recognition method, a gesture recognition method and the like, and the face recognition method and the gesture recognition method are various image frame recognition methods of different types.

In one embodiment, prior to receiving the plurality of pairs of first time stamps and the identification information, initiating an identification display request to the server for identifying the image frame may be performed, wherein the identification display request includes selection request information for selecting the identification information and is used to request the server to send some or all of the at least one identification information. In one application scenario, first identification information may be selected from at least one piece of identification information according to preset selection logic, and a target area is identified on an image frame according to the first identification information, and the identified image frame is displayed. For example, the preset selection logic may be set according to different requirements, for example, different identification information is selected according to different time periods, the identification information obtained by using the face recognition method is selected in the daytime, and the identification information obtained by using the gesture recognition method is selected at night.

Further, the at least one identification information further includes a confidence level associated with the image frame identification method. In one application scenario, first identification information may be selected from at least one identification information according to the confidence level, and a target area may be identified on an image frame according to the first identification information, and the identified image frame may be displayed.

For example, when at least one piece of identification information received by the client device contains identification information obtained by a plurality of image frame identification methods of the same kind, the identification information is selected by the confidence associated with the image frame identification method carried by each piece of identification information. For example, the client device receives the identification information obtained by the face recognition algorithm a, the face recognition algorithm B and the face recognition algorithm C, where the confidence coefficient corresponding to the face recognition algorithm a is 80%, the confidence coefficient corresponding to the face recognition algorithm B is 70%, and the confidence coefficient corresponding to the face recognition algorithm C is 90%, and the client device may select the identification information according to the confidence coefficient corresponding to the different algorithms, for example, select the identification information corresponding to the face recognition algorithm C with high confidence coefficient.

Returning to the flow, after the execution of the above step S202, the flow proceeds to step S204. In step S204, a video stream is received and a second timestamp of each image frame in the video stream is acquired (the manner of receiving the video stream will be described in detail later with reference to fig. 5 and 6), wherein the video stream transmitted in real time can be acquired through a web instant messaging (web RTC) technology, and the acquired video stream is decoded to obtain the image frames in the video stream. Next, in step S204, the second time stamp may be compared with the first time stamp to obtain a comparison result, and the flow advances to step S206. In step S206, a corresponding operation is performed according to the comparison result so as to identify the target area on the corresponding image frame according to the identification information, and the identified image frame is displayed.

In one embodiment, the second timestamp is determined to match the first timestamp based on a comparison of the second timestamp and the first timestamp. For example, the identification information of the 4 th second image frame is received from the server, the 4 th second image frame is received from the playing agent, and the second timestamp is matched with the first timestamp, so that the identification information of the 4 th second from the server can be identified on the image frame from the playing agent, and the identified image frame can be displayed.

In one embodiment, the identification information of the first image frame is determined to be received earlier than the first image frame based on a comparison of the second timestamp and the first timestamp, and the first timestamp and the identification information corresponding to the first image frame are cached until the first image frame is received, wherein the first image frame is waited for receipt within a predetermined first time period. Further, in response to receiving the first image frame within the predetermined time, a target area is identified on the first image frame according to the identification information and the identified image frame is displayed. And stopping the identification display of the first image frame and performing the identification display operation on a second image frame in response to the first image frame not being received within a preset time period, wherein the second image frame is the next image of the first image frame. As an example, assuming that the identification information of the 7 th second image frame is received from the server, the 5 th second image frame is received from the play agent, and the 7 th second image frame is not yet received, the 7 th second identification information is received earlier than the 7 th second image frame, and the 7 th second identification information and the time stamp are buffered so as to wait for the arrival of the 7 th second image frame. In one scenario, the client device may not wait until the arrival of the 7 th second image frame (i.e., a video stream "lost packet" phenomenon occurs), at which point the identification display of the 7 th second image frame may cease and begin the identification display of the next image frame.

In one embodiment, if it is determined that the first image frame is received earlier than the corresponding identification information based on the comparison of the second timestamp and the first timestamp, the identification display of the first image frame is paused until the identification information corresponding to the first image frame is received, wherein the identification information corresponding to the first image frame may be waited for the predetermined second time period to be received. And in response to receiving the identification information corresponding to the first image frame within a predetermined second time period, identifying the target area on the first image frame according to the identification information, and displaying the identified image frame. In response to not receiving the identification information of the first image frame within a predetermined second time period, the first image frame is not identified, and the first image frame is displayed directly or discarded. As an example, assuming that the identification information of the image frame of the 5 th second is received from the server, the image frame of the 7 th second is received from the play agent, and the identification information of the image frame of the 7 th second is not yet received, the reception of the image frame of the 7 th second is earlier than the reception of the 7 th second flag information, and the identification display of the image frame of the 7 th second is suspended so as to wait for the arrival of the identification information of the 7 th second. In one case, the client does not receive the arrival of the identification information of the 7 th second image frame, and the 7 th second image frame is not marked, and the unmarked image frame is directly displayed.

For example, the first image frame may be determined to be received earlier than the corresponding identification information based on a comparison of the second timestamp and the first timestamp, or the first image frame may be displayed directly without identifying the first image frame.

Fig. 3 is a flowchart 300 illustrating another method for identifying display of image frames according to an embodiment of the present invention. It will be appreciated that the methods herein may be performed by a server. In particular, the server as an executor of the method acts may cooperate with the client device described in connection with fig. 2, thereby enabling real-time identification display at the client device. In one implementation scenario, the server herein may be a computing server dedicated to determining the identification information and the first timestamp.

As shown, at step S302, a first timestamp is acquired for each image frame in a video stream. In one embodiment, the video stream may be an RTSP video stream that accepts Real-time streaming protocol (Real-TimeStream Protocol, RTSP), and by using the RTSP video stream, image frames to the video may be controlled, so that applications with high Real-time performance, such as video chat and video monitoring, may be carried. Next, at step S304, a corresponding image frame identification method analysis is performed on the video stream according to the image frame identification method to obtain respective identification information for one or more image frames in the video stream, wherein the identification information is used to identify a target region within the image frames. The image frame identification method can comprise the steps of adopting a target detection algorithm (CenterNet) to detect human faces, adopting a light-weight deep convolutional neural network algorithm (VGG) to classify human face frame images and/or adopting a human body gesture recognition algorithm (Mobilene-Pose) to detect human body key points.

After the above-described identification information is obtained, the flow advances to step S306. In step S306, in response to an identification display request from the client device for identification display of the image frame, a first timestamp and identification information are transmitted to the client device. In one embodiment, the identification display request may include selection request information indicating selection of one or more image frame identification methods, and the corresponding one or more image frame identification method analyses may be performed on the video stream according to the selected one or more image frame identification methods to obtain the corresponding one or more identification information, wherein the one or more identification information further includes a confidence level associated with each. In one application scenario, the identifier display request may be a "face recognition" request, and the computing server sends "face identification information+timestamp" to the client device in response to the identifier display request. At this time, the face identification information may be identification frame position information for identifying a face contour in the video frame.

In another application scenario, the identification display request may be a display "face recognition+gesture recognition" request, and the computing server sends "face identification information+gesture identification information+timestamp" to the client device in response to the identification display request, so that the client device may obtain the identification information for face recognition and the identification information for gesture recognition. It follows that the present invention supports flexible switching between multiple algorithms. The scheme of the invention supports that a user selects any one or more of N (N is a positive integer greater than 1) algorithms through the client device, so that the computing server can calculate corresponding various identification information according to the selected algorithms and send the corresponding various identification information to the client device. Based on this, the user of the client device may select appropriate identification information (e.g., with good confidence) from a variety of identification information for real-time identification display of the image frames.

Fig. 4 is an exemplary block diagram illustrating a system 400 for identifying display of image frames according to one embodiment of the invention. As shown, the system 400 may include a video server 401, a computing server 402, and a client device 403. It will be appreciated that both the video server and the computing server herein are combined, i.e., server 102 shown in fig. 1. Through the cooperation of the video server, the computing server and the client device, the system can realize the identification display operation of the image frames. In addition, the video server and the computing server may be two independent servers, which perform different operations and have different functions, respectively. The operation of the aforementioned video server, computing server, and client device will be described in detail below.

In operation, video server 401 may be used to receive video data transmitted from an image capture device (e.g., image capture device 101 of fig. 1) and generate a video stream from the video data, e.g., to form a video stream that conforms to a streaming media transport format. In one application scenario, the video server 401 may perform transmission and playback of a video stream through, for example, a web instant messaging (web RTC) technology. Meanwhile, the computing server 402 may interact with the client device by way of, for example, a Websocket long connection in order to push a plurality of pairs of first timestamp and identification information to the client device 403 (as described in the context of the first timestamp and identification information). The video stream may further be acquired using a computing server 402 and analyzed according to an image frame identification method to obtain a plurality of pairs of first time stamps and identification information. Specifically, the computing server 402 may transcode the acquired video stream by performing a transcoding service to obtain each image frame and a first timestamp of the video stream. Then, the image frames are subjected to computational analysis through algorithm analysis to obtain identification information for identifying the target area. Finally, the Web service transmits the identification information and the first timestamp acquired from the algorithm parsing to the client device 403 in response to the identification display request. The operation of the transcoding service, algorithm parsing, and Web services in the computing server will be described in detail below in connection with fig. 7.

In one embodiment, the client device 403 may be used to receive the first timestamp and the identification information from the computing server 402 and to receive the video stream from the video server 401 and to obtain the second timestamp for each image frame in the video stream. Further, after receiving the first timestamp and the second timestamp, the client device 403 may determine identification information corresponding to each image frame according to the first timestamp and the second timestamp, and identify the target area on the corresponding image frame according to the identification information, so as to display the identified image frame. In one application scenario, the client device 403 may respond to the real-time display request of the user through the presentation interface, and may further receive the first timestamp, the second timestamp, and the identification information by using the browser, so as to implement identification display of the image frame. The description of the browser and presentation page of the client device is described in detail below in conjunction with fig. 7.

It is to be appreciated that in implementing the identification display operation for image frames, the system of the present invention may use the client device 403 to initiate an identification display request for identifying an image frame to the computing server 402, and use the computing server 402 to send a first timestamp and identification information to the client device 403 in response to receiving the identification display request. It can be seen that the computing server 402 sends the identification information+the timestamp of each frame of picture in the current real-time video stream to the client device 403 only after receiving the identification display request sent by the client device 403. The video stream received by the client device 403 is also a real-time video stream after the client device 403 sends the identification display request to the video server 401, that is, the client device 403 displays real-time video data. In one application scenario, the computing server only computes the identification information + timestamp for each image frame of the received video stream and does not send to the client device when the identification display request of the client device 403 is not received. In addition, the client device 403 does not receive video stream data in this case either.

Fig. 5 is an exemplary block diagram illustrating a system 500 for identifying display of image frames according to yet another embodiment of the present invention. As shown in fig. 5, the system 500 may further include an image capturing device 501 in addition to the video server 401, the computing server 402, and the client device 403 (i.e., the system for identifying and displaying image frames of the present invention) described above. It will be appreciated that the video server 401, the computing server 402 and the client device 403 have been described in detail above with reference to fig. 4, and therefore the same content will not be described in detail below. In one embodiment, the image capturing device 501 may be, for example, a video camera, for capturing video data for delivery to the video server 401 and the computing server 402 by way of a video stream. In one application scenario, the manner in which the client device 403 is configured to receive the second timestamp in the video stream is through the conversion of the image capturing apparatus 501 via the video server 401, i.e. the obtaining of the video stream by the client device 403 may be through its first dedicated channel. Second, the client device 403 receives the first timestamp and the identification information of the image frame through the computing server 402, i.e. there is a second dedicated channel for acquiring the first timestamp and the identification information.

Fig. 6 is an exemplary block diagram illustrating a system 600 for identifying display of image frames according to another embodiment of the present invention. As shown in fig. 6, the system 600 may further include a server 601 in addition to the image capturing apparatus 501 and the client device 403 described above. It is to be understood that the image capturing apparatus 501 and the client device 403 have been described in detail above with reference to fig. 4 and 5, and thus the same contents will not be described in detail below. In one embodiment, the server 601 may be configured to receive a video stream from the image capturing apparatus 501, and in response to the identification display request, may be configured to transmit the identification information processed by the algorithm and a first timestamp corresponding thereto to the client device 403. Meanwhile, the server 601 may also be configured to receive a second timestamp related to the video stream image frame in response to the identifier display request, that is, the channel for delivering the identifier information, the first timestamp, and the second timestamp in this embodiment is the same channel. It is to be appreciated that the server 601 herein may have the same or similar architecture and functionality as the server 102 shown in fig. 1. Further, the server 601 may also be considered as a combination of the computing server and video server shown in fig. 5. Accordingly, the present invention is overly limited to the specific implementation of the server for algorithmic analysis and the server for real-time video streaming, and any reasonable arrangement will be apparent to those skilled in the art in light of the teachings of the present invention.

In order to facilitate an understanding of the system for displaying an identification of an image frame described herein above, the system for displaying an identification will be described in detail below in conjunction with fig. 7. Fig. 7 is a specific flow chart 700 illustrating a system for identifying display of image frames according to an embodiment of the present invention.

As shown in fig. 7, first, at step S702, video data is acquired with an image acquisition device, which may be a camera, and the video data is transmitted to a video server and a calculation server by way of RTSP video streams, respectively. Next, at step S704, the video server may receive the RTSP video stream from the image capturing apparatus and play it at the client device by way of Web RTC playing. Meanwhile, at step S706, an RTSP video stream from the image capture device may be received through a transcoding service of the computing server (as described above). In particular, the transcoding service may include several partial functions of receiving and parsing a video stream, decoding a video stream, and transmitting gray scale values and chrominance (YUV) data. In one embodiment, the transcoding service parses each image frame of the RTSP video stream sequentially through a Real-time streaming protocol (RTSP protocol), a Real-time transport protocol (Real-time Transport Protocol, RTP) and a Real-time transport control protocol (RTPControl Protocol, RTCP protocol) by receiving and parsing the video stream, so as to perform parsing judgment on each image frame of the RTSP video stream, so as to distinguish a key frame, a non-key frame and a parameter frame. Meanwhile, the first time stamp corresponding to each image frame of the RTSP video stream is also resolved by utilizing the operation of receiving and resolving the video stream.

After the above-described receiving and parsing of the video stream is performed, the transcoding service module decodes the received image frames into a packed format by decoding the video stream for transmitting the gray value and the chrominance data to the flow-scheduling center. In one application scenario, a flow scheduling center may be used to construct a picture queue, and each image frame may be placed in a corresponding picture queue center, which may have a different topic (topic) classification. Further, different topics are subscribed to according to different services so that the image frames can be consumed in a passive manner and then pushed to the algorithm enabled by the computing server. By using the method, unnecessary performance loss caused by actively pulling the image frame can be avoided, and the expenditure of service-side resources can be controlled and reduced.

Returning to the flow, after the execution of the above step S706, the flow advances to step S708. In step S708, a computational analysis operation is performed on the input image frame by algorithm analysis to obtain identification information for identifying the object in the image frame, and the process proceeds to step S710. At step S710, a camera list query and an algorithm list query interface are provided using a Web backend service, and a Websocket service may be provided for connection of a client device page, and further the identification information and a first timestamp corresponding thereto may be forwarded as needed. Next, in step S712, when each image frame is decoded and played by the browser through the Web RTC method, the current second timestamp is synchronized with the first timestamp pushed by the algorithm, and when the first timestamp is consistent with the second timestamp in response to the time, the identification information is drawn to the presentation page of the browser. In an application scenario, a user may be in any network environment when viewing a real-time calculation result, so that network transmission can be optimized through a Websocket long connection mode, and the Websocket is used as a mode capable of performing two-way communication, so that the calculation result of an algorithm can be guaranteed to be pushed to a display page of a browser in real time.

Further, after the Websocket connection is established, the browser can select the available camera and algorithm list provided by the background server through the display page (as described above), and push the camera and algorithm to be displayed to the Web background service module. After the Web background service module receives the identification display request, different cameras and algorithm identification information which are analyzed and transmitted by the algorithm are filtered, and the identification information is pushed to the browser. In one application scenario, in response to a situation in which the time taken for real-time computation is less than the video playback delay, i.e., the browser is used for receiving the image frame identification information earlier than it is for receiving the image frame, the browser may store the received identification information in its own buffer unit. In another application scenario. And when the algorithm identification information is later than the video stream, playing the video in a delayed manner for a fixed time length, wherein the fixed time length can be set to be 1 second so as to meet the requirement of time stamp synchronization.

Finally, in step S714, when the webRTC video stream is played, the second timestamp of each image frame is obtained, and the second timestamp is compared with the first timestamp related to the identification information in the cache, and when the first timestamp is less than or equal to the second timestamp, the identification information in the cache is taken out and displayed on the display page of the browser. In an application scene, the identification information comprises normalized positions, sizes, shapes, colors, identification texts and the like of the drawing block diagrams of the interface, so that the browser page can draw the identification information onto a video through a canvas drawing tool, and finally the display effect is achieved.

The system for identifying and displaying image frames described herein above is provided for ease of understanding. Fig. 8 is an interaction diagram illustrating a system 800 for identifying display of image frames according to an embodiment of the present invention.

As shown in the figure, the present system 800 may include a front-end module 801 (i.e., a client device), a back-end service module 802 (i.e., a computing server and a video server), and a camera 803 (i.e., an image capturing apparatus). First, the server is started at step 808. Next, the transcoding proxy is started in step 809, while the start information is sent to the camera. Thereafter, at step 810, the captured video data may be converted into a real-time RTSP video stream and transmitted using the camera. After the above step 810 is performed, the flow proceeds to step 811. At step 811, an RTSP video stream from the camera is received and algorithmically parsed to obtain a variety of identification information for identifying the detected objects in the video. Meanwhile, in the front-end module, first, at step 804, a start-up operation of the front-end is performed. Next, at step 805, a camera to be played may be selected from the front-end camera selection list, and an algorithm satisfying the target detection requirement may also be selected from the front-end algorithm list. Thereafter, at step 806, the front-end module and the back-end service module are interactively connected using WebSocket long connections. Returning to the back-end service module's flow, after step 811 is performed, the back-end service module proceeds to step 812. In step 812, the back-end service module may filter the identification information satisfying the algorithm in response to the camera selection request and the algorithm selection request of the front-end module, so as to obtain the identification information calculated in real time via the algorithm, and may further perform the pushing operation of the identification information. Next, at step 813, the WebSocket long connection is used to push the identification information and the first timestamp corresponding to the identification information to the front-end module in real time. Thereafter, the identification information and the first timestamp may be cached in a storage unit of the front-end module to wait for the front-end module to receive an image frame corresponding to the identification information.

Further, after the front-end module has performed step 805, in response to the identification display request of the front-end module, the flow proceeds to step 807. At step 807, the back-end service module obtains the video stream from the camera by playing WebRTC video and passes the video stream to the front-end module. Thus, in step 814, the front-end module may receive each image frame of the video stream from the camera and its associated second timestamp, and keep the first timestamp synchronized with the second timestamp so as to load the identification information cached at the front-end module into each image frame in the corresponding video stream. After the above step 814 is performed, the flow proceeds to step 815. In step 815, an identification display operation is performed using the front-end module. Finally, in step 816, the flow proceeds to an end step.

The method for displaying the image frame identification of the present invention is described in detail above with reference to the accompanying drawings. Based on the above description, those skilled in the art can understand that by adopting the method of comparing and determining the time stamps, the identification information can be effectively ensured to be accurately loaded on the corresponding image frames in real time, and the requirement of real-time display of the calculation result is met. Meanwhile, by the method, the method can avoid directly outputting the identified image frames by utilizing an algorithm, and further can effectively reduce the consumption of network bandwidth.

It should be understood that the possible terms "first" or "second" and the like in the claims, specification and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of the present disclosure are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims of the present disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Although the embodiments of the present invention are described above, the descriptions are merely examples for facilitating understanding of the present invention, and are not intended to limit the scope and application of the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is defined by the appended claims.

Claims

1. A system for identifying and displaying image frames, comprising:

a video server configured to generate a video stream from video data acquired by the image acquisition device;

a computing server configured to:

the video stream is obtained, analysis processing is carried out on the video stream according to an image frame identification method, so that a plurality of pairs of first time stamps and identification information are obtained, and the first time stamps and the identification information are sent to the client device in response to an identification display request from the client device for carrying out identification display on the image frames, wherein the identification information in each pair of first time stamps and the identification information is used for identifying a target area in the image frames corresponding to the first time stamps of the pair of first time stamps and the identification information;

A client device configured to:

receiving the first timestamp and identification information from the computing server;

receiving the video stream from the video server and obtaining a second timestamp for each image frame in the video stream;

and determining the identification information corresponding to each image frame according to the first timestamp and the second timestamp, identifying the target area on the corresponding image frame according to the identification information, and displaying the identified image frame.

2. The system of claim 1, wherein the client device is further configured to initiate an identification display request to a computing server for identifying the image frame;

the computing server is further configured to send the first timestamp and identification information to the client device in response to receiving the identification display request.

3. The system of claim 1 or 2, wherein the client device is further configured to generate the identification display request in response to receiving a real-time play request sent by a user, and send the identification display request to a computing server.

4. A system according to any of claims 1-3, wherein the computing server is further configured to perform image frame identification method analysis on the video stream using a plurality of image frame identification methods, respectively, to obtain a plurality of identification information.

5. The system of any of claims 1-4, wherein the client device is further configured to generate the identification information selection request in response to receiving a real-time play request sent by a user, and send the identification information selection request to a computing server as the identification display request or as part thereof; and is also provided with

The computing server is further configured to select to send the first timestamp and some or all of the plurality of identification information to the client device in accordance with the identification information selection request.

6. The system of any of claims 1-5, wherein the client device is further configured to:

selecting first identification information from part or all of the received multiple identification information according to a preset selection logic, identifying the target area on an image frame according to the first identification information, and displaying the identified image frame; or (b)

And selecting the first identification information from part or all of the received multiple identification information according to the confidence degree associated with the identification information, identifying the target area on the image frame according to the first identification information, and displaying the identified image frame.

7. The system of any of claims 1-6, wherein the client device is further configured to:

and based on the comparison of the second time stamp and the first time stamp, determining that the second time stamp is matched with the first time stamp, marking the target area on the image frame corresponding to the second time stamp according to the identification information, and displaying the marked image frame.

8. A method for identifying and displaying image frames, comprising:

receiving video data transmitted from an image acquisition device by using a video server, and generating a video stream according to the video data;

using a computing server to perform: the video stream is obtained, and is analyzed and processed according to an image frame identification method to obtain a plurality of pairs of first time stamps and identification information, wherein the identification information in each pair of first time stamps and the identification information is used for identifying a target area in an image frame corresponding to the first time stamp of the pair of first time stamps and the identification information; and

using the client device to perform: the first time stamp and the identification information are received from the computing server, the video stream is received from the video server, the second time stamp of each image frame in the video stream is obtained, the identification information corresponding to each image frame is determined according to the first time stamp and the second time stamp, the target area is identified on the corresponding image frame according to the identification information, and the identified image frames are displayed.

9. The method of claim 8, wherein the performing using the computing server:

and in response to receiving the video stream sent by the image acquisition device, performing image frame identification method analysis on the video stream by utilizing a plurality of image frame identification methods respectively so as to obtain a plurality of identification information.

10. The method of claim 8 or 9, wherein using the client device performs:

receiving part or all of the plurality of types of identification information, selecting first identification information from the received part or all of the plurality of types of identification information according to preset selection logic, identifying the target area on an image frame according to the first identification information, and displaying the identified image frame; or (b)