CN108401190B - Method and equipment for real-time labeling of video frames - Google Patents

Method and equipment for real-time labeling of video frames Download PDF

Info

Publication number
CN108401190B
CN108401190B CN201810409977.7A CN201810409977A CN108401190B CN 108401190 B CN108401190 B CN 108401190B CN 201810409977 A CN201810409977 A CN 201810409977A CN 108401190 B CN108401190 B CN 108401190B
Authority
CN
China
Prior art keywords
frame
video
user equipment
video frame
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810409977.7A
Other languages
Chinese (zh)
Other versions
CN108401190A (en
Inventor
张晓恬
胡军
潘思霁
徐健钢
尉苗苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liangfengtai Shanghai Information Technology Co ltd
Original Assignee
Liangfengtai Shanghai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liangfengtai Shanghai Information Technology Co ltd filed Critical Liangfengtai Shanghai Information Technology Co ltd
Publication of CN108401190A publication Critical patent/CN108401190A/en
Priority to PCT/CN2018/121730 priority Critical patent/WO2019134499A1/en
Application granted granted Critical
Publication of CN108401190B publication Critical patent/CN108401190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47214End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for content reservation or setting reminders; for requesting event notification, e.g. of sport results or stock market
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Abstract

The present application aims to provide a method and a device for real-time annotation of video frames, which specifically include: sending the video stream to a second user equipment; receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream; determining a first video frame corresponding to the second video frame in the video stream according to the second frame-related information; receiving the labeling operation information of the second user equipment to the second video frame; and presenting the corresponding annotation operation on the first video frame in real time according to the annotation operation information. According to the method and the device, the marked information is directly superposed and displayed on the video frame image which is not coded and decoded by the video sender, and the marked video frame is not coded and decoded and the like, so that the definition is high; furthermore, the scheme can also realize real-time display of the labels, has good practicability and strong interactivity, and improves user experience and broadband utilization rate.

Description

Method and equipment for real-time labeling of video frames
Technical Field
The present application relates to the field of computers, and more particularly, to a technique for real-time annotation of video frames.
Background
In the current universal video stream transmission means, a video stream sender encodes a video according to an encoding protocol and sends the encoded video to a video receiver through a network, the receiver receives a corresponding video, decodes the video, captures a decoded video, marks the video on a picture, encodes a marked image and sends the coded image to the video sender through the network, or sends the marked image to a server, and the video sender receives the image added with the mark by the server. The definition of the decoded image with the label information received by the sender has a certain loss, and the image with the label is transmitted from the receiver to the video sender through the network, and the speed of the process depends on the transmission rate of the current network, so that the transmission speed is influenced, a delay phenomenon exists, and the real-time interaction between the two parties is not facilitated.
Disclosure of Invention
An object of the present application is to provide a method and apparatus for real-time labeling of video frames.
According to an aspect of the present application, there is provided a method for real-time annotation of video frames at a first user equipment, the method comprising:
sending the video stream to a second user equipment;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
determining a first video frame corresponding to the second video frame in the video stream according to the second frame-related information;
receiving the labeling operation information of the second user equipment to the second video frame;
and presenting the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
According to another aspect of the present application, there is provided a method for real-time annotation of video frames at a second user equipment, the method comprising:
receiving a video stream sent by first user equipment;
sending second frame related information of the intercepted second video frame to the first user equipment according to screenshot operation of a user in the video stream;
acquiring the marking operation information of the user on the second video frame;
and sending the labeling operation information to the first user equipment.
According to another aspect of the present application, there is provided a method for real-time annotation of video frames at a third user equipment, the method comprising:
receiving video streams sent by first user equipment to second user equipment and third user equipment;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
determining a third video frame corresponding to the second video frame in the video stream according to the second frame-related information;
receiving the labeling operation information of the second user equipment to the second video frame;
and presenting the corresponding annotation operation on the third video frame in real time according to the annotation operation information.
According to another aspect of the present application, there is provided a method for real-time annotation of video frames at a network device, the method comprising:
receiving and forwarding a video stream sent by first user equipment to second user equipment;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
forwarding the second frame related information to the first user equipment;
receiving the labeling operation information of the second user equipment to the second video frame;
and forwarding the labeling operation information to the first user equipment.
According to an aspect of the present application, there is provided a method for real-time annotation of video frames, wherein the method comprises:
the first user equipment sends a video stream to the second user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the first user equipment according to screenshot operation of a user in the video stream;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the first user equipment;
and the first user equipment receives the annotation operation information and presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
According to another aspect of the present application, there is provided a method for real-time annotation of video frames, wherein the method comprises:
the first user equipment sends a video stream to the network equipment;
the network equipment receives the video stream and forwards the video stream to second user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the network equipment according to screenshot operation of a user in the video stream;
the network device receives the second frame related information and forwards the second frame related information to the first user equipment;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the network equipment;
the network equipment receives the marking operation information of the second user equipment on the second video frame, and forwards the marking operation information to the first user equipment;
and the first user equipment receives the annotation operation information and presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
According to yet another aspect of the present application, there is provided a method for real-time annotation of video frames, wherein the method comprises:
the first user equipment sends video streams to the second user equipment and the third user equipment;
the second user equipment sends second frame related information of the intercepted second video frame to the first user equipment and the third user equipment according to screenshot operation of a user in the video stream;
the second user equipment acquires the labeling operation information of the user on the second video frame and sends the labeling operation information to the first user equipment and the third user equipment;
the first user equipment receives the second frame related information, determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information, receives the annotation operation information, and presents corresponding annotation operation on the first video frame in real time according to the annotation operation information;
and the third user equipment receives the video stream, receives second frame related information of the second video frame, receives the annotation operation information, determines a third video frame corresponding to the second video frame in the video stream according to the second frame related information, and presents corresponding annotation operation on the third video frame in real time according to the annotation operation information.
According to yet another aspect of the present application, there is provided a method for real-time annotation of video frames, wherein the method comprises:
the first user equipment sends a video stream to the network equipment;
the network equipment receives the video stream and forwards the video stream to second user equipment and third user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the network equipment according to screenshot operation of a user in the video stream;
the network device receives the second frame related information and forwards the second frame related information to the first user device and the third user device;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the network equipment;
the network equipment receives the marking operation information of the second user equipment on the second video frame, and forwards the marking operation information to the first user equipment and the third user equipment;
the first user equipment receives the annotation operation information and presents corresponding annotation operation on the first video frame in real time according to the annotation operation information;
and the third user equipment receives the video stream, receives second frame related information of the second video frame, receives the annotation operation information, determines a third video frame corresponding to the second video frame in the video stream according to the second frame related information, and presents corresponding annotation operation on the third video frame in real time according to the annotation operation information.
According to an aspect of the present application, there is provided a first user equipment for real-time annotation of video frames, the equipment comprising:
the video sending module is used for sending the video stream to the second user equipment;
a frame information receiving module, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream;
a video frame determining module, configured to determine, according to the second frame-related information, a first video frame corresponding to the second video frame in the video stream;
the annotation receiving module is used for receiving annotation operation information of the second user equipment on the second video frame;
and the annotation presenting module is used for presenting the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
According to another aspect of the present application, there is provided a second user equipment for real-time annotation of video frames, the second user equipment comprising:
the video receiving module is used for receiving a video stream sent by first user equipment;
a frame information determining module, configured to send second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of a user in the video stream;
the annotation acquisition module is used for acquiring annotation operation information of the user on the second video frame;
and the label sending module is used for sending the label operation information to the first user equipment.
According to yet another aspect of the present application, there is provided a third user equipment for real-time annotation of video frames, the equipment comprising:
the third video receiving module is used for receiving video streams sent by the first user equipment to the second user equipment and the third user equipment;
a third frame information receiving module, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream;
a third video frame determining module, configured to determine, according to the second frame-related information, a third video frame corresponding to the second video frame in the video stream;
a third annotation receiving module, configured to receive annotation operation information of the second user equipment on the second video frame;
and the third presentation module is used for presenting the corresponding annotation operation on the third video frame in real time according to the annotation operation information.
According to yet another aspect of the present application, there is provided a network device for real-time annotation of video frames, the device comprising:
the video forwarding module is used for receiving and forwarding a video stream sent by the first user equipment to the second user equipment;
a frame information receiving module, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream;
a frame information forwarding module, configured to forward the second frame-related information to the first user equipment;
the annotation receiving module is used for receiving annotation operation information of the second user equipment on the second video frame;
and the label forwarding module is used for forwarding the label operation information to the first user equipment.
According to an aspect of the present application, there is provided a system for real-time annotation of video frames, the system comprising a first user equipment as described above and a second user equipment as described above.
According to another aspect of the present application, there is also provided a system for real-time annotation of video frames, comprising a first user equipment as described above, a second user equipment as described above, and a network device as described above.
According to an aspect of the present application, there is provided a system for real-time annotation of video frames, the system comprising a first user equipment as described above, a second user equipment as described above and a third user equipment as described above.
According to an aspect of the present application, there is provided a system for real-time annotation of video frames, the system comprising a first user equipment as described above, a second user equipment as described above, a third user equipment as described above, and a network device as described above.
According to an aspect of the application, there is provided a computer-readable medium comprising instructions that, when executed, cause a system to:
sending the video stream to a second user equipment;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
determining a first video frame corresponding to the second video frame in the video stream according to the second frame-related information;
receiving the labeling operation information of the second user equipment to the second video frame;
and presenting the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
According to another aspect of the application, there is also provided a computer-readable medium comprising instructions that, when executed, cause a system to:
receiving a video stream sent by first user equipment;
sending second frame related information of the intercepted second video frame to the first user equipment according to screenshot operation of a user in the video stream;
acquiring the marking operation information of the user on the second video frame;
and sending the labeling operation information to the first user equipment.
According to an aspect of the application, there is provided a computer-readable medium comprising instructions that, when executed, cause a system to:
receiving video streams sent by first user equipment to second user equipment and third user equipment;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
determining a third video frame corresponding to the second video frame in the video stream according to the second frame-related information;
receiving the labeling operation information of the second user equipment to the second video frame;
and presenting the corresponding annotation operation on the third video frame in real time according to the annotation operation information.
According to yet another aspect of the application, there is also provided a computer-readable medium comprising instructions that, when executed, cause a system to:
receiving and forwarding a video stream sent by first user equipment to second user equipment;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
forwarding the second frame related information to the first user equipment;
receiving the labeling operation information of the second user equipment to the second video frame;
and forwarding the labeling operation information to the first user equipment.
Compared with the prior art, the method and the device have the advantages that a certain video frame is cached at the video sender, the video frame image which is not coded and decoded at the video sender is determined according to the screenshot of the video receiver and the related information of the corresponding video frame, and the annotation information of the screenshot of the video receiver is transmitted to the video sender in real time. The label is displayed on the video frame image corresponding to the video sender in real time, so that the sender can observe the labeling process of the video receiver in real time, and the labeled video frame is not subjected to operations such as encoding and decoding, and the definition is high; furthermore, the scheme can also realize real-time display of the labels, has good practicability and strong interactivity, and improves user experience and broadband utilization rate. Moreover, the video sender can send the video frame to the video receiver after determining the video frame which is not coded and decoded, and the video receiver can also label the video frame with high quality, thereby greatly improving the use experience of users.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 illustrates a system topology for real-time annotation of video frames according to one embodiment of the present application;
FIG. 2 illustrates a flow chart of a method at a first user equipment for real-time annotation of video frames according to an aspect of the present application;
FIG. 3 illustrates a flow chart of a method at a second user equipment for real-time annotation of video frames according to another aspect of the present application;
FIG. 4 illustrates a flow chart of a method for real-time annotation of video frames at a third user equipment, in accordance with yet another aspect of the subject application;
FIG. 5 illustrates a flow diagram of a method at a network device for real-time annotation of video frames according to yet another aspect of the subject application;
FIG. 6 illustrates a system methodology for real-time annotation of video frames in accordance with an aspect of the subject application;
FIG. 7 illustrates a system methodology diagram for real-time annotation of video frames in accordance with another aspect of the subject application;
FIG. 8 illustrates a first user equipment diagram for real-time annotation of video frames, in accordance with an aspect of the subject application;
FIG. 9 illustrates a second user equipment schematic diagram for real-time annotation of video frames in accordance with another aspect of the subject application;
FIG. 10 illustrates a third user equipment schematic diagram for real-time annotation of video frames in accordance with another aspect of the subject application;
FIG. 11 is a schematic diagram of a network device for real-time annotation of video frames in accordance with yet another aspect of the subject application;
FIG. 12 illustrates an exemplary system that can be used to implement the various embodiments described in this application.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an android operating system, an iOS operating system, etc. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.
Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.
In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Fig. 1 shows an exemplary scenario of the present application, in which a first user equipment receives annotation information sent by a second user equipment while performing video communication with the second user equipment and a third user equipment, and locally retrieves a stored video frame without encoding and decoding, and presents the annotation information in real time. The process can be completed by the interaction of the first user equipment and the second user equipment, or the cooperation of the first user equipment, the second user equipment and the network equipment, or the cooperation of the first user equipment, the second user equipment and the third user equipment, or the cooperation of the first user equipment, the second user equipment, the third user equipment and the network equipment. Here, the first user device, the second user device, and the third user device are any electronic devices capable of recording and transmitting videos, such as smart glasses, a mobile phone, a tablet computer, a notebook, a smart watch, and the like, and the following embodiments are described by taking the first user device as the smart glasses, and the second user device and the third user device as the tablet computer, and it should be understood by those skilled in the art that the embodiments are also applicable to other user devices such as a mobile phone, a notebook, a smart watch, and the like.
Fig. 2 illustrates a method for real-time annotation of video frames at a first user equipment end, according to an aspect of the present application, wherein the method comprises step S11, step S12, step S13, step S14, and step S15. In step S11, the first ue sends a video stream to the second ue; in step S12, the first user equipment receives second frame-related information of a second video frame intercepted by the second user equipment in the video stream; in step S13, the first user equipment determines a first video frame corresponding to the second video frame in the video stream according to the second frame-related information; in step S14, the first user equipment receives annotation operation information of the second user equipment on the second video frame; in step S15, the first user equipment presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
Specifically, in step S11, the first user equipment transmits the video stream to the second user equipment. For example, the first user equipment establishes communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes a video stream in a video communication mode and then sends the encoded video stream to the second user equipment.
In step S12, the first user equipment receives second frame related information of a second video frame intercepted by the second user equipment in the video stream. For example, the second user equipment determines second frame related information of a video frame corresponding to the screenshot based on the screenshot operation of the second user, and then the first user equipment receives the second frame related information of the second video frame sent by the second user equipment, where the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like.
In step S13, the first user equipment determines a first video frame corresponding to the second video frame in the video stream according to the second frame-related information. For example, the first user equipment locally stores a period of time or a certain number of sent uncoded video frames, and the first user equipment determines the uncoded first video frame corresponding to the screenshot in the locally stored uncoded video frames according to the second frame related information sent by the second user equipment.
In step S14, the first ue receives annotation operation information of the second ue on the second video frame. For example, the second user device generates corresponding tagging operation information based on the tagging operation of the second user, and sends the tagging operation information to the first user device in real time, and the first user receives the tagging operation information.
In step S15, the first user equipment presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information. For example, the first user equipment presents the corresponding annotation operation on the first video frame in real time based on the received annotation operation information, such as displaying the first video frame in the form of a small window in the current interface, and then presents the corresponding annotation operation at a rate of one frame per 50ms at the corresponding position of the first video frame.
For example, the user A holds the smart glasses, the user B holds the tablet computer B, the smart glasses and the tablet computer B establish video communication through a wired or wireless network, the smart glasses encode a currently acquired picture and then send the encoded picture to the tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. And the tablet computer B generates real-time marking operation information according to the marking operation of the user B, and sends the marking operation information to the intelligent glasses in real time, and after receiving the marking operation information, the intelligent glasses display a corresponding first video frame which is not coded and decoded in a preset area of the intelligent glasses, and display the corresponding marking operation in real time at a position corresponding to the first video frame.
It should be understood by those skilled in the art that the content of the second frame related information in the above embodiments is only an example, and other content of the second frame related information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is therefore included herein by reference.
In some embodiments, the method further comprises step S16 (not shown). In step S16, the first user equipment stores video frames in the video stream; wherein, in step S13, the first user equipment determines the first video frame corresponding to the second video frame from the stored video frames according to the second frame related information. For example, the first user equipment sends a video stream to the second user equipment, and locally stores a period of time or a certain number of video frames which are not coded and decoded, where the period of time or the certain number may be a preset fixed value, or may be a threshold value dynamically adjusted according to a network condition or a transmission rate; subsequently, the first user equipment determines a corresponding first video frame which is not coded and decoded in the locally stored video frames based on second frame related information of a second video frame transmitted by the second user equipment. In other embodiments, the stored video frames satisfy, but are not limited to, at least any one of: the time interval between the sending time of the stored video frame and the current time is less than or equal to the video frame storage duration threshold; the cumulative number of stored video frames is less than or equal to a predetermined video frame storage number threshold.
The intelligent glasses send the collected pictures to the tablet computer B, and locally store a period of time or a certain number of video frames which are not coded and decoded, wherein the period of time or the certain number of video frames can be a fixed value preset by a system or a person, such as a certain period of time or a certain number of video frame thresholds obtained through big data statistical analysis; the duration or number of video frames may also be a video frame threshold that is dynamically adjusted based on network conditions or transmission rate. The dynamically adjusted time length or number threshold may be determined according to the total encoding/decoding and transmission time length information of the video frame, for example, the total encoding/decoding and transmission time length of the current video frame is calculated, the time length is used as a unit time length or the number of video frames that can be transmitted in the time length is used as a unit number, and then the time length or number threshold of the dynamic video frame is set by using the current unit time length or unit number as a reference. Here, the predetermined or dynamic video frame storage duration threshold is set to be greater than or equal to one unit duration, and similarly, the predetermined or dynamic video frame storage number threshold is set to be greater than or equal to one unit number. And then, the intelligent glasses determine a corresponding first video frame which is not coded and decoded in the stored video according to second frame related information of a second video frame sent by the tablet computer B, wherein the interval between the sending time of the stored video frame and the current time is less than or equal to a video frame storage duration threshold, or the accumulated number of the stored video frames is less than or equal to a preset video frame storage number threshold.
In some embodiments, the method further comprises step S17 (not shown). In step S17, the first user equipment obtains total encoding/decoding and transmission duration information of the video frames in the video stream, and adjusts the video frame storage duration threshold or the video frame storage number threshold according to the total encoding/decoding and transmission duration information. For example, the first user equipment records the encoding start time of each video frame, and sends the video frame to the second user equipment after encoding, and the second user equipment receives and records the decoding end time of each video frame; and then, the second user equipment sends the decoding end time of the video frame to the first user equipment, the first user equipment calculates the total coding and decoding and transmission time length information of the current video frame based on the coding start time and the decoding end time, or the second user equipment calculates the total coding and decoding and transmission time length information of the current video frame based on the coding start time and the decoding end time and sends the total coding and decoding and transmission time length information to the first user equipment. The first user equipment adjusts the video frame storage duration threshold or the video frame storage quantity threshold based on the coding and decoding and transmission total duration information, and if the duration information is used as a unit time reference, a certain multiple of video frame duration is set as a video frame storage duration threshold; for another example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frames, and the number is used as a unit number, and the number of video frames of a certain multiple is set as a threshold of the storage number of video frames.
For example, the intelligent glasses record the ith video frame encoding starting time as TsiAfter being coded, the video frame is sent to a tablet computer B, and the tablet computer B receives and records the video frame decoding ending time as Tei. Then, the tablet computer B decodes the video frame at the end time TeiSending the video frame to the intelligent glasses, and the intelligent glasses decoding the ith video frame according to the decoding end time T of the received ith video frameeiAnd a locally recorded coding start time TsiCalculating the total encoding/decoding and transmission duration T of the video framei=Tei-Tsi(ii) a The intelligent glasses can start the coding at the moment TsiSending the video frame to a tablet computer B, wherein the tablet computer B is based on the coding end time TeiCalculating the total encoding/decoding and transmission time length T of the video framei=Tei-TsiAnd the total time length T of the coding, decoding and transmissioniReturning to the smart glasses.
The intelligent glasses encode and decode according to the ith video frame and transmit the total time length TiAnd determining the dynamically stored 1.3T of the intelligent glasses according to big data statisticsiThe video frame duration within time. Or, dynamically adjusting the multiplying power according to the network transmission rate, for example, setting the buffer duration threshold to be (1+ k) TiWhere k is a threshold value adjusted according to network fluctuation, such as when the network fluctuation is largeK is set to 0.5, k is set to 0.2 when network fluctuation is small, and so on. In another example, the smart glasses encode/decode and transmit the total duration T according to the ith video frameiAnd calculating a time length T according to the frequency f of the video frame sent by the current intelligent glassesiThe number of video frames transmitted in-line is TiF, and further determining that the threshold value of the number of the stored video frames is 1.3N, wherein N is a numerical value obtained by adopting a further method for rounding. Further, the smart glasses may dynamically adjust the magnification according to the current network transmission rate, for example, setting the threshold of the buffer amount to (1+ k) N, where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, k is set to 0.5, and if the network fluctuation is small, k is set to 0.2, and the like.
It should be understood by those skilled in the art that the contents of the storage duration threshold and/or the storage quantity threshold in the above embodiments are only examples, and other contents of the storage duration threshold and/or the storage quantity threshold in the prior art or in the future shall also fall within the protection scope of the present application if applicable to the present application, and are hereby incorporated by reference.
In some embodiments, in step S11, the first user equipment sends the video stream and the frame identification information of the sent video frames in the video stream to the second user equipment; wherein, in step S13, the first user equipment determines a first video frame corresponding to the second video frame in the video stream according to the second frame-related information, wherein the frame identification information of the first video frame corresponds to the second frame-related information. The frame identification information of the video frame may be coding and decoding time corresponding to the video frame, or may be a number corresponding to the video. In some embodiments, in step S11, the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and sends a corresponding video stream and frame identification information of the sent video frames in the video stream to the second user equipment. For example, the first user equipment performs encoding processing on a plurality of video frames to be transmitted, acquires encoding start times of the plurality of video frames to be transmitted, and sends the plurality of video frames and the encoding start times thereof to the second user equipment. In some embodiments, the frame identification information of a sent video frame in the video stream includes encoding start time information of the sent video frame.
For example, the smart glasses record the encoding start time of each video frame, and send the video frame and the encoding start time of the sent video frame to the tablet computer b after encoding, wherein the sent video frame includes the video frame to be sent and the sent video frame after the current encoding is completed. Here, the smart glasses may transmit the encoding start time of the transmitted video frame to the tablet b at a certain time interval or a certain number of video frames at a certain interval, or may directly transmit the encoding start time of the first video frame and the video frame to the tablet b at the same time. The tablet computer determines a video frame corresponding to the screen capture picture according to the screen capture operation of the user B, and sends second frame related information of a corresponding second video frame to the smart glasses, wherein the second frame related information corresponds to the second frame identification information, and the second frame related information includes but is not limited to at least any one of the following: the encoding start time of the second video frame, the decoding end time of the second video frame, the total encoding and decoding and transmission duration information of the second video frame, the corresponding number or image of the second video frame, and the like. The intelligent glasses receive the second frame related information, and determine the corresponding stored first video frame without encoding according to the second frame related information, for example, determine the encoding start time of the first video frame without encoding corresponding to the second video frame according to the encoding start time, the decoding end time, the encoding and decoding of the second video frame, and the total transmission duration information of the second video frame, and further determine the corresponding first video frame, for example, directly determine the first video frame with the same number according to the number corresponding to the second video frame, and further determine the corresponding first video frame in the stored video frame without encoding according to the image identification of the second video frame.
It should be understood by those skilled in the art that the content of the frame id information in the above embodiments is only an example, and other content of the frame id information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is therefore included by reference herein.
In some embodiments, the method further comprises step S18 (not shown). In step S18, the first user device presents the first video frame; in step S15, the first user equipment superimposes and presents a corresponding annotation operation on the first video frame according to the annotation operation information. For example, the first user equipment determines a first video frame which is not coded and decoded, and displays the first video frame at a preset position in a current interface or in a small window form; and then, the first user equipment displays the corresponding annotation operation in an overlapping mode at the corresponding position of the first video frame according to the received annotation operation information in real time.
For example, the smart glasses determine a corresponding first video frame without encoding and decoding according to the second frame related information sent by the tablet computer b, and display the first video frame at a position preset on an interface of the smart glasses. And then, the intelligent glasses receive the real-time marking operation sent by the tablet computer B, determine the corresponding position of the marking operation in the currently displayed first video frame, and present the current marking operation at the corresponding position in real time.
In some embodiments, the method further comprises step S19 (not shown). In step S19, the first user equipment sends the first video frame to the second user equipment as a preferred frame for presenting the annotation operation. For example, a first user equipment determines a first video frame which is not coded and sends the first video frame to a second user equipment, so that the second user equipment can present the first video frame with higher quality.
For example, the smart glasses determine a corresponding first video frame without encoding and decoding according to the second frame related information sent by the tablet computer b, and send the first video frame to the tablet computer b as a preferred frame, for example, the first video frame is sent to the tablet computer b in a lossless compression mode, or the first video frame is sent to the tablet computer b through lossy compression with lower loss, and the lossy compression process ensures higher quality than the video frame locally cached at the tablet computer b. And the tablet computer B receives the first video frame and presents the first video frame.
In some embodiments, in step S11, the first user device sends the video stream to the second user device and the third user device. For example, communication connections are established among first user equipment, second user equipment, and third user equipment, where the first user equipment is a current video frame sender, the second user equipment and the third user equipment are current video frame receivers, and the first user equipment sends video streams to the second user equipment and the third user equipment through the communication connections.
For example, the user A holds the intelligent glasses, the user B holds the tablet computer B, the user C holds the tablet computer C, the intelligent glasses establish video communication with the tablet computer B and the tablet computer C through a wired or wireless network, the intelligent glasses encode currently acquired pictures and then send the encoded pictures to each tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses and the tablet computer C, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The tablet computer B generates real-time marking operation information according to marking operation of the user B, and sends the marking operation information to the intelligent glasses and the tablet computer C in real time, after the intelligent glasses receive the marking operation information, the corresponding first video frames which are not coded and decoded are displayed in the preset area of the intelligent glasses, and the corresponding marking operation is displayed in real time at the position corresponding to the first video frames. Similarly, the tablet computer c finds a corresponding third video frame in the coded and decoded video library locally cached at the tablet computer c terminal according to the second frame related information based on the received second frame related information and the tagging operation information, and presents a corresponding tagging operation in the third video frame based on the third video frame and the tagging operation information.
In some embodiments, the method further comprises step S010 (not shown). In step S010, the first user equipment sends the first video frame to the second user equipment and/or the third user equipment as a preferred frame for presenting the annotation operation. For example, the first user equipment determines a corresponding first video frame from the locally cached video frames according to the second frame-related information, and sends the first video frame to the second user equipment and/or the third user equipment. And after the second user equipment and/or the third user equipment receives the first video frame which is not coded, the first video frame is presented, and the second user and/or the third user can perform annotation operation based on the first video frame.
For example, after determining a first video frame which is not coded and decoded and corresponds to a second video frame, the smart glasses send the first video frame which is not coded and decoded to the tablet computer b and/or the tablet computer c in a lossless compression or high-quality compression mode, wherein the tablet computer b and the tablet computer c automatically judge whether to acquire the first video frame according to the quality of the current communication network connection, or select the sending mode of the first video frame according to the quality of the current communication network connection, for example, a lossless compression mode is adopted when the network quality is good, a high-quality compression mode is adopted when the network quality is poor, and the like.
In some embodiments, in step S010, the first user equipment sends the first video frame and the second frame related information to the second user equipment and/or the third user equipment, wherein the first video frame is used as a preferred frame for presenting the annotation operation in the second user equipment or the third user equipment.
For example, after determining a first video frame that is not encoded and decoded, the smart glasses send the first video frame and second frame related information corresponding to the first video frame to the tablet computer b and/or the tablet computer c. In some embodiments, the tablet computer performs multiple screen capturing operations in response to the operation of the user b, and the tablet computer b determines the screen capturing operation corresponding to the first video frame according to the second frame related information, for example, determines the screen capturing operation corresponding to the first video frame according to the second frame screen capturing time, and presents the second frame related information in the first video frame. And the tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window presenting the first video frame while presenting the first video frame.
Fig. 3 illustrates a method for real-time annotation of video frames at a second user equipment end according to another aspect of the present application, wherein the method comprises step S21, step S22, step S23 and step S24. In step S21, the second user equipment receives the video stream sent by the first user equipment; in step S22, the second user equipment sends, to the first user equipment, second frame related information of the intercepted second video frame according to a screenshot operation of the user in the video stream; in step S23, the second ue obtains the annotation operation information of the user on the second video frame; in step S24, the second user equipment sends the annotation operation information to the first user equipment. For example, the second user equipment receives and presents a video stream transmitted by the first user equipment; and the second user equipment determines a second video frame corresponding to the current screen capture picture based on the screen capture operation of the second user, and sends second frame related information of the second video frame to the first user equipment. And then, the second user equipment generates marking operation information based on the marking operation of the second user, and sends the marking operation information to the first user equipment.
For example, the user B holds the tablet computer B, the user A holds the smart glasses, and the tablet computer B and the smart glasses are in video communication through a wired or wireless network. And the tablet computer B receives and presents the video stream sent by the intelligent glasses, and determines a second video frame corresponding to the screen capture picture according to the screen capture operation of the user B. And then, the second frame related information corresponding to the second video frame is sent to the intelligent glasses by the tablet computer B, and the intelligent glasses receive the second frame related information and determine the corresponding first video frame based on the second frame related information. And the tablet personal computer generates corresponding labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the intelligent glasses in real time. The intelligent glasses present the first video frame at a preset position of the interface according to the first video frame and the marking operation information, and present the corresponding marking operation at a corresponding position in the first video frame in real time.
In some embodiments, in step S21, the second user equipment receives the video stream sent by the first user equipment, and the frame identification information of the video frames sent in the video stream; wherein the second frame related information comprises at least any one of: frame identification information of the second video frame; frame-related information generated based on the frame identification information of the second video frame. For example, the first user equipment sends a video stream to the second user equipment, and also sends frame identification information of a sent video frame in the video stream to the second user equipment, and the second user equipment receives the video stream and the frame identification information of the sent video frame in the video stream. The second user equipment determines a second video frame corresponding to the current screen capture picture based on the screen capture operation of the second user, and sends second frame related information of the second video frame to the first user equipment, wherein the second frame related information of the second video frame includes but is not limited to: frame identification information of the second video frame; frame-related information generated based on the frame identification information of the second video frame.
For example, the smart glasses send the frame identification information corresponding to the video frame in the video stream that has been sent to the tablet computer b while sending the video stream. The second tablet computer detects the screen capture operation of the second user, determines that the screen capture picture corresponds to a second video frame based on the current screen capture picture, and sends second frame related information corresponding to the second video frame to the smart glasses, wherein the second video frame related information includes but is not limited to: frame identification information of the second video frame, and frame-related information generated based on the frame identification information of the second video frame; the frame identification information of the second video frame may be a coding start time of the video frame or a number corresponding to the video frame, and the frame related information generated based on the frame identification information of the second video frame may be a decoding end time of the video frame or total time information of coding, decoding and transmission, and the like.
It should be understood by those skilled in the art that the content of the second frame related information in the above embodiments is only an example, and the content of other second frame related information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is also included herein by reference.
In some embodiments, the frame identification information comprises encoding start time information of the second video frame. For example, a first user equipment performs encoding processing on a video frame, and sends a corresponding video stream and frame identification information of a video frame that has been sent in the video stream to a second user equipment, where the frame identification information of the video frame includes an encoding start time of the video frame. In some embodiments, the second frame-related information comprises decoding end time information and total coding and transmission duration information of the second video frame. And the second user equipment receives and presents the video stream, records the corresponding decoding ending time, determines the corresponding second video frame based on screen capture operation, and determines the corresponding coding and decoding and total transmission duration information according to the coding starting time and the decoding ending time of the second video frame.
For example, the smart glasses record the encoding start time of each video frame, and after encoding, the encoding start time of the video frame and the transmitted video frame is transmitted to the tablet computer b. And the tablet computer B receives and presents the video frame and records the decoding ending time of the video frame. And the tablet computer determines a corresponding second video frame according to the screen capturing operation of the user B, and determines the total encoding and decoding and transmission duration information of the second video frame according to the encoding starting time and the decoding ending time corresponding to the second video frame. Subsequently, the tablet b sends second frame related information of the second video frame to the smart glasses, wherein the second frame related information includes but is not limited to: the encoding start time of the second video frame, the encoding and decoding of the second video frame, the total transmission duration information and the like.
In some embodiments, in step S23, the second ue obtains the annotation operation information of the user on the second video frame in real time; in step S24, the second user equipment sends the annotation operation to the first user equipment in real time. For example, the second user equipment obtains the corresponding annotation operation information in real time based on the operation of the second user, for example, the corresponding annotation operation information is collected at certain time intervals. And then, the second user equipment sends the acquired labeling operation information to the first user equipment in real time.
For example, the tablet computer b collects the labeling operation of the user b on the screen capture picture, such as a circle, an arrow, a character, a square box and other marks drawn on the screen by the user b. And the tablet computer B records the position and the path of the labeling brush, for example, the position of a corresponding labeling point is obtained through a plurality of points on the screen, and the labeled path is obtained by connecting the positions of the plurality of points. And the tablet computer B acquires the corresponding marking operation in real time and sends the marking operation to the intelligent glasses in real time, for example, the marking operation is acquired and sent at the frequency of one frame of 50 ms.
In some embodiments, the method further comprises step S25 (not shown). In step S25, the second user equipment receives the first video frame sent by the first user equipment, where the first video frame is used as the preferred frame for presenting the annotation operation, and loads and presents the first video frame in the display window of the second video frame to replace the second video frame, where the annotation operation is displayed on the first video frame. For example, the second user equipment determines a second video frame corresponding to the current screen capture picture, and sends second frame related information of the second video frame to the first user equipment; the first user equipment determines a first video frame which is not coded and decoded and corresponds to the second video frame based on second frame related information of the second video frame, and sends the first video frame to the second user equipment, and the second user equipment receives and presents the first video frame and acquires the labeling operation information of the second user on the first video frame.
For example, the tablet computer b enters a screen capture mode based on user operation and the like, determines a second video frame corresponding to the current picture, and sends second frame related information of the second video frame to the smart glasses terminal, where the second frame related information includes but is not limited to: the encoding start time of the second video frame or the number corresponding to the video frame. The intelligent glasses determine a corresponding first video frame which is not coded and decoded according to the second frame related information sent by the tablet computer B, and send the first video frame to the tablet computer B, for example, the first video frame is sent to the tablet computer B in a lossless compression mode, or the first video frame is sent to the tablet computer B through lossy compression with low loss, and the lossy compression process ensures that the quality of the video frame is higher than that of the video frame locally cached at the tablet computer B. And the second tablet computer receives and presents the first video frame, for example, the first video frame is presented in a small window form beside the current video, or the first video frame is displayed on a large screen, the current video is presented in a small window form, and the like. And then, the tablet computer B obtains the labeling operation information and the like related to the first video frame according to the operation of the second user.
In some embodiments, the method further comprises step S26 (not shown). In step S26, the second user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the annotation operation, determines, according to the second frame related information, that the first video frame is used to replace the second video frame, and loads and presents the first video frame in a display window of the second video frame to replace the second video frame, where the annotation operation is displayed on the first video frame.
For example, the second tablet computer receives a first video frame and second frame related information which are sent by the smart glasses and are not coded and decoded, wherein the second frame related information includes a screen capture time of the second video frame, a video frame number of the second video frame, and the like. In some embodiments, the tablet computer performs multiple screen capturing operations in response to the operation of the user b, and the tablet computer b determines the screen capturing operation corresponding to the first video frame according to the second frame related information, for example, determines the screen capturing operation corresponding to the first video frame according to the second frame screen capturing time, and presents the second frame related information in the first video frame. And the tablet computer B determines the current corresponding screen capturing operation according to the second frame related information, and presents the screen capturing operation in a small window form beside the current video, or displays a first video frame in a large screen form, presents the current video in a small window form, and the like, and presents the second frame related information in the presented first video frame when presenting the first video frame, such as the screen capturing time of the frame time presented in the first video frame or the frame number of the frame in the video frame, and the like. And then, the tablet computer B obtains the labeling operation information and the like related to the first video frame according to the operation of the second user.
In some embodiments, in step S21, the second user device receives the video stream sent by the first user device to the second user device and the third user device; in step S24, the second user equipment sends the annotation operation information to the first user equipment and the third user equipment.
For example, the user A holds the intelligent glasses, the user B holds the tablet computer B, the user C holds the tablet computer C, the intelligent glasses establish video communication with the tablet computer B and the tablet computer C through a wired or wireless network, the intelligent glasses encode currently acquired pictures and then send the encoded pictures to each tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses and the tablet computer C, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. And the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the intelligent glasses and the tablet computer C in real time.
Fig. 4 illustrates a method for real-time annotation of video frames at a third user equipment end according to yet another aspect of the present application, wherein the method comprises step S31, step S32, step S33, step S34 and step S35. In step S31, the third ue receives the video streams sent by the first ue to the second ue and the third ue; in step S32, the third user equipment receives second frame related information of a second video frame intercepted by the second user equipment in the video stream; in step S33, the third user equipment determines a third video frame corresponding to the second video frame in the video stream according to the second frame-related information; in step S34, the third user equipment receives annotation operation information of the second user equipment on the second video frame; in step S35, the third user equipment presents the corresponding annotation operation on the third video frame in real time according to the annotation operation information.
For example, the user A holds the intelligent glasses, the user B holds the tablet computer B, the user C holds the tablet computer C, the intelligent glasses establish video communication with the tablet computer B and the tablet computer C through a wired or wireless network, the intelligent glasses encode currently acquired pictures and then send the encoded pictures to each tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses and the tablet computer C, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The tablet computer B generates real-time marking operation information according to marking operation of the user B, and sends the marking operation information to the intelligent glasses and the tablet computer C in real time, after the intelligent glasses receive the marking operation information, the corresponding first video frames which are not coded and decoded are displayed in the preset area of the intelligent glasses, and the corresponding marking operation is displayed in real time at the position corresponding to the first video frames. Similarly, the tablet computer c finds a corresponding third video frame in the coded and decoded video library locally cached at the tablet computer c terminal according to the second frame related information based on the received second frame related information and the tagging operation information, and presents a corresponding tagging operation in the third video frame based on the third video frame and the tagging operation information.
In some embodiments, the method further comprises step S36 (not shown). In step S36, the third user equipment receives the first video frame sent by the first user equipment, wherein the first video is used as the preferred frame for presenting the annotation operation, and loads the presentation of the first video frame in the display window of the third video frame to replace the third video frame, wherein the annotation operation is displayed on the first video frame.
For example, the tablet computer b enters a screen capture mode based on user operation and the like, determines a second video frame corresponding to the current picture, and sends second frame related information of the second video frame to the smart glasses terminal, where the second frame related information includes but is not limited to: the encoding start time of the second video frame or the number corresponding to the video frame. The intelligent glasses determine a corresponding first video frame which is not coded and decoded according to second frame related information sent by the tablet computer B, and send the first video frame to the tablet computer B and the tablet computer C, for example, the first video frame is sent to the tablet computer B and the tablet computer C in a lossless compression mode, or the first video frame is sent to the tablet computer B and the tablet computer C in a lossy compression mode with low loss, and the lossy compression process ensures that the quality of the video frame is higher than that of the video frame locally cached at the tablet computer B and the tablet computer C. And the tablet computer receives and presents the first video frame, for example, the first video frame is presented in a small window form beside the current video, or the first video frame is displayed on a large screen, the current video is presented in a small window form, and the like. And then, the tablet computer C receives the annotation operation information sent by the tablet computer B, and the annotation operation is presented in the first video frame.
In some embodiments, the method further comprises step S37 (not shown). In step S37, the third user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the annotation operation, determines, according to the second frame related information, that the first video frame is used to replace the third video frame, and loads and presents the first video frame in a display window of the third video frame to replace the third video frame, where the annotation operation is displayed on the first video frame.
For example, the tablet pc receives a first video frame and second frame related information sent by the smart glasses without encoding and decoding, where the second frame related information includes a screen capture time of the second video frame, a video frame number of the second video frame, and the like. The tablet computer c receives and presents the first video frame, for example, presents the first video frame in a form of a small window beside the current video, or displays the first video frame on a large screen, presents the current video in a form of a small window, and the like, and meanwhile, when the first video frame is presented, the tablet computer c presents the related information of the second frame in the presented first video frame, for example, presents the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame, and the like. And then, the tablet computer C receives the annotation operation information sent by the tablet computer B, and the annotation operation is presented in the first video frame.
Fig. 5 illustrates a method for real-time annotation of video frames at a network device end according to yet another aspect of the present application, wherein the method includes step S41, step S42, step S43, step S44, and step S45. In step S41, the network device receives and forwards a video stream sent by the first user device to the second user device; in step S42, the network device receives second frame-related information of a second video frame intercepted by the second user device in the video stream; in step S43, the network device forwards the second frame related information to the first user equipment; in step S44, the network device receives annotation operation information of the second user device on the second video frame; in step S45, the network device forwards the tagging operation information to the first user equipment.
For example, a user A holds the smart glasses, a user B holds the tablet computer B, and the smart glasses and the tablet computer B perform video communication through a cloud. The intelligent glasses encode the currently acquired pictures and then send the encoded pictures to the cloud end, and the cloud end forwards the coded pictures to the tablet computer B, wherein the intelligent glasses cache a period of time or a certain number of video frames when sending videos. The second terminal of the tablet computer receives and decodes the video stream and then presents the video stream, a second video frame corresponding to the screen capture picture is determined based on screen capture operation of the user B, second frame related information corresponding to the second video frame is sent to the cloud terminal and forwarded to the intelligent glasses by the cloud terminal, wherein the second frame related information comprises but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The tablet computer B generates real-time marking operation information according to marking operation of the user B, the marking operation information is sent to the cloud end and sent to the intelligent glasses through the cloud end, and after the intelligent glasses receive the marking operation information, corresponding first video frames which are not coded and decoded are displayed in a preset area of the intelligent glasses, and corresponding marking operation is presented in real time at a position corresponding to the first video frames.
It should be understood by those skilled in the art that the content of the second frame related information in the above embodiments is only an example, and other content of the second frame related information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is therefore included herein by reference.
In some embodiments, in step S41, the network device receives and forwards a video stream sent by the first user device to the second user device, and frame identification information of video frames sent in the video stream. For example, a first user equipment encodes a video frame, and sends frame identification information of a video frame that has been sent in a corresponding video stream and the video stream to a network device, and the network device forwards the frame identification information of the video stream and the sent video frame to a second user equipment, where the frame identification information includes an encoding start time of the video frame. In other embodiments, in step S43, the network device determines frame identification information of a video frame corresponding to the second video frame in the video stream according to the second frame-related information, and sends the frame identification information of the video frame corresponding to the second video frame to the first user device.
For example, the cloud receives a video stream sent by the smart glasses and frame identification information of a video frame already sent in the video stream, such as a start time of encoding of each video frame. And the cloud terminal forwards the video stream and the frame identification information corresponding to the sent video frame to the tablet computer B. And the tablet computer B receives and presents the video frame and records the decoding ending time of the video frame. And the tablet computer determines a corresponding second video frame according to the screen capturing operation of the user B, and sends second frame related information of the second video frame to the cloud end, wherein the second frame related information comprises a decoding ending moment corresponding to the second video frame or a video number of the second video frame and the like. The cloud end receives second frame related information of a second video frame sent by the tablet computer B, and determines frame identification information of the corresponding second frame based on the second frame related information, for example, a coding start time of the second frame or a video number of the second video frame is determined according to a decoding end time of the second video frame or the video number of the second video frame.
In some embodiments, in step S41, the network device receives and forwards the video stream sent by the first user device to the second user device and the third user device; in step S43, the network device forwards the second frame related information to the first user equipment and the third user equipment; in step S45, the network device forwards the tagging operation information to the first user device and the third user device.
For example, the user a holds the smart glasses, the user b holds the tablet computer b, the user c holds the tablet computer c, the smart glasses, the tablet computer b, and the tablet computer c establish video communication through the network device, the smart glasses encode a currently acquired picture and then transmit the encoded picture to the network device, and buffer a period of time or a certain number of video frames, and the network device transmits the video stream to the tablet computer b and the tablet computer c.
The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user b, and sends second frame related information corresponding to the second video frame to the network device, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The network device forwards the second frame related information to the first user equipment and the second user equipment.
The tablet computer B generates real-time marking operation information according to marking operation of the user B, the marking operation information is transmitted to the intelligent glasses and the tablet computer C in real time through the network equipment, and after the intelligent glasses receive the marking operation information, the corresponding first video frames which are not coded and decoded are displayed in the preset area of the intelligent glasses, and the corresponding marking operation is displayed in real time at the position corresponding to the first video frames. Similarly, the tablet computer c finds a corresponding third video frame in the coded and decoded video library locally cached at the tablet computer c terminal according to the second frame related information based on the received second frame related information and the tagging operation information, and presents a corresponding tagging operation in the third video frame based on the third video frame and the tagging operation information.
FIG. 6 illustrates a method for real-time annotation of video frames, in accordance with an aspect of the subject application, wherein the method comprises:
the first user equipment sends a video stream to the second user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the first user equipment according to screenshot operation of a user in the video stream;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the first user equipment;
and the first user equipment receives the annotation operation information and presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
FIG. 7 illustrates a method for real-time annotation of video frames, in accordance with another aspect of the subject application, wherein the method comprises:
the first user equipment sends a video stream to the network equipment;
the network equipment receives the video stream and forwards the video stream to second user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the network equipment according to screenshot operation of a user in the video stream;
the network device receives the second frame related information and forwards the second frame related information to the first user equipment;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the network equipment;
the network equipment receives the marking operation information of the second user equipment on the second video frame, and forwards the marking operation information to the first user equipment;
and the first user equipment receives the annotation operation information and presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
According to yet another aspect of the present application, a method for real-time annotation of video frames, wherein the method comprises:
the first user equipment sends video streams to the second user equipment and the third user equipment;
the second user equipment sends second frame related information of the intercepted second video frame to the first user equipment and the third user equipment according to screenshot operation of a user in the video stream;
the second user equipment acquires the labeling operation information of the user on the second video frame and sends the labeling operation information to the first user equipment and the third user equipment;
the first user equipment receives the second frame related information, determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information, receives the annotation operation information, and presents corresponding annotation operation on the first video frame in real time according to the annotation operation information;
and the third user equipment receives the video stream, receives second frame related information of the second video frame, receives the annotation operation information, determines a third video frame corresponding to the second video frame in the video stream according to the second frame related information, and presents corresponding annotation operation on the third video frame in real time according to the annotation operation information.
According to one aspect of the present application, a method for real-time annotation of video frames is provided, wherein the method comprises:
the first user equipment sends a video stream to the network equipment;
the network equipment receives the video stream and forwards the video stream to second user equipment and third user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the network equipment according to screenshot operation of a user in the video stream;
the network device receives the second frame related information and forwards the second frame related information to the first user device and the third user device;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream according to the second frame related information;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the network equipment;
the network equipment receives the marking operation information of the second user equipment on the second video frame, and forwards the marking operation information to the first user equipment and the third user equipment;
the first user equipment receives the annotation operation information and presents corresponding annotation operation on the first video frame in real time according to the annotation operation information;
and the third user equipment receives the video stream, receives second frame related information of the second video frame, receives the annotation operation information, determines a third video frame corresponding to the second video frame in the video stream according to the second frame related information, and presents corresponding annotation operation on the third video frame in real time according to the annotation operation information.
Fig. 8 shows a first user equipment for real-time annotation of video frames according to an aspect of the present application, wherein the first user equipment comprises a video sending module 11, a frame information receiving module 12, a video frame determining module 13, an annotation receiving module 14 and an annotation presenting module 15. The video sending module 11 is configured to send a video stream to the second user equipment; a frame information receiving module 12, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream; a video frame determining module 13, configured to determine, according to the second frame-related information, a first video frame corresponding to the second video frame in the video stream; the annotation receiving module 14 is configured to receive annotation operation information of the second user equipment on the second video frame; and the annotation presenting module 15 is configured to present, in real time, the corresponding annotation operation on the first video frame according to the annotation operation information.
Specifically, the video sending module 11 is configured to send a video stream to the second user equipment. For example, the first user equipment establishes communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes a video stream in a video communication mode and then sends the encoded video stream to the second user equipment.
A frame information receiving module 12, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream. For example, the second user equipment determines second frame related information of a video frame corresponding to the screenshot based on the screenshot operation of the second user, and then the first user equipment receives the second frame related information of the second video frame sent by the second user equipment, where the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like.
A video frame determining module 13, configured to determine, according to the second frame-related information, a first video frame corresponding to the second video frame in the video stream. For example, the first user equipment locally stores a period of time or a certain number of sent uncoded video frames, and the first user equipment determines the uncoded first video frame corresponding to the screenshot in the locally stored uncoded video frames according to the second frame related information sent by the second user equipment.
And an annotation receiving module 14, configured to receive annotation operation information of the second user equipment on the second video frame. For example, the second user device generates corresponding tagging operation information based on the tagging operation of the second user, and sends the tagging operation information to the first user device in real time, and the first user receives the tagging operation information.
And the annotation presenting module 15 is configured to present, in real time, the corresponding annotation operation on the first video frame according to the annotation operation information. For example, the second user equipment presents the corresponding annotation operation on the first video frame in real time based on the received annotation operation information, such as displaying the first video frame in the form of a small window in the current interface, and then presents the corresponding annotation operation at a rate of one frame per 50ms at the corresponding position of the first video frame.
For example, the user A holds the smart glasses, the user B holds the tablet computer B, the smart glasses and the tablet computer B establish video communication through a wired or wireless network, the smart glasses encode a currently acquired picture and then send the encoded picture to the tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. And the tablet computer B generates real-time marking operation information according to the marking operation of the user B, and sends the marking operation information to the intelligent glasses in real time, and after receiving the marking operation information, the intelligent glasses display a corresponding first video frame which is not coded and decoded in a preset area of the intelligent glasses, and display the corresponding marking operation in real time at a position corresponding to the first video frame.
It should be understood by those skilled in the art that the content of the second frame related information in the above embodiments is only an example, and other content of the second frame related information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is therefore included herein by reference.
In some embodiments, the apparatus further comprises a storage module 16 (not shown). A storage module 16, configured to store video frames in the video stream; wherein, the video frame determining module 13 is configured to determine, according to the second frame-related information, a first video frame corresponding to the second video frame from the stored video frames. For example, the first user equipment sends a video stream to the second user equipment, and locally stores a period of time or a certain number of video frames which are not coded and decoded, where the period of time or the certain number may be a preset fixed value, or may be a threshold value dynamically adjusted according to a network condition or a transmission rate; subsequently, the first user equipment determines a corresponding first video frame which is not coded and decoded in the locally stored video frames based on second frame related information of a second video frame transmitted by the second user equipment. In other embodiments, the stored video frames satisfy, but are not limited to, at least any one of: the time interval between the sending time of the stored video frame and the current time is less than or equal to the video frame storage duration threshold; the cumulative number of stored video frames is less than or equal to a predetermined video frame storage number threshold.
The intelligent glasses send the collected pictures to the tablet computer B, and locally store a period of time or a certain number of video frames which are not coded and decoded, wherein the period of time or the certain number of video frames can be a fixed value preset by a system or a person, such as a certain period of time or a certain number of video frame thresholds obtained through big data statistical analysis; the duration or number of video frames may also be a video frame threshold that is dynamically adjusted based on network conditions or transmission rate. The dynamically adjusted time length or number threshold may be determined according to the total encoding/decoding and transmission time length information of the video frame, for example, the total encoding/decoding and transmission time length of the current video frame is calculated, the time length is used as a unit time length or the number of video frames that can be transmitted in the time length is used as a unit number, and then the time length or number threshold of the dynamic video frame is set by using the current unit time length or unit number as a reference. Here, the predetermined or dynamic video frame storage duration threshold is set to be greater than or equal to one unit duration, and similarly, the predetermined or dynamic video frame storage number threshold is set to be greater than or equal to one unit number. And then, the intelligent glasses determine a corresponding first video frame which is not coded and decoded in the stored video according to second frame related information of a second video frame sent by the tablet computer B, wherein the interval between the sending time of the stored video frame and the current time is less than or equal to a video frame storage duration threshold, or the accumulated number of the stored video frames is less than or equal to a preset video frame storage number threshold.
In some embodiments, the apparatus further comprises a threshold adjustment module 17 (not shown). And the threshold adjusting module 17 is configured to acquire total encoding and decoding and transmission duration information of video frames in the video stream, and adjust the video frame storage duration threshold or the video frame storage quantity threshold according to the total encoding and decoding and transmission duration information. For example, the first user equipment records the encoding start time of each video frame, and sends the video frame to the second user equipment after encoding, and the second user equipment receives and records the decoding end time of each video frame; and then, the second user equipment sends the decoding end time of the video frame to the first user equipment, the first user equipment calculates the total coding and decoding and transmission time length information of the current video frame based on the coding start time and the decoding end time, or the second user equipment calculates the total coding and decoding and transmission time length information of the current video frame based on the coding start time and the decoding end time and sends the total coding and decoding and transmission time length information to the first user equipment. The first user equipment adjusts the video frame storage duration threshold or the video frame storage quantity threshold based on the coding and decoding and transmission total duration information, and if the duration information is used as a unit time reference, a certain multiple of video frame duration is set as a video frame storage duration threshold; for another example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frames, and the number is used as a unit number, and the number of video frames of a certain multiple is set as a threshold of the storage number of video frames.
For example, the intelligent glasses record the ith video frame encoding starting time as TsiAfter being coded, the video frame is sent to a tablet computer B, and the tablet computer B receives and records the video frame decoding ending time as Tei. Then, the tablet computer B decodes the video frame at the end time TeiSending the video frame to the intelligent glasses, and finishing decoding by the intelligent glasses according to the received ith video frameCarving TeiAnd a locally recorded coding start time Tsi,Calculating the total encoding/decoding and transmission time length T of the video framei=Tei-Tsi(ii) a The intelligent glasses can start the coding at the moment TsiSending the video frame to a tablet computer B, wherein the tablet computer B is based on the coding end time TeiCalculating the total encoding/decoding and transmission time length T of the video framei=Tei-TsiAnd the total time length T of the coding, decoding and transmissioniReturning to the smart glasses.
The intelligent glasses encode and decode according to the ith video frame and transmit the total time length TiAnd determining the dynamically stored 1.3T of the intelligent glasses according to big data statisticsiThe video frame duration within time. Or, dynamically adjusting the multiplying power according to the network transmission rate, for example, setting the buffer duration threshold to be (1+ k) TiAnd k is a threshold value adjusted according to network fluctuation, and if the network fluctuation is large, k is set to 0.5, and if the network fluctuation is small, k is set to 0.2 and the like. In another example, the smart glasses encode/decode and transmit the total duration T according to the ith video frameiAnd calculating a time length T according to the frequency f of the video frame sent by the current intelligent glassesiThe number of video frames transmitted in-line is TiF, and further determining that the threshold value of the number of the stored video frames is 1.3N, wherein N is a numerical value obtained by adopting a further method for rounding. Further, the smart glasses may dynamically adjust the magnification according to the current network transmission rate, for example, setting the threshold of the buffer amount to (1+ k) N, where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, k is set to 0.5, and if the network fluctuation is small, k is set to 0.2, and the like.
It should be understood by those skilled in the art that the contents of the storage duration threshold and/or the storage quantity threshold in the above embodiments are only examples, and other contents of the storage duration threshold and/or the storage quantity threshold in the prior art or in the future shall also fall within the protection scope of the present application if applicable to the present application, and are hereby incorporated by reference.
In some embodiments, the video sending module 11 is configured to send a video stream and frame identification information of a sent video frame in the video stream to a second user equipment; the video frame determining module 13 is configured to determine, according to the second frame-related information, a first video frame corresponding to the second video frame in the video stream, where the frame identification information of the first video frame corresponds to the second frame-related information. The frame identification information of the video frame may be coding and decoding time corresponding to the video frame, or may be a number corresponding to the video. In some embodiments, the video sending module 11 is configured to perform encoding processing on multiple video frames to be transmitted, and send frame identification information of a corresponding video stream and a sent video frame in the video stream to the second user equipment. For example, the first user equipment performs encoding processing on a plurality of video frames to be transmitted, acquires encoding start times of the plurality of video frames to be transmitted, and sends the plurality of video frames and the encoding start times thereof to the second user equipment. In some embodiments, the frame identification information of a sent video frame in the video stream includes encoding start time information of the sent video frame.
For example, the smart glasses record the encoding start time of each video frame, and send the video frame and the encoding start time of the sent video frame to the tablet computer b after encoding, wherein the sent video frame includes the video frame to be sent and the sent video frame after the current encoding is completed. Here, the smart glasses may transmit the encoding start time of the transmitted video frame to the tablet b at a certain time interval or a certain number of video frames at a certain interval, or may directly transmit the encoding start time of the first video frame and the video frame to the tablet b at the same time. The tablet computer determines a video frame corresponding to the screen capture picture according to the screen capture operation of the user B, and sends second frame related information of a corresponding second video frame to the smart glasses, wherein the second frame related information corresponds to the second frame identification information, and the second frame related information includes but is not limited to at least any one of the following: the encoding start time of the second video frame, the decoding end time of the second video frame, the total encoding and decoding and transmission duration information of the second video frame, the corresponding number or image of the second video frame, and the like. The intelligent glasses receive the second frame related information, and determine the corresponding stored first video frame without encoding according to the second frame related information, for example, determine the encoding start time of the first video frame without encoding corresponding to the second video frame according to the encoding start time, the decoding end time, the encoding and decoding of the second video frame, and the total transmission duration information of the second video frame, and further determine the corresponding first video frame, for example, directly determine the first video frame with the same number according to the number corresponding to the second video frame, and further determine the corresponding first video frame in the stored video frame without encoding according to the image identification of the second video frame.
It should be understood by those skilled in the art that the content of the frame id information in the above embodiments is only an example, and other content of the frame id information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is therefore included by reference herein.
In some embodiments, the device further comprises a video frame presentation module 18 (not shown). A video frame presentation module 18 for presenting the first video frame; and the annotation presenting module 15 is configured to superimpose and present a corresponding annotation operation on the first video frame according to the annotation operation information. For example, the first user equipment determines a first video frame which is not coded and decoded, and displays the first video frame at a preset position in a current interface or in a small window form; and then, the first user equipment displays the corresponding annotation operation in an overlapping mode at the corresponding position of the first video frame according to the received annotation operation information in real time.
For example, the smart glasses determine a corresponding first video frame without encoding and decoding according to the second frame related information sent by the tablet computer b, and display the first video frame at a position preset on an interface of the smart glasses. And then, the intelligent glasses receive the real-time marking operation sent by the tablet computer B, determine the corresponding position of the marking operation in the currently displayed first video frame, and present the current marking operation at the corresponding position in real time.
In some embodiments, the apparatus further comprises a first preferred frame module 19 (not shown). A first preferred frame module 19, configured to send the first video frame to the second user equipment as a preferred frame for presenting the annotation operation. For example, a first user equipment determines a first video frame which is not coded and sends the first video frame to a second user equipment, so that the second user equipment can present the first video frame with higher quality.
For example, the smart glasses determine a corresponding first video frame without encoding and decoding according to the second frame related information sent by the tablet computer b, and send the first video frame to the tablet computer b as a preferred frame, for example, the first video frame is sent to the tablet computer b in a lossless compression mode, or the first video frame is sent to the tablet computer b through lossy compression with lower loss, and the lossy compression process ensures higher quality than the video frame locally cached at the tablet computer b. And the tablet computer B receives the first video frame and presents the first video frame.
In some embodiments, the video sending module 11 is configured to send the video stream to the second user equipment and the third user equipment. For example, communication connections are established among first user equipment, second user equipment, and third user equipment, where the first user equipment is a current video frame sender, the second user equipment and the third user equipment are current video frame receivers, and the first user equipment sends video streams to the second user equipment and the third user equipment through the communication connections.
For example, the user A holds the intelligent glasses, the user B holds the tablet computer B, the user C holds the tablet computer C, the intelligent glasses establish video communication with the tablet computer B and the tablet computer C through a wired or wireless network, the intelligent glasses encode currently acquired pictures and then send the encoded pictures to each tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses and the tablet computer C, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The tablet computer B generates real-time marking operation information according to marking operation of the user B, and sends the marking operation information to the intelligent glasses and the tablet computer C in real time, after the intelligent glasses receive the marking operation information, the corresponding first video frames which are not coded and decoded are displayed in the preset area of the intelligent glasses, and the corresponding marking operation is displayed in real time at the position corresponding to the first video frames. Similarly, the tablet computer c finds a corresponding third video frame in the coded and decoded video library locally cached at the tablet computer c terminal according to the second frame related information based on the received second frame related information and the tagging operation information, and presents a corresponding tagging operation in the third video frame based on the third video frame and the tagging operation information.
In some embodiments, the apparatus further comprises a second preferred frame module 010 (not shown). A second preferred frame module 010, configured to send the first video frame to the second user equipment and/or the third user equipment as a preferred frame for presenting the annotation operation. For example, the first user equipment determines a corresponding first video frame from the locally cached video frames according to the second frame-related information, and sends the first video frame to the second user equipment and/or the third user equipment. And after the second user equipment and/or the third user equipment receives the first video frame which is not coded, the first video frame is presented, and the second user and/or the third user can perform annotation operation based on the first video frame.
For example, after determining a first video frame which is not coded and decoded and corresponds to a second video frame, the smart glasses send the first video frame which is not coded and decoded to the tablet computer b and/or the tablet computer c in a lossless compression or high-quality compression mode, wherein the tablet computer b and the tablet computer c automatically judge whether to acquire the first video frame according to the quality of the current communication network connection, or select the sending mode of the first video frame according to the quality of the current communication network connection, for example, a lossless compression mode is adopted when the network quality is good, a high-quality compression mode is adopted when the network quality is poor, and the like.
In some embodiments, the second preferred frame module 010 is configured to send the first video frame and the second frame related information to the second user equipment and/or the third user equipment, wherein the first video frame is used as a preferred frame for presenting the annotation operation in the second user equipment or the third user equipment.
For example, after determining a first video frame that is not encoded and decoded, the smart glasses send the first video frame and second frame related information corresponding to the first video frame to the tablet computer b and/or the tablet computer c. In some embodiments, the tablet computer performs multiple screen capturing operations in response to the operation of the user b, and the tablet computer b determines the screen capturing operation corresponding to the first video frame according to the second frame related information, for example, determines the screen capturing operation corresponding to the first video frame according to the second frame screen capturing time, and presents the second frame related information in the first video frame. And the tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window presenting the first video frame while presenting the first video frame.
Fig. 9 shows a second user equipment for real-time annotation of video frames according to another aspect of the present application, wherein the second user equipment comprises a video receiving module 21, a frame information determining module 22, an annotation obtaining module 23, and an annotation sending module 24. A video receiving module 21, configured to receive a video stream sent by a first user equipment; a frame information determining module 22, configured to send, to the first user equipment, second frame related information of a captured second video frame according to a screenshot operation of a user in the video stream; the annotation acquisition module 23 is configured to acquire annotation operation information of the user on the second video frame; and a label sending module 24, configured to send the label operation information to the first user equipment. For example, the second user equipment receives and presents a video stream transmitted by the first user equipment; and the second user equipment determines a second video frame corresponding to the current screen capture picture based on the screen capture operation of the second user, and sends second frame related information of the second video frame to the first user equipment. And then, the second user equipment generates marking operation information based on the marking operation of the second user, and sends the marking operation information to the first user equipment.
For example, the user B holds the tablet computer B, the user A holds the smart glasses, and the tablet computer B and the smart glasses are in video communication through a wired or wireless network. And the tablet computer B receives and presents the video stream sent by the intelligent glasses, and determines a second video frame corresponding to the screen capture picture according to the screen capture operation of the user B. And then, the second frame related information corresponding to the second video frame is sent to the intelligent glasses by the tablet computer B, and the intelligent glasses receive the second frame related information and determine the corresponding first video frame based on the second frame related information. And the tablet personal computer generates corresponding labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the intelligent glasses in real time. The intelligent glasses present the first video frame at a preset position of the interface according to the first video frame and the marking operation information, and present the corresponding marking operation at a corresponding position in the first video frame in real time.
In some embodiments, the video receiving module 21 is configured to receive a video stream sent by a first user equipment, and frame identification information of a video frame sent in the video stream; wherein the second frame related information comprises at least any one of: frame identification information of the second video frame; frame-related information generated based on the frame identification information of the second video frame. For example, the first user equipment sends a video stream to the second user equipment, and also sends frame identification information of a sent video frame in the video stream to the second user equipment, and the second user equipment receives the video stream and the frame identification information of the sent video frame in the video stream. The second user equipment determines a second video frame corresponding to the current screen capture picture based on the screen capture operation of the second user, and sends second frame related information of the second video frame to the first user equipment, wherein the second frame related information of the second video frame includes but is not limited to: frame identification information of the second video frame; frame-related information generated based on the frame identification information of the second video frame.
For example, the smart glasses send the frame identification information corresponding to the video frame in the video stream that has been sent to the tablet computer b while sending the video stream. The second tablet computer detects the screen capture operation of the second user, determines that the screen capture picture corresponds to a second video frame based on the current screen capture picture, and sends second frame related information corresponding to the second video frame to the smart glasses, wherein the second video frame related information includes but is not limited to: frame identification information of the second video frame, and frame-related information generated based on the frame identification information of the second video frame; the frame identification information of the second video frame may be a coding start time of the video frame or a number corresponding to the video frame, and the frame related information generated based on the frame identification information of the second video frame may be a decoding end time of the video frame or total time information of coding, decoding and transmission, and the like.
It should be understood by those skilled in the art that the content of the second frame related information in the above embodiments is only an example, and the content of other second frame related information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is also included herein by reference.
In some embodiments, the frame identification information comprises encoding start time information of the second video frame. For example, a first user equipment performs encoding processing on a video frame, and sends a corresponding video stream and frame identification information of a video frame that has been sent in the video stream to a second user equipment, where the frame identification information of the video frame includes an encoding start time of the video frame. In some embodiments, the second frame-related information comprises decoding end time information and total coding and transmission duration information of the second video frame. And the second user equipment receives and presents the video stream, records the corresponding decoding ending time, determines the corresponding second video frame based on screen capture operation, and determines the corresponding coding and decoding and total transmission duration information according to the coding starting time and the decoding ending time of the second video frame.
For example, the smart glasses record the encoding start time of each video frame, and after encoding, the encoding start time of the video frame and the transmitted video frame is transmitted to the tablet computer b. And the tablet computer B receives and presents the video frame and records the decoding ending time of the video frame. And the tablet computer determines a corresponding second video frame according to the screen capturing operation of the user B, and determines the total encoding and decoding and transmission duration information of the second video frame according to the encoding starting time and the decoding ending time corresponding to the second video frame. Subsequently, the tablet b sends second frame related information of the second video frame to the smart glasses, wherein the second frame related information includes but is not limited to: the encoding start time of the second video frame, the encoding and decoding of the second video frame, the total transmission duration information and the like.
In some embodiments, the annotation obtaining module 23 is configured to obtain, in real time, annotation operation information of the user on the second video frame; the annotation sending module 24 is configured to send the annotation operation to the first user equipment in real time. For example, the second user equipment obtains the corresponding annotation operation information in real time based on the operation of the second user, for example, the corresponding annotation operation information is collected at certain time intervals. And then, the second user equipment sends the acquired labeling operation information to the first user equipment in real time.
For example, the tablet computer b collects the labeling operation of the user b on the screen capture picture, such as a circle, an arrow, a character, a square box and other marks drawn on the screen by the user b. And the tablet computer B records the position and the path of the labeling brush, for example, the position of a corresponding labeling point is obtained through a plurality of points on the screen, and the labeled path is obtained by connecting the positions of the plurality of points. And the tablet computer B acquires the corresponding marking operation in real time and sends the marking operation to the intelligent glasses in real time, for example, the marking operation is acquired and sent at the frequency of one frame of 50 ms.
In some embodiments, the apparatus further comprises a first video frame replacement module 25 (not shown). A first video frame replacing module 25, configured to receive a first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the annotation operation, and load and present the first video frame in a display window of the second video frame to replace the second video frame, where the annotation operation is displayed in the first video frame. For example, the second user equipment determines a second video frame corresponding to the current screenshot, and sends second frame related information of the second video frame to the first user equipment; the first user equipment determines a first video frame which is not coded and decoded and corresponds to the second video frame based on second frame related information of the second video frame, and sends the first video frame to the second user equipment, and the second user equipment receives and presents the first video frame and acquires the labeling operation information of the second user on the first video frame.
For example, the tablet computer b enters a screen capture mode based on user operation and the like, determines a second video frame corresponding to the current picture, and sends second frame related information of the second video frame to the smart glasses terminal, where the second frame related information includes but is not limited to: the encoding start time of the second video frame or the number corresponding to the video frame. The intelligent glasses determine a corresponding first video frame which is not coded and decoded according to the second frame related information sent by the tablet computer B, and send the first video frame to the tablet computer B, for example, the first video frame is sent to the tablet computer B in a lossless compression mode, or the first video frame is sent to the tablet computer B through lossy compression with low loss, and the lossy compression process ensures that the quality of the video frame is higher than that of the video frame locally cached at the tablet computer B. And the second tablet computer receives and presents the first video frame, for example, the first video frame is presented in a small window form beside the current video, or the first video frame is displayed on a large screen, the current video is presented in a small window form, and the like. And then, the tablet computer B obtains the labeling operation information and the like related to the first video frame according to the operation of the second user.
In some embodiments, the apparatus further includes a first video frame annotation module 26 (not shown). A first video frame tagging module 26, configured to receive a first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the tagging operation, determine, according to the second frame related information, that the first video frame is used to replace the second video frame, and load and present the first video frame in a display window of the second video frame to replace the second video frame, where the tagging operation is displayed in the first video frame.
For example, the second tablet computer receives a first video frame and second frame related information which are sent by the smart glasses and are not coded and decoded, wherein the second frame related information includes a screen capture time of the second video frame, a video frame number of the second video frame, and the like. In some embodiments, the tablet computer performs multiple screen capturing operations in response to the operation of the user b, and the tablet computer b determines the screen capturing operation corresponding to the first video frame according to the second frame related information, for example, determines the screen capturing operation corresponding to the first video frame according to the second frame screen capturing time, and presents the second frame related information in the first video frame. And the tablet computer B determines the current corresponding screen capturing operation according to the second frame related information, and presents the screen capturing operation in a small window form beside the current video, or displays a first video frame in a large screen form, presents the current video in a small window form, and the like, and presents the second frame related information in the presented first video frame when presenting the first video frame, such as the screen capturing time of the frame time presented in the first video frame or the frame number of the frame in the video frame, and the like. And then, the tablet computer B obtains the labeling operation information and the like related to the first video frame according to the operation of the second user.
In some embodiments, the video receiving module 21 is configured to receive video streams sent by the first user equipment to the second user equipment and the third user equipment; the label sending module 24 is configured to send the label operation information to the first user equipment and the third user equipment.
For example, the user A holds the intelligent glasses, the user B holds the tablet computer B, the user C holds the tablet computer C, the intelligent glasses establish video communication with the tablet computer B and the tablet computer C through a wired or wireless network, the intelligent glasses encode currently acquired pictures and then send the encoded pictures to each tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses and the tablet computer C, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. And the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the intelligent glasses and the tablet computer C in real time.
Fig. 10 shows an apparatus for real-time annotation of video frames at a third user equipment end according to yet another aspect of the present application, wherein the apparatus comprises a third video receiving module 31, a third frame information receiving module 32, a third video frame determining module 33, a third annotation receiving module 34 and a third rendering module 35. A third video receiving module 31, configured to receive video streams sent by the first user equipment to the second user equipment and the third user equipment; a third frame information receiving module 32, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream; a third video frame determining module 33, configured to determine, according to the second frame-related information, a third video frame corresponding to the second video frame in the video stream; a third annotation receiving module 34, configured to receive annotation operation information of the second user equipment on the second video frame; and a third presenting module 35, configured to present, in real time, a corresponding annotation operation on the third video frame according to the annotation operation information.
For example, the user A holds the intelligent glasses, the user B holds the tablet computer B, the user C holds the tablet computer C, the intelligent glasses establish video communication with the tablet computer B and the tablet computer C through a wired or wireless network, the intelligent glasses encode currently acquired pictures and then send the encoded pictures to each tablet computer B, and cache a period of time or a certain number of video frames. The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user B, and sends second frame related information corresponding to the second video frame to the smart glasses and the tablet computer C, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The tablet computer B generates real-time marking operation information according to marking operation of the user B, and sends the marking operation information to the intelligent glasses and the tablet computer C in real time, after the intelligent glasses receive the marking operation information, the corresponding first video frames which are not coded and decoded are displayed in the preset area of the intelligent glasses, and the corresponding marking operation is displayed in real time at the position corresponding to the first video frames. Similarly, the tablet computer c finds a corresponding third video frame in the coded and decoded video library locally cached at the tablet computer c terminal according to the second frame related information based on the received second frame related information and the tagging operation information, and presents a corresponding tagging operation in the third video frame based on the third video frame and the tagging operation information.
In some embodiments, the apparatus further includes a preferred frame reception presentation module 36 (not shown). A preferred frame receiving and presenting module 36, configured to receive a first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the annotation operation, and load and present the first video frame in a display window of the third video frame to replace the third video frame, where the annotation operation is displayed in the first video frame.
For example, the tablet computer b enters a screen capture mode based on user operation and the like, determines a second video frame corresponding to the current picture, and sends second frame related information of the second video frame to the smart glasses terminal, where the second frame related information includes but is not limited to: the encoding start time of the second video frame or the number corresponding to the video frame. The intelligent glasses determine a corresponding first video frame which is not coded and decoded according to second frame related information sent by the tablet computer B, and send the first video frame to the tablet computer B and the tablet computer C, for example, the first video frame is sent to the tablet computer B and the tablet computer C in a lossless compression mode, or the first video frame is sent to the tablet computer B and the tablet computer C in a lossy compression mode with low loss, and the lossy compression process ensures that the quality of the video frame is higher than that of the video frame locally cached at the tablet computer B and the tablet computer C. And the tablet computer receives and presents the first video frame, for example, the first video frame is presented in a small window form beside the current video, or the first video frame is displayed on a large screen, the current video is presented in a small window form, and the like. And then, the tablet computer C receives the annotation operation information sent by the tablet computer B, and the annotation operation is presented in the first video frame.
In some embodiments, the apparatus further comprises a preferred frame annotation presentation module 37 (not shown). A preferred frame annotation presenting module 37, configured to receive a first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the annotation operation, determine, according to the second frame related information, that the first video frame is used to replace the third video frame, and load and present the first video frame in a display window of the third video frame to replace the third video frame, where the annotation operation is displayed on the first video frame.
For example, the tablet pc receives a first video frame and second frame related information sent by the smart glasses without encoding and decoding, where the second frame related information includes a screen capture time of the second video frame, a video frame number of the second video frame, and the like. The tablet computer c receives and presents the first video frame, for example, presents the first video frame in a form of a small window beside the current video, or displays the first video frame on a large screen, presents the current video in a form of a small window, and the like, and meanwhile, when the first video frame is presented, the tablet computer c presents the related information of the second frame in the presented first video frame, for example, presents the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame, and the like. And then, the tablet computer C receives the annotation operation information sent by the tablet computer B, and the annotation operation is presented in the first video frame.
Fig. 11 illustrates a network device for real-time annotation of video frames according to yet another aspect of the present application, wherein the device includes a video forwarding module 41, a frame information receiving module 42, a frame information forwarding module 43, an annotation receiving module 44, and an annotation forwarding module 45. A video forwarding module 41, configured to receive and forward a video stream sent by a first user equipment to a second user equipment; a frame information receiving module 42, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream; a frame information forwarding module 43, configured to forward the second frame-related information to the first user equipment; an annotation receiving module 44, configured to receive annotation operation information of the second user equipment on the second video frame; and an annotation forwarding module 45, configured to forward the annotation operation information to the first user equipment.
For example, a user A holds the smart glasses, a user B holds the tablet computer B, and the smart glasses and the tablet computer B perform video communication through a cloud. The intelligent glasses encode the currently acquired pictures and then send the encoded pictures to the cloud end, and the cloud end forwards the coded pictures to the tablet computer B, wherein the intelligent glasses cache a period of time or a certain number of video frames when sending videos. The second terminal of the tablet computer receives and decodes the video stream and then presents the video stream, a second video frame corresponding to the screen capture picture is determined based on screen capture operation of the user B, second frame related information corresponding to the second video frame is sent to the cloud terminal and forwarded to the intelligent glasses by the cloud terminal, wherein the second frame related information comprises but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The tablet computer B generates real-time marking operation information according to marking operation of the user B, the marking operation information is sent to the cloud end and sent to the intelligent glasses through the cloud end, and after the intelligent glasses receive the marking operation information, corresponding first video frames which are not coded and decoded are displayed in a preset area of the intelligent glasses, and corresponding marking operation is presented in real time at a position corresponding to the first video frames.
It should be understood by those skilled in the art that the content of the second frame related information in the above embodiments is only an example, and other content of the second frame related information in the prior art or in the future, if applicable to the present application, shall also fall within the protection scope of the present application, and is therefore included herein by reference.
In some embodiments, the video forwarding module 41 is configured to receive and forward a video stream sent by a first user equipment to a second user equipment, and frame identification information of a video frame that has been sent in the video stream. For example, a first user equipment encodes a video frame, and sends frame identification information of a video frame that has been sent in a corresponding video stream and the video stream to a network device, and the network device forwards the frame identification information of the video stream and the sent video frame to a second user equipment, where the frame identification information includes an encoding start time of the video frame. In other embodiments, the frame information forwarding module 43 is configured to determine, according to the second frame-related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and send the frame identification information of the video frame corresponding to the second video frame to the first user equipment.
For example, the cloud receives a video stream sent by the smart glasses and frame identification information of a video frame already sent in the video stream, such as a start time of encoding of each video frame. And the cloud terminal forwards the video stream and the frame identification information corresponding to the sent video frame to the tablet computer B. And the tablet computer B receives and presents the video frame and records the decoding ending time of the video frame. And the tablet computer determines a corresponding second video frame according to the screen capturing operation of the user B, and sends second frame related information of the second video frame to the cloud end, wherein the second frame related information comprises a decoding ending moment corresponding to the second video frame or a video number of the second video frame and the like. The cloud end receives second frame related information of a second video frame sent by the tablet computer B, and determines frame identification information of the corresponding second frame based on the second frame related information, for example, a coding start time of the second frame or a video number of the second video frame is determined according to a decoding end time of the second video frame or the video number of the second video frame.
In some embodiments, the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, the frame information forwarding module 43 is configured to forward the second frame related information to the first user equipment and the third user equipment; the label forwarding module 45 is configured to forward the label operation information to the first user equipment and the third user equipment.
For example, the user a holds the smart glasses, the user b holds the tablet computer b, the user c holds the tablet computer c, the smart glasses, the tablet computer b, and the tablet computer c establish video communication through the network device, the smart glasses encode a currently acquired picture and then transmit the encoded picture to the network device, and buffer a period of time or a certain number of video frames, and the network device transmits the video stream to the tablet computer b and the tablet computer c.
The second terminal of the tablet computer receives and decodes the video stream, then presents the video stream, determines a second video frame corresponding to the screen capture picture based on the screen capture operation of the user b, and sends second frame related information corresponding to the second video frame to the network device, wherein the second frame related information includes but is not limited to: second video frame identification information, second video frame encoding starting time, second video frame decoding ending time, second video frame encoding and decoding and total transmission duration information and the like. The smart glasses receive second frame related information of the second video frame and determine a corresponding first video frame without encoding and decoding in the locally stored video frames based on the second frame related information. The network device forwards the second frame related information to the first user equipment and the second user equipment.
The tablet computer B generates real-time marking operation information according to marking operation of the user B, the marking operation information is transmitted to the intelligent glasses and the tablet computer C in real time through the network equipment, and after the intelligent glasses receive the marking operation information, the corresponding first video frames which are not coded and decoded are displayed in the preset area of the intelligent glasses, and the corresponding marking operation is displayed in real time at the position corresponding to the first video frames. Similarly, the tablet computer c finds a corresponding third video frame in the coded and decoded video library locally cached at the tablet computer c terminal according to the second frame related information based on the received second frame related information and the tagging operation information, and presents a corresponding tagging operation in the third video frame based on the third video frame and the tagging operation information.
According to an aspect of the present application, there is provided a system for real-time annotation of video frames, wherein the system comprises: a first user equipment as described in any of the above embodiments and a second user equipment as described in any of the above embodiments; in other embodiments, the system further comprises: a network device as claimed in any preceding embodiment.
According to an aspect of the present application, there is provided a system for real-time annotation of video frames, wherein the system comprises: a first user equipment as described in any of the above embodiments, a second user equipment as described in any of the above embodiments, and a third user equipment as described in any of the above embodiments; in other embodiments, the system further comprises: a network device as claimed in any preceding embodiment.
The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.
The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.
The present application further provides a computer device, comprising:
one or more processors;
a memory for storing one or more computer programs;
the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.
FIG. 12 illustrates an exemplary system that can be used to implement the various embodiments described herein;
in some embodiments, as illustrated in FIG. 12, the system 300 can be implemented as any of the described embodiments for real-time annotation of video frames. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.
For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310.
The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.
System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may include a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).
For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.
For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).
NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.
Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.
For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).
In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.
Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.
By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.
An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (37)

1. A method for real-time annotation of video frames at a first user equipment, wherein the method comprises:
sending a video stream and frame identification information of a sent video frame in the video stream to second user equipment, and storing the video frame in the video stream;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream, wherein the second frame related information comprises frame identification information of the second video frame;
determining a first video frame corresponding to the second video frame in the video stream from the stored video frames according to the second frame-related information, wherein the frame identification information of the first video frame corresponds to the second frame-related information, and the first video frame is stored locally at the first user equipment and is not encoded;
receiving the labeling operation information of the second user equipment to the second video frame;
and presenting the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
2. The method of claim 1, wherein the stored video frame satisfies at least any one of:
the time interval between the sending time of the stored video frame and the current time is less than or equal to the video frame storage duration threshold;
the cumulative number of stored video frames is less than or equal to a predetermined video frame storage number threshold.
3. The method of claim 2, wherein the method further comprises:
acquiring coding and decoding of video frames in the video stream and total transmission duration information;
and adjusting the video frame storage duration threshold or the video frame storage quantity threshold according to the coding and decoding and transmission total duration information.
4. The method of claim 1, wherein the sending a video stream and frame identification information of sent video frames in the video stream to a second user equipment comprises:
and coding a plurality of video frames to be transmitted, and sending the corresponding video stream and the frame identification information of the video frames sent in the video stream to second user equipment.
5. The method of claim 4, wherein the frame identification information of the transmitted video frame in the video stream comprises encoding start time information of the transmitted video frame.
6. The method of claim 1, wherein the method further comprises:
presenting the first video frame;
wherein, the real-time presentation of the corresponding annotation operation on the first video frame according to the annotation operation information includes:
and superposing and presenting corresponding annotation operation on the first video frame according to the annotation operation information.
7. The method of any of claims 1-6, wherein the method further comprises:
and sending the first video frame as a preferred frame for presenting the annotation operation to the second user equipment.
8. The method of any of claims 1-6, wherein the transmitting a video stream to a second user equipment comprises:
and sending the video stream to the second user equipment and the third user equipment.
9. The method of claim 8, wherein the method further comprises:
and sending the first video frame as a preferred frame for presenting the annotation operation to the second user equipment and/or the third user equipment.
10. The method of claim 9, wherein the transmitting the first video frame to the second user device and/or the third user device comprises:
and sending the first video frame and the second frame related information to the second user equipment and/or the third user equipment, wherein the first video frame is used as a preferred frame for presenting the annotation operation in the second user equipment or the third user equipment.
11. A method for real-time annotation of video frames at a second user equipment, wherein the method comprises:
receiving a video stream sent by first user equipment and frame identification information of a sent video frame in the video stream, wherein the video frame of the video stream is stored in the first user equipment;
sending second frame related information of the intercepted second video frame to the first user equipment according to screenshot operation of a user in the video stream, wherein the second frame related information comprises frame identification information of the second video frame, the second frame related information is used for determining a corresponding first video frame from stored video frames, and the first video frame is stored locally at the first user equipment and is not coded;
acquiring the marking operation information of the user on the second video frame;
and sending the annotation operation information to the first user equipment, wherein the annotation operation information is presented in the first video frame in real time.
12. The method of claim 11, wherein the frame identification information comprises encoding start time information of the second video frame.
13. The method of claim 12, wherein the second frame-related information further comprises decoding end time information and total coding and transmission duration information of the second video frame.
14. The method of claim 11, wherein the obtaining of the annotation operation information of the user on the second video frame comprises:
acquiring the marking operation information of the user on the second video frame in real time;
wherein the sending the tagging operation information to the first user equipment includes:
and sending the marking operation information to the first user equipment in real time.
15. The method of claim 11, wherein the method further comprises:
receiving a first video frame sent by the first user equipment, wherein the first video frame is used as a preferred frame for presenting the annotation operation;
and loading and presenting the first video frame in the display window of the second video frame to replace the second video frame, wherein the annotation operation is displayed on the first video frame.
16. The method of claim 11, wherein the method further comprises:
receiving a first video frame and the second frame related information sent by the first user equipment, wherein the first video frame is used as a preferred frame for presenting the annotation operation;
determining that the first video frame is used to replace the second video frame according to the second frame-related information;
and loading and presenting the first video frame in the display window of the second video frame to replace the second video frame, wherein the annotation operation is displayed on the first video frame.
17. The method of claim 11, wherein the receiving a video stream transmitted by a first user device comprises:
receiving video streams sent by first user equipment to second user equipment and third user equipment;
wherein the sending the tagging operation information to the first user equipment includes:
and sending the labeling operation information to the first user equipment and the third user equipment.
18. A method for real-time annotation of video frames at a third user equipment, wherein the method comprises:
receiving a video stream sent by first user equipment to third user equipment and frame identification information of a sent video frame in the video stream;
receiving second frame related information of a second video frame intercepted by second user equipment in the video stream, wherein the second frame related information comprises frame identification information of the second video frame;
determining a third video frame corresponding to the second video frame in the video stream according to the second frame related information, wherein the third video frame comprises a coded and decoded video library locally cached by the third user equipment;
receiving the labeling operation information of the second user equipment to the second video frame;
and presenting the corresponding annotation operation on the third video frame in real time according to the annotation operation information.
19. The method of claim 18, wherein the method further comprises:
receiving a first video frame sent by the first user equipment, wherein the first video is used as a preferred frame for presenting the annotation operation;
and loading and presenting the first video frame in a display window of the third video frame to replace the third video frame, wherein the annotation operation is displayed on the first video frame.
20. The method of claim 18, wherein the method further comprises:
receiving a first video frame and the second frame related information sent by the first user equipment, wherein the first video frame is used as a preferred frame for presenting the annotation operation;
determining that the first video frame is used to replace the third video frame according to the second frame-related information;
and loading and presenting the first video frame in a display window of the third video frame to replace the third video frame, wherein the annotation operation is displayed on the first video frame.
21. A method for real-time annotation of video frames at a network device side is provided, wherein the method comprises:
receiving and forwarding a video stream sent to second user equipment by first user equipment and frame identification information of a sent video frame in the video stream;
receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream;
forwarding the second frame-related information to the first user equipment, wherein the second frame-related information includes frame identification information of a second video frame, the second frame-related information is used for determining a corresponding first video frame from stored video frames, and the first video frame is stored locally at the first user equipment without being encoded;
receiving the labeling operation information of the second user equipment to the second video frame;
and forwarding the annotation operation information to the first user equipment, wherein the annotation operation information is presented in the first video frame in real time.
22. The method of claim 21, wherein the forwarding the second frame related information to the first user equipment comprises:
determining frame identification information of a video frame corresponding to the second video frame in the video stream according to the second frame related information;
and sending the frame identification information of the video frame corresponding to the second video frame to the first user equipment.
23. The method of claim 21, wherein the receiving and forwarding a video stream sent by a first user device to a second user device comprises:
receiving and forwarding video streams sent by first user equipment to second user equipment and third user equipment;
wherein the forwarding the second frame related information to the first user equipment comprises:
forwarding the second frame related information to the first user equipment and the third user equipment;
wherein the forwarding the tagging operation information to the first user equipment includes:
and forwarding the labeling operation information to the first user equipment and the third user equipment.
24. A method for real-time annotation of video frames, wherein the method comprises:
the method comprises the steps that a first user device sends a video stream and frame identification information of a sent video frame in the video stream to a second user device, and the video frame in the video stream is stored;
the second user equipment receives the video stream and the frame identification information of the sent video frame in the video stream, and sends second frame related information of the intercepted second video frame to the first user equipment according to screenshot operation of a user in the video stream, wherein the second frame related information comprises the frame identification information of the second video frame;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream from the stored video frames according to the second frame related information, wherein the frame identification information of the first video frame corresponds to the second frame related information, and the first video frame is stored locally in the first user equipment and is not coded;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the first user equipment;
and the first user equipment receives the annotation operation information and presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
25. A method for real-time annotation of video frames, wherein the method comprises:
the method comprises the steps that first user equipment sends video streams and frame identification information of sent video frames in the video streams to network equipment, and stores the video frames in the video streams;
the network equipment receives the video stream and forwards the video stream and frame identification information of a sent video frame in the video stream to second user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the network equipment according to screenshot operation of a user in the video stream;
the network device receives the second frame related information and forwards the second frame related information to the first user equipment, wherein the second frame related information comprises frame identification information of a second video frame;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream from the stored video frames according to the second frame related information, wherein the frame identification information of the first video frame corresponds to the second frame related information, and the first video frame is stored locally in the first user equipment and is not coded;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the network equipment;
the network equipment receives the marking operation information of the second user equipment on the second video frame, and forwards the marking operation information to the first user equipment;
and the first user equipment receives the annotation operation information and presents the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
26. A method for real-time annotation of video frames, wherein the method comprises:
the method comprises the steps that a first user device sends video streams and frame identification information of sent video frames in the video streams to a second user device and a third user device, and the video frames in the video streams are stored;
the second user equipment sends second frame related information of the intercepted second video frame to the first user equipment and the third user equipment according to screenshot operation of a user in the video stream, wherein the second frame related information comprises frame identification information of the second video frame;
the second user equipment acquires the labeling operation information of the user on the second video frame and sends the labeling operation information to the first user equipment and the third user equipment;
the first user equipment receives the second frame related information, determines a first video frame corresponding to the second video frame in the video stream from the stored video frames according to the second frame related information, receives the annotation operation information, and presents corresponding annotation operation on the first video frame in real time according to the annotation operation information, wherein the frame identification information of the first video frame corresponds to the second frame related information, and the first video frame is stored locally in the first user equipment and is not coded;
and the third user equipment receives the video stream, receives second frame related information of the second video frame, receives the annotation operation information, determines a third video frame corresponding to the second video frame in the video stream according to the second frame related information, and presents corresponding annotation operation on the third video frame in real time according to the annotation operation information, wherein the third video frame is contained in a video library which is locally cached by the third user equipment and is subjected to coding and decoding.
27. A method for real-time annotation of video frames, wherein the method comprises:
the method comprises the steps that first user equipment sends video streams and frame identification information of sent video frames in the video streams to network equipment, and stores the video frames in the video streams;
the network equipment receives and forwards the video stream and frame identification information of a sent video frame in the video stream to second user equipment and third user equipment;
the second user equipment receives the video stream, and sends second frame related information of the intercepted second video frame to the network equipment according to screenshot operation of a user in the video stream;
the network device receives the second frame related information and forwards the second frame related information to the first user device and the third user device, wherein the second frame related information comprises frame identification information of a second video frame;
the first user equipment receives the second frame related information, and determines a first video frame corresponding to the second video frame in the video stream from the stored video frames according to the second frame related information, wherein the frame identification information of the first video frame corresponds to the second frame related information, and the first video frame is stored locally in the first user equipment and is not coded;
the second user equipment acquires the marking operation information of the user on the second video frame and sends the marking operation information to the network equipment;
the network equipment receives the marking operation information of the second user equipment on the second video frame, and forwards the marking operation information to the first user equipment and the third user equipment;
the first user equipment receives the annotation operation information and presents corresponding annotation operation on the first video frame in real time according to the annotation operation information;
and the third user equipment receives the video stream, receives second frame related information of the second video frame, receives the annotation operation information, determines a third video frame corresponding to the second video frame in the video stream according to the second frame related information, and presents corresponding annotation operation on the third video frame in real time according to the annotation operation information, wherein the third video frame is contained in a video library which is locally cached by the third user equipment and is subjected to coding and decoding.
28. A first user equipment for real-time annotation of video frames, wherein the apparatus comprises:
the video sending module is used for sending a video stream and frame identification information of a sent video frame in the video stream to second user equipment and storing the video frame in the video stream;
a frame information receiving module, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream, where the second frame related information includes frame identification information of the second video frame;
a video frame determining module, configured to determine, from the stored video frames according to the second frame-related information, a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame-related information, and the first video frame is stored locally at the first user equipment without being encoded;
the annotation receiving module is used for receiving annotation operation information of the second user equipment on the second video frame;
and the annotation presenting module is used for presenting the corresponding annotation operation on the first video frame in real time according to the annotation operation information.
29. A second user equipment for real-time annotation of video frames, wherein the apparatus comprises:
the video receiving module is used for receiving a video stream sent by first user equipment and frame identification information of a sent video frame in the video stream, wherein the video frame of the video stream is stored in the first user equipment;
a frame information determining module, configured to send, to the first user equipment, second frame-related information of a captured second video frame according to a screenshot operation of a user in the video stream, where the second frame-related information includes frame identification information of the second video frame, and the second frame-related information is used to determine a corresponding first video frame from stored video frames, where the first video frame is stored locally in the first user equipment without being encoded;
the annotation acquisition module is used for acquiring annotation operation information of the user on the second video frame;
and the annotation sending module is used for sending the annotation operation information to the first user equipment, wherein the annotation operation information is presented in the first video frame in real time.
30. A third user device for real-time annotation of video frames, wherein the device comprises:
the third video receiving module is used for receiving a video stream sent by the first user equipment to the third user equipment and frame identification information of a sent video frame in the video stream;
a third frame information receiving module, configured to receive second frame related information of a second video frame intercepted by a second user equipment in the video stream, where the second frame related information includes frame identification information of the second video frame;
a third video frame determining module, configured to determine, according to the second frame-related information, a third video frame corresponding to the second video frame in the video stream, where the third video frame is included in a coded and decoded video library locally cached by the third user equipment;
a third annotation receiving module, configured to receive annotation operation information of the second user equipment on the second video frame;
and the third presentation module is used for presenting the corresponding annotation operation on the third video frame in real time according to the annotation operation information.
31. A network device for real-time annotation of video frames, wherein the device comprises:
the video forwarding module is used for receiving and forwarding a video stream sent by first user equipment to second user equipment and frame identification information of a sent video frame in the video stream;
a frame information receiving module, configured to receive second frame related information of a second video frame intercepted by the second user equipment in the video stream;
a frame information forwarding module, configured to forward the second frame-related information to the first user equipment, where the second frame-related information includes frame identification information of a second video frame, and the second frame-related information is used to determine a corresponding first video frame from stored video frames, where the first video frame is stored locally in the first user equipment without being encoded;
the annotation receiving module is used for receiving annotation operation information of the second user equipment on the second video frame;
and the annotation forwarding module is used for forwarding the annotation operation information to the first user equipment, wherein the annotation operation information is presented in the first video frame in real time.
32. A system for real-time annotation of video frames, wherein the system comprises: a first user equipment according to claim 28 and a second user equipment according to claim 29.
33. The system of claim 32, wherein the system further comprises: a network device as claimed in claim 31.
34. A system for real-time annotation of video frames, wherein the system comprises: a first user equipment according to claim 28, a second user equipment according to claim 29 and a third user equipment according to claim 30.
35. The system of claim 34, wherein the system further comprises: a network device as claimed in claim 31.
36. An apparatus for real-time annotation of video frames, wherein the apparatus comprises:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 23.
37. A computer-readable medium comprising instructions that, when executed, cause a system to perform the operations of any of the methods of claims 1-23.
CN201810409977.7A 2018-01-05 2018-05-02 Method and equipment for real-time labeling of video frames Active CN108401190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/121730 WO2019134499A1 (en) 2018-01-05 2018-12-18 Method and device for labeling video frames in real time

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810011908 2018-01-05
CN2018100119080 2018-01-05

Publications (2)

Publication Number Publication Date
CN108401190A CN108401190A (en) 2018-08-14
CN108401190B true CN108401190B (en) 2020-09-04

Family

ID=63101425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810409977.7A Active CN108401190B (en) 2018-01-05 2018-05-02 Method and equipment for real-time labeling of video frames

Country Status (2)

Country Link
CN (1) CN108401190B (en)
WO (1) WO2019134499A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108401190B (en) * 2018-01-05 2020-09-04 亮风台(上海)信息科技有限公司 Method and equipment for real-time labeling of video frames
CN112950951B (en) * 2021-01-29 2023-05-02 浙江大华技术股份有限公司 Intelligent information display method, electronic device and storage medium
CN113596517B (en) * 2021-07-13 2022-08-09 北京远舢智能科技有限公司 Image freezing and labeling method and system based on mixed reality
CN114201645A (en) * 2021-12-01 2022-03-18 北京百度网讯科技有限公司 Object labeling method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716586A (en) * 2013-12-12 2014-04-09 中国科学院深圳先进技术研究院 Monitoring video fusion system and monitoring video fusion method based on three-dimension space scene
CN104536661A (en) * 2014-12-17 2015-04-22 深圳市金立通信设备有限公司 Terminal screen shot method
CN104935861A (en) * 2014-03-19 2015-09-23 成都鼎桥通信技术有限公司 Multi-party multimedia communication method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7810121B2 (en) * 2002-05-03 2010-10-05 Time Warner Interactive Video Group, Inc. Technique for delivering network personal video recorder service and broadcast programming service over a communications network
US20060104601A1 (en) * 2004-11-15 2006-05-18 Ati Technologies, Inc. Method and apparatus for programming the storage of video information
CN104954812A (en) * 2014-03-27 2015-09-30 腾讯科技(深圳)有限公司 Video synchronized playing method, device and system
US9516255B2 (en) * 2015-01-21 2016-12-06 Microsoft Technology Licensing, Llc Communication system
CN104883515B (en) * 2015-05-22 2018-11-02 广东威创视讯科技股份有限公司 A kind of video labeling processing method and video labeling processing server
CN106412622A (en) * 2016-11-14 2017-02-15 百度在线网络技术(北京)有限公司 Method and apparatus for displaying barrage information during video content playing process
CN106603537A (en) * 2016-12-19 2017-04-26 广东威创视讯科技股份有限公司 System and method for marking video signal source of mobile intelligent terminal
CN107333087B (en) * 2017-06-27 2020-05-08 京东方科技集团股份有限公司 Information sharing method and device based on video session
CN107277641A (en) * 2017-07-04 2017-10-20 上海全土豆文化传播有限公司 A kind of processing method and client of barrage information
CN108401190B (en) * 2018-01-05 2020-09-04 亮风台(上海)信息科技有限公司 Method and equipment for real-time labeling of video frames

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716586A (en) * 2013-12-12 2014-04-09 中国科学院深圳先进技术研究院 Monitoring video fusion system and monitoring video fusion method based on three-dimension space scene
CN104935861A (en) * 2014-03-19 2015-09-23 成都鼎桥通信技术有限公司 Multi-party multimedia communication method
CN104536661A (en) * 2014-12-17 2015-04-22 深圳市金立通信设备有限公司 Terminal screen shot method

Also Published As

Publication number Publication date
WO2019134499A1 (en) 2019-07-11
CN108401190A (en) 2018-08-14

Similar Documents

Publication Publication Date Title
CN108401190B (en) Method and equipment for real-time labeling of video frames
US9615112B2 (en) Method, system, player and mobile terminal for online video playback
KR102154800B1 (en) Data streaming method of electronic apparatus and electronic apparatus thereof
US9699099B2 (en) Method of transmitting data in a communication system
KR20200019201A (en) Chroma prediction method and device
US9807140B2 (en) Method, terminal, and system for reproducing content
US20150156557A1 (en) Display apparatus, method of displaying image thereof, and computer-readable recording medium
CN108235107A (en) Video recording method, device and electric terminal
KR20140006102A (en) Method for dynamically adapting video image parameters for facilitating subsequent applications
US20210281911A1 (en) Video enhancement control method, device, electronic device, and storage medium
WO2021057697A1 (en) Video encoding and decoding methods and apparatuses, storage medium, and electronic device
WO2009112547A1 (en) Method of transmitting data in a communication system
US20140308017A1 (en) Imaging device, video recording device, video display device, video monitoring device, video monitoring system, and video monitoring method
CN113301355A (en) Video transmission, live broadcast and play method, equipment and storage medium
CN111385576B (en) Video coding method and device, mobile terminal and storage medium
CN104717555A (en) Video stream acquiring method and device
WO2021057686A1 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium and electronic device
US20200186580A1 (en) Dynamic rotation of streaming protocols
CN107734278B (en) Video playback method and related device
CN110798700B (en) Video processing method, video processing device, storage medium and electronic equipment
WO2016032383A1 (en) Sharing of multimedia content
JP6483850B2 (en) Data processing method and apparatus
CN108377400A (en) A kind of image transmitting optimization method, system and its apparatus
US11336902B1 (en) Systems and methods for optimizing video encoding
US10764578B2 (en) Bit rate optimization system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 201210 7th Floor, No. 1, Lane 5005, Shenjiang Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: HISCENE INFORMATION TECHNOLOGY Co.,Ltd.

Address before: Room 501 / 503-505, 570 shengxia Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201203

Patentee before: HISCENE INFORMATION TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder