WO2019134499A1 - Method and device for labeling video frames in real time - Google Patents

Method and device for labeling video frames in real time Download PDF

Info

Publication number
WO2019134499A1
WO2019134499A1 PCT/CN2018/121730 CN2018121730W WO2019134499A1 WO 2019134499 A1 WO2019134499 A1 WO 2019134499A1 CN 2018121730 W CN2018121730 W CN 2018121730W WO 2019134499 A1 WO2019134499 A1 WO 2019134499A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
user equipment
video frame
video
information
Prior art date
Application number
PCT/CN2018/121730
Other languages
French (fr)
Chinese (zh)
Inventor
张晓恬
胡军
潘思霁
徐健钢
尉苗苗
Original Assignee
亮风台(上海)信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 亮风台(上海)信息科技有限公司 filed Critical 亮风台(上海)信息科技有限公司
Publication of WO2019134499A1 publication Critical patent/WO2019134499A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47214End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for content reservation or setting reminders; for requesting event notification, e.g. of sport results or stock market
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Definitions

  • the present application relates to the field of computers, and in particular, to a technique for real-time annotation of video frames.
  • the universal video streaming means the sender of the video stream encodes the video according to the encoding protocol and sends it to the video receiver through the network, and the receiver receives the corresponding video, and decodes the video, and takes a screenshot of the decoded video, and The image is marked on the screen, and the labeled image is encoded and sent to the video sender through the network, or the labeled image is sent to the server, and the video sender receives the image added by the server.
  • the decoded image with the annotation information received by the sender has a certain loss of clarity, and the image with the annotation is transmitted from the receiver to the video sender through the network.
  • the speed of the process depends on the transmission of the current network. The rate, and thus the transmission speed, is affected, and there is a delay phenomenon, which is not conducive to real-time interaction between the two parties.
  • a method for real-time annotation of a video frame on a first user equipment side comprising:
  • a method for real-time annotation of a video frame on a second user equipment side comprising:
  • a method for real-time annotation of a video frame on a third user equipment side comprising:
  • a method for real-time annotation of a video frame on a network device side comprising:
  • a method for real-time annotation of a video frame includes:
  • the first user equipment sends a video stream to the second user equipment
  • the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  • a method for real-time annotation of a video frame includes:
  • the first user equipment sends a video stream to the network device
  • the network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment;
  • the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  • a method for real-time annotation of a video frame includes:
  • the first user equipment sends a video stream to the second user equipment and the third user equipment;
  • a method for real-time annotation of a video frame includes:
  • the first user equipment sends a video stream to the network device
  • the network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
  • the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
  • a first user equipment for real-time annotation of a video frame comprising:
  • a video sending module configured to send a video stream to the second user equipment
  • a frame information receiving module configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
  • a video frame determining module configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
  • An annotation receiving module configured to receive the labeling operation information of the second video frame by the second user equipment
  • an annotation presentation module configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information.
  • a second user equipment for real-time annotation of a video frame comprising:
  • a video receiving module configured to receive a video stream sent by the first user equipment
  • a frame information determining module configured to send second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
  • An annotation obtaining module configured to acquire the labeling operation information of the second video frame by the user
  • an annotation sending module configured to send the labeling operation information to the first user equipment.
  • a third user equipment for real-time annotation of a video frame comprising:
  • a third video receiving module configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment
  • a third frame information receiving module configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
  • a third video frame determining module configured to determine, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
  • a third label receiving module configured to receive the labeling operation information of the second video frame by the second user equipment
  • the third rendering module is configured to present a corresponding labeling operation on the third video frame in real time according to the labeling operation information.
  • a network device for real-time annotation of a video frame comprising:
  • a video forwarding module configured to receive and forward a video stream sent by the first user equipment to the second user equipment
  • a frame information receiving module configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
  • a frame information forwarding module configured to forward the second frame related information to the first user equipment
  • An annotation receiving module configured to receive the labeling operation information of the second video frame by the second user equipment
  • an annotation forwarding module configured to forward the labeling operation information to the first user equipment.
  • a system for real-time annotation of video frames comprising a first user device as described above and a second user device as described above.
  • a system for real-time annotation of video frames including a first user device as described above, a second user device as described above, and a network device as described above.
  • a system for real-time annotation of a video frame comprising a first user device as described above, a second user device as described above, and a third user as described above device.
  • a system for real-time annotation of a video frame comprising a first user device as described above, a second user device as described above, a third user as described above Equipment and network equipment as described above.
  • a computer readable medium comprising instructions that, when executed, cause a system to:
  • a computer readable medium comprising instructions that, when executed, cause a system to:
  • a computer readable medium comprising instructions that, when executed, cause a system to:
  • a computer readable medium comprising instructions that, when executed, cause a system to:
  • the present application determines a video sender's unencoded video frame image according to the video receiver's screenshot and the corresponding video frame related information by buffering a certain video frame on the video sender, and the video is
  • the annotation information of the receiver on the screenshot is transmitted to the video sender in real time.
  • the annotation is displayed on the video frame image corresponding to the video sender in real time, so the sender can observe the labeling process of the video receiver in real time, and the resolution is high because the labeled video frame is not encoded and so on; further, the scheme It also enables real-time display of annotations, with good practicability, strong interactivity, and improved user experience and broadband utilization.
  • the video sender can send the video frame to the video receiver after determining the uncoded video frame, and the video receiver can also mark the high quality video frame, thereby greatly improving the user experience.
  • FIG. 1 shows a system topology diagram for real-time annotation of video frames in accordance with an embodiment of the present application
  • FIG. 2 shows a flow chart of a method for real-time annotation of video frames at a first user equipment end according to an aspect of the present application
  • FIG. 3 is a flow chart showing a method for real-time annotation of video frames on a second user equipment side according to another aspect of the present application
  • FIG. 4 is a flowchart of a method for real-time annotation of a video frame at a third user equipment end according to still another aspect of the present application;
  • FIG. 5 is a flowchart of a method for real-time annotation of video frames on a network device side according to still another aspect of the present application.
  • FIG. 6 shows a system method diagram for real-time annotation of video frames in accordance with an aspect of the present application
  • FIG. 7 shows a system method diagram for real-time annotation of video frames in accordance with another aspect of the present application.
  • FIG. 8 shows a schematic diagram of a first user equipment for real-time annotation of video frames in accordance with an aspect of the present application
  • FIG. 9 shows a schematic diagram of a second user equipment for real-time annotation of video frames in accordance with another aspect of the present application.
  • FIG. 10 shows a schematic diagram of a third user equipment for real-time annotation of video frames in accordance with another aspect of the present application
  • FIG. 11 shows a schematic diagram of a network device for real-time annotation of video frames in accordance with still another aspect of the present application
  • FIG. 12 illustrates an exemplary system that can be used to implement various embodiments described in this application.
  • the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage,
  • the device referred to in the present application includes but is not limited to a user equipment, a network device, or a device formed by integrating a user equipment and a network device through a network.
  • the user equipment includes, but is not limited to, any mobile electronic product that can perform human-computer interaction with the user (for example, human-computer interaction through a touchpad), such as a smart phone, a tablet computer, etc., and the mobile electronic product can be operated by any operation.
  • System such as android operating system, iOS operating system, etc.
  • the network device includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and programmable logic.
  • ASIC application specific integrated circuit
  • the network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a plurality of servers; wherein the cloud is composed of a large number of computers or network servers based on Cloud Computing.
  • cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computers.
  • the network includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (Ad Hoc network), and the like.
  • the device may also be a program running on the user equipment, the network device, or the user equipment and the network device, the network device, the touch terminal, or the network device and the touch terminal integrated through the network.
  • FIG. 1 shows a typical scenario of the present application.
  • the first user equipment receives the annotation information sent by the second user equipment and performs local storage while performing video communication with the second user equipment and the third user equipment.
  • the uncoded video frame presents the annotation information in real time.
  • the process may be performed by the first user equipment and the second user equipment, or may be completed by the first user equipment, the second user equipment, and the network equipment, and may also be performed by the first user equipment, the second user equipment, and the first user equipment.
  • the three user equipments are completed, and may also be completed by the first user equipment, the second user equipment, the third user equipment, and the network equipment.
  • the first user equipment, the second user equipment, and the third user equipment are any electronic devices that can record and send video, such as smart glasses, mobile phones, tablets, notebooks, smart watches, etc., where the first user equipment is The following embodiments are described by using the smart glasses, the second user equipment, and the third user equipment as a tablet. Those skilled in the art should understand that the embodiments are also applicable to other user equipments such as mobile phones, notebooks, and smart watches.
  • step S11 the first user equipment end sends a video stream to the second user equipment; in step S12, the first user equipment receives the second video frame intercepted by the second user equipment in the video stream.
  • step S13 the first user equipment determines, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream; in step S14, the first The user equipment receives the labeling operation information of the second video frame by the second user equipment; in step S15, the first user equipment presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  • the first user equipment end sends a video stream to the second user equipment.
  • the first user equipment establishes a communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes the video stream to the second user equipment by using a video communication manner.
  • the first user equipment receives second frame related information of the second video frame intercepted by the second user equipment in the video stream. For example, the second user equipment determines the second frame related information of the video frame corresponding to the screen capture screen based on the screen capture operation of the second user, and then the first user equipment receives the second frame related to the second video frame sent by the second user equipment.
  • Information wherein the second frame related information includes, but is not limited to, second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. .
  • the first user equipment determines, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream. For example, the first user equipment locally stores a period of time or a certain number of transmitted unencoded video frames, and the first user equipment stores the uncoded locally according to the second frame related information sent by the second user equipment. In the video frame, the uncoded first video frame corresponding to the screen capture is determined.
  • step S14 the first user equipment receives the labeling operation information of the second video frame by the second user equipment.
  • the second user equipment generates corresponding labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment in real time, and the first user receives the labeling operation information.
  • the first user equipment presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information. For example, the first user equipment displays a corresponding labeling operation in real time on the first video frame based on the received labeling operation information, such as displaying the first video frame in the form of a small window in the current interface, and corresponding to the first video frame in the current interface.
  • the location presents a corresponding annotation operation at a rate of one frame every 50 ms.
  • user A holds smart glasses
  • user B holds tablet B
  • smart glasses and tablet B establish video communication via wired or wireless network.
  • Smart glasses encode the currently collected picture and send it to tablet B, and cache.
  • the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses.
  • the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates the real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
  • the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding unencoded in the preset area of the smart glasses.
  • the first video frame, and the corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
  • the method further includes step S16 (not shown).
  • step S16 the first user equipment stores the video frame in the video stream; wherein, in step S13, the first user equipment determines, according to the second frame related information, the stored video frame from the first The first video frame corresponding to the two video frames.
  • the first user equipment sends the video stream to the second user equipment, and locally stores a period of time or a certain number of uncoded video frames, wherein a period of time or a certain number may be a preset fixed value,
  • the threshold may be dynamically adjusted according to the network condition or the transmission rate; then, the first user equipment determines the corresponding uncompiled in the locally stored video frame based on the second frame related information of the second video frame sent by the second user equipment.
  • the stored video frame meets, but is not limited to, at least one of the following: a time interval between a transmission time of the stored video frame and a current time is less than or equal to a video frame storage duration threshold; the stored The cumulative number of video frames is less than or equal to a predetermined number of video frame storage thresholds.
  • the smart glasses send the collected images to the tablet B and store them for a period of time or a certain number of uncoded video frames locally, wherein the duration or the number may be a fixed value of the system or manually preset, such as A certain duration or a certain number of video frame thresholds obtained by statistical analysis of big data; a period of time or a certain number of video frames may also be a video frame threshold dynamically adjusted according to network conditions or transmission rates.
  • the duration or number threshold of the dynamic adjustment may be determined according to the codec and the total duration information of the video frame, such as calculating the total length of the codec and the current video frame, and using the duration as a unit duration or within the duration.
  • the number of transmitted video frames is taken as a unit number, and the dynamic video frame duration or number threshold is set with reference to the current unit duration or unit number.
  • the set predetermined or dynamic video frame storage duration threshold should be greater than or equal to one unit duration.
  • the set predetermined or dynamic video frame storage threshold should be greater than or equal to one unit number. Then, the smart glasses determine a corresponding uncoded first video frame in the stored video according to the second frame related information of the second video frame sent by the tablet B, wherein the stored video frame is sent and current.
  • the interval of time is less than or equal to the video frame storage duration threshold, or the accumulated number of stored video frames is less than or equal to a predetermined video frame storage threshold.
  • the method further includes step S17 (not shown).
  • step S17 the first user equipment acquires codec and total transmission duration information of the video frame in the video stream, and adjusts the video frame storage duration threshold or the video frame according to the codec and total transmission duration information. The number of storage thresholds.
  • the first user equipment records the encoding start time of each video frame, and after encoding, sends the video frame to the second user equipment, and the second user equipment receives and records each video frame decoding end time; subsequently, the second user equipment
  • the video frame decoding end time is sent to the first user equipment, and the first user equipment calculates the codec and total transmission duration information of the current video frame based on the encoding start time and the decoding end time, or the second user equipment is based on the encoding start time and
  • the decoding end time calculates the codec and total transmission duration information of the current video frame, and sends the codec and the total transmission duration information to the first user equipment.
  • the first user equipment adjusts the video frame storage duration threshold or the video frame storage threshold according to the codec and the total transmission duration information. If the duration information is used as a unit time reference, setting a certain multiple of the video frame duration is
  • the video frame stores a duration threshold; for example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frame, and the number is used as a unit number to set a certain multiple of the video frame. The number is used as a threshold for the number of video frames stored.
  • the smart glasses record the i-th video frame encoding start time as T si , and after encoding, send the video frame to the tablet B, and the tablet computer B receives and records the video frame decoding end time as T ei . Subsequently, the tablet B sends the video frame decoding end time T ei to the smart glasses, and the smart glasses calculate the decoding end time T ei according to the received i-th video frame and the encoding start time T si recorded locally .
  • the total length of codec and transmission T i T ei -T si , and the total length of time T i of the codec and transmission is returned to the smart glasses.
  • the smart glasses determine the duration of the video frame in the 1.3T i time dynamically saved by the smart glasses according to the big data statistics according to the encoding and decoding of the i-th video frame and the total transmission time T i .
  • dynamically adjust the magnification according to the network transmission rate for example, setting the buffer duration threshold to (1+k)T i , where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, setting k to 0.5, the network When the fluctuation is small, set k to 0.2 or the like.
  • the smart glasses can dynamically adjust the magnification according to the current network transmission rate, such as setting the buffer number threshold (1+k)N, where k is a threshold adjusted according to network fluctuations, and if the network fluctuation is large, k is set. When it is 0.5 and the network fluctuation is small, set k to 0.2 or the like.
  • the first user equipment sends the video stream and the frame identification information of the transmitted video frame in the video stream to the second user equipment; wherein, in step S13, the first user equipment is configured according to The second frame related information determines a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information.
  • the frame identification information of the video frame may be a codec time corresponding to the video frame, or may be a number corresponding to the video.
  • the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and sends frame corresponding information of the corresponding video stream and the transmitted video frame in the video stream to the second User equipment.
  • the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and acquires an encoding start time of the plurality of video frames to be transmitted, and sends the multiple video frames and their encoding start time to the second.
  • the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
  • the smart glasses record the encoding start time of each video frame, and after encoding, send the video frame and the encoding start time of the transmitted video frame to the tablet computer B, wherein the transmitted video frame includes the video that is to be sent after the current encoding is completed.
  • Frames and transmitted video frames may be sent at a certain time or at a certain interval of a certain video frame, and the coding start time of the transmitted video frame is sent to the tablet B, or the coding start time of the first video frame may be directly Video frames are sent to tablet B at the same time.
  • the tablet computer operates the screen capture operation of the user B, determines the video frame corresponding to the screen capture screen, and sends the second frame related information of the corresponding second video frame to the smart glasses, wherein the second frame related information and the second frame identification information Correspondingly, including but not limited to at least one of the following: an encoding start time of the second video frame, a second video frame decoding end time, a second video frame codec and transmission total duration information, a second video frame corresponding number or Images, etc.
  • the smart glasses receive the second frame related information, and determine correspondingly stored uncoded first video frames according to the second frame related information, such as according to the encoding start time of the second video frame, the second video frame decoding end time, Determining a coding start time of the uncoded first video frame corresponding to the second video frame, and determining a corresponding first video frame, and determining, by the second video frame, corresponding to the second video frame, The number directly determines the first video frame of the same number, and also determines the corresponding first video frame in the stored uncoded video frame by image recognition of the second video frame.
  • the method further includes step S18 (not shown).
  • step S18 the first user equipment presents the first video frame; wherein, in step S15, the first user equipment superimposes a corresponding labeling operation on the first video frame according to the labeling operation information. For example, the first user equipment determines the first video frame that has not been coded, and displays the first video frame in a preset position in the current interface or in a small window; subsequently, the first user equipment receives the annotation according to the real-time reception. The operation information is superimposed on the corresponding position of the first video frame to present a corresponding labeling operation.
  • the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and display the first video frame at a preset position of the interface of the smart glasses. Then, the smart glasses receive the real-time annotation operation sent by the tablet B. The smart glasses determine the corresponding position of the annotation operation in the currently displayed first video frame, and present the current annotation operation in real time at the corresponding location.
  • the method further includes step S19 (not shown).
  • step S19 the first user equipment sends the first video frame to the second user equipment as a preferred frame for presenting the labeling operation. For example, the first user equipment determines the first video frame that has not been coded, and sends the first video frame to the second user equipment for the second user equipment to present the first video frame of higher quality.
  • the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame as a preferred frame to the tablet B, such as by lossless compression.
  • Tablet B receives the first video frame and presents the first video frame.
  • the first user equipment sends a video stream to the second user equipment and the third user equipment.
  • a communication connection is established between the first user equipment, the second user equipment, and the third user equipment, where the first user equipment is the current video frame sender, and the second user equipment and the third user equipment are the current video frame receiving.
  • the first user equipment sends a video stream to the second user equipment and the third user equipment through the communication connection.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses and tablet B
  • tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
  • the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
  • the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
  • the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
  • the method further includes step S010 (not shown).
  • step S010 the first user equipment sends the first video frame as a preferred frame for presenting the labeling operation to the second user equipment and/or the third user equipment.
  • the first user equipment determines a corresponding first video frame in the locally cached video frame according to the second frame related information, and sends the first video frame to the second user equipment and/or the third user equipment.
  • the second user equipment and/or the third user equipment receives the uncoded first video frame, the first video frame is presented, and the second user and/or the third user may perform an annotation operation based on the first video frame.
  • the uncoded first video frame is sent to the tablet B through lossless compression or high quality compression.
  • Tablet PC C wherein Tablet PC B and Tablet PC C determine whether to obtain the first video frame according to the quality of the current communication network connection, or select the transmission mode of the first video frame according to the quality of the current communication network connection, such as good network quality
  • lossless compression high-quality compression is used when the network quality is poor.
  • the first user equipment sends the first video frame and the second frame related information to the second user equipment and/or the third user equipment, where The first video frame is used as a preferred frame to present the annotation operation in the second user device or the third user device.
  • the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time.
  • a screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
  • the tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window in which the first video frame is presented while presenting the first video frame.
  • FIG. 3 illustrates a method for real-time annotation of a video frame at a second user equipment end according to another aspect of the present application, wherein the method includes step S21, step S22, step S23, and step S24.
  • step S21 the second user equipment receives the video stream sent by the first user equipment; in step S22, the second user equipment sends the first user equipment to the first user equipment according to the user's screenshot operation in the video stream.
  • step S23 the second user equipment acquires the labeling operation information of the second video frame by the user; in step S24, the second user equipment The first user equipment sends the labeling operation information.
  • the second user equipment receives and presents the video stream sent by the first user equipment; and the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and the second video frame is The second frame related information is sent to the first user equipment. Then, the second user equipment generates the labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment.
  • user B holds tablet B
  • user A holds smart glasses
  • tablet B and smart glasses communicate video over wired or wireless networks.
  • the tablet computer B receives and presents the video stream sent by the smart glasses, and determines the second video frame corresponding to the screen capture screen according to the screen capture operation of the user B. Then, the tablet B sends the second frame related information corresponding to the second video frame to the smart glasses, and the smart glasses receive the second frame related information and determine a corresponding first video frame based on the second frame related information.
  • the tablet ethyl unit generates the corresponding labeling operation information in the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
  • the smart glasses present a first video frame at a preset position of the interface according to the first video frame and the labeled operation information, and present a corresponding labeling operation in real time in the corresponding position in the first video frame.
  • the second user equipment receives the video stream sent by the first user equipment, and the frame identification information of the transmitted video frame in the video stream; wherein the second frame related information And including at least one of the following: frame identification information of the second video frame; frame related information generated based on frame identification information of the second video frame.
  • the first user equipment sends the video stream to the second user equipment, and also sends the frame identifier information of the sent video frame in the video stream to the second user equipment, where the second user equipment receives the video stream, and the video stream The frame identification information of the transmitted video frame.
  • the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment, where the second video
  • the second frame related information of the frame includes but is not limited to: frame identification information of the second video frame; frame related information generated based on the frame identification information of the second video frame.
  • the smart glasses send the frame identification information corresponding to the video frames in the already transmitted video stream to the tablet B while transmitting the video stream.
  • the tablet computer B detects the screen capture operation of the user B. Based on the screen of the current screen capture, it is determined that the screen capture screen corresponds to the second video frame, and the second frame related information corresponding to the second video frame is sent to the smart glasses, wherein the second video
  • the frame related information includes, but is not limited to, frame identification information of the second video frame, frame related information generated based on the frame identification information of the second video frame, where the frame identification information of the second video frame may be the encoding of the video frame.
  • the frame-related information generated based on the frame identification information of the second video frame may be the decoding end time of the video frame or the total length of the codec and the transmission time information, etc., at the start time or the number corresponding to the video frame.
  • the frame identification information includes encoding start time information of the second video frame.
  • the first user equipment performs an encoding process on the video frame, and sends the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment, where the frame identification information of the video frame includes The encoding start time of the video frame.
  • the second frame related information includes decoding end time information and codec and transmission total duration information of the second video frame. The second user equipment receives and presents the video stream, and records a corresponding decoding end time, determines a corresponding second video frame based on the screen capture operation, and determines a corresponding edit according to the encoding start time and the decoding end time of the second video frame. Decode and transmit total duration information.
  • the smart glasses record the encoding start time of each video frame, and after encoding, send the encoding start time of the video frame and the transmitted video frame to the tablet B.
  • Tablet B receives and presents the video frame and records the decoding end time of the video frame.
  • the screen is operated by the screen capture operation of the user E, and the corresponding second video frame is determined, and the codec and the total transmission duration information of the second video frame are determined according to the coding start time and the decoding end time corresponding to the second video frame.
  • the tablet B sends the second frame related information of the second video frame to the smart glasses, where the second frame related information includes, but is not limited to, an encoding start time of the second video frame, a codec of the second video frame, and Transfer total time information, etc.
  • the second user equipment acquires the labeling operation information of the second video frame by the user in real time; wherein, in step S24, the second user equipment sends the first user equipment to the first user equipment.
  • the annotation operation is sent in real time.
  • the second user equipment acquires the corresponding labeling operation information in real time based on the operation of the second user, for example, collecting the corresponding labeling operation information at a certain time interval. Then, the second user equipment sends the obtained annotation operation information to the first user equipment in real time.
  • the tablet computer B collects the labeling operation of the user B on the screen capture screen, for example, the user B draws a circle, an arrow, a text, a box and the like on the screen.
  • Tablet B records the position and path of the marked brush. For example, through multiple points on the screen, the position corresponding to the corresponding point is obtained, and the position where the multiple points are connected is marked.
  • Tablet PC B obtains the corresponding labeling operation in real time and sends it to the smart glasses in real time, such as collecting and sending labeling operations at a frequency of 50 ms.
  • the method further includes step S25 (not shown).
  • step S25 the second user equipment receives the first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation, in the second video frame. Loading the first video frame to replace the second video frame in a display window, wherein the labeling operation is displayed on the first video frame.
  • the second user equipment determines a second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment; the first user equipment is based on the second video frame.
  • the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
  • the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B, such as by using lossless compression to the first video.
  • the frame is sent to the tablet B, or the first video frame is sent to the tablet B through the lossy lossy compression.
  • the lossy compression process guarantees a higher quality than the locally buffered video frame of the tablet.
  • the tablet B receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
  • the method further includes step S26 (not shown).
  • step S26 the second user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation. Determining, according to the second frame related information, that the first video frame is used to replace the second video frame, and loading the first video frame in a display window of the second video frame to replace the a second video frame, wherein the labeling operation is displayed on the first video frame.
  • the tablet B receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame and a video frame of the second video frame. Number, etc.
  • the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
  • the tablet computer B determines the current corresponding screen capture operation according to the second frame related information, and presents the window in the form of a small window next to the current video, or displays the first video frame on a large screen, and presents the current video in the form of a small window, etc.
  • the tablet B presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Wait. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
  • step S21 the second user equipment receives the video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, in step S24, the second user equipment The first user equipment and the third user equipment send the labeling operation information.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses and tablet B
  • tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
  • the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • Tablet B generates real-time annotation operation information according to user B's labeling operation, and sends the labeling operation information to smart glasses and tablet PC in real time.
  • FIG. 4 illustrates a method for real-time annotation of a video frame at a third user equipment end according to still another aspect of the present application, wherein the method includes step S31, step S32, step S33, step S34, and step S35.
  • step S31 the third user equipment receives the video stream sent by the first user equipment to the second user equipment and the third user equipment; in step S32, the third user equipment receives the second user equipment in the video.
  • the second frame related information of the second video frame intercepted in the stream; in step S33, the third user equipment determines, according to the second frame related information, a corresponding to the second video frame in the video stream.
  • step S34 the third user equipment receives the labeling operation information of the second video frame by the second user equipment; in step S35, the third user equipment is configured according to the labeling operation information.
  • a corresponding labeling operation is presented in real time on the third video frame.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses and tablet B
  • tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
  • the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
  • the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
  • the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
  • the method further includes step S36 (not shown).
  • step S36 the third user equipment receives the first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the labeling operation, in the third video frame. Loading the first video frame to replace the third video frame in a display window, wherein the labeling operation is displayed on the first video frame.
  • the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
  • the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B and the tablet C, as by lossless compression
  • the first video frame is sent to the tablet B and the tablet C, or the first video frame is sent to the tablet B and the tablet C through the lossy lossy compression, the lossy compression process is guaranteed to be better than the tablet B
  • the quality of the video frame cached locally by the tablet C is high.
  • the tablet computer C receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
  • the method further includes step S37 (not shown).
  • step S37 the third user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation. Determining, according to the second frame related information, that the first video frame is used to replace the third video frame, and loading the first video frame in a display window of the third video frame to replace the first video frame a three video frame, wherein the labeling operation is displayed on the first video frame.
  • the tablet computer C receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame, and a video frame of the second video frame. Number, etc.
  • the tablet computer C receives and presents the first video frame, such as being presented as a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video as a small window, etc., while presenting the first video.
  • the tablet C presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame.
  • the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
  • FIG. 5 illustrates a method for real-time annotation of a video frame at a network device end according to still another aspect of the present application, wherein the method includes step S41, step S42, step S43, step S44, and step S45.
  • step S41 the network device receives and forwards the video stream sent by the first user equipment to the second user equipment; in step S42, the network device receives the second video intercepted by the second user equipment in the video stream.
  • step S43 the network device forwards the second frame related information to the first user equipment; in step S44, the network device receives the second user equipment to the first The labeling operation information of the two video frames; in step S45, the network device forwards the labeling operation information to the first user equipment.
  • user A holds smart glasses
  • user B holds tablet B
  • smart glasses and tablet B communicate video through the cloud.
  • the smart glasses encode the currently collected picture and send it to the cloud, and the cloud forwards it to the tablet B.
  • the smart glasses cache a period of time or a certain number of video frames when the video is sent.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the cloud,
  • the cloud forwards to the smart glasses, where the second frame related information includes, but is not limited to, the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame encoding and decoding. Total length information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and sends the labeling operation information to the cloud, and is sent by the cloud to the smart glasses.
  • the smart glasses After receiving the labeling operation information, the smart glasses display in the preset area of the smart glasses. Corresponding uncoded first video frame, and corresponding corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
  • the network device receives and forwards the video stream sent by the first user equipment to the second user equipment, and the frame identification information of the transmitted video frame in the video stream.
  • the first user equipment performs encoding processing on the video frame, and sends the corresponding video stream and the frame identification information of the transmitted video frame in the video stream to the network device, where the network device sends the video stream and the frame of the transmitted video frame.
  • the identification information is forwarded to the second user equipment, wherein the frame identification information includes an encoding start time of the video frame.
  • step S43 the network device determines, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and the The frame identification information of the video frame corresponding to the two video frames is sent to the first user equipment.
  • the cloud receives the video stream sent by the smart glasses and the frame identification information of the transmitted video frames in the video stream, such as the encoding start time of each video frame.
  • the cloud forwards the video stream and the frame identification information corresponding to the transmitted video frame to the tablet B.
  • Tablet B receives and presents the video frame and records the decoding end time of the video frame.
  • the screen capture operation of the tablet E is performed by the user E, and the corresponding second video frame is determined, and the second frame related information of the second video frame is sent to the cloud, where the second frame related information includes the decoding end corresponding to the second video frame. Time or video number of the second video frame, etc.
  • the cloud receives the second frame related information of the second video frame sent by the tablet B, and determines the frame identification information of the corresponding second frame based on the second frame related information, such as according to the decoding end time or the second of the second video frame.
  • the video number of the video frame or the like determines the encoding start time of the second frame or the video number of the second video frame, and the like.
  • step S41 the network device receives and forwards the video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, in step S43, the network equipment uses the second frame.
  • the related information is forwarded to the first user equipment and the third user equipment; wherein, in step S45, the network device forwards the labeling operation information to the first user equipment and the third user equipment.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses, tablet B, and tablet C establish video communication through network devices, and smart glasses will capture the current picture.
  • the code is sent to the network device and buffered for a period of time or a certain number of video frames, and the network device sends the video stream to the tablet B and the tablet C.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture, and sends the second frame related information corresponding to the second video frame to the network device.
  • the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the network device forwards the second frame related information to the first user equipment and the second user equipment.
  • the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and transmits the labeling operation information to the smart glasses and the tablet computer C through the network device in real time, and the smart glasses receive the labeling operation information, and the preset in the smart glasses
  • the area displays a corresponding uncoded first video frame, and presents a corresponding labeling operation in real time at a position corresponding to the first video frame.
  • the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
  • FIG. 6 illustrates a method for real-time annotation of video frames in accordance with an aspect of the present application, wherein the method includes:
  • the first user equipment sends a video stream to the second user equipment
  • the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  • FIG. 7 illustrates a method for real-time annotation of a video frame according to another aspect of the present application, wherein the method includes:
  • the first user equipment sends a video stream to the network device
  • the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  • a method for real-time annotation of a video frame according to still another aspect of the present application, wherein the method comprises:
  • the first user equipment sends a video stream to the second user equipment and the third user equipment;
  • a method for real-time annotation of a video frame includes:
  • the first user equipment sends a video stream to the network device
  • the network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
  • the first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
  • FIG. 8 illustrates a first user equipment for real-time annotation of a video frame according to an aspect of the present application, wherein the device includes a video transmission module 11, a frame information receiving module 12, a video frame determination module 13, and an annotation receiving module. 14 and annotation presentation module 15.
  • the video sending module 11 is configured to send a video stream to the second user equipment, where the frame information receiving module 12 is configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream.
  • a video frame determining module 13 configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, and an annotation receiving module 14 configured to receive the second The labeling operation information of the second video frame by the user equipment; the labeling presentation module 15 is configured to present a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  • the video sending module 11 is configured to send a video stream to the second user equipment.
  • the first user equipment establishes a communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes the video stream to the second user equipment by using a video communication manner.
  • the frame information receiving module 12 is configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream. For example, the second user equipment determines the second frame related information of the video frame corresponding to the screen capture screen based on the screen capture operation of the second user, and then the first user equipment receives the second frame related to the second video frame sent by the second user equipment.
  • Information wherein the second frame related information includes, but is not limited to, second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. .
  • the video frame determining module 13 is configured to determine, according to the second frame related information, a first video frame in the video stream that corresponds to the second video frame. For example, the first user equipment locally stores a period of time or a certain number of transmitted unencoded video frames, and the first user equipment stores the uncoded locally according to the second frame related information sent by the second user equipment. In the video frame, the uncoded first video frame corresponding to the screen capture is determined.
  • the label receiving module 14 is configured to receive the labeling operation information of the second video frame by the second user equipment.
  • the second user equipment generates corresponding labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment in real time, and the first user receives the labeling operation information.
  • the annotation presentation module 15 is configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information.
  • the second user equipment displays a corresponding labeling operation in real time on the first video frame based on the received labeling operation information, such as displaying the first video frame in a small window in the current interface, and corresponding to the first video frame in the current interface.
  • the location presents a corresponding annotation operation at a rate of one frame every 50 ms.
  • user A holds smart glasses
  • user B holds tablet B
  • smart glasses and tablet B establish video communication via wired or wireless network.
  • Smart glasses encode the currently collected picture and send it to tablet B, and cache.
  • the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses.
  • the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates the real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
  • the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding unencoded in the preset area of the smart glasses.
  • the first video frame, and the corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
  • the device also includes a storage module 16 (not shown). a storage module 16 for storing a video frame in the video stream, where the video frame determining module 13 is configured to determine, from the stored video frame, the second video frame according to the second frame related information. Corresponding first video frame.
  • the first user equipment sends the video stream to the second user equipment, and locally stores a period of time or a certain number of uncoded video frames, wherein a period of time or a certain number may be a preset fixed value,
  • the threshold may be dynamically adjusted according to the network condition or the transmission rate; then, the first user equipment determines, according to the second frame related information of the second video frame sent by the second user equipment, the corresponding unrecognized in the locally stored video frame.
  • the first video frame of the codec is a codec.
  • the stored video frame meets, but is not limited to, at least one of the following: a time interval between a transmission time of the stored video frame and a current time is less than or equal to a video frame storage duration threshold; the stored The cumulative number of video frames is less than or equal to a predetermined number of video frame storage thresholds.
  • the smart glasses send the collected images to the tablet B and store them for a period of time or a certain number of uncoded video frames locally, wherein the duration or the number may be a fixed value of the system or manually preset, such as A certain duration or a certain number of video frame thresholds obtained by statistical analysis of big data; a period of time or a certain number of video frames may also be a video frame threshold dynamically adjusted according to network conditions or transmission rates.
  • the duration or number threshold of the dynamic adjustment may be determined according to the codec and the total duration information of the video frame, such as calculating the total length of the codec and the current video frame, and using the duration as a unit duration or within the duration.
  • the number of transmitted video frames is taken as a unit number, and the dynamic video frame duration or number threshold is set with reference to the current unit duration or unit number.
  • the set predetermined or dynamic video frame storage duration threshold should be greater than or equal to one unit duration.
  • the set predetermined or dynamic video frame storage threshold should be greater than or equal to one unit number. Then, the smart glasses determine a corresponding uncoded first video frame in the stored video according to the second frame related information of the second video frame sent by the tablet B, wherein the stored video frame is sent and current.
  • the interval of time is less than or equal to the video frame storage duration threshold, or the accumulated number of stored video frames is less than or equal to a predetermined video frame storage threshold.
  • the device further includes a threshold adjustment module 17 (not shown).
  • the threshold adjustment module 17 is configured to acquire codec and total transmission duration information of the video frame in the video stream, and adjust the video frame storage duration threshold or the number of the video frame storage according to the codec and total transmission duration information. Threshold.
  • the first user equipment records the encoding start time of each video frame, and after encoding, sends the video frame to the second user equipment, and the second user equipment receives and records each video frame decoding end time; subsequently, the second user equipment
  • the video frame decoding end time is sent to the first user equipment, and the first user equipment calculates the codec and total transmission duration information of the current video frame based on the encoding start time and the decoding end time, or the second user equipment is based on the encoding start time and
  • the decoding end time calculates the codec and total transmission duration information of the current video frame, and sends the codec and the total transmission duration information to the first user equipment.
  • the first user equipment adjusts the video frame storage duration threshold or the video frame storage threshold according to the codec and the total transmission duration information. If the duration information is used as a unit time reference, setting a certain multiple of the video frame duration is
  • the video frame stores a duration threshold; for example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frame, and the number is used as a unit number to set a certain multiple of the video frame. The number is used as a threshold for the number of video frames stored.
  • the smart glasses record the i-th video frame encoding start time as T si , and after encoding, send the video frame to the tablet B, and the tablet computer B receives and records the video frame decoding end time as T ei . Subsequently, the tablet B sends the video frame decoding end time T ei to the smart glasses, and the smart glasses calculate the decoding end time T ei according to the received i-th video frame and the encoding start time T si recorded locally .
  • the total length of codec and transmission T i T ei -T si , and the total length of time T i of the codec and transmission is returned to the smart glasses.
  • the smart glasses determine the duration of the video frame in the 1.3T i time dynamically saved by the smart glasses according to the big data statistics according to the encoding and decoding of the i-th video frame and the total transmission time T i .
  • dynamically adjust the magnification according to the network transmission rate for example, setting the buffer duration threshold to (1+k)T i , where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, setting k to 0.5, the network When the fluctuation is small, set k to 0.2 or the like.
  • the smart glasses can dynamically adjust the magnification according to the current network transmission rate, such as setting the buffer number threshold (1+k)N, where k is a threshold adjusted according to network fluctuations, and if the network fluctuation is large, k is set. When it is 0.5 and the network fluctuation is small, set k to 0.2 or the like.
  • the video sending module 11 is configured to send, to the second user equipment, a video stream and frame identification information of the transmitted video frame in the video stream, where the video frame determining module 13 is configured to The second frame related information determines a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information.
  • the frame identification information of the video frame may be a codec time corresponding to the video frame, or may be a number corresponding to the video.
  • the video sending module 11 is configured to perform encoding processing on multiple video frames to be transmitted, and send frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment. .
  • the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and acquires an encoding start time of the plurality of video frames to be transmitted, and sends the multiple video frames and their encoding start time to the second.
  • the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
  • the smart glasses record the encoding start time of each video frame, and after encoding, send the video frame and the encoding start time of the transmitted video frame to the tablet computer B, wherein the transmitted video frame includes the video that is to be sent after the current encoding is completed.
  • Frames and transmitted video frames may be sent at a certain time or at a certain interval of a certain video frame, and the coding start time of the transmitted video frame is sent to the tablet B, or the coding start time of the first video frame may be directly Video frames are sent to tablet B at the same time.
  • the tablet computer operates the screen capture operation of the user B, determines the video frame corresponding to the screen capture screen, and sends the second frame related information of the corresponding second video frame to the smart glasses, wherein the second frame related information and the second frame identification information Correspondingly, including but not limited to at least one of the following: an encoding start time of the second video frame, a second video frame decoding end time, a second video frame codec and transmission total duration information, a second video frame corresponding number or Images, etc.
  • the smart glasses receive the second frame related information, and determine correspondingly stored uncoded first video frames according to the second frame related information, such as according to the encoding start time of the second video frame, the second video frame decoding end time, Determining a coding start time of the uncoded first video frame corresponding to the second video frame, and determining a corresponding first video frame, and determining, by the second video frame, corresponding to the second video frame, The number directly determines the first video frame of the same number, and also determines the corresponding first video frame in the stored uncoded video frame by image recognition of the second video frame.
  • the device also includes a video frame rendering module 18 (not shown).
  • the video frame presentation module 18 is configured to present the first video frame.
  • the annotation presentation module 15 is configured to superimpose a corresponding annotation operation on the first video frame according to the annotation operation information.
  • the first user equipment determines the first video frame that has not been coded, and displays the first video frame in a preset position in the current interface or in a small window; subsequently, the first user equipment receives the annotation according to the real-time reception.
  • the operation information is superimposed on the corresponding position of the first video frame to present a corresponding labeling operation.
  • the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and display the first video frame at a preset position of the interface of the smart glasses. Then, the smart glasses receive the real-time annotation operation sent by the tablet B. The smart glasses determine the corresponding position of the annotation operation in the currently displayed first video frame, and present the current annotation operation in real time at the corresponding location.
  • the device also includes a first preferred frame module 19 (not shown).
  • the first preferred frame module 19 is configured to send the first video frame to the second user equipment as a preferred frame that presents the labeling operation. For example, the first user equipment determines the first video frame that has not been coded, and sends the first video frame to the second user equipment for the second user equipment to present the first video frame of higher quality.
  • the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame as a preferred frame to the tablet B, such as by lossless compression.
  • Tablet B receives the first video frame and presents the first video frame.
  • the video sending module 11 is configured to send a video stream to the second user equipment and the third user equipment.
  • a communication connection is established between the first user equipment, the second user equipment, and the third user equipment, where the first user equipment is the current video frame sender, and the second user equipment and the third user equipment are the current video frame receiving.
  • the first user equipment sends a video stream to the second user equipment and the third user equipment through the communication connection.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses and tablet B
  • tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
  • the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
  • the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
  • the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
  • the device further includes a second preferred frame module 010 (not shown).
  • the second preferred frame module 010 is configured to send the first video frame to the second user equipment and/or the third user equipment as a preferred frame for presenting the labeling operation.
  • the first user equipment determines a corresponding first video frame in the locally cached video frame according to the second frame related information, and sends the first video frame to the second user equipment and/or the third user equipment.
  • the second user equipment and/or the third user equipment receives the uncoded first video frame, the first video frame is presented, and the second user and/or the third user may perform an annotation operation based on the first video frame.
  • the uncoded first video frame is sent to the tablet B through lossless compression or high quality compression.
  • Tablet PC C wherein Tablet PC B and Tablet PC C determine whether to obtain the first video frame according to the quality of the current communication network connection, or select the transmission mode of the first video frame according to the quality of the current communication network connection, such as good network quality
  • lossless compression high-quality compression is used when the network quality is poor.
  • the second preferred frame module 010 is configured to send the first video frame and the second frame related information to the second user equipment and/or the third user equipment, where The first video frame is used as a preferred frame to present the annotation operation in the second user device or the third user device.
  • the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time.
  • a screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
  • the tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window in which the first video frame is presented while presenting the first video frame.
  • FIG. 9 illustrates a second user equipment for real-time annotation of a video frame according to another aspect of the present application, wherein the apparatus includes a video receiving module 21, a frame information determining module 22, an annotation acquiring module 23, and an annotation transmitting module 24.
  • the video receiving module 21 is configured to receive the video stream sent by the first user equipment
  • the frame information determining module 22 is configured to send the intercepted number to the first user equipment according to a screenshot operation of the user in the video stream.
  • a second frame related information of the second video frame an annotation obtaining module 23, configured to acquire the labeling operation information of the second video frame by the user; and an annotation sending module 24, configured to send the Label the operation information.
  • the second user equipment receives and presents the video stream sent by the first user equipment; and the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and the second video frame is The second frame related information is sent to the first user equipment. Then, the second user equipment generates the labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment.
  • user B holds tablet B
  • user A holds smart glasses
  • tablet B and smart glasses communicate video over wired or wireless networks.
  • the tablet computer B receives and presents the video stream sent by the smart glasses, and determines the second video frame corresponding to the screen capture screen according to the screen capture operation of the user B. Then, the tablet B sends the second frame related information corresponding to the second video frame to the smart glasses, and the smart glasses receive the second frame related information and determine a corresponding first video frame based on the second frame related information.
  • the tablet ethyl unit generates the corresponding labeling operation information in the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time.
  • the smart glasses present a first video frame at a preset position of the interface according to the first video frame and the labeled operation information, and present a corresponding labeling operation in real time in the corresponding position in the first video frame.
  • the video receiving module 21 is configured to receive a video stream sent by the first user equipment, and frame identification information of the transmitted video frame in the video stream, where the second frame related information includes the following At least one of: frame identification information of the second video frame; frame related information generated based on frame identification information of the second video frame.
  • the first user equipment sends the video stream to the second user equipment, and also sends the frame identifier information of the sent video frame in the video stream to the second user equipment, where the second user equipment receives the video stream, and the video stream The frame identification information of the transmitted video frame.
  • the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment, where the second video
  • the second frame related information of the frame includes but is not limited to: frame identification information of the second video frame; frame related information generated based on the frame identification information of the second video frame.
  • the smart glasses send the frame identification information corresponding to the video frames in the already transmitted video stream to the tablet B while transmitting the video stream.
  • the tablet computer B detects the screen capture operation of the user B. Based on the screen of the current screen capture, it is determined that the screen capture screen corresponds to the second video frame, and the second frame related information corresponding to the second video frame is sent to the smart glasses, wherein the second video
  • the frame related information includes, but is not limited to, frame identification information of the second video frame, frame related information generated based on the frame identification information of the second video frame, where the frame identification information of the second video frame may be the encoding of the video frame.
  • the frame-related information generated based on the frame identification information of the second video frame may be the decoding end time of the video frame or the total length of the codec and the transmission time information, etc., at the start time or the number corresponding to the video frame.
  • the frame identification information includes encoding start time information of the second video frame.
  • the first user equipment performs an encoding process on the video frame, and sends the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment, where the frame identification information of the video frame includes The encoding start time of the video frame.
  • the second frame related information includes decoding end time information and codec and transmission total duration information of the second video frame. The second user equipment receives and presents the video stream, and records a corresponding decoding end time, determines a corresponding second video frame based on the screen capture operation, and determines a corresponding edit according to the encoding start time and the decoding end time of the second video frame. Decode and transmit total duration information.
  • the smart glasses record the encoding start time of each video frame, and after encoding, send the encoding start time of the video frame and the transmitted video frame to the tablet B.
  • Tablet B receives and presents the video frame and records the decoding end time of the video frame.
  • the screen is operated by the screen capture operation of the user E, and the corresponding second video frame is determined, and the codec and the total transmission duration information of the second video frame are determined according to the coding start time and the decoding end time corresponding to the second video frame.
  • the tablet B sends the second frame related information of the second video frame to the smart glasses, where the second frame related information includes, but is not limited to, an encoding start time of the second video frame, a codec of the second video frame, and Transfer total time information, etc.
  • the annotation obtaining module 23 is configured to acquire the labeling operation information of the second video frame by the user in real time, where the label sending module 24 is configured to send the identifier to the first user equipment in real time.
  • Label operation For example, the second user equipment acquires the corresponding labeling operation information in real time based on the operation of the second user, for example, collecting the corresponding labeling operation information at a certain time interval. Then, the second user equipment sends the acquired annotation operation information to the first user equipment in real time.
  • the tablet computer B collects the labeling operation of the user B on the screen capture screen, for example, the user B draws a circle, an arrow, a text, a box and the like on the screen.
  • Tablet B records the position and path of the marked brush. For example, through multiple points on the screen, the position corresponding to the corresponding point is obtained, and the position where the multiple points are connected is marked.
  • Tablet PC B obtains the corresponding labeling operation in real time and sends it to the smart glasses in real time, such as collecting and sending labeling operations at a frequency of 50 ms.
  • the device also includes a first video frame replacement module 25 (not shown).
  • a first video frame replacement module 25 configured to receive a first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation, in the second video Loading the first video frame to replace the second video frame in a display window of the frame, wherein the labeling operation is displayed on the first video frame.
  • the second user equipment determines a second video frame corresponding to the current screen shot and transmits the second frame related information of the second video frame to the first user equipment; the first user equipment is based on the second video frame.
  • the second frame related information determines the unencoded first video frame corresponding to the second video frame, and sends the first video frame to the second user equipment, where the second user equipment receives and presents the first video frame. And obtaining the labeling operation information of the first video frame by the second user.
  • the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
  • the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B, such as by using lossless compression to the first video.
  • the frame is sent to the tablet B, or the first video frame is sent to the tablet B through the lossy lossy compression.
  • the lossy compression process guarantees a higher quality than the locally buffered video frame of the tablet.
  • the tablet B receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
  • the device further includes a first video frame labeling module 26 (not shown).
  • a first video frame labeling module 26 configured to receive a first video frame and the second frame related information that are sent by the first user equipment, where the first video frame is used as a preference for presenting the labeling operation a frame, determining, according to the second frame related information, that the first video frame is used to replace the second video frame, and loading the first video frame to replace the first video frame in a display window of the second video frame The second video frame is described, wherein the labeling operation is displayed on the first video frame.
  • the tablet B receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame and a video frame of the second video frame. Number, etc.
  • the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame.
  • the tablet computer B determines the current corresponding screen capture operation according to the second frame related information, and presents the window in the form of a small window next to the current video, or displays the first video frame on a large screen, and presents the current video in the form of a small window, etc.
  • the tablet B presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Wait. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
  • the video receiving module 21 is configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment, where the label sending module 24 is configured to send to the first user equipment. And the third user equipment sends the labeling operation information.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses and tablet B
  • tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
  • the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
  • FIG. 10 illustrates an apparatus for real-time labeling a video frame at a third user equipment end according to still another aspect of the present application, wherein the apparatus includes a third video receiving module 31, a third frame information receiving module 32, and a third The three video frame determining module 33, the third label receiving module 34, and the third rendering module 35.
  • a third video receiving module 31 configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment, where the third frame information receiving module 32 is configured to receive the second user equipment in the a second frame related information of the second video frame that is intercepted in the video stream; the third video frame determining module 33 is configured to determine, according to the second frame related information, that the video stream corresponds to the second video frame a third video frame, a third annotation receiving module 34, configured to receive the labeling operation information of the second video frame by the second user equipment, and a third rendering module 35, configured to The corresponding labeling operation is presented in real time on the third video frame.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses and tablet B
  • tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected.
  • the picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
  • the smart glasses After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame.
  • the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
  • the device also includes a preferred frame reception presentation module 36 (not shown).
  • a preferred frame receiving presentation module 36 configured to receive a first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the labeling operation, in the third video frame Loading the first video frame to replace the third video frame in a display window, wherein the labeling operation is displayed on the first video frame.
  • the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame.
  • the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B and the tablet C, as by lossless compression
  • the first video frame is sent to the tablet B and the tablet C, or the first video frame is sent to the tablet B and the tablet C through the lossy lossy compression, the lossy compression process is guaranteed to be better than the tablet B
  • the quality of the video frame cached locally by the tablet C is high.
  • the tablet computer C receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
  • the device further includes a preferred frame annotation presentation module 37 (not shown).
  • a frame annotation presentation module 37 configured to receive a first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation Determining, according to the second frame related information, that the first video frame is used to replace the third video frame, and loading the first video frame in a display window of the third video frame to replace the first video frame a three video frame, wherein the labeling operation is displayed on the first video frame.
  • the tablet computer C receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame, and a video frame of the second video frame. Number, etc.
  • the tablet computer C receives and presents the first video frame, such as being presented as a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video as a small window, etc., while presenting the first video.
  • the tablet C presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame.
  • the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
  • FIG. 11 illustrates a network device for real-time annotation of a video frame according to still another aspect of the present application, wherein the device includes a video forwarding module 41, a frame information receiving module 42, a frame information forwarding module 43, and an annotation receiving module 44. And annotating the forwarding module 45.
  • the video forwarding module 41 is configured to receive and forward a video stream that is sent by the first user equipment to the second user equipment
  • the frame information receiving module 42 is configured to receive the second information that is captured by the second user equipment in the video stream.
  • a second frame related information of the video frame a frame information forwarding module 43, configured to forward the second frame related information to the first user equipment, and an annotation receiving module 44, configured to receive the second user equipment
  • the labeling operation information of the second video frame is used to forward the labeling operation information to the first user equipment.
  • user A holds smart glasses
  • user B holds tablet B
  • smart glasses and tablet B communicate video through the cloud.
  • the smart glasses encode the currently collected picture and send it to the cloud, and the cloud forwards it to the tablet B.
  • the smart glasses cache a period of time or a certain number of video frames when the video is sent.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the cloud,
  • the cloud forwards to the smart glasses, where the second frame related information includes, but is not limited to, the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame encoding and decoding. Total length information, etc.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and sends the labeling operation information to the cloud, and is sent by the cloud to the smart glasses.
  • the smart glasses After receiving the labeling operation information, the smart glasses display in the preset area of the smart glasses. Corresponding uncoded first video frame, and corresponding corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
  • the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment, and frame identification information of the transmitted video frame in the video stream.
  • the first user equipment performs encoding processing on the video frame, and sends the corresponding video stream and the frame identification information of the transmitted video frame in the video stream to the network device, where the network device sends the video stream and the frame of the transmitted video frame.
  • the identification information is forwarded to the second user equipment, wherein the frame identification information includes an encoding start time of the video frame.
  • the frame information forwarding module 43 is configured to determine, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and The frame identification information of the video frame corresponding to the second video frame is sent to the first user equipment.
  • the cloud receives the video stream sent by the smart glasses and the frame identification information of the transmitted video frames in the video stream, such as the encoding start time of each video frame.
  • the cloud forwards the video stream and the frame identification information corresponding to the transmitted video frame to the tablet B.
  • Tablet B receives and presents the video frame and records the decoding end time of the video frame.
  • the screen capture operation of the tablet E is performed by the user E, and the corresponding second video frame is determined, and the second frame related information of the second video frame is sent to the cloud, where the second frame related information includes the decoding end corresponding to the second video frame. Time or video number of the second video frame, etc.
  • the cloud receives the second frame related information of the second video frame sent by the tablet B, and determines the frame identification information of the corresponding second frame based on the second frame related information, such as according to the decoding end time or the second of the second video frame.
  • the video number of the video frame or the like determines the encoding start time of the second frame or the video number of the second video frame, and the like.
  • the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment and the third user equipment, where the frame information forwarding module 43 is configured to use the second The frame-related information is forwarded to the first user equipment and the third user equipment.
  • the label forwarding module 45 is configured to forward the labeling operation information to the first user equipment and the third user equipment.
  • user A holds smart glasses
  • user B holds tablet B
  • user C holds tablet C
  • smart glasses, tablet B, and tablet C establish video communication through network devices, and smart glasses will capture the current picture.
  • the code is sent to the network device and buffered for a period of time or a certain number of video frames, and the network device sends the video stream to the tablet B and the tablet C.
  • the tablet B After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture, and sends the second frame related information corresponding to the second video frame to the network device.
  • the second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information.
  • the smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information.
  • the network device forwards the second frame related information to the first user equipment and the second user equipment.
  • the tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and transmits the labeling operation information to the smart glasses and the tablet computer C through the network device in real time, and the smart glasses receive the labeling operation information, and the preset in the smart glasses
  • the area displays a corresponding uncoded first video frame, and presents a corresponding labeling operation in real time at a position corresponding to the first video frame.
  • the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
  • a system for real-time annotation of a video frame comprising: the first user equipment according to any of the above embodiments and any of the above embodiments The second user equipment is described; in other embodiments, the system further comprises: the network device as described in any of the above embodiments.
  • a system for real-time annotation of a video frame comprising: the first user equipment according to any of the above embodiments, according to any of the above embodiments The second user equipment and the third user equipment as described in any of the above embodiments; in other embodiments, the system further includes: the network device as described in any of the foregoing embodiments.
  • the present application also provides a computer readable storage medium storing computer code, the method of any of which is performed when the computer code is executed.
  • the present application also provides a computer program product that is executed as described in any of the foregoing when the computer program product is executed by a computer device.
  • the application also provides a computer device, the computer device comprising:
  • One or more processors are One or more processors;
  • a memory for storing one or more computer programs
  • the one or more processors When the one or more computer programs are executed by the one or more processors, the one or more processors are caused to implement the method of any of the preceding.
  • Figure 12 illustrates an exemplary system that can be used to implement various embodiments described in this application
  • system 300 can be used as a device for any real-time annotation of video frames as any of the described embodiments.
  • system 300 can include and be coupled to one or more computer readable media (eg, system memory or NVM/storage device 320) having instructions and configured to execute The instructions are one or more processors (eg, processor(s) 305) that implement the modules to perform the actions described herein.
  • processors eg, processor(s) 305
  • system control module 310 can include any suitable interface controller to provide to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310. Any suitable interface.
  • System control module 310 can include a memory controller module 330 to provide an interface to system memory 315.
  • the memory controller module 330 can be a hardware module, a software module, and/or a firmware module.
  • System memory 315 can be used, for example, to load and store data and/or instructions for system 300.
  • system memory 315 can include any suitable volatile memory, such as a suitable DRAM.
  • system memory 315 can include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).
  • DDR4 SDRAM double data rate type quad synchronous dynamic random access memory
  • system control module 310 can include one or more input/output (I/O) controllers to provide an interface to NVM/storage device 320 and communication interface(s) 325.
  • I/O input/output
  • NVM/storage device 320 can be used to store data and/or instructions.
  • NVM/storage device 320 may comprise any suitable non-volatile memory (eg, flash memory) and/or may include any suitable non-volatile storage device(s) (eg, one or more hard disk drives (HDD), one or more compact disc (CD) drives and/or one or more digital versatile disc (DVD) drives).
  • HDD hard disk drives
  • CD compact disc
  • DVD digital versatile disc
  • the NVM/storage device 320 can include storage resources that are physically part of the device on which the system 300 is installed, or that can be accessed by the device without having to be part of the device.
  • NVM/storage device 320 can be accessed via network via communication interface(s) 325.
  • the communication interface(s) 325 can provide an interface to the system 300 to communicate over one or more networks and/or with any other suitable device.
  • System 300 can wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.
  • At least one of the processor(s) 305 can be packaged with the logic of one or more controllers of the system control module 310 (eg, the memory controller module 330). For one embodiment, at least one of the processor(s) 305 can be packaged with the logic of one or more controllers of the system control module 310 to form a system in package (SiP). For one embodiment, at least one of the processor(s) 305 can be integrated on the same mold as the logic of one or more controllers of the system control module 310. For one embodiment, at least one of the processor(s) 305 can be integrated with the logic of one or more controllers of the system control module 310 on the same mold to form a system on a chip (SoC).
  • SoC system on a chip
  • system 300 can be, but is not limited to, a server, workstation, desktop computing device, or mobile computing device (eg, a laptop computing device, a handheld computing device, a tablet, a netbook, etc.).
  • system 300 can have more or fewer components and/or different architectures.
  • system 300 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an application specific integrated circuit ( ASIC) and speakers.
  • LCD liquid crystal display
  • ASIC application specific integrated circuit
  • the present application can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device.
  • the software program of the present application can be executed by a processor to implement the steps or functions described above.
  • the software programs (including related data structures) of the present application can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like.
  • some of the steps or functions of the present application may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
  • a portion of the application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or technical solution in accordance with the present application.
  • the form of computer program instructions in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., accordingly, the manner in which the computer program instructions are executed by the computer includes but not Limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installation. program.
  • the computer readable medium can be any available computer readable storage medium or communication medium that can be accessed by a computer.
  • Communication media includes media that can be transferred from one system to another by communication signals including, for example, computer readable instructions, data structures, program modules or other data.
  • Communication media can include conductive transmission media such as cables and wires (eg, fiber optics, coaxial, etc.) and wireless (unguided transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared.
  • Computer readable instructions, data structures, program modules or other data may be embodied, for example, as modulated data signals in a wireless medium, such as a carrier wave or a similar mechanism, such as embodied in a portion of a spread spectrum technique.
  • modulated data signal refers to a signal whose one or more features are altered or set in such a manner as to encode information in the signal. Modulation can be analog, digital or hybrid modulation techniques.
  • the computer readable storage medium may comprise, by way of example and not limitation, vols and non-volatile, implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • a computer readable storage medium includes, but is not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and nonvolatile memory such as flash memory, various read only memories (ROM, PROM, EPROM) , EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disks, tapes, CDs, DVDs); or other currently known media or later developed for storage in computer systems Computer readable information/data used.
  • volatile memory such as random access memory (RAM, DRAM, SRAM)
  • nonvolatile memory such as flash memory, various read only memories (ROM, PROM, EPROM) , EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk
  • an embodiment in accordance with the present application includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering
  • the apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present application.

Abstract

The present application aims to provide a method and device for labeling video frames in real time. The method specifically comprises: sending a video stream to a second user equipment; receiving second frame related information of a second video frame intercepted by the second user equipment in the video stream; determining a first video frame corresponding to the second video frame in the video stream according to the second frame related information; receiving labeling operation information of the second video frame by the second user equipment; and displaying a corresponding labeling operation on the first video frame in real time according to the labeling operation information. According to the present application, labeling information is directly displayed in overlapping fashion on a non-coded or non-decoded video frame image sent by a video sender, and because the labeled video frame is not subjected to operations such as coding and decoding, the definition is high. Furthermore, this solution can also achieve real-time display of the label, so that the practicability is good, the interactivity is high, and the user experience and the broadband utilization rate are improved.

Description

一种用于对视频帧进行实时标注的方法与设备Method and device for real-time annotation of video frames
本案要求CN 201810011908.0和CN 201810409977.7的优先权This case requires the priority of CN 201810011908.0 and CN 201810409977.7
技术领域Technical field
本申请涉及计算机领域,尤其涉及一种用于对视频帧进行实时标注的技术。The present application relates to the field of computers, and in particular, to a technique for real-time annotation of video frames.
背景技术Background technique
现在通用的视频流传输手段,视频流的发送方按照编码协议将视频编码并通过网络发送给视频接收方,接收方接收对应的视频,并将该视频解码,对解码后的视频截图,并在画面上进行标注,将标注后的图像编码后通过网络发送给视频发送方,或者将标注后的图像发送至服务器,视频发送方再接收服务器添加标注后的图像。发送方接收到的解码后的带有标注信息的图像,其清晰度有一定的损耗,且带有标注的图像从接收方经过网络传输到视频发送方,该过程的快慢取决于当前网络的传输速率,因而传输速度受到影响,存在延迟现象,不利于双方的实时交互。Nowadays, the universal video streaming means, the sender of the video stream encodes the video according to the encoding protocol and sends it to the video receiver through the network, and the receiver receives the corresponding video, and decodes the video, and takes a screenshot of the decoded video, and The image is marked on the screen, and the labeled image is encoded and sent to the video sender through the network, or the labeled image is sent to the server, and the video sender receives the image added by the server. The decoded image with the annotation information received by the sender has a certain loss of clarity, and the image with the annotation is transmitted from the receiver to the video sender through the network. The speed of the process depends on the transmission of the current network. The rate, and thus the transmission speed, is affected, and there is a delay phenomenon, which is not conducive to real-time interaction between the two parties.
发明内容Summary of the invention
本申请的一个目的是提供一种对视频帧进行实时标注的方法与设备。It is an object of the present application to provide a method and apparatus for real-time annotation of video frames.
根据本申请的一个方面,提供了一种在第一用户设备端用于对视频帧进行实时标注的方法,该方法包括:According to an aspect of the present application, a method for real-time annotation of a video frame on a first user equipment side is provided, the method comprising:
向第二用户设备发送视频流;Sending a video stream to the second user equipment;
接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。And correspondingly presenting a corresponding labeling operation on the first video frame according to the labeling operation information.
根据本申请的另一个方面,提供了一种在第二用户设备端用于对视频帧进行实时标注的方法,该方法包括:According to another aspect of the present application, a method for real-time annotation of a video frame on a second user equipment side is provided, the method comprising:
接收第一用户设备所发送的视频流;Receiving a video stream sent by the first user equipment;
根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;Transmitting second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
获取所述用户对所述第二视频帧的标注操作信息;Obtaining, by the user, the labeling operation information of the second video frame;
向所述第一用户设备发送所述标注操作信息。Sending the labeling operation information to the first user equipment.
根据本申请的又一个方面,提供了一种在第三用户设备端用于对视频帧进行实时标注的方法,该方法包括:According to still another aspect of the present application, a method for real-time annotation of a video frame on a third user equipment side is provided, the method comprising:
接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;Receiving a video stream that is sent by the first user equipment to the second user equipment and the third user equipment;
接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;Determining, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。And correspondingly presenting a corresponding labeling operation on the third video frame according to the labeling operation information.
根据本申请的又一个方面,提供了一种在网络设备端用于对视频帧进行实时标注的方法,该方法包括:According to still another aspect of the present application, a method for real-time annotation of a video frame on a network device side is provided, the method comprising:
接收并转发第一用户设备发给第二用户设备的视频流;Receiving and forwarding a video stream sent by the first user equipment to the second user equipment;
接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
将所述第二帧相关信息转发至所述第一用户设备;Forwarding the second frame related information to the first user equipment;
接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
将所述标注操作信息转发至所述第一用户设备。Forwarding the labeling operation information to the first user equipment.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的方法,其中,该方法包括:According to an aspect of the present application, a method for real-time annotation of a video frame is provided, wherein the method includes:
第一用户设备向第二用户设备发送视频流;The first user equipment sends a video stream to the second user equipment;
所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操 作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the first user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述第一用户设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the first user equipment;
所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
根据本申请的另一个方面,提供了一种用于对视频帧进行实时标注的方法,其中,该方法包括:According to another aspect of the present application, a method for real-time annotation of a video frame is provided, wherein the method includes:
第一用户设备向网络设备发送视频流;The first user equipment sends a video stream to the network device;
所述网络设备接收所述视频流,并向第二用户设备转发所述视频流;Receiving, by the network device, the video stream, and forwarding the video stream to a second user equipment;
所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述网络设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the network device;
所述网络设备接收所述第二帧相关信息,并将所述第二帧相关信息转发至所述第一用户设备;Receiving, by the network device, the second frame related information, and forwarding the second frame related information to the first user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述网络设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the network device;
所述网络设备接收所述第二用户设备对所述第二视频帧的标注操作信息,将所述标注操作信息转发至所述第一用户设备;The network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment;
所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
根据本申请的又一个方面,提供了一种用于对视频帧进行实时标注的方法,其中,该方法包括:According to still another aspect of the present application, a method for real-time annotation of a video frame is provided, wherein the method includes:
第一用户设备向第二用户设备及第三用户设备发送视频流;The first user equipment sends a video stream to the second user equipment and the third user equipment;
所述第二用户设备根据用户在所述视频流中的截图操作,向所述第一用户设备及所述第三用户设备发送被截取的第二视频帧的第二帧相关信息;Transmitting, by the second user equipment, the second frame related information of the intercepted second video frame to the first user equipment and the third user equipment according to a screenshot operation of the user in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向 所述第一用户设备及所述第三用户设备发送所述标注操作信息;And acquiring, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the first user equipment and the third user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作;Receiving, by the first user equipment, the second frame related information, determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, and receiving the labeling operation information And displaying a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
所述第三用户设备接收所述视频流,接收所述第二视频帧的第二帧相关信息,接收所述标注操作信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧,根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。Receiving, by the third user equipment, the video stream, receiving second frame related information of the second video frame, receiving the labeling operation information, determining, according to the second frame related information, the video stream and the The third video frame corresponding to the second video frame presents a corresponding labeling operation in real time on the third video frame according to the labeling operation information.
根据本申请的又一个方面,提供了一种用于对视频帧进行实时标注的方法,其中,该方法包括:According to still another aspect of the present application, a method for real-time annotation of a video frame is provided, wherein the method includes:
第一用户设备向网络设备发送视频流;The first user equipment sends a video stream to the network device;
所述网络设备接收所述视频流,并向第二用户设备及第三用户设备转发所述视频流;Receiving, by the network device, the video stream, and forwarding the video stream to the second user equipment and the third user equipment;
所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述网络设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the network device;
所述网络设备接收所述第二帧相关信息,并将所述第二帧相关信息转发至所述第一用户设备及所述第三用户设备;Receiving, by the network device, the second frame related information, and forwarding the second frame related information to the first user equipment and the third user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述网络设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the network device;
所述网络设备接收所述第二用户设备对所述第二视频帧的标注操作信息,将所述标注操作信息转发至所述第一用户设备及第三用户设备;The network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作;The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
所述第三用户设备接收所述视频流,接收所述第二视频帧的第二帧相关信息,接收所述标注操作信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧,根据所述标注操作信息在所述第三 视频帧上实时呈现对应的标注操作。Receiving, by the third user equipment, the video stream, receiving second frame related information of the second video frame, receiving the labeling operation information, determining, according to the second frame related information, the video stream and the The third video frame corresponding to the second video frame presents a corresponding labeling operation in real time on the third video frame according to the labeling operation information.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的第一用户设备,该设备包括:According to an aspect of the present application, a first user equipment for real-time annotation of a video frame is provided, the device comprising:
视频发送模块,用于向第二用户设备发送视频流;a video sending module, configured to send a video stream to the second user equipment;
帧信息接收模块,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;a frame information receiving module, configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
视频帧确定模块,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;a video frame determining module, configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
标注接收模块,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;An annotation receiving module, configured to receive the labeling operation information of the second video frame by the second user equipment;
标注呈现模块,用于根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。And an annotation presentation module, configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information.
根据本申请的另一个方面,提供了一种用于对视频帧进行实时标注的第二用户设备,该设备包括:According to another aspect of the present application, a second user equipment for real-time annotation of a video frame is provided, the apparatus comprising:
视频接收模块,用于接收第一用户设备所发送的视频流;a video receiving module, configured to receive a video stream sent by the first user equipment;
帧信息确定模块,用于根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;a frame information determining module, configured to send second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
标注获取模块,用于获取所述用户对所述第二视频帧的标注操作信息;An annotation obtaining module, configured to acquire the labeling operation information of the second video frame by the user;
标注发送模块,用于向所述第一用户设备发送所述标注操作信息。And an annotation sending module, configured to send the labeling operation information to the first user equipment.
根据本申请的又一个方面,提供了一种用于对视频帧进行实时标注的第三用户设备,该设备包括:According to still another aspect of the present application, a third user equipment for real-time annotation of a video frame is provided, the device comprising:
第三视频接收模块,用于接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;a third video receiving module, configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment;
第三帧信息接收模块,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;a third frame information receiving module, configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
第三视频帧确定模块,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;a third video frame determining module, configured to determine, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
第三标注接收模块,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;a third label receiving module, configured to receive the labeling operation information of the second video frame by the second user equipment;
第三呈现模块,用于根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。The third rendering module is configured to present a corresponding labeling operation on the third video frame in real time according to the labeling operation information.
根据本申请的又一个方面,提供了一种用于对视频帧进行实时标注的网络设备,该设备包括:According to still another aspect of the present application, a network device for real-time annotation of a video frame is provided, the device comprising:
视频转发模块,用于接收并转发第一用户设备发给第二用户设备的视频流;a video forwarding module, configured to receive and forward a video stream sent by the first user equipment to the second user equipment;
帧信息接收模块,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;a frame information receiving module, configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
帧信息转发模块,用于将所述第二帧相关信息转发至所述第一用户设备;a frame information forwarding module, configured to forward the second frame related information to the first user equipment;
标注接收模块,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;An annotation receiving module, configured to receive the labeling operation information of the second video frame by the second user equipment;
标注转发模块,用于将所述标注操作信息转发至所述第一用户设备。And an annotation forwarding module, configured to forward the labeling operation information to the first user equipment.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的系统,该系统包括如上所述的第一用户设备和如上所述的第二用户设备。In accordance with an aspect of the present application, a system for real-time annotation of video frames is provided, the system comprising a first user device as described above and a second user device as described above.
根据本申请的另一方面,还提供了一种用于对视频帧进行实时标注的系统,包括如上所述的第一用户设备、如上所述的第二用户设备和如上所述的网络设备。In accordance with another aspect of the present application, a system for real-time annotation of video frames is also provided, including a first user device as described above, a second user device as described above, and a network device as described above.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的系统,该系统包括如上所述的第一用户设备、如上所述的第二用户设备和如上所述的第三用户设备。According to an aspect of the present application, there is provided a system for real-time annotation of a video frame, the system comprising a first user device as described above, a second user device as described above, and a third user as described above device.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的系统,该系统包括如上所述的第一用户设备、如上所述的第二用户设备、如上所述的第三用户设备以及如上所述的网络设备。According to an aspect of the present application, there is provided a system for real-time annotation of a video frame, the system comprising a first user device as described above, a second user device as described above, a third user as described above Equipment and network equipment as described above.
根据本申请的一个方面,提供了一种包括指令的计算机可读介质,所述指令在被执行时使得系统进行:According to an aspect of the present application, there is provided a computer readable medium comprising instructions that, when executed, cause a system to:
向第二用户设备发送视频流;Sending a video stream to the second user equipment;
接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。And correspondingly presenting a corresponding labeling operation on the first video frame according to the labeling operation information.
根据本申请的另一个方面,还提供了一种包括指令的计算机可读介质,所述指令在被执行时使得系统进行:According to another aspect of the present application, there is also provided a computer readable medium comprising instructions that, when executed, cause a system to:
接收第一用户设备所发送的视频流;Receiving a video stream sent by the first user equipment;
根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;Transmitting second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
获取所述用户对所述第二视频帧的标注操作信息;Obtaining, by the user, the labeling operation information of the second video frame;
向所述第一用户设备发送所述标注操作信息。Sending the labeling operation information to the first user equipment.
根据本申请的一个方面,提供了一种包括指令的计算机可读介质,所述指令在被执行时使得系统进行:According to an aspect of the present application, there is provided a computer readable medium comprising instructions that, when executed, cause a system to:
接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;Receiving a video stream that is sent by the first user equipment to the second user equipment and the third user equipment;
接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;Determining, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。And correspondingly presenting a corresponding labeling operation on the third video frame according to the labeling operation information.
根据本申请的又一个方面,还提供了一种包括指令的计算机可读介质,所述指令在被执行时使得系统进行:According to still another aspect of the present application, there is also provided a computer readable medium comprising instructions that, when executed, cause a system to:
接收并转发第一用户设备发给第二用户设备的视频流;Receiving and forwarding a video stream sent by the first user equipment to the second user equipment;
接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
将所述第二帧相关信息转发至所述第一用户设备;Forwarding the second frame related information to the first user equipment;
接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
将所述标注操作信息转发至所述第一用户设备。Forwarding the labeling operation information to the first user equipment.
与现有技术相比,本申请通过在视频发送方缓存一定的视频帧,根据视频接收方的截屏与对应视频帧相关信息,确定视频发送方的未经编解码的视频帧图像,并将视频接收方在截图上的标注信息实时传送给视频发送方。视频发送方对应的视频帧图像上实时显示该标注,因而发送方可以实时观察视频接收方的标注过程,由于所标注的视频帧未经编解码等操作,因而清晰度高;进一步地,本方案还能实现标注的实时展示,实用性好、互动性强、提高了用户体验和宽带利用率。而且,视频发送方可以在确定未经编解码的视频帧后将该视频帧发送至视频接收方,视频接收方也可以通过高质量的视频帧进行标注,大大提升了用户的使用体验。Compared with the prior art, the present application determines a video sender's unencoded video frame image according to the video receiver's screenshot and the corresponding video frame related information by buffering a certain video frame on the video sender, and the video is The annotation information of the receiver on the screenshot is transmitted to the video sender in real time. The annotation is displayed on the video frame image corresponding to the video sender in real time, so the sender can observe the labeling process of the video receiver in real time, and the resolution is high because the labeled video frame is not encoded and so on; further, the scheme It also enables real-time display of annotations, with good practicability, strong interactivity, and improved user experience and broadband utilization. Moreover, the video sender can send the video frame to the video receiver after determining the uncoded video frame, and the video receiver can also mark the high quality video frame, thereby greatly improving the user experience.
附图说明DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.
图1示出根据本申请一个实施例的一种用于对视频帧进行实时标注的系统拓扑图;1 shows a system topology diagram for real-time annotation of video frames in accordance with an embodiment of the present application;
图2示出根据本申请一个方面的一种在第一用户设备端用于对视频帧进行实时标注的方法流程图;2 shows a flow chart of a method for real-time annotation of video frames at a first user equipment end according to an aspect of the present application;
图3示出根据本申请另一个方面的一种在第二用户设备端用于对视频帧进行实时标注的方法流程图;3 is a flow chart showing a method for real-time annotation of video frames on a second user equipment side according to another aspect of the present application;
图4示出根据本申请又一个方面的一种在第三用户设备端用于对视频帧进行实时标注的方法流程图;4 is a flowchart of a method for real-time annotation of a video frame at a third user equipment end according to still another aspect of the present application;
图5示出根据本申请又一个方面的一种在网络设备端用于对视频帧进行实时标注的方法流程图;FIG. 5 is a flowchart of a method for real-time annotation of video frames on a network device side according to still another aspect of the present application; FIG.
图6示出根据本申请一个方面的一种用于对视频帧进行实时标注的系统方法图;6 shows a system method diagram for real-time annotation of video frames in accordance with an aspect of the present application;
图7示出根据本申请另一个方面的一种用于对视频帧进行实时标注的系统方法图;7 shows a system method diagram for real-time annotation of video frames in accordance with another aspect of the present application;
图8示出根据本申请的一个方面的一种用于对视频帧进行实时标注的第 一用户设备示意图;8 shows a schematic diagram of a first user equipment for real-time annotation of video frames in accordance with an aspect of the present application;
图9示出根据本申请的另一个方面的一种用于对视频帧进行实时标注的第二用户设备示意图;9 shows a schematic diagram of a second user equipment for real-time annotation of video frames in accordance with another aspect of the present application;
图10示出根据本申请的另一个方面的一种用于对视频帧进行实时标注的第三用户设备示意图;10 shows a schematic diagram of a third user equipment for real-time annotation of video frames in accordance with another aspect of the present application;
图11示出根据本申请的又一个方面的一种用于对视频帧进行实时标注的网络设备示意图;11 shows a schematic diagram of a network device for real-time annotation of video frames in accordance with still another aspect of the present application;
图12示出可被用于实施本申请中所述的各个实施例的示例性系统。FIG. 12 illustrates an exemplary system that can be used to implement various embodiments described in this application.
附图中相同或相似的附图标记代表相同或相似的部件。The same or similar reference numerals in the drawings denote the same or similar components.
具体实施方式Detailed ways
下面结合附图对本申请作进一步详细描述。The present application is further described in detail below with reference to the accompanying drawings.
在本申请一个典型的配置中,终端、服务网络的设备和可信方均包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration of the present application, the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, A magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device.
本申请所指设备包括但不限于用户设备、网络设备、或用户设备与网络设备通过网络相集成所构成的设备。所述用户设备包括但不限于任何一种可与用户进行人机交互(例如通过触摸板进行人机交互)的移动电子产品,例 如智能手机、平板电脑等,所述移动电子产品可以采用任意操作系统,如android操作系统、iOS操作系统等。其中,所述网络设备包括一种能够按照事先设定或存储的指令,自动进行数值计算和信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路(ASIC)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、数字信号处理器(DSP)、嵌入式设备等。所述网络设备包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云;在此,云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个虚拟超级计算机。所述网络包括但不限于互联网、广域网、城域网、局域网、VPN网络、无线自组织网络(Ad Hoc网络)等。优选地,所述设备还可以是运行于所述用户设备、网络设备、或用户设备与网络设备、网络设备、触摸终端或网络设备与触摸终端通过网络相集成所构成的设备上的程序。The device referred to in the present application includes but is not limited to a user equipment, a network device, or a device formed by integrating a user equipment and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product that can perform human-computer interaction with the user (for example, human-computer interaction through a touchpad), such as a smart phone, a tablet computer, etc., and the mobile electronic product can be operated by any operation. System, such as android operating system, iOS operating system, etc. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, and the hardware includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and programmable logic. Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), embedded devices, and more. The network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a plurality of servers; wherein the cloud is composed of a large number of computers or network servers based on Cloud Computing. Among them, cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computers. The network includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (Ad Hoc network), and the like. Preferably, the device may also be a program running on the user equipment, the network device, or the user equipment and the network device, the network device, the touch terminal, or the network device and the touch terminal integrated through the network.
当然,本领域技术人员应能理解上述设备仅为举例,其他现有的或今后可能出现的设备如可适用于本申请,也应包含在本申请保护范围以内,并在此以引用方式包含于此。Of course, those skilled in the art should understand that the above-mentioned devices are only examples, and other existing or future devices may be applicable to the present application, and are also included in the scope of the present application, and are hereby incorporated by reference. this.
在本申请的描述中,“多个”的含义是两个或者更多,除非另有明确具体的限定。In the description of the present application, the meaning of "plurality" is two or more unless specifically defined otherwise.
图1示出了本申请的一个典型场景,第一用户设备在与第二用户设备以及第三用户设备进行视频通讯的同时,接收关于第二用户设备发送的标注信息,并在本地调取存储的未经编解码的视频帧,实时呈现该标注信息。其中,该过程可以由第一用户设备与第二用户设备交互完成,也可以由第一用户设备、第二用户设备和网络设备配合完成,还可以由第一用户设备、第二用户设备以及第三用户设备完成,还可以由第一用户设备、第二用户设备、第三用户设备以及网络设备配合完成。此处,第一用户设备、第二用户设备以及第三用户设备为可以录制和发送视频的任何电子设备,如智能眼镜、手机、平板电脑、笔记本、智能手表等,此处以第一用户设备为智能眼镜,第二用户设备、第三用户设备为平板电脑为例阐述以下实施例,本领域技术人员应能理解,该等实施例同样适用手机、笔记本、智能手表 等其它用户设备。FIG. 1 shows a typical scenario of the present application. The first user equipment receives the annotation information sent by the second user equipment and performs local storage while performing video communication with the second user equipment and the third user equipment. The uncoded video frame presents the annotation information in real time. The process may be performed by the first user equipment and the second user equipment, or may be completed by the first user equipment, the second user equipment, and the network equipment, and may also be performed by the first user equipment, the second user equipment, and the first user equipment. The three user equipments are completed, and may also be completed by the first user equipment, the second user equipment, the third user equipment, and the network equipment. Here, the first user equipment, the second user equipment, and the third user equipment are any electronic devices that can record and send video, such as smart glasses, mobile phones, tablets, notebooks, smart watches, etc., where the first user equipment is The following embodiments are described by using the smart glasses, the second user equipment, and the third user equipment as a tablet. Those skilled in the art should understand that the embodiments are also applicable to other user equipments such as mobile phones, notebooks, and smart watches.
图2示出了根据本申请一个方面的一种在第一用户设备端对视频帧进行实时标注的方法,其中,该方法包括步骤S11、步骤S12、步骤S13、步骤S14和步骤S15。在步骤S11中,第一用户设备端向第二用户设备发送视频流;在步骤S12中,第一用户设备接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;在步骤S13中,第一用户设备根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;在步骤S14中,第一用户设备接收所述第二用户设备对所述第二视频帧的标注操作信息;在步骤S15中,第一用户设备根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。2 illustrates a method for real-time annotation of a video frame at a first user equipment end according to an aspect of the present application, wherein the method includes step S11, step S12, step S13, step S14, and step S15. In step S11, the first user equipment end sends a video stream to the second user equipment; in step S12, the first user equipment receives the second video frame intercepted by the second user equipment in the video stream. Two-frame related information; in step S13, the first user equipment determines, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream; in step S14, the first The user equipment receives the labeling operation information of the second video frame by the second user equipment; in step S15, the first user equipment presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information. .
具体而言,在步骤S11中,第一用户设备端向第二用户设备发送视频流。例如,第一用户设备通过有线或无线网络与第二用户设备建立通讯连接,第一用户设备通过视频通讯方式将视频流编码后发送至第二用户设备。Specifically, in step S11, the first user equipment end sends a video stream to the second user equipment. For example, the first user equipment establishes a communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes the video stream to the second user equipment by using a video communication manner.
在步骤S12中,第一用户设备接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息。例如,第二用户设备基于第二用户的截屏操作,确定截屏画面对应的视频帧的第二帧相关信息,随后,第一用户设备接收第二用户设备发送的第二视频帧的第二帧相关信息,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。In step S12, the first user equipment receives second frame related information of the second video frame intercepted by the second user equipment in the video stream. For example, the second user equipment determines the second frame related information of the video frame corresponding to the screen capture screen based on the screen capture operation of the second user, and then the first user equipment receives the second frame related to the second video frame sent by the second user equipment. Information, wherein the second frame related information includes, but is not limited to, second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. .
在步骤S13中,第一用户设备根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧。例如,第一用户设备在本地存储了一段时长或一定数量的已发送的未经编码的视频帧,第一用户设备根据第二用户设备发送的第二帧相关信息在本地存储的未经编码的视频帧中,确定截屏画面对应的未经编码的第一视频帧。In step S13, the first user equipment determines, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream. For example, the first user equipment locally stores a period of time or a certain number of transmitted unencoded video frames, and the first user equipment stores the uncoded locally according to the second frame related information sent by the second user equipment. In the video frame, the uncoded first video frame corresponding to the screen capture is determined.
在步骤S14中,第一用户设备接收所述第二用户设备对所述第二视频帧的标注操作信息。例如,第二用户设备基于第二用户的标注操作生成对应的标注操作信息,并将该标注操作信息实时发送至第一用户设备,第一用户接收该标注操作信息。In step S14, the first user equipment receives the labeling operation information of the second video frame by the second user equipment. For example, the second user equipment generates corresponding labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment in real time, and the first user receives the labeling operation information.
在步骤S15中,第一用户设备根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。例如,第一用户设备基于接收到的标注操作信息,在第一视频帧上实时呈现对应的标注操作,如在当前界面中以小窗的形式显示第一视频帧,再在第一视频帧对应的位置以每50ms一帧的速率呈现对应的标注操作。In step S15, the first user equipment presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information. For example, the first user equipment displays a corresponding labeling operation in real time on the first video frame based on the received labeling operation information, such as displaying the first video frame in the form of a small window in the current interface, and corresponding to the first video frame in the current interface. The location presents a corresponding annotation operation at a rate of one frame every 50 ms.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,智能眼镜与平板电脑乙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, and smart glasses and tablet B establish video communication via wired or wireless network. Smart glasses encode the currently collected picture and send it to tablet B, and cache. A length of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses. The second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates the real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time. After receiving the labeling operation information, the smart glasses display the corresponding unencoded in the preset area of the smart glasses. The first video frame, and the corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
本领域技术人员应能理解,上述实施例中第二帧相关信息的内容仅为举例,现有技术中或未来出现的其他第二帧相关信息的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the second frame related information in the foregoing embodiment is only an example, and the content of other second frame related information that appears in the prior art or in the future, if applicable to the present application, also belongs to The scope of protection of this application is hereby incorporated by reference.
在一些实施例中,该方法还包括步骤S16(未示出)。在步骤S16中,第一用户设备存储所述视频流中的视频帧;其中,在步骤S13中,第一用户设备根据所述第二帧相关信息从所存储的视频帧中确定与所述第二视频帧相对应的第一视频帧。例如,第一用户设备将视频流发送至第二用户设备,并在本地存储一段时长或一定数量的未经编解码的视频帧,其中,一段时长或一定数量可以是预设的固定值,也可以是根据网络状况或传输速率进行动态调整的阈值;随后,第一用户设备基于第二用户设备发送的第二视频帧的第二帧相关信息在本地存储的视频帧中确定对应的未经编 解码的第一视频帧。在另一些实施例中,所述所存储的视频帧满足但不限于以下至少任一项:所存储的视频帧的发送时间与当前时间的时间间隔小于或等于视频帧存储时长阈值;所存储的视频帧的累计数量小于或等于预定的视频帧存储数量阈值。In some embodiments, the method further includes step S16 (not shown). In step S16, the first user equipment stores the video frame in the video stream; wherein, in step S13, the first user equipment determines, according to the second frame related information, the stored video frame from the first The first video frame corresponding to the two video frames. For example, the first user equipment sends the video stream to the second user equipment, and locally stores a period of time or a certain number of uncoded video frames, wherein a period of time or a certain number may be a preset fixed value, The threshold may be dynamically adjusted according to the network condition or the transmission rate; then, the first user equipment determines the corresponding uncompiled in the locally stored video frame based on the second frame related information of the second video frame sent by the second user equipment. The decoded first video frame. In other embodiments, the stored video frame meets, but is not limited to, at least one of the following: a time interval between a transmission time of the stored video frame and a current time is less than or equal to a video frame storage duration threshold; the stored The cumulative number of video frames is less than or equal to a predetermined number of video frame storage thresholds.
智能眼镜将采集到的画面发送至平板电脑乙,并在本地存储一段时长或一定数量的未经编解码的视频帧,其中,一段时长或一定数量可以是系统或人工预设的固定值,如通过大数据统计分析得到的一定时长或一定数量的视频帧阈值;一段时长或一定数量的视频帧也可以是根据网络状况或传输速率进行动态调整的视频帧阈值。其中,动态调整的时长或数量阈值可以根据该视频帧的编解码及传输总时长信息确定,如计算当前视频帧的编解码及传输总时长,将该时长作为一个单位时长或者将该时长内可传输的视频帧数量作为一个单位数量,再以该当前单位时长或单位数量为参考设定动态的视频帧时长或数量阈值。此处,设定的预定或动态的视频帧存储时长阈值应大于或等于一个单位时长,同理,设定的预定或动态的视频帧存储数量阈值应大于或等于一个单位数量。随后,智能眼镜根据平板电脑乙发送的第二视频帧的第二帧相关信息在存储的视频中确定对应的未经编解码的第一视频帧,其中,所存储的视频帧的发送时间与当前时间的间隔小于或等于视频帧存储时长阈值,或所存储的视频帧的累计数量小于或等于预定的视频帧存储数量阈值。The smart glasses send the collected images to the tablet B and store them for a period of time or a certain number of uncoded video frames locally, wherein the duration or the number may be a fixed value of the system or manually preset, such as A certain duration or a certain number of video frame thresholds obtained by statistical analysis of big data; a period of time or a certain number of video frames may also be a video frame threshold dynamically adjusted according to network conditions or transmission rates. The duration or number threshold of the dynamic adjustment may be determined according to the codec and the total duration information of the video frame, such as calculating the total length of the codec and the current video frame, and using the duration as a unit duration or within the duration. The number of transmitted video frames is taken as a unit number, and the dynamic video frame duration or number threshold is set with reference to the current unit duration or unit number. Here, the set predetermined or dynamic video frame storage duration threshold should be greater than or equal to one unit duration. Similarly, the set predetermined or dynamic video frame storage threshold should be greater than or equal to one unit number. Then, the smart glasses determine a corresponding uncoded first video frame in the stored video according to the second frame related information of the second video frame sent by the tablet B, wherein the stored video frame is sent and current. The interval of time is less than or equal to the video frame storage duration threshold, or the accumulated number of stored video frames is less than or equal to a predetermined video frame storage threshold.
在一些实施例中,该方法还包括步骤S17(未示出)。在步骤S17中,第一用户设备获取所述视频流中视频帧的编解码及传输总时长信息,并根据所述编解码及传输总时长信息调整所述视频帧存储时长阈值或所述视频帧存储数量阈值。例如,第一用户设备记录各视频帧的编码起始时刻,编码后将视频帧发送至第二用户设备,第二用户设备接收并记录各视频帧解码结束时刻;随后,第二用户设备将该视频帧解码结束时刻发送至第一用户设备,第一用户设备基于编码起始时刻与解码结束时刻计算当前视频帧的编解码及传输总时长信息,或者,第二用户设备基于编码起始时刻与解码结束时刻计算当前视频帧的编解码及传输总时长信息,并将该编解码及传输总时长信息发送至第一用户设备。第一用户设备基于该编解码及传 输总时长信息调整所述视频帧存储时长阈值或所述视频帧存储数量阈值,如将该时长信息作为一个单位时间参考,设定一定倍数的视频帧时长为视频帧存储时长阈值;又如,根据该时长信息及第一用户设备发送视频帧的速率计算该时长信息内能够发送的视频帧的数量,将该数量作为单位数量,设定一定倍数的视频帧数量作为视频帧存储数量阈值。In some embodiments, the method further includes step S17 (not shown). In step S17, the first user equipment acquires codec and total transmission duration information of the video frame in the video stream, and adjusts the video frame storage duration threshold or the video frame according to the codec and total transmission duration information. The number of storage thresholds. For example, the first user equipment records the encoding start time of each video frame, and after encoding, sends the video frame to the second user equipment, and the second user equipment receives and records each video frame decoding end time; subsequently, the second user equipment The video frame decoding end time is sent to the first user equipment, and the first user equipment calculates the codec and total transmission duration information of the current video frame based on the encoding start time and the decoding end time, or the second user equipment is based on the encoding start time and The decoding end time calculates the codec and total transmission duration information of the current video frame, and sends the codec and the total transmission duration information to the first user equipment. The first user equipment adjusts the video frame storage duration threshold or the video frame storage threshold according to the codec and the total transmission duration information. If the duration information is used as a unit time reference, setting a certain multiple of the video frame duration is The video frame stores a duration threshold; for example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frame, and the number is used as a unit number to set a certain multiple of the video frame. The number is used as a threshold for the number of video frames stored.
例如,智能眼镜记录第i个视频帧编码起始时刻为T si,编码后将该视频帧发送至平板电脑乙,平板电脑乙接收并记录该视频帧解码结束时刻为T ei。随后,平板电脑乙将该视频帧解码结束时刻T ei发送至智能眼镜,智能眼镜根据接收到的第i个视频帧的解码结束时刻T ei及在本地记录的编码起始时刻T si,计算该视频帧的编解码及传输总时长T i=T ei-T si;智能眼镜可以将编码起始时刻T si随视频帧发送至平板电脑乙,平板电脑乙基于编码结束时刻T ei计算该视频帧的编解码及传输总时长T i=T ei-T si,并将该编解码及传输总时长T i返回至智能眼镜。 For example, the smart glasses record the i-th video frame encoding start time as T si , and after encoding, send the video frame to the tablet B, and the tablet computer B receives and records the video frame decoding end time as T ei . Subsequently, the tablet B sends the video frame decoding end time T ei to the smart glasses, and the smart glasses calculate the decoding end time T ei according to the received i-th video frame and the encoding start time T si recorded locally . The total length of codec and transmission of the video frame is T i =T ei -T si ; the smart glasses can send the encoding start time T si along with the video frame to the tablet B, and the tablet ethyl calculates the video frame at the encoding end time T ei The total length of codec and transmission T i =T ei -T si , and the total length of time T i of the codec and transmission is returned to the smart glasses.
智能眼镜根据第i个视频帧的编解码及传输总时长T i,根据大数据统计确定智能眼镜动态保存的1.3T i时间内的视频帧时长。或者,根据网络传输速率动态调整倍率,如设定缓存时长阈值为(1+k)T i,其中,k为根据网络波动调整的阈值,如网络波动较大时,将k设置为0.5,网络波动较小时,将k设置为0.2等。又如,智能眼镜根据第i个视频帧的编解码及传输总时长T i,并根据当前智能眼镜发送的视频帧的频率f,计算一个时长T i内传输的视频帧的数量N=T i*f,并进一步确定保存的视频帧数量阈值为1.3N,其中,N为采取进一法取整得到的数值。进一步地,智能眼镜可以根据当前网络传输速率动态调整倍率,如设定缓存数量阈值为(1+k)N,其中,k为根据网络波动调整的阈值,如网络波动较大时,将k设置为0.5,网络波动较小时,将k设置为0.2等。 The smart glasses determine the duration of the video frame in the 1.3T i time dynamically saved by the smart glasses according to the big data statistics according to the encoding and decoding of the i-th video frame and the total transmission time T i . Or, dynamically adjust the magnification according to the network transmission rate, for example, setting the buffer duration threshold to (1+k)T i , where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, setting k to 0.5, the network When the fluctuation is small, set k to 0.2 or the like. As another example, the glasses according to the codec and transmit the i-th video frame a total duration T i, and the frequency f of the video frame currently the glasses transmitted, calculates a the number N of video frames in the i transport length T = T i *f, and further determines that the threshold of the number of saved video frames is 1.3N, where N is the value obtained by taking the rounding method. Further, the smart glasses can dynamically adjust the magnification according to the current network transmission rate, such as setting the buffer number threshold (1+k)N, where k is a threshold adjusted according to network fluctuations, and if the network fluctuation is large, k is set. When it is 0.5 and the network fluctuation is small, set k to 0.2 or the like.
本领域技术人员应能理解,上述实施例中存储时长阈值和/或存储数量阈值的内容仅为举例,现有技术中或未来出现的其他存储时长阈值和/或存储数量阈值的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the storage duration threshold and/or the storage threshold in the foregoing embodiment is only an example, and other storage duration thresholds and/or storage thresholds appearing in the prior art or in the future, if applicable, This application is also intended to be within the scope of the present application, and is hereby incorporated by reference.
在一些实施例中,在步骤S11中,第一用户设备向第二用户设备发送 视频流及所述视频流中已发送视频帧的帧标识信息;其中,在步骤S13中,第一用户设备根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,其中所述第一视频帧的帧标识信息与所述第二帧相关信息相对应。其中,视频帧的帧标识信息可以是视频帧对应的编解码时间,还可以是该视频对应的编号等。在一些实施例中,在步骤S11中,第一用户设备对多个待传输的视频帧进行编码处理,并将对应视频流及所述视频流中已发送视频帧的帧标识信息发送至第二用户设备。例如,第一用户设备对多个待传输的视频帧进行编码处理,并获取该多个待传输的视频帧的编码起始时刻,将该多个视频帧及其编码起始时刻发送至第二用户设备。在一些实施例中,所述视频流中已发送视频帧的帧标识信息包括该已发送视频帧的编码起始时刻信息。In some embodiments, in step S11, the first user equipment sends the video stream and the frame identification information of the transmitted video frame in the video stream to the second user equipment; wherein, in step S13, the first user equipment is configured according to The second frame related information determines a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information. The frame identification information of the video frame may be a codec time corresponding to the video frame, or may be a number corresponding to the video. In some embodiments, in step S11, the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and sends frame corresponding information of the corresponding video stream and the transmitted video frame in the video stream to the second User equipment. For example, the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and acquires an encoding start time of the plurality of video frames to be transmitted, and sends the multiple video frames and their encoding start time to the second. User equipment. In some embodiments, the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
例如,智能眼镜记录各视频帧的编码起始时刻,编码后将视频帧和已发送的视频帧的编码起始时刻发送至平板电脑乙,其中,已发送视频帧包括当前编码完成即将发送的视频帧和已发送的视频帧。此处,智能眼镜可以是间隔一定时间或者间隔一定视频帧发送数量,将已发送的视频帧的编码起始时刻发送至平板电脑乙,也可以直接将第一视频帧的编码起始时刻与该视频帧同时发送至平板电脑乙。平板电脑乙基于用户乙的截屏操作,确定截屏画面对应的视频帧,并将对应的第二视频帧的第二帧相关信息发送至智能眼镜,其中,第二帧相关信息与第二帧标识信息相对应,包括但不限于以下至少任一项:第二视频帧的编码起始时刻、第二视频帧解码结束时刻、第二视频帧编解码及传输总时长信息、第二视频帧对应编号或图像等。智能眼镜接收该第二帧相关信息,并根据第二帧相关信息确定对应存储的未经编码的第一视频帧,如根据第二视频帧的编码起始时刻、第二视频帧解码结束时刻、第二视频帧编解码及传输总时长信息等确定第二视频帧对应的未经编码的第一视频帧的编码起始时刻进而确定对应的第一视频帧,又如通过第二视频帧对应的编号直接确定相同编号的第一视频帧,还如通过对第二视频帧的图像识别在存储的未经编码视频帧中确定对应的第一视频帧。For example, the smart glasses record the encoding start time of each video frame, and after encoding, send the video frame and the encoding start time of the transmitted video frame to the tablet computer B, wherein the transmitted video frame includes the video that is to be sent after the current encoding is completed. Frames and transmitted video frames. Here, the smart glasses may be sent at a certain time or at a certain interval of a certain video frame, and the coding start time of the transmitted video frame is sent to the tablet B, or the coding start time of the first video frame may be directly Video frames are sent to tablet B at the same time. The tablet computer operates the screen capture operation of the user B, determines the video frame corresponding to the screen capture screen, and sends the second frame related information of the corresponding second video frame to the smart glasses, wherein the second frame related information and the second frame identification information Correspondingly, including but not limited to at least one of the following: an encoding start time of the second video frame, a second video frame decoding end time, a second video frame codec and transmission total duration information, a second video frame corresponding number or Images, etc. The smart glasses receive the second frame related information, and determine correspondingly stored uncoded first video frames according to the second frame related information, such as according to the encoding start time of the second video frame, the second video frame decoding end time, Determining a coding start time of the uncoded first video frame corresponding to the second video frame, and determining a corresponding first video frame, and determining, by the second video frame, corresponding to the second video frame, The number directly determines the first video frame of the same number, and also determines the corresponding first video frame in the stored uncoded video frame by image recognition of the second video frame.
本领域技术人员应能理解,上述实施例中帧标识信息的内容仅为举例, 现有技术中或未来出现的其他帧标识信息的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the frame identification information in the foregoing embodiment is only an example, and the content of other frame identification information that appears in the prior art or in the future, if applicable to the present application, also belongs to the protection of the present application. The scope is hereby incorporated by reference.
在一些实施例中,该方法还包括步骤S18(未示出)。在步骤S18中,第一用户设备呈现所述第一视频帧;其中,在步骤S15中,第一用户设备根据所述标注操作信息在所述第一视频帧上叠加呈现对应的标注操作。例如,第一用户设备确定未经编解码的第一视频帧,并在当前界面中预设的位置或者以小窗的形式显示该第一视频帧;随后,第一用户设备根据实时接收的标注操作信息在第一视频帧对应位置叠加呈现对应的标注操作。In some embodiments, the method further includes step S18 (not shown). In step S18, the first user equipment presents the first video frame; wherein, in step S15, the first user equipment superimposes a corresponding labeling operation on the first video frame according to the labeling operation information. For example, the first user equipment determines the first video frame that has not been coded, and displays the first video frame in a preset position in the current interface or in a small window; subsequently, the first user equipment receives the annotation according to the real-time reception. The operation information is superimposed on the corresponding position of the first video frame to present a corresponding labeling operation.
例如,智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未经编解码的第一视频帧,并在智能眼镜的界面预设的位置显示第一视频帧。随后,智能眼镜接收到平板电脑乙发送的实时标注操作,智能眼镜确定该标注操作在当前显示的第一视频帧中的对应位置,并在对应位置实时呈现当前该标注操作。For example, the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and display the first video frame at a preset position of the interface of the smart glasses. Then, the smart glasses receive the real-time annotation operation sent by the tablet B. The smart glasses determine the corresponding position of the annotation operation in the currently displayed first video frame, and present the current annotation operation in real time at the corresponding location.
在一些实施例中,该方法还包括步骤S19(未示出)。在步骤S19中,第一用户设备将所述第一视频帧作为呈现所述标注操作的优选帧发送至所述第二用户设备。例如,第一用户设备确定未经编解码的第一视频帧,并将该第一视频帧发送至第二用户设备,以供第二用户设备呈现质量更高的第一视频帧。In some embodiments, the method further includes step S19 (not shown). In step S19, the first user equipment sends the first video frame to the second user equipment as a preferred frame for presenting the labeling operation. For example, the first user equipment determines the first video frame that has not been coded, and sends the first video frame to the second user equipment for the second user equipment to present the first video frame of higher quality.
例如,智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未经编解码的第一视频帧,并将该第一视频帧作为优选帧发送至平板电脑乙,如通过无损压缩的方式将该第一视频帧发送至平板电脑乙,或者通过损耗较低的有损压缩将该第一视频帧发送至平板电脑乙,该有损压缩过程保证比平板电脑乙端本地缓存的视频帧质量高即可。平板电脑乙接收该第一视频帧,并呈现该第一视频帧。For example, the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame as a preferred frame to the tablet B, such as by lossless compression. Sending the first video frame to tablet B, or sending the first video frame to tablet B through lossy lossy compression, the lossy compression process guarantees the quality of the video frame locally buffered than the tablet end High enough. Tablet B receives the first video frame and presents the first video frame.
在一些实施例中,在步骤S11中,第一用户设备向第二用户设备及第三用户设备发送视频流。例如,第一用户设备、第二用户设备以及第三用户设备间建立了通讯连接,其中,第一用户设备为当前视频帧发送方,第二用户设备与第三用户设备为当前视频帧的接收方,第一用户设备通过通讯连接向第二用户设备和第三用户设备发送视频流。In some embodiments, in step S11, the first user equipment sends a video stream to the second user equipment and the third user equipment. For example, a communication connection is established between the first user equipment, the second user equipment, and the third user equipment, where the first user equipment is the current video frame sender, and the second user equipment and the third user equipment are the current video frame receiving. The first user equipment sends a video stream to the second user equipment and the third user equipment through the communication connection.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜与平板电脑乙、平板电脑丙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至各平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜和平板电脑丙,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜和平板电脑丙,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。同理,平板电脑丙基于接收到的第二帧相关信息以及标注操作信息,根据第二帧相关信息在平板电脑丙端在本地缓存的编解码后的视频库中找到对应的第三视频帧,并基于第三视频帧以及标注操作信息在第三视频帧中呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses and tablet B, and tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected. The picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time. After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame. Similarly, the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
在一些实施例中,该方法还包括步骤S010(未示出)。在步骤S010中,第一用户设备将所述第一视频帧作为呈现所述标注操作的优选帧发送至所述第二用户设备和/或所述第三用户设备。例如,第一用户设备根据第二帧相关信息在本地缓存的视频帧中确定了对应的第一视频帧,并将该第一视频帧发送至第二用户设备和/或第三用户设备。第二用户设备和/或第三用户设备接收该未经编码的第一视频帧后,呈现该第一视频帧,第二用户和/或第三用户可以基于该第一视频帧进行标注操作。In some embodiments, the method further includes step S010 (not shown). In step S010, the first user equipment sends the first video frame as a preferred frame for presenting the labeling operation to the second user equipment and/or the third user equipment. For example, the first user equipment determines a corresponding first video frame in the locally cached video frame according to the second frame related information, and sends the first video frame to the second user equipment and/or the third user equipment. After the second user equipment and/or the third user equipment receives the uncoded first video frame, the first video frame is presented, and the second user and/or the third user may perform an annotation operation based on the first video frame.
例如,智能眼镜确定第二视频帧对应的未经编解码的第一视频帧后,将该未经编解码的第一视频帧通过无损压缩或者高质量的压缩方式发送至平板电脑乙和/或平板电脑丙,其中,平板电脑乙和平板电脑丙根据当前通讯网络连接的质量自行判断是否获取第一视频帧,或者根据当前通讯网络连接的质量选择第一视频帧的发送方式,如网络质量良好的时候采用无 损压缩的方式,网络质量不良的时候采用高质量的压缩方式等。For example, after the smart glasses determine the unencoded first video frame corresponding to the second video frame, the uncoded first video frame is sent to the tablet B through lossless compression or high quality compression. Tablet PC C, wherein Tablet PC B and Tablet PC C determine whether to obtain the first video frame according to the quality of the current communication network connection, or select the transmission mode of the first video frame according to the quality of the current communication network connection, such as good network quality When using lossless compression, high-quality compression is used when the network quality is poor.
在一些实施例中,在步骤S010中,第一用户设备将所述第一视频帧及所述第二帧相关信息发送至所述第二用户设备和/或所述第三用户设备,其中,所述第一视频帧用作在所述第二用户设备或所述第三用户设备中呈现所述标注操作的优选帧。In some embodiments, the first user equipment sends the first video frame and the second frame related information to the second user equipment and/or the third user equipment, where The first video frame is used as a preferred frame to present the annotation operation in the second user device or the third user device.
例如,智能眼镜确定了未经编解码的第一视频帧后,将第一视频帧和该第一视频帧对应的第二帧相关信息发送至平板电脑乙和/平板电脑丙。在一些实施例中,平板电脑乙基于用户乙的操作进行了多次截屏操作,平板电脑乙根据第二帧相关信息确定该第一视频帧对应的截屏操作,如根据第二帧截屏时间确定第一视频帧对应的截屏操作,并在该第一视频帧中呈现该第二帧相关信息。平板电脑丙接收第二帧相关信息及第一视频帧,在呈现第一视频帧的同时,在该呈现第一视频帧的窗口中呈现第二帧相关信息。For example, after the smart glasses determine the first video frame that has not been coded, the first video frame and the second frame related information corresponding to the first video frame are sent to the tablet B and the tablet C. In some embodiments, the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame. The tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window in which the first video frame is presented while presenting the first video frame.
图3示出根据本申请另一个方面的一种在第二用户设备端对视频帧进行实时标注的方法,其中,该方法包括步骤S21、步骤S22、步骤S23和步骤S24。在步骤S21中,第二用户设备接收第一用户设备所发送的视频流;在步骤S22中,第二用户设备根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;在步骤S23中,第二用户设备获取所述用户对所述第二视频帧的标注操作信息;在步骤S24中,第二用户设备向所述第一用户设备发送所述标注操作信息。例如,第二用户设备接收并呈现第一用户设备所发送的视频流;第二用户设备基于第二用户的截屏操作,确定当前截屏画面对应的第二视频帧,并将该第二视频帧的第二帧相关信息发送至第一用户设备。随后,第二用户设备基于第二用户的标注操作生成标注操作信息,并将该标注操作信息发送至第一用户设备。FIG. 3 illustrates a method for real-time annotation of a video frame at a second user equipment end according to another aspect of the present application, wherein the method includes step S21, step S22, step S23, and step S24. In step S21, the second user equipment receives the video stream sent by the first user equipment; in step S22, the second user equipment sends the first user equipment to the first user equipment according to the user's screenshot operation in the video stream. Obtaining the second frame related information of the second video frame; in step S23, the second user equipment acquires the labeling operation information of the second video frame by the user; in step S24, the second user equipment The first user equipment sends the labeling operation information. For example, the second user equipment receives and presents the video stream sent by the first user equipment; and the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and the second video frame is The second frame related information is sent to the first user equipment. Then, the second user equipment generates the labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment.
例如,用户乙持有平板电脑乙,用户甲持有智能眼镜,平板电脑乙与智能眼镜通过有线或者无线网络进行视频通讯。平板电脑乙接收并呈现智能眼镜发送的视频流,并根据用户乙的截屏操作,确定该截屏画面对应的第二视频帧。随后,平板电脑乙将该第二视频帧对应的第二帧相关信息发送至智能眼镜,智能眼镜接收该第二帧相关信息并基于该第二帧相关信息 确定对应的第一视频帧。平板电脑乙基于用户乙的标注操作生成对应的标注操作信息,并将该标注操作信息实时发送至智能眼镜。智能眼镜根据第一视频帧以及该标注操作信息,在界面预设的位置呈现第一视频帧,并在第一视频帧中对应位置实时呈现对应的标注操作。For example, user B holds tablet B, user A holds smart glasses, and tablet B and smart glasses communicate video over wired or wireless networks. The tablet computer B receives and presents the video stream sent by the smart glasses, and determines the second video frame corresponding to the screen capture screen according to the screen capture operation of the user B. Then, the tablet B sends the second frame related information corresponding to the second video frame to the smart glasses, and the smart glasses receive the second frame related information and determine a corresponding first video frame based on the second frame related information. The tablet ethyl unit generates the corresponding labeling operation information in the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time. The smart glasses present a first video frame at a preset position of the interface according to the first video frame and the labeled operation information, and present a corresponding labeling operation in real time in the corresponding position in the first video frame.
在一些实施例中,在步骤S21中,第二用户设备接收第一用户设备所发送的视频流,及所述视频流中已发送视频帧的帧标识信息;其中,所述第二帧相关信息包括以下至少任一项:所述第二视频帧的帧标识信息;基于所述第二视频帧的帧标识信息生成的帧相关信息。例如,第一用户设备向第二用户设备发送视频流的同时,还向第二用户设备发送该视频流中已发送视频帧的帧标识信息,第二用户设备接收该视频流,以及视频流中已发送的视频帧的帧标识信息。第二用户设备基于第二用户的截屏操作,确定当前截屏画面对应的第二视频帧,并将该第二视频帧的第二帧相关信息发送至第一用户设备,其中,所述第二视频帧的第二帧相关信息包括但不限于:第二视频帧的帧标识信息;基于该第二视频帧的帧标识信息生成的帧相关信息。In some embodiments, in step S21, the second user equipment receives the video stream sent by the first user equipment, and the frame identification information of the transmitted video frame in the video stream; wherein the second frame related information And including at least one of the following: frame identification information of the second video frame; frame related information generated based on frame identification information of the second video frame. For example, the first user equipment sends the video stream to the second user equipment, and also sends the frame identifier information of the sent video frame in the video stream to the second user equipment, where the second user equipment receives the video stream, and the video stream The frame identification information of the transmitted video frame. The second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment, where the second video The second frame related information of the frame includes but is not limited to: frame identification information of the second video frame; frame related information generated based on the frame identification information of the second video frame.
例如,智能眼镜在发送视频流的同时,将已经发送的视频流中的视频帧对应的帧标识信息发送至平板电脑乙。平板电脑乙检测到用户乙的截屏操作,基于当前截屏的画面,确定该截屏画面对应第二视频帧,并将第二视频帧对应的第二帧相关信息发送至智能眼镜,其中,第二视频帧相关信息包括但不限于:第二视频帧的帧标识信息、基于第二视频帧的帧标识信息生成的帧相关信息;其中,第二视频帧的帧标识信息可以是该视频帧的编码起始时刻或者该视频帧对应的编号等,基于第二视频帧的帧标识信息生成的帧相关信息可以是该视频帧的解码结束时刻或者编解码及传输总时长信息等。For example, the smart glasses send the frame identification information corresponding to the video frames in the already transmitted video stream to the tablet B while transmitting the video stream. The tablet computer B detects the screen capture operation of the user B. Based on the screen of the current screen capture, it is determined that the screen capture screen corresponds to the second video frame, and the second frame related information corresponding to the second video frame is sent to the smart glasses, wherein the second video The frame related information includes, but is not limited to, frame identification information of the second video frame, frame related information generated based on the frame identification information of the second video frame, where the frame identification information of the second video frame may be the encoding of the video frame. The frame-related information generated based on the frame identification information of the second video frame may be the decoding end time of the video frame or the total length of the codec and the transmission time information, etc., at the start time or the number corresponding to the video frame.
本领域技术人员应能理解,上述实施例中第二帧相关信息的内容仅为举例,现有技术或未来出现的其他第二帧相关信息的内容,如适用于本申请,则也应属于本申请的保护范围,故也在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the second frame related information in the foregoing embodiment is only an example, and the content of other second frame related information existing in the prior art or in the future, if applicable to the present application, The scope of protection of the application is hereby incorporated by reference.
在一些实施例中,所述帧标识信息包括所述第二视频帧的编码起始时间信息。例如,第一用户设备对视频帧进行编码处理,并将对应视频流及 所述视频流中已发送视频帧的帧标识信息发送至第二用户设备,其中,所述视频帧的帧标识信息包括该视频帧的编码起始时刻。在一些实施例中,所述第二帧相关信息包括所述第二视频帧的解码结束时间信息与编解码及传输总时长信息。第二用户设备接收并呈现该视频流,并记录对应的解码结束时刻,基于截屏操作确定对应的第二视频帧,根据该第二视频帧的编码起始时刻和解码结束时刻,确定对应的编解码和传输总时长信息。In some embodiments, the frame identification information includes encoding start time information of the second video frame. For example, the first user equipment performs an encoding process on the video frame, and sends the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment, where the frame identification information of the video frame includes The encoding start time of the video frame. In some embodiments, the second frame related information includes decoding end time information and codec and transmission total duration information of the second video frame. The second user equipment receives and presents the video stream, and records a corresponding decoding end time, determines a corresponding second video frame based on the screen capture operation, and determines a corresponding edit according to the encoding start time and the decoding end time of the second video frame. Decode and transmit total duration information.
例如,智能眼镜记录各视频帧的编码起始时刻,编码后将视频帧和已发送的视频帧的编码起始时刻发送至平板电脑乙。平板电脑乙接收并呈现该视频帧,并记录该视频帧的解码结束时刻。平板电脑乙基于用户乙的截屏操作,确定对应的第二视频帧,并根据第二视频帧对应的编码起始时刻和解码结束时刻,确定该第二视频帧的编解码和传输总时长信息。随后,平板电脑乙将第二视频帧的第二帧相关信息发送智能眼镜,其中,第二帧相关信息包括但不限于:第二视频帧的编码起始时刻、第二视频帧的编解码和传输总时长信息等。For example, the smart glasses record the encoding start time of each video frame, and after encoding, send the encoding start time of the video frame and the transmitted video frame to the tablet B. Tablet B receives and presents the video frame and records the decoding end time of the video frame. The screen is operated by the screen capture operation of the user E, and the corresponding second video frame is determined, and the codec and the total transmission duration information of the second video frame are determined according to the coding start time and the decoding end time corresponding to the second video frame. Then, the tablet B sends the second frame related information of the second video frame to the smart glasses, where the second frame related information includes, but is not limited to, an encoding start time of the second video frame, a codec of the second video frame, and Transfer total time information, etc.
在一些实施例中,在步骤S23中,第二用户设备实时获取所述用户对所述第二视频帧的标注操作信息;其中,在步骤S24中,第二用户设备向所述第一用户设备实时发送所述标注操作。例如,第二用户设备基于第二用户的操作,实时获取对应的标注操作信息,如以一定的时间间隔采集对应的标注操作信息。随后,第二用户设备将获取的标注操作信息实时发送至第一用户设备。In some embodiments, in step S23, the second user equipment acquires the labeling operation information of the second video frame by the user in real time; wherein, in step S24, the second user equipment sends the first user equipment to the first user equipment. The annotation operation is sent in real time. For example, the second user equipment acquires the corresponding labeling operation information in real time based on the operation of the second user, for example, collecting the corresponding labeling operation information at a certain time interval. Then, the second user equipment sends the obtained annotation operation information to the first user equipment in real time.
例如,平板电脑乙采集用户乙对截屏画面的标注操作,如用户乙在屏幕上画圆、箭头、文字、方框等标记。平板电脑乙记录标注画笔的位置和路径,如通过屏幕上多个点,得到对应标注对应点的位置,连接多个点的位置得到标注的路径等。平板电脑乙实时获取对应的标注操作,并实时发送给智能眼镜,如以50ms一帧的频率采集并发送标注操作等。For example, the tablet computer B collects the labeling operation of the user B on the screen capture screen, for example, the user B draws a circle, an arrow, a text, a box and the like on the screen. Tablet B records the position and path of the marked brush. For example, through multiple points on the screen, the position corresponding to the corresponding point is obtained, and the position where the multiple points are connected is marked. Tablet PC B obtains the corresponding labeling operation in real time and sends it to the smart glasses in real time, such as collecting and sending labeling operations at a frequency of 50 ms.
在一些实施例中,该方法还包括步骤S25(未示出)。在步骤S25中,第二用户设备接收所述第一用户设备所发送的第一视频帧,其中,所述第一视频帧用作呈现所述标注操作的优选帧,在所述第二视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第二视频帧,其中,所述标注操作 显示于所述第一视频帧。例如,第二用户设备将确定当前截屏画面对应的第二视频帧,并将该第二视频帧的第二帧相关信息发送至第一用户设备;第一用户设备基于该第二视频帧的第二帧相关信息确定该第二视频帧对应的未经编解码的第一视频帧,并将该第一视频帧发送至第二用户设备,第二用户设备接收并呈现该第一视频帧,并获取第二用户对第一视频帧的标注操作信息。In some embodiments, the method further includes step S25 (not shown). In step S25, the second user equipment receives the first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation, in the second video frame. Loading the first video frame to replace the second video frame in a display window, wherein the labeling operation is displayed on the first video frame. For example, the second user equipment determines a second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment; the first user equipment is based on the second video frame. Determining, by the second frame, the uncoded first video frame corresponding to the second video frame, and sending the first video frame to the second user equipment, where the second user equipment receives and presents the first video frame, and Obtaining the labeling operation information of the first video frame by the second user.
例如,平板电脑乙基于用户的操作等进入截屏模式,确定当前画面对应的第二视频帧,将该第二视频帧的第二帧相关信息发送至智能眼镜端,其中,第二帧相关信息包括但不限于:所述第二视频帧的编码起始时刻或者该视频帧对应的编号等。智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未经编解码的第一视频帧,并将该第一视频帧发送至平板电脑乙,如通过无损压缩的方式将该第一视频帧发送至平板电脑乙,或者通过损耗较低的有损压缩将该第一视频帧发送至平板电脑乙,该有损压缩过程保证比平板电脑乙端本地缓存的视频帧质量高即可。平板电脑乙接收并呈现该第一视频帧,如在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等。随后,平板电脑乙根据第二用户的操作,得到关于第一视频帧的标注操作信息等。For example, the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame. The smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B, such as by using lossless compression to the first video. The frame is sent to the tablet B, or the first video frame is sent to the tablet B through the lossy lossy compression. The lossy compression process guarantees a higher quality than the locally buffered video frame of the tablet. The tablet B receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
在一些实施例中,该方法还包括步骤S26(未示出)。在步骤S26中,第二用户设备接收所述第一用户设备所发送的第一视频帧及所述第二帧相关信息,其中,所述第一视频帧用作呈现所述标注操作的优选帧,根据所述第二帧相关信息确定所述第一视频帧用于替换所述第二视频帧,并在所述第二视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第二视频帧,其中,所述标注操作显示于所述第一视频帧。In some embodiments, the method further includes step S26 (not shown). In step S26, the second user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation. Determining, according to the second frame related information, that the first video frame is used to replace the second video frame, and loading the first video frame in a display window of the second video frame to replace the a second video frame, wherein the labeling operation is displayed on the first video frame.
例如,平板电脑乙接收到智能眼镜发送的未经编解码的第一视频帧以及第二帧相关信息,其中,第二帧相关信息包括第二视频帧的截屏时间、第二视频帧的视频帧编号等。在一些实施例中,平板电脑乙基于用户乙的操作进行了多次截屏操作,平板电脑乙根据第二帧相关信息确定该第一视频帧对应的截屏操作,如根据第二帧截屏时间确定第一视频帧对应的截屏操作,并在该第一视频帧中呈现该第二帧相关信息。平板电脑乙根据第二 帧相关信息确定当前对应的截屏操作,并在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等,同时在呈现第一视频帧时,平板电脑乙在呈现的第一视频帧中呈现第二帧相关信息,如在第一视频帧中呈现该帧时间的截图时间或者该帧在视频帧中的帧数编号等。随后,平板电脑乙根据第二用户的操作,得到关于第一视频帧的标注操作信息等。For example, the tablet B receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame and a video frame of the second video frame. Number, etc. In some embodiments, the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame. The tablet computer B determines the current corresponding screen capture operation according to the second frame related information, and presents the window in the form of a small window next to the current video, or displays the first video frame on a large screen, and presents the current video in the form of a small window, etc. When the first video frame is presented, the tablet B presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Wait. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
在一些实施例中,在步骤S21中,第二用户设备接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;其中,在步骤S24中,第二用户设备向所述第一用户设备及所述第三用户设备发送所述标注操作信息。In some embodiments, in step S21, the second user equipment receives the video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, in step S24, the second user equipment The first user equipment and the third user equipment send the labeling operation information.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜与平板电脑乙、平板电脑丙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至各平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜和平板电脑丙,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜和平板电脑丙。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses and tablet B, and tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected. The picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. Tablet B generates real-time annotation operation information according to user B's labeling operation, and sends the labeling operation information to smart glasses and tablet PC in real time.
图4示出根据本申请的又一个方面的一种在第三用户设备端对视频帧进行实时标注的方法,其中,该方法包括步骤S31、步骤S32、步骤S33、步骤S34和步骤S35。在步骤S31中,第三用户设备接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;在步骤S32中,第三用户设备接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;在步骤S33中,第三用户设备根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;在步骤S34中,第 三用户设备接收所述第二用户设备对所述第二视频帧的标注操作信息;在步骤S35中,第三用户设备根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。FIG. 4 illustrates a method for real-time annotation of a video frame at a third user equipment end according to still another aspect of the present application, wherein the method includes step S31, step S32, step S33, step S34, and step S35. In step S31, the third user equipment receives the video stream sent by the first user equipment to the second user equipment and the third user equipment; in step S32, the third user equipment receives the second user equipment in the video. The second frame related information of the second video frame intercepted in the stream; in step S33, the third user equipment determines, according to the second frame related information, a corresponding to the second video frame in the video stream. a third video frame; in step S34, the third user equipment receives the labeling operation information of the second video frame by the second user equipment; in step S35, the third user equipment is configured according to the labeling operation information. A corresponding labeling operation is presented in real time on the third video frame.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜与平板电脑乙、平板电脑丙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至各平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜和平板电脑丙,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜和平板电脑丙,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。同理,平板电脑丙基于接收到的第二帧相关信息以及标注操作信息,根据第二帧相关信息在平板电脑丙端在本地缓存的编解码后的视频库中找到对应的第三视频帧,并基于第三视频帧以及标注操作信息在第三视频帧中呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses and tablet B, and tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected. The picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time. After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame. Similarly, the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
在一些实施例中,该方法还包括步骤S36(未示出)。在步骤S36中,第三用户设备接收所述第一用户设备所发送的第一视频帧,其中,所述第一视频用作呈现所述标注操作的优选帧,在所述第三视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第三视频帧,其中,所述标注操作显示于所述第一视频帧。In some embodiments, the method further includes step S36 (not shown). In step S36, the third user equipment receives the first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the labeling operation, in the third video frame. Loading the first video frame to replace the third video frame in a display window, wherein the labeling operation is displayed on the first video frame.
例如,平板电脑乙基于用户的操作等进入截屏模式,确定当前画面对应的第二视频帧,将该第二视频帧的第二帧相关信息发送至智能眼镜端,其中,第二帧相关信息包括但不限于:所述第二视频帧的编码起始时刻或者该视频帧对应的编号等。智能眼镜根据平板电脑乙发送的第二帧相关信 息确定对应的未经编解码的第一视频帧,并将该第一视频帧发送至平板电脑乙和平板电脑丙,如通过无损压缩的方式将该第一视频帧发送至平板电脑乙和平板电脑丙,或者通过损耗较低的有损压缩将该第一视频帧发送至平板电脑乙和平板电脑丙,该有损压缩过程保证比平板电脑乙和平板电脑丙本地缓存的视频帧质量高即可。平板电脑丙接收并呈现该第一视频帧,如在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等。随后,平板电脑丙接收平板电脑乙发送的标注操作信息,并在该第一视频帧中呈现该标注操作。For example, the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame. The smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B and the tablet C, as by lossless compression The first video frame is sent to the tablet B and the tablet C, or the first video frame is sent to the tablet B and the tablet C through the lossy lossy compression, the lossy compression process is guaranteed to be better than the tablet B And the quality of the video frame cached locally by the tablet C is high. The tablet computer C receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
在一些实施例中,该方法还包括步骤S37(未示出)。在步骤S37中,第三用户设备接收所述第一用户设备所发送的第一视频帧及所述第二帧相关信息,其中,所述第一视频帧用作呈现所述标注操作的优选帧,根据所述第二帧相关信息确定所述第一视频帧用于替换所述第三视频帧,在所述第三视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第三视频帧,其中,所述标注操作显示于所述第一视频帧。In some embodiments, the method further includes step S37 (not shown). In step S37, the third user equipment receives the first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation. Determining, according to the second frame related information, that the first video frame is used to replace the third video frame, and loading the first video frame in a display window of the third video frame to replace the first video frame a three video frame, wherein the labeling operation is displayed on the first video frame.
例如,平板电脑丙接收到智能眼镜发送的未经编解码的第一视频帧以及第二帧相关信息,其中,第二帧相关信息包括第二视频帧的截屏时间、第二视频帧的视频帧编号等。平板电脑丙接收并呈现该第一视频帧,如在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等,同时在呈现第一视频帧时,平板电脑丙在呈现的第一视频帧中呈现第二帧相关信息,如在第一视频帧中呈现该帧时间的截图时间或者该帧在视频帧中的帧数编号等。随后,平板电脑丙接收平板电脑乙发送的标注操作信息,并在该第一视频帧中呈现该标注操作。For example, the tablet computer C receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame, and a video frame of the second video frame. Number, etc. The tablet computer C receives and presents the first video frame, such as being presented as a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video as a small window, etc., while presenting the first video. At the time of the frame, the tablet C presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
图5示出根据本申请的又一个方面的一种在网络设备端对视频帧进行实时标注的方法,其中,该方法包括步骤S41、步骤S42、步骤S43、步骤S44和步骤S45。在步骤S41中,网络设备接收并转发第一用户设备发给第二用户设备的视频流;在步骤S42中,网络设备接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;在步骤S43中,网络设备将所述第二帧相关信息转发至所述第一用户设备;在步骤S44中,网络设备接收所述第二用户设备对所述第二视频帧的标注操作信息;在步 骤S45中,网络设备将所述标注操作信息转发至所述第一用户设备。FIG. 5 illustrates a method for real-time annotation of a video frame at a network device end according to still another aspect of the present application, wherein the method includes step S41, step S42, step S43, step S44, and step S45. In step S41, the network device receives and forwards the video stream sent by the first user equipment to the second user equipment; in step S42, the network device receives the second video intercepted by the second user equipment in the video stream. a second frame related information of the frame; in step S43, the network device forwards the second frame related information to the first user equipment; in step S44, the network device receives the second user equipment to the first The labeling operation information of the two video frames; in step S45, the network device forwards the labeling operation information to the first user equipment.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,智能眼镜与平板电脑乙通过云端进行视频通讯。智能眼镜将当前采集的画面编码后发送云端,由云端转发至平板电脑乙,其中,智能眼镜在发送视频时缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至云端,由云端转发至智能眼镜,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息发送至云端,并由云端发送至智能眼镜,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, and smart glasses and tablet B communicate video through the cloud. The smart glasses encode the currently collected picture and send it to the cloud, and the cloud forwards it to the tablet B. The smart glasses cache a period of time or a certain number of video frames when the video is sent. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the cloud, The cloud forwards to the smart glasses, where the second frame related information includes, but is not limited to, the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame encoding and decoding. Total length information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and sends the labeling operation information to the cloud, and is sent by the cloud to the smart glasses. After receiving the labeling operation information, the smart glasses display in the preset area of the smart glasses. Corresponding uncoded first video frame, and corresponding corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
本领域技术人员应能理解,上述实施例中第二帧相关信息的内容仅为举例,现有技术中或未来出现的其他第二帧相关信息的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the second frame related information in the foregoing embodiment is only an example, and the content of other second frame related information that appears in the prior art or in the future, if applicable to the present application, also belongs to The scope of protection of this application is hereby incorporated by reference.
在一些实施例中,在步骤S41中,网络设备接收并转发第一用户设备发给第二用户设备的视频流,及所述视频流中已发送视频帧的帧标识信息。例如,第一用户设备对视频帧进行编码处理,并将对应视频流及所述视频流中已发送视频帧的帧标识信息发送至网络设备,网络设备将该视频流及已发送视频帧的帧标识信息转发至第二用户设备,其中,帧标识信息包括视频帧的编码起始时刻。在另一些实施例中,在步骤S43中,网络设备根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的视频帧的帧标识信息,并将所述第二视频帧相对应的视频帧的帧标识信息发送至所述第一用户设备。In some embodiments, in step S41, the network device receives and forwards the video stream sent by the first user equipment to the second user equipment, and the frame identification information of the transmitted video frame in the video stream. For example, the first user equipment performs encoding processing on the video frame, and sends the corresponding video stream and the frame identification information of the transmitted video frame in the video stream to the network device, where the network device sends the video stream and the frame of the transmitted video frame. The identification information is forwarded to the second user equipment, wherein the frame identification information includes an encoding start time of the video frame. In another embodiment, in step S43, the network device determines, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and the The frame identification information of the video frame corresponding to the two video frames is sent to the first user equipment.
例如,云端接收智能眼镜发送的视频流及视频流中已发送的视频帧的帧标识信息,如各视频帧的编码起始时刻。云端将该视频流及对应已发送 的视频帧的帧标识信息转发至平板电脑乙。平板电脑乙接收并呈现该视频帧,并记录该视频帧的解码结束时刻。平板电脑乙基于用户乙的截屏操作,确定对应的第二视频帧,并将第二视频帧的第二帧相关信息发送至云端,其中,第二帧相关信息包括第二视频帧对应的解码结束时刻或者第二视频帧的视频编号等。云端接收平板电脑乙发送的第二视频帧的第二帧相关信息,并基于该第二帧相关信息确定对应的第二帧的帧标识信息,如根据第二视频帧的解码结束时刻或者第二视频帧的视频编号等确定该第二帧的编码起始时刻或者第二视频帧的视频编号等。For example, the cloud receives the video stream sent by the smart glasses and the frame identification information of the transmitted video frames in the video stream, such as the encoding start time of each video frame. The cloud forwards the video stream and the frame identification information corresponding to the transmitted video frame to the tablet B. Tablet B receives and presents the video frame and records the decoding end time of the video frame. The screen capture operation of the tablet E is performed by the user E, and the corresponding second video frame is determined, and the second frame related information of the second video frame is sent to the cloud, where the second frame related information includes the decoding end corresponding to the second video frame. Time or video number of the second video frame, etc. The cloud receives the second frame related information of the second video frame sent by the tablet B, and determines the frame identification information of the corresponding second frame based on the second frame related information, such as according to the decoding end time or the second of the second video frame. The video number of the video frame or the like determines the encoding start time of the second frame or the video number of the second video frame, and the like.
在一些实施例中,在步骤S41中,网络设备接收并转发第一用户设备发给第二用户设备及第三用户设备的视频流;其中,在步骤S43中,网络设备将所述第二帧相关信息转发至所述第一用户设备及所述第三用户设备;其中,在步骤S45中,网络设备将所述标注操作信息转发至所述第一用户设备及所述第三用户设备。In some embodiments, in step S41, the network device receives and forwards the video stream sent by the first user equipment to the second user equipment and the third user equipment; wherein, in step S43, the network equipment uses the second frame. The related information is forwarded to the first user equipment and the third user equipment; wherein, in step S45, the network device forwards the labeling operation information to the first user equipment and the third user equipment.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜、平板电脑乙、平板电脑丙通过网络设备建立了视频通讯,智能眼镜将当前采集的画面编码后发送至网络设备,并缓存一段时长或一定数量的视频帧,网络设备将该视频流发送至平板电脑乙和平板电脑丙。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses, tablet B, and tablet C establish video communication through network devices, and smart glasses will capture the current picture. The code is sent to the network device and buffered for a period of time or a certain number of video frames, and the network device sends the video stream to the tablet B and the tablet C.
平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至网络设备,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。网络设备将该第二帧相关信息转发至第一用户设备和第二用户设备。After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture, and sends the second frame related information corresponding to the second video frame to the network device. The second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The network device forwards the second frame related information to the first user equipment and the second user equipment.
平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息通过网络设备实时转发送至智能眼镜和平板电脑丙,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的 第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。同理,平板电脑丙基于接收到的第二帧相关信息以及标注操作信息,根据第二帧相关信息在平板电脑丙端在本地缓存的编解码后的视频库中找到对应的第三视频帧,并基于第三视频帧以及标注操作信息在第三视频帧中呈现对应的标注操作。The tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and transmits the labeling operation information to the smart glasses and the tablet computer C through the network device in real time, and the smart glasses receive the labeling operation information, and the preset in the smart glasses The area displays a corresponding uncoded first video frame, and presents a corresponding labeling operation in real time at a position corresponding to the first video frame. Similarly, the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
图6示出根据本申请一个方面的一种对视频帧进行实时标注的方法,其中,该方法包括:6 illustrates a method for real-time annotation of video frames in accordance with an aspect of the present application, wherein the method includes:
第一用户设备向第二用户设备发送视频流;The first user equipment sends a video stream to the second user equipment;
所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the first user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述第一用户设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the first user equipment;
所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
图7示出根据本申请另一个方面的一种对视频帧进行实时标注的方法,其中,该方法包括:FIG. 7 illustrates a method for real-time annotation of a video frame according to another aspect of the present application, wherein the method includes:
第一用户设备向网络设备发送视频流;The first user equipment sends a video stream to the network device;
所述网络设备接收所述视频流,并向第二用户设备转发所述视频流;Receiving, by the network device, the video stream, and forwarding the video stream to a second user equipment;
所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述网络设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the network device;
所述网络设备接收所述第二帧相关信息,并将所述第二帧相关信息转发至所述第一用户设备;Receiving, by the network device, the second frame related information, and forwarding the second frame related information to the first user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述网络设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the network device;
所述网络设备接收所述第二用户设备对所述第二视频帧的标注操作 信息,将所述标注操作信息转发至所述第一用户设备;Receiving, by the network device, the labeling operation information of the second video frame by the second user equipment, and forwarding the labeling operation information to the first user equipment;
所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
根据本申请的又一个方面的一种用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame according to still another aspect of the present application, wherein the method comprises:
第一用户设备向第二用户设备及第三用户设备发送视频流;The first user equipment sends a video stream to the second user equipment and the third user equipment;
所述第二用户设备根据用户在所述视频流中的截图操作,向所述第一用户设备及所述第三用户设备发送被截取的第二视频帧的第二帧相关信息;Transmitting, by the second user equipment, the second frame related information of the intercepted second video frame to the first user equipment and the third user equipment according to a screenshot operation of the user in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述第一用户设备及所述第三用户设备发送所述标注操作信息;And acquiring, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the first user equipment and the third user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作;Receiving, by the first user equipment, the second frame related information, determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, and receiving the labeling operation information And displaying a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
所述第三用户设备接收所述视频流,接收所述第二视频帧的第二帧相关信息,接收所述标注操作信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧,根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。Receiving, by the third user equipment, the video stream, receiving second frame related information of the second video frame, receiving the labeling operation information, determining, according to the second frame related information, the video stream and the The third video frame corresponding to the second video frame presents a corresponding labeling operation in real time on the third video frame according to the labeling operation information.
根据本申请的一个方面提供了一种用于对视频帧进行实时标注的方法,其中,该方法包括:According to an aspect of the present application, a method for real-time annotation of a video frame is provided, wherein the method includes:
第一用户设备向网络设备发送视频流;The first user equipment sends a video stream to the network device;
所述网络设备接收所述视频流,并向第二用户设备及第三用户设备转发所述视频流;Receiving, by the network device, the video stream, and forwarding the video stream to the second user equipment and the third user equipment;
所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述网络设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the network device;
所述网络设备接收所述第二帧相关信息,并将所述第二帧相关信息转发至所述第一用户设备及所述第三用户设备;Receiving, by the network device, the second frame related information, and forwarding the second frame related information to the first user equipment and the third user equipment;
所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信 息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述网络设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the network device;
所述网络设备接收所述第二用户设备对所述第二视频帧的标注操作信息,将所述标注操作信息转发至所述第一用户设备及第三用户设备;The network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作;The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
所述第三用户设备接收所述视频流,接收所述第二视频帧的第二帧相关信息,接收所述标注操作信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧,根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。Receiving, by the third user equipment, the video stream, receiving second frame related information of the second video frame, receiving the labeling operation information, determining, according to the second frame related information, the video stream and the The third video frame corresponding to the second video frame presents a corresponding labeling operation in real time on the third video frame according to the labeling operation information.
图8示出了根据本申请一个方面的一种对视频帧进行实时标注的第一用户设备,其中,该设备包括视频发送模块11、帧信息接收模块12、视频帧确定模块13、标注接收模块14和标注呈现模块15。视频发送模块11,用于向第二用户设备发送视频流;帧信息接收模块12,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;视频帧确定模块13,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;标注接收模块14,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;标注呈现模块15,用于根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。FIG. 8 illustrates a first user equipment for real-time annotation of a video frame according to an aspect of the present application, wherein the device includes a video transmission module 11, a frame information receiving module 12, a video frame determination module 13, and an annotation receiving module. 14 and annotation presentation module 15. The video sending module 11 is configured to send a video stream to the second user equipment, where the frame information receiving module 12 is configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream. a video frame determining module 13 configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, and an annotation receiving module 14 configured to receive the second The labeling operation information of the second video frame by the user equipment; the labeling presentation module 15 is configured to present a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
具体而言,视频发送模块11,用于向第二用户设备发送视频流。例如,第一用户设备通过有线或无线网络与第二用户设备建立通讯连接,第一用户设备通过视频通讯方式将视频流编码后发送至第二用户设备。Specifically, the video sending module 11 is configured to send a video stream to the second user equipment. For example, the first user equipment establishes a communication connection with the second user equipment through a wired or wireless network, and the first user equipment encodes the video stream to the second user equipment by using a video communication manner.
帧信息接收模块12,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息。例如,第二用户设备基于第二用户的截屏操作,确定截屏画面对应的视频帧的第二帧相关信息,随后,第一用户设备接收第二用户设备发送的第二视频帧的第二帧相关信息,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息 等。The frame information receiving module 12 is configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream. For example, the second user equipment determines the second frame related information of the video frame corresponding to the screen capture screen based on the screen capture operation of the second user, and then the first user equipment receives the second frame related to the second video frame sent by the second user equipment. Information, wherein the second frame related information includes, but is not limited to, second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. .
视频帧确定模块13,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧。例如,第一用户设备在本地存储了一段时长或一定数量的已发送的未经编码的视频帧,第一用户设备根据第二用户设备发送的第二帧相关信息在本地存储的未经编码的视频帧中,确定截屏画面对应的未经编码的第一视频帧。The video frame determining module 13 is configured to determine, according to the second frame related information, a first video frame in the video stream that corresponds to the second video frame. For example, the first user equipment locally stores a period of time or a certain number of transmitted unencoded video frames, and the first user equipment stores the uncoded locally according to the second frame related information sent by the second user equipment. In the video frame, the uncoded first video frame corresponding to the screen capture is determined.
标注接收模块14,用于接收所述第二用户设备对所述第二视频帧的标注操作信息。例如,第二用户设备基于第二用户的标注操作生成对应的标注操作信息,并将该标注操作信息实时发送至第一用户设备,第一用户接收该标注操作信息。The label receiving module 14 is configured to receive the labeling operation information of the second video frame by the second user equipment. For example, the second user equipment generates corresponding labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment in real time, and the first user receives the labeling operation information.
标注呈现模块15,用于根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。例如,第二用户设备基于接收到的标注操作信息,在第一视频帧上实时呈现对应的标注操作,如在当前界面中以小窗的形式显示第一视频帧,再在第一视频帧对应的位置以每50ms一帧的速率呈现对应的标注操作。The annotation presentation module 15 is configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information. For example, the second user equipment displays a corresponding labeling operation in real time on the first video frame based on the received labeling operation information, such as displaying the first video frame in a small window in the current interface, and corresponding to the first video frame in the current interface. The location presents a corresponding annotation operation at a rate of one frame every 50 ms.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,智能眼镜与平板电脑乙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, and smart glasses and tablet B establish video communication via wired or wireless network. Smart glasses encode the currently collected picture and send it to tablet B, and cache. A length of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses. The second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates the real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time. After receiving the labeling operation information, the smart glasses display the corresponding unencoded in the preset area of the smart glasses. The first video frame, and the corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
本领域技术人员应能理解,上述实施例中第二帧相关信息的内容仅为 举例,现有技术中或未来出现的其他第二帧相关信息的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the second frame related information in the foregoing embodiment is only an example, and the content of other second frame related information that appears in the prior art or in the future, if applicable to the present application, also belongs to The scope of protection of this application is hereby incorporated by reference.
在一些实施例中,该设备还包括存储模块16(未示出)。存储模块16,用于存储所述视频流中的视频帧;其中,视频帧确定模块13,用于根据所述第二帧相关信息从所存储的视频帧中确定与所述第二视频帧相对应的第一视频帧。例如,第一用户设备将视频流发送至第二用户设备,并在本地存储一段时长或一定数量的未经编解码的视频帧,其中,一段时长或一定数量可以是预设的固定值,也可以是根据网络状况或传输速率进行动态调整的阈值;随后,第一用户设备基于第二用户设备发送的第二视频帧的第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。在另一些实施例中,所述所存储的视频帧满足但不限于以下至少任一项:所存储的视频帧的发送时间与当前时间的时间间隔小于或等于视频帧存储时长阈值;所存储的视频帧的累计数量小于或等于预定的视频帧存储数量阈值。In some embodiments, the device also includes a storage module 16 (not shown). a storage module 16 for storing a video frame in the video stream, where the video frame determining module 13 is configured to determine, from the stored video frame, the second video frame according to the second frame related information. Corresponding first video frame. For example, the first user equipment sends the video stream to the second user equipment, and locally stores a period of time or a certain number of uncoded video frames, wherein a period of time or a certain number may be a preset fixed value, The threshold may be dynamically adjusted according to the network condition or the transmission rate; then, the first user equipment determines, according to the second frame related information of the second video frame sent by the second user equipment, the corresponding unrecognized in the locally stored video frame. The first video frame of the codec. In other embodiments, the stored video frame meets, but is not limited to, at least one of the following: a time interval between a transmission time of the stored video frame and a current time is less than or equal to a video frame storage duration threshold; the stored The cumulative number of video frames is less than or equal to a predetermined number of video frame storage thresholds.
智能眼镜将采集到的画面发送至平板电脑乙,并在本地存储一段时长或一定数量的未经编解码的视频帧,其中,一段时长或一定数量可以是系统或人工预设的固定值,如通过大数据统计分析得到的一定时长或一定数量的视频帧阈值;一段时长或一定数量的视频帧也可以是根据网络状况或传输速率进行动态调整的视频帧阈值。其中,动态调整的时长或数量阈值可以根据该视频帧的编解码及传输总时长信息确定,如计算当前视频帧的编解码及传输总时长,将该时长作为一个单位时长或者将该时长内可传输的视频帧数量作为一个单位数量,再以该当前单位时长或单位数量为参考设定动态的视频帧时长或数量阈值。此处,设定的预定或动态的视频帧存储时长阈值应大于或等于一个单位时长,同理,设定的预定或动态的视频帧存储数量阈值应大于或等于一个单位数量。随后,智能眼镜根据平板电脑乙发送的第二视频帧的第二帧相关信息在存储的视频中确定对应的未经编解码的第一视频帧,其中,所存储的视频帧的发送时间与当前时间的间隔小于或等于视频帧存储时长阈值,或所存储的视频帧的累计数量小于或等于预定的视频帧存储数量阈值。The smart glasses send the collected images to the tablet B and store them for a period of time or a certain number of uncoded video frames locally, wherein the duration or the number may be a fixed value of the system or manually preset, such as A certain duration or a certain number of video frame thresholds obtained by statistical analysis of big data; a period of time or a certain number of video frames may also be a video frame threshold dynamically adjusted according to network conditions or transmission rates. The duration or number threshold of the dynamic adjustment may be determined according to the codec and the total duration information of the video frame, such as calculating the total length of the codec and the current video frame, and using the duration as a unit duration or within the duration. The number of transmitted video frames is taken as a unit number, and the dynamic video frame duration or number threshold is set with reference to the current unit duration or unit number. Here, the set predetermined or dynamic video frame storage duration threshold should be greater than or equal to one unit duration. Similarly, the set predetermined or dynamic video frame storage threshold should be greater than or equal to one unit number. Then, the smart glasses determine a corresponding uncoded first video frame in the stored video according to the second frame related information of the second video frame sent by the tablet B, wherein the stored video frame is sent and current. The interval of time is less than or equal to the video frame storage duration threshold, or the accumulated number of stored video frames is less than or equal to a predetermined video frame storage threshold.
在一些实施例中,该设备还包括阈值调整模块17(未示出)。阈值调整模块17,用于获取所述视频流中视频帧的编解码及传输总时长信息,并根据所述编解码及传输总时长信息调整所述视频帧存储时长阈值或所述视频帧存储数量阈值。例如,第一用户设备记录各视频帧的编码起始时刻,编码后将视频帧发送至第二用户设备,第二用户设备接收并记录各视频帧解码结束时刻;随后,第二用户设备将该视频帧解码结束时刻发送至第一用户设备,第一用户设备基于编码起始时刻与解码结束时刻计算当前视频帧的编解码及传输总时长信息,或者,第二用户设备基于编码起始时刻与解码结束时刻计算当前视频帧的编解码及传输总时长信息,并将该编解码及传输总时长信息发送至第一用户设备。第一用户设备基于该编解码及传输总时长信息调整所述视频帧存储时长阈值或所述视频帧存储数量阈值,如将该时长信息作为一个单位时间参考,设定一定倍数的视频帧时长为视频帧存储时长阈值;又如,根据该时长信息及第一用户设备发送视频帧的速率计算该时长信息内能够发送的视频帧的数量,将该数量作为单位数量,设定一定倍数的视频帧数量作为视频帧存储数量阈值。In some embodiments, the device further includes a threshold adjustment module 17 (not shown). The threshold adjustment module 17 is configured to acquire codec and total transmission duration information of the video frame in the video stream, and adjust the video frame storage duration threshold or the number of the video frame storage according to the codec and total transmission duration information. Threshold. For example, the first user equipment records the encoding start time of each video frame, and after encoding, sends the video frame to the second user equipment, and the second user equipment receives and records each video frame decoding end time; subsequently, the second user equipment The video frame decoding end time is sent to the first user equipment, and the first user equipment calculates the codec and total transmission duration information of the current video frame based on the encoding start time and the decoding end time, or the second user equipment is based on the encoding start time and The decoding end time calculates the codec and total transmission duration information of the current video frame, and sends the codec and the total transmission duration information to the first user equipment. The first user equipment adjusts the video frame storage duration threshold or the video frame storage threshold according to the codec and the total transmission duration information. If the duration information is used as a unit time reference, setting a certain multiple of the video frame duration is The video frame stores a duration threshold; for example, the number of video frames that can be sent in the duration information is calculated according to the duration information and the rate at which the first user equipment sends the video frame, and the number is used as a unit number to set a certain multiple of the video frame. The number is used as a threshold for the number of video frames stored.
例如,智能眼镜记录第i个视频帧编码起始时刻为T si,编码后将该视频帧发送至平板电脑乙,平板电脑乙接收并记录该视频帧解码结束时刻为T ei。随后,平板电脑乙将该视频帧解码结束时刻T ei发送至智能眼镜,智能眼镜根据接收到的第i个视频帧的解码结束时刻T ei及在本地记录的编码起始时刻T si,计算该视频帧的编解码及传输总时长T i=T ei-T si;智能眼镜可以将编码起始时刻T si随视频帧发送至平板电脑乙,平板电脑乙基于编码结束时刻T ei计算该视频帧的编解码及传输总时长T i=T ei-T si,并将该编解码及传输总时长T i返回至智能眼镜。 For example, the smart glasses record the i-th video frame encoding start time as T si , and after encoding, send the video frame to the tablet B, and the tablet computer B receives and records the video frame decoding end time as T ei . Subsequently, the tablet B sends the video frame decoding end time T ei to the smart glasses, and the smart glasses calculate the decoding end time T ei according to the received i-th video frame and the encoding start time T si recorded locally . The total length of codec and transmission of the video frame is T i =T ei -T si ; the smart glasses can send the encoding start time T si along with the video frame to the tablet B, and the tablet ethyl calculates the video frame at the encoding end time T ei The total length of codec and transmission T i =T ei -T si , and the total length of time T i of the codec and transmission is returned to the smart glasses.
智能眼镜根据第i个视频帧的编解码及传输总时长T i,根据大数据统计确定智能眼镜动态保存的1.3T i时间内的视频帧时长。或者,根据网络传输速率动态调整倍率,如设定缓存时长阈值为(1+k)T i,其中,k为根据网络波动调整的阈值,如网络波动较大时,将k设置为0.5,网络波动较小时,将k设置为0.2等。又如,智能眼镜根据第i个视频帧的编解码及传输总时长T i,并根据当前智能眼镜发送的视频帧的频率f,计算一个 时长T i内传输的视频帧的数量N=T i*f,并进一步确定保存的视频帧数量阈值为1.3N,其中,N为采取进一法取整得到的数值。进一步地,智能眼镜可以根据当前网络传输速率动态调整倍率,如设定缓存数量阈值为(1+k)N,其中,k为根据网络波动调整的阈值,如网络波动较大时,将k设置为0.5,网络波动较小时,将k设置为0.2等。 The smart glasses determine the duration of the video frame in the 1.3T i time dynamically saved by the smart glasses according to the big data statistics according to the encoding and decoding of the i-th video frame and the total transmission time T i . Or, dynamically adjust the magnification according to the network transmission rate, for example, setting the buffer duration threshold to (1+k)T i , where k is a threshold adjusted according to network fluctuation, and if the network fluctuation is large, setting k to 0.5, the network When the fluctuation is small, set k to 0.2 or the like. As another example, the glasses according to the codec and transmit the i-th video frame a total duration T i, and the frequency f of the video frame currently the glasses transmitted, calculates a the number N of video frames in the i transport length T = T i *f, and further determines that the threshold of the number of saved video frames is 1.3N, where N is the value obtained by taking the rounding method. Further, the smart glasses can dynamically adjust the magnification according to the current network transmission rate, such as setting the buffer number threshold (1+k)N, where k is a threshold adjusted according to network fluctuations, and if the network fluctuation is large, k is set. When it is 0.5 and the network fluctuation is small, set k to 0.2 or the like.
本领域技术人员应能理解,上述实施例中存储时长阈值和/或存储数量阈值的内容仅为举例,现有技术中或未来出现的其他存储时长阈值和/或存储数量阈值的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the storage duration threshold and/or the storage threshold in the foregoing embodiment is only an example, and other storage duration thresholds and/or storage thresholds appearing in the prior art or in the future, if applicable, This application is also intended to be within the scope of the present application, and is hereby incorporated by reference.
在一些实施例中,视频发送模块11,用于向第二用户设备发送视频流及所述视频流中已发送视频帧的帧标识信息;其中,视频帧确定模块13,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,其中所述第一视频帧的帧标识信息与所述第二帧相关信息相对应。其中,视频帧的帧标识信息可以是视频帧对应的编解码时间,还可以是该视频对应的编号等。在一些实施例中,视频发送模块11,用于对多个待传输的视频帧进行编码处理,并将对应视频流及所述视频流中已发送视频帧的帧标识信息发送至第二用户设备。例如,第一用户设备对多个待传输的视频帧进行编码处理,并获取该多个待传输的视频帧的编码起始时刻,将该多个视频帧及其编码起始时刻发送至第二用户设备。在一些实施例中,所述视频流中已发送视频帧的帧标识信息包括该已发送视频帧的编码起始时刻信息。In some embodiments, the video sending module 11 is configured to send, to the second user equipment, a video stream and frame identification information of the transmitted video frame in the video stream, where the video frame determining module 13 is configured to The second frame related information determines a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information. The frame identification information of the video frame may be a codec time corresponding to the video frame, or may be a number corresponding to the video. In some embodiments, the video sending module 11 is configured to perform encoding processing on multiple video frames to be transmitted, and send frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment. . For example, the first user equipment performs encoding processing on a plurality of video frames to be transmitted, and acquires an encoding start time of the plurality of video frames to be transmitted, and sends the multiple video frames and their encoding start time to the second. User equipment. In some embodiments, the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
例如,智能眼镜记录各视频帧的编码起始时刻,编码后将视频帧和已发送的视频帧的编码起始时刻发送至平板电脑乙,其中,已发送视频帧包括当前编码完成即将发送的视频帧和已发送的视频帧。此处,智能眼镜可以是间隔一定时间或者间隔一定视频帧发送数量,将已发送的视频帧的编码起始时刻发送至平板电脑乙,也可以直接将第一视频帧的编码起始时刻与该视频帧同时发送至平板电脑乙。平板电脑乙基于用户乙的截屏操作,确定截屏画面对应的视频帧,并将对应的第二视频帧的第二帧相关信息发送至智能眼镜,其中,第二帧相关信息与第二帧标识信息相对应,包括但 不限于以下至少任一项:第二视频帧的编码起始时刻、第二视频帧解码结束时刻、第二视频帧编解码及传输总时长信息、第二视频帧对应编号或图像等。智能眼镜接收该第二帧相关信息,并根据第二帧相关信息确定对应存储的未经编码的第一视频帧,如根据第二视频帧的编码起始时刻、第二视频帧解码结束时刻、第二视频帧编解码及传输总时长信息等确定第二视频帧对应的未经编码的第一视频帧的编码起始时刻进而确定对应的第一视频帧,又如通过第二视频帧对应的编号直接确定相同编号的第一视频帧,还如通过对第二视频帧的图像识别在存储的未经编码视频帧中确定对应的第一视频帧。For example, the smart glasses record the encoding start time of each video frame, and after encoding, send the video frame and the encoding start time of the transmitted video frame to the tablet computer B, wherein the transmitted video frame includes the video that is to be sent after the current encoding is completed. Frames and transmitted video frames. Here, the smart glasses may be sent at a certain time or at a certain interval of a certain video frame, and the coding start time of the transmitted video frame is sent to the tablet B, or the coding start time of the first video frame may be directly Video frames are sent to tablet B at the same time. The tablet computer operates the screen capture operation of the user B, determines the video frame corresponding to the screen capture screen, and sends the second frame related information of the corresponding second video frame to the smart glasses, wherein the second frame related information and the second frame identification information Correspondingly, including but not limited to at least one of the following: an encoding start time of the second video frame, a second video frame decoding end time, a second video frame codec and transmission total duration information, a second video frame corresponding number or Images, etc. The smart glasses receive the second frame related information, and determine correspondingly stored uncoded first video frames according to the second frame related information, such as according to the encoding start time of the second video frame, the second video frame decoding end time, Determining a coding start time of the uncoded first video frame corresponding to the second video frame, and determining a corresponding first video frame, and determining, by the second video frame, corresponding to the second video frame, The number directly determines the first video frame of the same number, and also determines the corresponding first video frame in the stored uncoded video frame by image recognition of the second video frame.
本领域技术人员应能理解,上述实施例中帧标识信息的内容仅为举例,现有技术中或未来出现的其他帧标识信息的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the frame identification information in the foregoing embodiment is only an example, and the content of other frame identification information that appears in the prior art or in the future, if applicable to the present application, should also belong to the protection of the present application. The scope is hereby incorporated by reference.
在一些实施例中,该设备还包括视频帧呈现模块18(未示出)。视频帧呈现模块18,用于呈现所述第一视频帧;其中,标注呈现模块15,用于根据所述标注操作信息在所述第一视频帧上叠加呈现对应的标注操作。例如,第一用户设备确定未经编解码的第一视频帧,并在当前界面中预设的位置或者以小窗的形式显示该第一视频帧;随后,第一用户设备根据实时接收的标注操作信息在第一视频帧对应位置叠加呈现对应的标注操作。In some embodiments, the device also includes a video frame rendering module 18 (not shown). The video frame presentation module 18 is configured to present the first video frame. The annotation presentation module 15 is configured to superimpose a corresponding annotation operation on the first video frame according to the annotation operation information. For example, the first user equipment determines the first video frame that has not been coded, and displays the first video frame in a preset position in the current interface or in a small window; subsequently, the first user equipment receives the annotation according to the real-time reception. The operation information is superimposed on the corresponding position of the first video frame to present a corresponding labeling operation.
例如,智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未经编解码的第一视频帧,并在智能眼镜的界面预设的位置显示第一视频帧。随后,智能眼镜接收到平板电脑乙发送的实时标注操作,智能眼镜确定该标注操作在当前显示的第一视频帧中的对应位置,并在对应位置实时呈现当前该标注操作。For example, the smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and display the first video frame at a preset position of the interface of the smart glasses. Then, the smart glasses receive the real-time annotation operation sent by the tablet B. The smart glasses determine the corresponding position of the annotation operation in the currently displayed first video frame, and present the current annotation operation in real time at the corresponding location.
在一些实施例中,该设备还包括第一优选帧模块19(未示出)。第一优选帧模块19,用于将所述第一视频帧作为呈现所述标注操作的优选帧发送至所述第二用户设备。例如,第一用户设备确定未经编解码的第一视频帧,并将该第一视频帧发送至第二用户设备,以供第二用户设备呈现质量更高的第一视频帧。In some embodiments, the device also includes a first preferred frame module 19 (not shown). The first preferred frame module 19 is configured to send the first video frame to the second user equipment as a preferred frame that presents the labeling operation. For example, the first user equipment determines the first video frame that has not been coded, and sends the first video frame to the second user equipment for the second user equipment to present the first video frame of higher quality.
例如,智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未 经编解码的第一视频帧,并将该第一视频帧作为优选帧发送至平板电脑乙,如通过无损压缩的方式将该第一视频帧发送至平板电脑乙,或者通过损耗较低的有损压缩将该第一视频帧发送至平板电脑乙,该有损压缩过程保证比平板电脑乙端本地缓存的视频帧质量高即可。平板电脑乙接收该第一视频帧,并呈现该第一视频帧。For example, the smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame as a preferred frame to the tablet B, such as by lossless compression. Sending the first video frame to tablet B, or sending the first video frame to tablet B through lossy lossy compression, the lossy compression process guarantees the quality of the video frame locally buffered than the tablet end High enough. Tablet B receives the first video frame and presents the first video frame.
在一些实施例中,视频发送模块11,用于向第二用户设备及第三用户设备发送视频流。例如,第一用户设备、第二用户设备以及第三用户设备间建立了通讯连接,其中,第一用户设备为当前视频帧发送方,第二用户设备与第三用户设备为当前视频帧的接收方,第一用户设备通过通讯连接向第二用户设备和第三用户设备发送视频流。In some embodiments, the video sending module 11 is configured to send a video stream to the second user equipment and the third user equipment. For example, a communication connection is established between the first user equipment, the second user equipment, and the third user equipment, where the first user equipment is the current video frame sender, and the second user equipment and the third user equipment are the current video frame receiving. The first user equipment sends a video stream to the second user equipment and the third user equipment through the communication connection.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜与平板电脑乙、平板电脑丙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至各平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜和平板电脑丙,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜和平板电脑丙,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。同理,平板电脑丙基于接收到的第二帧相关信息以及标注操作信息,根据第二帧相关信息在平板电脑丙端在本地缓存的编解码后的视频库中找到对应的第三视频帧,并基于第三视频帧以及标注操作信息在第三视频帧中呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses and tablet B, and tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected. The picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time. After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame. Similarly, the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
在一些实施例中,该设备还包括第二优选帧模块010(未示出)。第二优选帧模块010,用于将所述第一视频帧作为呈现所述标注操作的优选帧 发送至所述第二用户设备和/或所述第三用户设备。例如,第一用户设备根据第二帧相关信息在本地缓存的视频帧中确定了对应的第一视频帧,并将该第一视频帧发送至第二用户设备和/或第三用户设备。第二用户设备和/或第三用户设备接收该未经编码的第一视频帧后,呈现该第一视频帧,第二用户和/或第三用户可以基于该第一视频帧进行标注操作。In some embodiments, the device further includes a second preferred frame module 010 (not shown). The second preferred frame module 010 is configured to send the first video frame to the second user equipment and/or the third user equipment as a preferred frame for presenting the labeling operation. For example, the first user equipment determines a corresponding first video frame in the locally cached video frame according to the second frame related information, and sends the first video frame to the second user equipment and/or the third user equipment. After the second user equipment and/or the third user equipment receives the uncoded first video frame, the first video frame is presented, and the second user and/or the third user may perform an annotation operation based on the first video frame.
例如,智能眼镜确定第二视频帧对应的未经编解码的第一视频帧后,将该未经编解码的第一视频帧通过无损压缩或者高质量的压缩方式发送至平板电脑乙和/或平板电脑丙,其中,平板电脑乙和平板电脑丙根据当前通讯网络连接的质量自行判断是否获取第一视频帧,或者根据当前通讯网络连接的质量选择第一视频帧的发送方式,如网络质量良好的时候采用无损压缩的方式,网络质量不良的时候采用高质量的压缩方式等。For example, after the smart glasses determine the unencoded first video frame corresponding to the second video frame, the uncoded first video frame is sent to the tablet B through lossless compression or high quality compression. Tablet PC C, wherein Tablet PC B and Tablet PC C determine whether to obtain the first video frame according to the quality of the current communication network connection, or select the transmission mode of the first video frame according to the quality of the current communication network connection, such as good network quality When using lossless compression, high-quality compression is used when the network quality is poor.
在一些实施例中,第二优选帧模块010,用于将所述第一视频帧及所述第二帧相关信息发送至所述第二用户设备和/或所述第三用户设备,其中,所述第一视频帧用作在所述第二用户设备或所述第三用户设备中呈现所述标注操作的优选帧。In some embodiments, the second preferred frame module 010 is configured to send the first video frame and the second frame related information to the second user equipment and/or the third user equipment, where The first video frame is used as a preferred frame to present the annotation operation in the second user device or the third user device.
例如,智能眼镜确定了未经编解码的第一视频帧后,将第一视频帧和该第一视频帧对应的第二帧相关信息发送至平板电脑乙和/平板电脑丙。在一些实施例中,平板电脑乙基于用户乙的操作进行了多次截屏操作,平板电脑乙根据第二帧相关信息确定该第一视频帧对应的截屏操作,如根据第二帧截屏时间确定第一视频帧对应的截屏操作,并在该第一视频帧中呈现该第二帧相关信息。平板电脑丙接收第二帧相关信息及第一视频帧,在呈现第一视频帧的同时,在该呈现第一视频帧的窗口中呈现第二帧相关信息。For example, after the smart glasses determine the first video frame that has not been coded, the first video frame and the second frame related information corresponding to the first video frame are sent to the tablet B and the tablet C. In some embodiments, the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame. The tablet computer C receives the second frame related information and the first video frame, and presents the second frame related information in the window in which the first video frame is presented while presenting the first video frame.
图9示出根据本申请另一个方面的一种对视频帧进行实时标注的第二用户设备,其中,该设备包括视频接收模块21、帧信息确定模块22、标注获取模块23和标注发送模块24。视频接收模块21,用于接收第一用户设备所发送的视频流;帧信息确定模块22,用于根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;标注获取模块23,用于获取所述用户对所述第二视频帧的标注操作信息;标注发送模块24,用于向所述第一用户设备发送所述标注操作信息。 例如,第二用户设备接收并呈现第一用户设备所发送的视频流;第二用户设备基于第二用户的截屏操作,确定当前截屏画面对应的第二视频帧,并将该第二视频帧的第二帧相关信息发送至第一用户设备。随后,第二用户设备基于第二用户的标注操作生成标注操作信息,并将该标注操作信息发送至第一用户设备。9 illustrates a second user equipment for real-time annotation of a video frame according to another aspect of the present application, wherein the apparatus includes a video receiving module 21, a frame information determining module 22, an annotation acquiring module 23, and an annotation transmitting module 24. . The video receiving module 21 is configured to receive the video stream sent by the first user equipment, and the frame information determining module 22 is configured to send the intercepted number to the first user equipment according to a screenshot operation of the user in the video stream. a second frame related information of the second video frame; an annotation obtaining module 23, configured to acquire the labeling operation information of the second video frame by the user; and an annotation sending module 24, configured to send the Label the operation information. For example, the second user equipment receives and presents the video stream sent by the first user equipment; and the second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and the second video frame is The second frame related information is sent to the first user equipment. Then, the second user equipment generates the labeling operation information based on the labeling operation of the second user, and sends the labeling operation information to the first user equipment.
例如,用户乙持有平板电脑乙,用户甲持有智能眼镜,平板电脑乙与智能眼镜通过有线或者无线网络进行视频通讯。平板电脑乙接收并呈现智能眼镜发送的视频流,并根据用户乙的截屏操作,确定该截屏画面对应的第二视频帧。随后,平板电脑乙将该第二视频帧对应的第二帧相关信息发送至智能眼镜,智能眼镜接收该第二帧相关信息并基于该第二帧相关信息确定对应的第一视频帧。平板电脑乙基于用户乙的标注操作生成对应的标注操作信息,并将该标注操作信息实时发送至智能眼镜。智能眼镜根据第一视频帧以及该标注操作信息,在界面预设的位置呈现第一视频帧,并在第一视频帧中对应位置实时呈现对应的标注操作。For example, user B holds tablet B, user A holds smart glasses, and tablet B and smart glasses communicate video over wired or wireless networks. The tablet computer B receives and presents the video stream sent by the smart glasses, and determines the second video frame corresponding to the screen capture screen according to the screen capture operation of the user B. Then, the tablet B sends the second frame related information corresponding to the second video frame to the smart glasses, and the smart glasses receive the second frame related information and determine a corresponding first video frame based on the second frame related information. The tablet ethyl unit generates the corresponding labeling operation information in the labeling operation of the user B, and sends the labeling operation information to the smart glasses in real time. The smart glasses present a first video frame at a preset position of the interface according to the first video frame and the labeled operation information, and present a corresponding labeling operation in real time in the corresponding position in the first video frame.
在一些实施例中,视频接收模块21,用于接收第一用户设备所发送的视频流,及所述视频流中已发送视频帧的帧标识信息;其中,所述第二帧相关信息包括以下至少任一项:所述第二视频帧的帧标识信息;基于所述第二视频帧的帧标识信息生成的帧相关信息。例如,第一用户设备向第二用户设备发送视频流的同时,还向第二用户设备发送该视频流中已发送视频帧的帧标识信息,第二用户设备接收该视频流,以及视频流中已发送的视频帧的帧标识信息。第二用户设备基于第二用户的截屏操作,确定当前截屏画面对应的第二视频帧,并将该第二视频帧的第二帧相关信息发送至第一用户设备,其中,所述第二视频帧的第二帧相关信息包括但不限于:第二视频帧的帧标识信息;基于该第二视频帧的帧标识信息生成的帧相关信息。In some embodiments, the video receiving module 21 is configured to receive a video stream sent by the first user equipment, and frame identification information of the transmitted video frame in the video stream, where the second frame related information includes the following At least one of: frame identification information of the second video frame; frame related information generated based on frame identification information of the second video frame. For example, the first user equipment sends the video stream to the second user equipment, and also sends the frame identifier information of the sent video frame in the video stream to the second user equipment, where the second user equipment receives the video stream, and the video stream The frame identification information of the transmitted video frame. The second user equipment determines, according to the screen capture operation of the second user, the second video frame corresponding to the current screen capture, and sends the second frame related information of the second video frame to the first user equipment, where the second video The second frame related information of the frame includes but is not limited to: frame identification information of the second video frame; frame related information generated based on the frame identification information of the second video frame.
例如,智能眼镜在发送视频流的同时,将已经发送的视频流中的视频帧对应的帧标识信息发送至平板电脑乙。平板电脑乙检测到用户乙的截屏操作,基于当前截屏的画面,确定该截屏画面对应第二视频帧,并将第二视频帧对应的第二帧相关信息发送至智能眼镜,其中,第二视频帧相关信 息包括但不限于:第二视频帧的帧标识信息、基于第二视频帧的帧标识信息生成的帧相关信息;其中,第二视频帧的帧标识信息可以是该视频帧的编码起始时刻或者该视频帧对应的编号等,基于第二视频帧的帧标识信息生成的帧相关信息可以是该视频帧的解码结束时刻或者编解码及传输总时长信息等。For example, the smart glasses send the frame identification information corresponding to the video frames in the already transmitted video stream to the tablet B while transmitting the video stream. The tablet computer B detects the screen capture operation of the user B. Based on the screen of the current screen capture, it is determined that the screen capture screen corresponds to the second video frame, and the second frame related information corresponding to the second video frame is sent to the smart glasses, wherein the second video The frame related information includes, but is not limited to, frame identification information of the second video frame, frame related information generated based on the frame identification information of the second video frame, where the frame identification information of the second video frame may be the encoding of the video frame. The frame-related information generated based on the frame identification information of the second video frame may be the decoding end time of the video frame or the total length of the codec and the transmission time information, etc., at the start time or the number corresponding to the video frame.
本领域技术人员应能理解,上述实施例中第二帧相关信息的内容仅为举例,现有技术或未来出现的其他第二帧相关信息的内容,如适用于本申请,则也应属于本申请的保护范围,故也在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the second frame related information in the foregoing embodiment is only an example, and the content of other second frame related information existing in the prior art or in the future, if applicable to the present application, The scope of protection of the application is hereby incorporated by reference.
在一些实施例中,所述帧标识信息包括所述第二视频帧的编码起始时间信息。例如,第一用户设备对视频帧进行编码处理,并将对应视频流及所述视频流中已发送视频帧的帧标识信息发送至第二用户设备,其中,所述视频帧的帧标识信息包括该视频帧的编码起始时刻。在一些实施例中,所述第二帧相关信息包括所述第二视频帧的解码结束时间信息与编解码及传输总时长信息。第二用户设备接收并呈现该视频流,并记录对应的解码结束时刻,基于截屏操作确定对应的第二视频帧,根据该第二视频帧的编码起始时刻和解码结束时刻,确定对应的编解码和传输总时长信息。In some embodiments, the frame identification information includes encoding start time information of the second video frame. For example, the first user equipment performs an encoding process on the video frame, and sends the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment, where the frame identification information of the video frame includes The encoding start time of the video frame. In some embodiments, the second frame related information includes decoding end time information and codec and transmission total duration information of the second video frame. The second user equipment receives and presents the video stream, and records a corresponding decoding end time, determines a corresponding second video frame based on the screen capture operation, and determines a corresponding edit according to the encoding start time and the decoding end time of the second video frame. Decode and transmit total duration information.
例如,智能眼镜记录各视频帧的编码起始时刻,编码后将视频帧和已发送的视频帧的编码起始时刻发送至平板电脑乙。平板电脑乙接收并呈现该视频帧,并记录该视频帧的解码结束时刻。平板电脑乙基于用户乙的截屏操作,确定对应的第二视频帧,并根据第二视频帧对应的编码起始时刻和解码结束时刻,确定该第二视频帧的编解码和传输总时长信息。随后,平板电脑乙将第二视频帧的第二帧相关信息发送智能眼镜,其中,第二帧相关信息包括但不限于:第二视频帧的编码起始时刻、第二视频帧的编解码和传输总时长信息等。For example, the smart glasses record the encoding start time of each video frame, and after encoding, send the encoding start time of the video frame and the transmitted video frame to the tablet B. Tablet B receives and presents the video frame and records the decoding end time of the video frame. The screen is operated by the screen capture operation of the user E, and the corresponding second video frame is determined, and the codec and the total transmission duration information of the second video frame are determined according to the coding start time and the decoding end time corresponding to the second video frame. Then, the tablet B sends the second frame related information of the second video frame to the smart glasses, where the second frame related information includes, but is not limited to, an encoding start time of the second video frame, a codec of the second video frame, and Transfer total time information, etc.
在一些实施例中,标注获取模块23,用于实时获取所述用户对所述第二视频帧的标注操作信息;其中,标注发送模块24,用于向所述第一用户设备实时发送所述标注操作。例如,第二用户设备基于第二用户的操作,实时获取对应的标注操作信息,如以一定的时间间隔采集对应的标注操作信息。随后,第二用户设备将获取的标注操作信息实时发送至第一用户设 备。In some embodiments, the annotation obtaining module 23 is configured to acquire the labeling operation information of the second video frame by the user in real time, where the label sending module 24 is configured to send the identifier to the first user equipment in real time. Label operation. For example, the second user equipment acquires the corresponding labeling operation information in real time based on the operation of the second user, for example, collecting the corresponding labeling operation information at a certain time interval. Then, the second user equipment sends the acquired annotation operation information to the first user equipment in real time.
例如,平板电脑乙采集用户乙对截屏画面的标注操作,如用户乙在屏幕上画圆、箭头、文字、方框等标记。平板电脑乙记录标注画笔的位置和路径,如通过屏幕上多个点,得到对应标注对应点的位置,连接多个点的位置得到标注的路径等。平板电脑乙实时获取对应的标注操作,并实时发送给智能眼镜,如以50ms一帧的频率采集并发送标注操作等。For example, the tablet computer B collects the labeling operation of the user B on the screen capture screen, for example, the user B draws a circle, an arrow, a text, a box and the like on the screen. Tablet B records the position and path of the marked brush. For example, through multiple points on the screen, the position corresponding to the corresponding point is obtained, and the position where the multiple points are connected is marked. Tablet PC B obtains the corresponding labeling operation in real time and sends it to the smart glasses in real time, such as collecting and sending labeling operations at a frequency of 50 ms.
在一些实施例中,该设备还包括第一视频帧替换模块25(未示出)。第一视频帧替换模块25,用于接收所述第一用户设备所发送的第一视频帧,其中,所述第一视频帧用作呈现所述标注操作的优选帧,在所述第二视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第二视频帧,其中,所述标注操作显示于所述第一视频帧。例如,例如,第二用户设备将确定当前截屏画面对应的第二视频帧,并将该第二视频帧的第二帧相关信息发送至第一用户设备;第一用户设备基于该第二视频帧的第二帧相关信息确定该第二视频帧对应的未经编解码的第一视频帧,并将该第一视频帧发送至第二用户设备,第二用户设备接收并呈现该第一视频帧,并获取第二用户对第一视频帧的标注操作信息。In some embodiments, the device also includes a first video frame replacement module 25 (not shown). a first video frame replacement module 25, configured to receive a first video frame sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation, in the second video Loading the first video frame to replace the second video frame in a display window of the frame, wherein the labeling operation is displayed on the first video frame. For example, the second user equipment determines a second video frame corresponding to the current screen shot and transmits the second frame related information of the second video frame to the first user equipment; the first user equipment is based on the second video frame. The second frame related information determines the unencoded first video frame corresponding to the second video frame, and sends the first video frame to the second user equipment, where the second user equipment receives and presents the first video frame. And obtaining the labeling operation information of the first video frame by the second user.
例如,平板电脑乙基于用户的操作等进入截屏模式,确定当前画面对应的第二视频帧,将该第二视频帧的第二帧相关信息发送至智能眼镜端,其中,第二帧相关信息包括但不限于:所述第二视频帧的编码起始时刻或者该视频帧对应的编号等。智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未经编解码的第一视频帧,并将该第一视频帧发送至平板电脑乙,如通过无损压缩的方式将该第一视频帧发送至平板电脑乙,或者通过损耗较低的有损压缩将该第一视频帧发送至平板电脑乙,该有损压缩过程保证比平板电脑乙端本地缓存的视频帧质量高即可。平板电脑乙接收并呈现该第一视频帧,如在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等。随后,平板电脑乙根据第二用户的操作,得到关于第一视频帧的标注操作信息等。For example, the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame. The smart glasses determine a corresponding uncoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B, such as by using lossless compression to the first video. The frame is sent to the tablet B, or the first video frame is sent to the tablet B through the lossy lossy compression. The lossy compression process guarantees a higher quality than the locally buffered video frame of the tablet. The tablet B receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
在一些实施例中,该设备还包括第一视频帧标注模块26(未示出)。第一视频帧标注模块26,用于接收所述第一用户设备所发送的第一视频帧 及所述第二帧相关信息,其中,所述第一视频帧用作呈现所述标注操作的优选帧,根据所述第二帧相关信息确定所述第一视频帧用于替换所述第二视频帧,并在所述第二视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第二视频帧,其中,所述标注操作显示于所述第一视频帧。In some embodiments, the device further includes a first video frame labeling module 26 (not shown). a first video frame labeling module 26, configured to receive a first video frame and the second frame related information that are sent by the first user equipment, where the first video frame is used as a preference for presenting the labeling operation a frame, determining, according to the second frame related information, that the first video frame is used to replace the second video frame, and loading the first video frame to replace the first video frame in a display window of the second video frame The second video frame is described, wherein the labeling operation is displayed on the first video frame.
例如,平板电脑乙接收到智能眼镜发送的未经编解码的第一视频帧以及第二帧相关信息,其中,第二帧相关信息包括第二视频帧的截屏时间、第二视频帧的视频帧编号等。在一些实施例中,平板电脑乙基于用户乙的操作进行了多次截屏操作,平板电脑乙根据第二帧相关信息确定该第一视频帧对应的截屏操作,如根据第二帧截屏时间确定第一视频帧对应的截屏操作,并在该第一视频帧中呈现该第二帧相关信息。平板电脑乙根据第二帧相关信息确定当前对应的截屏操作,并在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等,同时在呈现第一视频帧时,平板电脑乙在呈现的第一视频帧中呈现第二帧相关信息,如在第一视频帧中呈现该帧时间的截图时间或者该帧在视频帧中的帧数编号等。随后,平板电脑乙根据第二用户的操作,得到关于第一视频帧的标注操作信息等。For example, the tablet B receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame and a video frame of the second video frame. Number, etc. In some embodiments, the tablet ethyl unit performs a plurality of screen capture operations on the operation of the user B, and the tablet computer B determines the screen capture operation corresponding to the first video frame according to the second frame related information, for example, according to the second frame screenshot time. A screen capture operation corresponding to a video frame, and the second frame related information is presented in the first video frame. The tablet computer B determines the current corresponding screen capture operation according to the second frame related information, and presents the window in the form of a small window next to the current video, or displays the first video frame on a large screen, and presents the current video in the form of a small window, etc. When the first video frame is presented, the tablet B presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Wait. Subsequently, the tablet computer B obtains the labeling operation information and the like regarding the first video frame according to the operation of the second user.
在一些实施例中,视频接收模块21,用于接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;其中,标注发送模块24,用于向所述第一用户设备及所述第三用户设备发送所述标注操作信息。In some embodiments, the video receiving module 21 is configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment, where the label sending module 24 is configured to send to the first user equipment. And the third user equipment sends the labeling operation information.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜与平板电脑乙、平板电脑丙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至各平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜和平板电脑丙,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平 板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜和平板电脑丙。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses and tablet B, and tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected. The picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time.
图10示出根据本申请的又一个方面的一种在第三用户设备端对视频帧进行实时标注的设备,其中,该设备包括第三视频接收模块31、第三帧信息接收模块32、第三视频帧确定模块33、第三标注接收模块34和第三呈现模块35。第三视频接收模块31,用于接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;第三帧信息接收模块32,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;第三视频帧确定模块33,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;第三标注接收模块34,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;第三呈现模块35,用于根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。FIG. 10 illustrates an apparatus for real-time labeling a video frame at a third user equipment end according to still another aspect of the present application, wherein the apparatus includes a third video receiving module 31, a third frame information receiving module 32, and a third The three video frame determining module 33, the third label receiving module 34, and the third rendering module 35. a third video receiving module 31, configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment, where the third frame information receiving module 32 is configured to receive the second user equipment in the a second frame related information of the second video frame that is intercepted in the video stream; the third video frame determining module 33 is configured to determine, according to the second frame related information, that the video stream corresponds to the second video frame a third video frame, a third annotation receiving module 34, configured to receive the labeling operation information of the second video frame by the second user equipment, and a third rendering module 35, configured to The corresponding labeling operation is presented in real time on the third video frame.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜与平板电脑乙、平板电脑丙通过有线或无线网络建立了视频通讯,智能眼镜将当前采集的画面编码后发送至各平板电脑乙,并缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至智能眼镜和平板电脑丙,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息实时发送至智能眼镜和平板电脑丙,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。同理,平板电脑丙基于接收到的第二帧相关信息以及标注操作信息,根据第二帧相关信息在平板电脑丙端在本地缓存的编解码后的视频库中找到对应的第三视频帧,并基于第 三视频帧以及标注操作信息在第三视频帧中呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses and tablet B, and tablet C establishes video communication through wired or wireless network, and smart glasses will be currently collected. The picture is encoded and sent to each tablet B and buffered for a period of time or a certain number of video frames. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the smart glasses and The tablet C, wherein the second frame related information includes but is not limited to: the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame codec and transmission total duration Information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time labeling operation information according to the labeling operation of the user B, and sends the labeling operation information to the smart glasses and the tablet computer C in real time. After receiving the labeling operation information, the smart glasses display the corresponding areas in the preset area of the smart glasses. The first video frame that is not coded, and the corresponding labeling operation is presented in real time at the position corresponding to the first video frame. Similarly, the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
在一些实施例中,该设备还包括优选帧接收呈现模块36(未示出)。优选帧接收呈现模块36,用于接收所述第一用户设备所发送的第一视频帧,其中,所述第一视频用作呈现所述标注操作的优选帧,在所述第三视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第三视频帧,其中,所述标注操作显示于所述第一视频帧。In some embodiments, the device also includes a preferred frame reception presentation module 36 (not shown). a preferred frame receiving presentation module 36, configured to receive a first video frame sent by the first user equipment, where the first video is used as a preferred frame for presenting the labeling operation, in the third video frame Loading the first video frame to replace the third video frame in a display window, wherein the labeling operation is displayed on the first video frame.
例如,平板电脑乙基于用户的操作等进入截屏模式,确定当前画面对应的第二视频帧,将该第二视频帧的第二帧相关信息发送至智能眼镜端,其中,第二帧相关信息包括但不限于:所述第二视频帧的编码起始时刻或者该视频帧对应的编号等。智能眼镜根据平板电脑乙发送的第二帧相关信息确定对应的未经编解码的第一视频帧,并将该第一视频帧发送至平板电脑乙和平板电脑丙,如通过无损压缩的方式将该第一视频帧发送至平板电脑乙和平板电脑丙,或者通过损耗较低的有损压缩将该第一视频帧发送至平板电脑乙和平板电脑丙,该有损压缩过程保证比平板电脑乙和平板电脑丙端本地缓存的视频帧质量高即可。平板电脑丙接收并呈现该第一视频帧,如在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等。随后,平板电脑丙接收平板电脑乙发送的标注操作信息,并在该第一视频帧中呈现该标注操作。For example, the tablet ethyl enters the screen capture mode by the operation of the user, determines the second video frame corresponding to the current picture, and sends the second frame related information of the second video frame to the smart glasses end, where the second frame related information includes However, it is not limited to: an encoding start time of the second video frame or a number corresponding to the video frame. The smart glasses determine a corresponding unencoded first video frame according to the second frame related information sent by the tablet B, and send the first video frame to the tablet B and the tablet C, as by lossless compression The first video frame is sent to the tablet B and the tablet C, or the first video frame is sent to the tablet B and the tablet C through the lossy lossy compression, the lossy compression process is guaranteed to be better than the tablet B And the quality of the video frame cached locally by the tablet C is high. The tablet computer C receives and presents the first video frame, such as being presented in the form of a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video in the form of a small window, and the like. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
在一些实施例中,该设备还包括优选帧标注呈现模块37(未示出)。优选帧标注呈现模块37,用于接收所述第一用户设备所发送的第一视频帧及所述第二帧相关信息,其中,所述第一视频帧用作呈现所述标注操作的优选帧,根据所述第二帧相关信息确定所述第一视频帧用于替换所述第三视频帧,在所述第三视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第三视频帧,其中,所述标注操作显示于所述第一视频帧。In some embodiments, the device further includes a preferred frame annotation presentation module 37 (not shown). a frame annotation presentation module 37, configured to receive a first video frame and the second frame related information sent by the first user equipment, where the first video frame is used as a preferred frame for presenting the labeling operation Determining, according to the second frame related information, that the first video frame is used to replace the third video frame, and loading the first video frame in a display window of the third video frame to replace the first video frame a three video frame, wherein the labeling operation is displayed on the first video frame.
例如,平板电脑丙接收到智能眼镜发送的未经编解码的第一视频帧以及第二帧相关信息,其中,第二帧相关信息包括第二视频帧的截屏时间、第二视频帧的视频帧编号等。平板电脑丙接收并呈现该第一视频帧,如在当前视频旁以小窗的形式呈现,或者大屏显示第一视频帧,将当前视频以小窗的形式呈现等,同时在呈现第一视频帧时,平板电脑丙在呈现的第一 视频帧中呈现第二帧相关信息,如在第一视频帧中呈现该帧时间的截图时间或者该帧在视频帧中的帧数编号等。随后,平板电脑丙接收平板电脑乙发送的标注操作信息,并在该第一视频帧中呈现该标注操作。For example, the tablet computer C receives the unencoded first video frame and the second frame related information sent by the smart glasses, wherein the second frame related information includes a screenshot time of the second video frame, and a video frame of the second video frame. Number, etc. The tablet computer C receives and presents the first video frame, such as being presented as a small window next to the current video, or displaying the first video frame on a large screen, presenting the current video as a small window, etc., while presenting the first video. At the time of the frame, the tablet C presents the second frame related information in the first video frame presented, such as the screenshot time of the frame time in the first video frame or the frame number of the frame in the video frame. Subsequently, the tablet computer C receives the labeling operation information sent by the tablet computer B, and presents the labeling operation in the first video frame.
图11示出根据本申请的又一个方面的一种对视频帧进行实时标注的网络设备,其中,该设备包括视频转发模块41、帧信息接收模块42、帧信息转发模块43、标注接收模块44和标注转发模块45。视频转发模块41,用于接收并转发第一用户设备发给第二用户设备的视频流;帧信息接收模块42,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;帧信息转发模块43,用于将所述第二帧相关信息转发至所述第一用户设备;标注接收模块44,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;标注转发模块45,用于将所述标注操作信息转发至所述第一用户设备。FIG. 11 illustrates a network device for real-time annotation of a video frame according to still another aspect of the present application, wherein the device includes a video forwarding module 41, a frame information receiving module 42, a frame information forwarding module 43, and an annotation receiving module 44. And annotating the forwarding module 45. The video forwarding module 41 is configured to receive and forward a video stream that is sent by the first user equipment to the second user equipment, and the frame information receiving module 42 is configured to receive the second information that is captured by the second user equipment in the video stream. a second frame related information of the video frame, a frame information forwarding module 43, configured to forward the second frame related information to the first user equipment, and an annotation receiving module 44, configured to receive the second user equipment The labeling operation information of the second video frame is used to forward the labeling operation information to the first user equipment.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,智能眼镜与平板电脑乙通过云端进行视频通讯。智能眼镜将当前采集的画面编码后发送云端,由云端转发至平板电脑乙,其中,智能眼镜在发送视频时缓存一段时长或一定数量的视频帧。平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至云端,由云端转发至智能眼镜,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息发送至云端,并由云端发送至智能眼镜,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。For example, user A holds smart glasses, user B holds tablet B, and smart glasses and tablet B communicate video through the cloud. The smart glasses encode the currently collected picture and send it to the cloud, and the cloud forwards it to the tablet B. The smart glasses cache a period of time or a certain number of video frames when the video is sent. After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture image, and sends the second frame related information corresponding to the second video frame to the cloud, The cloud forwards to the smart glasses, where the second frame related information includes, but is not limited to, the second video frame identification information, the second video frame encoding start time, the second video frame decoding end time, and the second video frame encoding and decoding. Total length information, etc. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and sends the labeling operation information to the cloud, and is sent by the cloud to the smart glasses. After receiving the labeling operation information, the smart glasses display in the preset area of the smart glasses. Corresponding uncoded first video frame, and corresponding corresponding labeling operation is presented in real time at a position corresponding to the first video frame.
本领域技术人员应能理解,上述实施例中第二帧相关信息的内容仅为举例,现有技术中或未来出现的其他第二帧相关信息的内容,如适用于本申请,则也应属于本申请的保护范围,故在此以引用的方式包含于此。It should be understood by those skilled in the art that the content of the second frame related information in the foregoing embodiment is only an example, and the content of other second frame related information that appears in the prior art or in the future, if applicable to the present application, also belongs to The scope of protection of this application is hereby incorporated by reference.
在一些实施例中,视频转发模块41,用于接收并转发第一用户设备发给第二用户设备的视频流,及所述视频流中已发送视频帧的帧标识信息。例如,第一用户设备对视频帧进行编码处理,并将对应视频流及所述视频流中已发送视频帧的帧标识信息发送至网络设备,网络设备将该视频流及已发送视频帧的帧标识信息转发至第二用户设备,其中,帧标识信息包括视频帧的编码起始时刻。在另一些实施例中,帧信息转发模块43,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的视频帧的帧标识信息,并将所述第二视频帧相对应的视频帧的帧标识信息发送至所述第一用户设备。In some embodiments, the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment, and frame identification information of the transmitted video frame in the video stream. For example, the first user equipment performs encoding processing on the video frame, and sends the corresponding video stream and the frame identification information of the transmitted video frame in the video stream to the network device, where the network device sends the video stream and the frame of the transmitted video frame. The identification information is forwarded to the second user equipment, wherein the frame identification information includes an encoding start time of the video frame. In another embodiment, the frame information forwarding module 43 is configured to determine, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream, and The frame identification information of the video frame corresponding to the second video frame is sent to the first user equipment.
例如,云端接收智能眼镜发送的视频流及视频流中已发送的视频帧的帧标识信息,如各视频帧的编码起始时刻。云端将该视频流及对应已发送的视频帧的帧标识信息转发至平板电脑乙。平板电脑乙接收并呈现该视频帧,并记录该视频帧的解码结束时刻。平板电脑乙基于用户乙的截屏操作,确定对应的第二视频帧,并将第二视频帧的第二帧相关信息发送至云端,其中,第二帧相关信息包括第二视频帧对应的解码结束时刻或者第二视频帧的视频编号等。云端接收平板电脑乙发送的第二视频帧的第二帧相关信息,并基于该第二帧相关信息确定对应的第二帧的帧标识信息,如根据第二视频帧的解码结束时刻或者第二视频帧的视频编号等确定该第二帧的编码起始时刻或者第二视频帧的视频编号等。For example, the cloud receives the video stream sent by the smart glasses and the frame identification information of the transmitted video frames in the video stream, such as the encoding start time of each video frame. The cloud forwards the video stream and the frame identification information corresponding to the transmitted video frame to the tablet B. Tablet B receives and presents the video frame and records the decoding end time of the video frame. The screen capture operation of the tablet E is performed by the user E, and the corresponding second video frame is determined, and the second frame related information of the second video frame is sent to the cloud, where the second frame related information includes the decoding end corresponding to the second video frame. Time or video number of the second video frame, etc. The cloud receives the second frame related information of the second video frame sent by the tablet B, and determines the frame identification information of the corresponding second frame based on the second frame related information, such as according to the decoding end time or the second of the second video frame. The video number of the video frame or the like determines the encoding start time of the second frame or the video number of the second video frame, and the like.
在一些实施例中,视频转发模块41,用于接收并转发第一用户设备发给第二用户设备及第三用户设备的视频流;其中,帧信息转发模块43,用于将所述第二帧相关信息转发至所述第一用户设备及所述第三用户设备;其中,标注转发模块45,用于将所述标注操作信息转发至所述第一用户设备及所述第三用户设备。In some embodiments, the video forwarding module 41 is configured to receive and forward a video stream sent by the first user equipment to the second user equipment and the third user equipment, where the frame information forwarding module 43 is configured to use the second The frame-related information is forwarded to the first user equipment and the third user equipment. The label forwarding module 45 is configured to forward the labeling operation information to the first user equipment and the third user equipment.
例如,用户甲持有智能眼镜,用户乙持有平板电脑乙,用户丙持有平板电脑丙,智能眼镜、平板电脑乙、平板电脑丙通过网络设备建立了视频通讯,智能眼镜将当前采集的画面编码后发送至网络设备,并缓存一段时长或一定数量的视频帧,网络设备将该视频流发送至平板电脑乙和平板电脑丙。For example, user A holds smart glasses, user B holds tablet B, user C holds tablet C, smart glasses, tablet B, and tablet C establish video communication through network devices, and smart glasses will capture the current picture. The code is sent to the network device and buffered for a period of time or a certain number of video frames, and the network device sends the video stream to the tablet B and the tablet C.
平板电脑乙端接收并解码后呈现该视频流,基于用户乙的截屏操作,确定该截屏画面对应的第二视频帧,并将该第二视频帧对应的第二帧相关信息发送至网络设备,其中,第二帧相关信息包括但不限于:第二视频帧标识信息、第二视频帧编码起始时刻、第二视频帧解码结束时刻以及第二视频帧编解码及传输总时长信息等。智能眼镜接收该第二视频帧的第二帧相关信息,并基于该第二帧相关信息,在本地存储的视频帧中确定对应的未经编解码的第一视频帧。网络设备将该第二帧相关信息转发至第一用户设备和第二用户设备。After receiving and decoding, the tablet B receives the video stream, and based on the screen capture operation of the user B, determines a second video frame corresponding to the screen capture, and sends the second frame related information corresponding to the second video frame to the network device. The second frame related information includes, but is not limited to, a second video frame identification information, a second video frame encoding start time, a second video frame decoding end time, and a second video frame codec and transmission total duration information. The smart glasses receive the second frame related information of the second video frame, and determine a corresponding uncoded first video frame in the locally stored video frame based on the second frame related information. The network device forwards the second frame related information to the first user equipment and the second user equipment.
平板电脑乙根据用户乙的标注操作生成实时的标注操作信息,并将标注操作信息通过网络设备实时转发送至智能眼镜和平板电脑丙,智能眼镜接收该标注操作信息后,在智能眼镜的预设区域显示对应的未经编解码的第一视频帧,并在该第一视频帧对应的位置实时呈现对应的标注操作。同理,平板电脑丙基于接收到的第二帧相关信息以及标注操作信息,根据第二帧相关信息在平板电脑丙端在本地缓存的编解码后的视频库中找到对应的第三视频帧,并基于第三视频帧以及标注操作信息在第三视频帧中呈现对应的标注操作。The tablet computer B generates real-time annotation operation information according to the labeling operation of the user B, and transmits the labeling operation information to the smart glasses and the tablet computer C through the network device in real time, and the smart glasses receive the labeling operation information, and the preset in the smart glasses The area displays a corresponding uncoded first video frame, and presents a corresponding labeling operation in real time at a position corresponding to the first video frame. Similarly, the tablet propyl receives the second frame related information and the labeled operation information, and finds the corresponding third video frame in the locally cached codec video library according to the second frame related information. And presenting a corresponding labeling operation in the third video frame based on the third video frame and the labeling operation information.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的系统,其中,所述系统包括:如上述任一实施例所述的第一用户设备及如上述任一实施例所述的第二用户设备;在另一些实施例中,该系统还包括:如上述任一实施例中所述的网络设备。According to an aspect of the present application, there is provided a system for real-time annotation of a video frame, wherein the system comprises: the first user equipment according to any of the above embodiments and any of the above embodiments The second user equipment is described; in other embodiments, the system further comprises: the network device as described in any of the above embodiments.
根据本申请的一个方面,提供了一种用于对视频帧进行实时标注的系统,其中,所述系统包括:如上述任一实施例所述的第一用户设备、如上述任一实施例所述的第二用户设备以及如上述任一实施例所述的第三用户设备;在另一些实施例中,该系统还包括:如上述任一实施例中所述的网络设备。According to an aspect of the present application, there is provided a system for real-time annotation of a video frame, wherein the system comprises: the first user equipment according to any of the above embodiments, according to any of the above embodiments The second user equipment and the third user equipment as described in any of the above embodiments; in other embodiments, the system further includes: the network device as described in any of the foregoing embodiments.
本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机代码,当所述计算机代码被执行时,如前任一项所述的方法被执行。The present application also provides a computer readable storage medium storing computer code, the method of any of which is performed when the computer code is executed.
本申请还提供了一种计算机程序产品,当所述计算机程序产品被计算 机设备执行时,如前任一项所述的方法被执行。The present application also provides a computer program product that is executed as described in any of the foregoing when the computer program product is executed by a computer device.
本申请还提供了一种计算机设备,所述计算机设备包括:The application also provides a computer device, the computer device comprising:
一个或多个处理器;One or more processors;
存储器,用于存储一个或多个计算机程序;a memory for storing one or more computer programs;
当所述一个或多个计算机程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如前任一项所述的方法。When the one or more computer programs are executed by the one or more processors, the one or more processors are caused to implement the method of any of the preceding.
图12示出了可被用于实施本申请中所述的各个实施例的示例性系统;Figure 12 illustrates an exemplary system that can be used to implement various embodiments described in this application;
如图12所示在一些实施例中,系统300能够作为各所述实施例中的任意一个用于对视频帧进行实时标注的设备。在一些实施例中,系统300可包括具有指令的一个或多个计算机可读介质(例如,系统存储器或NVM/存储设备320)以及与该一个或多个计算机可读介质耦合并被配置为执行指令以实现模块从而执行本申请中所述的动作的一个或多个处理器(例如,(一个或多个)处理器305)。As shown in FIG. 12, in some embodiments, system 300 can be used as a device for any real-time annotation of video frames as any of the described embodiments. In some embodiments, system 300 can include and be coupled to one or more computer readable media (eg, system memory or NVM/storage device 320) having instructions and configured to execute The instructions are one or more processors (eg, processor(s) 305) that implement the modules to perform the actions described herein.
对于一个实施例,系统控制模块310可包括任意适当的接口控制器,以向(一个或多个)处理器305中的至少一个和/或与系统控制模块310通信的任意适当的设备或组件提供任意适当的接口。For one embodiment, system control module 310 can include any suitable interface controller to provide to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310. Any suitable interface.
系统控制模块310可包括存储器控制器模块330,以向系统存储器315提供接口。存储器控制器模块330可以是硬件模块、软件模块和/或固件模块。 System control module 310 can include a memory controller module 330 to provide an interface to system memory 315. The memory controller module 330 can be a hardware module, a software module, and/or a firmware module.
系统存储器315可被用于例如为系统300加载和存储数据和/或指令。对于一个实施例,系统存储器315可包括任意适当的易失性存储器,例如,适当的DRAM。在一些实施例中,系统存储器315可包括双倍数据速率类型四同步动态随机存取存储器(DDR4SDRAM)。 System memory 315 can be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 can include any suitable volatile memory, such as a suitable DRAM. In some embodiments, system memory 315 can include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).
对于一个实施例,系统控制模块310可包括一个或多个输入/输出(I/O)控制器,以向NVM/存储设备320及(一个或多个)通信接口325提供接口。For one embodiment, system control module 310 can include one or more input/output (I/O) controllers to provide an interface to NVM/storage device 320 and communication interface(s) 325.
例如,NVM/存储设备320可被用于存储数据和/或指令。NVM/存储设备320可包括任意适当的非易失性存储器(例如,闪存)和/或可包括任意适当的(一个或多个)非易失性存储设备(例如,一个或多个硬盘驱动器(HDD)、一个或多个光盘(CD)驱动器和/或一个或多个数字通用光盘(DVD)驱动器)。For example, NVM/storage device 320 can be used to store data and/or instructions. NVM/storage device 320 may comprise any suitable non-volatile memory (eg, flash memory) and/or may include any suitable non-volatile storage device(s) (eg, one or more hard disk drives ( HDD), one or more compact disc (CD) drives and/or one or more digital versatile disc (DVD) drives).
NVM/存储设备320可包括在物理上作为系统300被安装在其上的设备的 一部分的存储资源,或者其可被该设备访问而不必作为该设备的一部分。例如,NVM/存储设备320可通过网络经由(一个或多个)通信接口325进行访问。The NVM/storage device 320 can include storage resources that are physically part of the device on which the system 300 is installed, or that can be accessed by the device without having to be part of the device. For example, NVM/storage device 320 can be accessed via network via communication interface(s) 325.
(一个或多个)通信接口325可为系统300提供接口以通过一个或多个网络和/或与任意其他适当的设备通信。系统300可根据一个或多个无线网络标准和/或协议中的任意标准和/或协议来与无线网络的一个或多个组件进行无线通信。The communication interface(s) 325 can provide an interface to the system 300 to communicate over one or more networks and/or with any other suitable device. System 300 can wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.
对于一个实施例,(一个或多个)处理器305中的至少一个可与系统控制模块310的一个或多个控制器(例如,存储器控制器模块330)的逻辑封装在一起。对于一个实施例,(一个或多个)处理器305中的至少一个可与系统控制模块310的一个或多个控制器的逻辑封装在一起以形成系统级封装(SiP)。对于一个实施例,(一个或多个)处理器305中的至少一个可与系统控制模块310的一个或多个控制器的逻辑集成在同一模具上。对于一个实施例,(一个或多个)处理器305中的至少一个可与系统控制模块310的一个或多个控制器的逻辑集成在同一模具上以形成片上系统(SoC)。For one embodiment, at least one of the processor(s) 305 can be packaged with the logic of one or more controllers of the system control module 310 (eg, the memory controller module 330). For one embodiment, at least one of the processor(s) 305 can be packaged with the logic of one or more controllers of the system control module 310 to form a system in package (SiP). For one embodiment, at least one of the processor(s) 305 can be integrated on the same mold as the logic of one or more controllers of the system control module 310. For one embodiment, at least one of the processor(s) 305 can be integrated with the logic of one or more controllers of the system control module 310 on the same mold to form a system on a chip (SoC).
在各个实施例中,系统300可以但不限于是:服务器、工作站、台式计算设备或移动计算设备(例如,膝上型计算设备、手持计算设备、平板电脑、上网本等)。在各个实施例中,系统300可具有更多或更少的组件和/或不同的架构。例如,在一些实施例中,系统300包括一个或多个摄像机、键盘、液晶显示器(LCD)屏幕(包括触屏显示器)、非易失性存储器端口、多个天线、图形芯片、专用集成电路(ASIC)和扬声器。In various embodiments, system 300 can be, but is not limited to, a server, workstation, desktop computing device, or mobile computing device (eg, a laptop computing device, a handheld computing device, a tablet, a netbook, etc.). In various embodiments, system 300 can have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an application specific integrated circuit ( ASIC) and speakers.
需要注意的是,本申请可在软件和/或软件与硬件的组合体中被实施,例如,可采用专用集成电路(ASIC)、通用目的计算机或任何其他类似硬件设备来实现。在一个实施例中,本申请的软件程序可以通过处理器执行以实现上文所述步骤或功能。同样地,本申请的软件程序(包括相关的数据结构)可以被存储到计算机可读记录介质中,例如,RAM存储器,磁或光驱动器或软磁盘及类似设备。另外,本申请的一些步骤或功能可采用硬件来实现,例如,作为与处理器配合从而执行各个步骤或功能的电路。It should be noted that the present application can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device. In one embodiment, the software program of the present application can be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including related data structures) of the present application can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like. In addition, some of the steps or functions of the present application may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
另外,本申请的一部分可被应用为计算机程序产品,例如计算机程序指令,当其被计算机执行时,通过该计算机的操作,可以调用或提供根据 本申请的方法和/或技术方案。本领域技术人员应能理解,计算机程序指令在计算机可读介质中的存在形式包括但不限于源文件、可执行文件、安装包文件等,相应地,计算机程序指令被计算机执行的方式包括但不限于:该计算机直接执行该指令,或者该计算机编译该指令后再执行对应的编译后程序,或者该计算机读取并执行该指令,或者该计算机读取并安装该指令后再执行对应的安装后程序。在此,计算机可读介质可以是可供计算机访问的任意可用的计算机可读存储介质或通信介质。In addition, a portion of the application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or technical solution in accordance with the present application. It should be understood by those skilled in the art that the form of computer program instructions in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., accordingly, the manner in which the computer program instructions are executed by the computer includes but not Limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installation. program. Here, the computer readable medium can be any available computer readable storage medium or communication medium that can be accessed by a computer.
通信介质包括藉此包含例如计算机可读指令、数据结构、程序模块或其他数据的通信信号被从一个系统传送到另一系统的介质。通信介质可包括有导的传输介质(诸如电缆和线(例如,光纤、同轴等))和能传播能量波的无线(未有导的传输)介质,诸如声音、电磁、RF、微波和红外。计算机可读指令、数据结构、程序模块或其他数据可被体现为例如无线介质(诸如载波或诸如被体现为扩展频谱技术的一部分的类似机制)中的已调制数据信号。术语“已调制数据信号”指的是其一个或多个特征以在信号中编码信息的方式被更改或设定的信号。调制可以是模拟的、数字的或混合调制技术。Communication media includes media that can be transferred from one system to another by communication signals including, for example, computer readable instructions, data structures, program modules or other data. Communication media can include conductive transmission media such as cables and wires (eg, fiber optics, coaxial, etc.) and wireless (unguided transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. . Computer readable instructions, data structures, program modules or other data may be embodied, for example, as modulated data signals in a wireless medium, such as a carrier wave or a similar mechanism, such as embodied in a portion of a spread spectrum technique. The term "modulated data signal" refers to a signal whose one or more features are altered or set in such a manner as to encode information in the signal. Modulation can be analog, digital or hybrid modulation techniques.
作为示例而非限制,计算机可读存储介质可包括以用于存储诸如计算机可读指令、数据结构、程序模块或其它数据的信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动的介质。例如,计算机可读存储介质包括,但不限于,易失性存储器,诸如随机存储器(RAM,DRAM,SRAM);以及非易失性存储器,诸如闪存、各种只读存储器(ROM,PROM,EPROM,EEPROM)、磁性和铁磁/铁电存储器(MRAM,FeRAM);以及磁性和光学存储设备(硬盘、磁带、CD、DVD);或其它现在已知的介质或今后开发的能够存储供计算机系统使用的计算机可读信息/数据。The computer readable storage medium may comprise, by way of example and not limitation, vols and non-volatile, implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Mobile and non-removable media. For example, a computer readable storage medium includes, but is not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and nonvolatile memory such as flash memory, various read only memories (ROM, PROM, EPROM) , EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disks, tapes, CDs, DVDs); or other currently known media or later developed for storage in computer systems Computer readable information/data used.
在此,根据本申请的一个实施例包括一个装置,该装置包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该装置运行基于前述根据本申请的多个实施例的方法和/或技术方案。Herein, an embodiment in accordance with the present application includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering The apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present application.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It is obvious to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, and the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims instead All changes in the meaning and scope of equivalent elements are included in this application. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is to be understood that the word "comprising" does not exclude other elements or steps. A plurality of units or devices recited in the device claims may also be implemented by a unit or device by software or hardware. The first, second, etc. words are used to denote names and do not denote any particular order.

Claims (42)

  1. 一种在第一用户设备端用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame on a first user equipment side, wherein the method includes:
    向第二用户设备发送视频流;Sending a video stream to the second user equipment;
    接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
    根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
    接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
    根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。And correspondingly presenting a corresponding labeling operation on the first video frame according to the labeling operation information.
  2. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1 wherein the method further comprises:
    存储所述视频流中的视频帧;Storing video frames in the video stream;
    其中,所述根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,包括:The determining, by the second frame related information, the first video frame corresponding to the second video frame in the video stream, including:
    根据所述第二帧相关信息从所存储的视频帧中确定与所述第二视频帧相对应的第一视频帧。Determining, from the stored video frames, a first video frame corresponding to the second video frame according to the second frame related information.
  3. 根据权利要求2所述的方法,其中,所述所存储的视频帧满足以下至少任一项:The method of claim 2 wherein said stored video frames satisfy at least one of:
    所存储的视频帧的发送时间与当前时间的时间间隔小于或等于视频帧存储时长阈值;The time interval between the sending time of the stored video frame and the current time is less than or equal to the video frame storage time threshold;
    所存储的视频帧的累计数量小于或等于预定的视频帧存储数量阈值。The cumulative number of stored video frames is less than or equal to a predetermined number of video frame storage thresholds.
  4. 根据权利要求3所述的方法,其中,所述方法还包括:The method of claim 3, wherein the method further comprises:
    获取所述视频流中视频帧的编解码及传输总时长信息;Obtaining codec and total transmission duration information of the video frame in the video stream;
    根据所述编解码及传输总时长信息调整所述视频帧存储时长阈值或所述视频帧存储数量阈值。And adjusting the video frame storage duration threshold or the video frame storage threshold according to the codec and transmission total duration information.
  5. 根据权利要求1所述的方法,其中,所述向第二用户设备发送视频流,包括:The method of claim 1, wherein the transmitting the video stream to the second user equipment comprises:
    向第二用户设备发送视频流及所述视频流中已发送视频帧的帧标识信息;Transmitting, to the second user equipment, a video stream and frame identification information of the transmitted video frame in the video stream;
    其中,所述根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,包括:The determining, by the second frame related information, the first video frame corresponding to the second video frame in the video stream, including:
    根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,其中所述第一视频帧的帧标识信息与所述第二帧相关信息相对应。Determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information .
  6. 根据权利要求5所述的方法,其中,所述向第二用户设备发送视频流及所述视频流中已发送视频帧的帧标识信息,包括:The method of claim 5, wherein the transmitting the video stream and the frame identification information of the transmitted video frame in the video stream to the second user equipment comprises:
    对多个待传输的视频帧进行编码处理,并将对应视频流及所述视频流中已发送视频帧的帧标识信息发送至第二用户设备。And encoding the plurality of video frames to be transmitted, and transmitting the frame identification information of the corresponding video stream and the transmitted video frame in the video stream to the second user equipment.
  7. 根据权利要求6所述的方法,其中,所述视频流中已发送视频帧的帧标识信息包括该已发送视频帧的编码起始时间信息。The method of claim 6, wherein the frame identification information of the transmitted video frame in the video stream includes encoding start time information of the transmitted video frame.
  8. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1 wherein the method further comprises:
    呈现所述第一视频帧;Presenting the first video frame;
    其中,所述根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作,包括:The performing the corresponding labeling operation on the first video frame in real time according to the labeling operation information, including:
    根据所述标注操作信息在所述第一视频帧上叠加呈现对应的标注操作。And displaying, according to the labeling operation information, a corresponding labeling operation on the first video frame.
  9. 根据权利要求1至8中任一项所述的方法,其中,所述方法还包括:The method of any of claims 1 to 8, wherein the method further comprises:
    将所述第一视频帧作为呈现所述标注操作的优选帧发送至所述第二用户设备。The first video frame is sent to the second user equipment as a preferred frame that presents the labeling operation.
  10. 根据权利要求1至8中任一项所述的方法,其中,所述向第二用户设备发送视频流,包括:The method according to any one of claims 1 to 8, wherein the transmitting the video stream to the second user equipment comprises:
    向第二用户设备及第三用户设备发送视频流。The video stream is sent to the second user equipment and the third user equipment.
  11. 根据权利要求10所述的方法,其中,所述方法还包括:The method of claim 10, wherein the method further comprises:
    将所述第一视频帧作为呈现所述标注操作的优选帧发送至所述第二用户设备和/或所述第三用户设备。Transmitting the first video frame to the second user equipment and/or the third user equipment as a preferred frame presenting the labeling operation.
  12. 根据权利要求11所述的方法,其中,所述将所述第一视频帧发送至所述第二用户设备和/或所述第三用户设备,包括:The method of claim 11, wherein the transmitting the first video frame to the second user equipment and/or the third user equipment comprises:
    将所述第一视频帧及所述第二帧相关信息发送至所述第二用户设备和/或所述第三用户设备,其中,所述第一视频帧用作在所述第二用户设备或所 述第三用户设备中呈现所述标注操作的优选帧。Transmitting the first video frame and the second frame related information to the second user equipment and/or the third user equipment, wherein the first video frame is used as the second user equipment Or the preferred frame of the labeling operation is presented in the third user equipment.
  13. 一种在第二用户设备端用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame on a second user equipment side, wherein the method includes:
    接收第一用户设备所发送的视频流;Receiving a video stream sent by the first user equipment;
    根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;Transmitting second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
    获取所述用户对所述第二视频帧的标注操作信息;Obtaining, by the user, the labeling operation information of the second video frame;
    向所述第一用户设备发送所述标注操作信息。Sending the labeling operation information to the first user equipment.
  14. 根据权利要求13所述的方法,其中,所述接收第一用户设备所发送的视频流,包括:The method of claim 13, wherein the receiving the video stream sent by the first user equipment comprises:
    接收第一用户设备所发送的视频流,及所述视频流中已发送视频帧的帧标识信息;Receiving a video stream sent by the first user equipment, and frame identification information of the transmitted video frame in the video stream;
    其中,所述第二帧相关信息包括以下至少任一项:The second frame related information includes at least one of the following:
    所述第二视频帧的帧标识信息;Frame identification information of the second video frame;
    基于所述第二视频帧的帧标识信息生成的帧相关信息。Frame related information generated based on frame identification information of the second video frame.
  15. 根据权利要求14所述的方法,其中,所述帧标识信息包括所述第二视频帧的编码起始时间信息。The method of claim 14, wherein the frame identification information comprises encoding start time information of the second video frame.
  16. 根据权利要求15所述的方法,其中,所述第二帧相关信息包括所述第二视频帧的解码结束时间信息与编解码及传输总时长信息。The method of claim 15, wherein the second frame related information comprises decoding end time information and codec and transmission total duration information of the second video frame.
  17. 根据权利要求14所述的方法,其中,所述获取所述用户对所述第二视频帧的标注操作信息,包括:The method according to claim 14, wherein the obtaining the labeling operation information of the second video frame by the user comprises:
    实时获取所述用户对所述第二视频帧的标注操作信息;Obtaining, in real time, the labeling operation information of the second video frame by the user;
    其中,所述向所述第一用户设备发送所述标注操作信息,包括:The sending the labeling operation information to the first user equipment includes:
    向所述第一用户设备实时发送所述标注操作信息。Transmitting the annotation operation information to the first user equipment in real time.
  18. 根据权利要求13所述的方法,其中,所述方法还包括:The method of claim 13 wherein the method further comprises:
    接收所述第一用户设备所发送的第一视频帧,其中,所述第一视频帧用作呈现所述标注操作的优选帧;Receiving, by the first user equipment, a first video frame, where the first video frame is used as a preferred frame for presenting the labeling operation;
    在所述第二视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第二视频帧,其中,所述标注操作显示于所述第一视频帧。Loading the first video frame to replace the second video frame in a display window of the second video frame, wherein the labeling operation is displayed on the first video frame.
  19. 根据权利要求13所述的方法,其中,所述方法还包括:The method of claim 13 wherein the method further comprises:
    接收所述第一用户设备所发送的第一视频帧及所述第二帧相关信息,其中,所述第一视频帧用作呈现所述标注操作的优选帧;Receiving, by the first user equipment, the first video frame and the second frame related information, where the first video frame is used as a preferred frame for presenting the labeling operation;
    根据所述第二帧相关信息确定所述第一视频帧用于替换所述第二视频帧;Determining, according to the second frame related information, that the first video frame is used to replace the second video frame;
    在所述第二视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第二视频帧,其中,所述标注操作显示于所述第一视频帧。Loading the first video frame to replace the second video frame in a display window of the second video frame, wherein the labeling operation is displayed on the first video frame.
  20. 根据权利要求13所述的方法,其中,所述接收第一用户设备所发送的视频流,包括:The method of claim 13, wherein the receiving the video stream sent by the first user equipment comprises:
    接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;Receiving a video stream that is sent by the first user equipment to the second user equipment and the third user equipment;
    其中,所述向所述第一用户设备发送所述标注操作信息,包括:The sending the labeling operation information to the first user equipment includes:
    向所述第一用户设备及所述第三用户设备发送所述标注操作信息。And sending the labeling operation information to the first user equipment and the third user equipment.
  21. 一种在第三用户设备端用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame on a third user equipment side, wherein the method includes:
    接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;Receiving a video stream that is sent by the first user equipment to the second user equipment and the third user equipment;
    接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
    根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;Determining, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
    接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
    根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。And correspondingly presenting a corresponding labeling operation on the third video frame according to the labeling operation information.
  22. 根据权利要求21所述的方法,其中,所述方法还包括:The method of claim 21, wherein the method further comprises:
    接收所述第一用户设备所发送的第一视频帧,其中,所述第一视频用作呈现所述标注操作的优选帧;Receiving, by the first user equipment, a first video frame, where the first video is used as a preferred frame for presenting the labeling operation;
    在所述第三视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第三视频帧,其中,所述标注操作显示于所述第一视频帧。Loading the first video frame to replace the third video frame in a display window of the third video frame, wherein the labeling operation is displayed on the first video frame.
  23. 根据权利要求21所述的方法,其中,所述方法还包括:The method of claim 21, wherein the method further comprises:
    接收所述第一用户设备所发送的第一视频帧及所述第二帧相关信息,其中,所述第一视频帧用作呈现所述标注操作的优选帧;Receiving, by the first user equipment, the first video frame and the second frame related information, where the first video frame is used as a preferred frame for presenting the labeling operation;
    根据所述第二帧相关信息确定所述第一视频帧用于替换所述第三视频帧;Determining, according to the second frame related information, that the first video frame is used to replace the third video frame;
    在所述第三视频帧的显示窗口中加载呈现所述第一视频帧以替换所述第三视频帧,其中,所述标注操作显示于所述第一视频帧。Loading the first video frame to replace the third video frame in a display window of the third video frame, wherein the labeling operation is displayed on the first video frame.
  24. 一种在网络设备端用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame on a network device side, wherein the method includes:
    接收并转发第一用户设备发给第二用户设备的视频流;Receiving and forwarding a video stream sent by the first user equipment to the second user equipment;
    接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;Receiving second frame related information of the second video frame intercepted by the second user equipment in the video stream;
    将所述第二帧相关信息转发至所述第一用户设备;Forwarding the second frame related information to the first user equipment;
    接收所述第二用户设备对所述第二视频帧的标注操作信息;Receiving, by the second user equipment, labeling operation information of the second video frame;
    将所述标注操作信息转发至所述第一用户设备。Forwarding the labeling operation information to the first user equipment.
  25. 根据权利要求24所述的方法,其中,所述接收并转发第一用户设备发给第二用户设备的视频流包括:The method of claim 24, wherein the receiving and forwarding the video stream sent by the first user equipment to the second user equipment comprises:
    接收并转发第一用户设备发给第二用户设备的视频流,及所述视频流中已发送视频帧的帧标识信息。Receiving and forwarding a video stream sent by the first user equipment to the second user equipment, and frame identification information of the transmitted video frame in the video stream.
  26. 根据权利要求25所述的方法,其中,所述将所述第二帧相关信息转发至所述第一用户设备包括:The method of claim 25, wherein the forwarding the second frame related information to the first user equipment comprises:
    根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的视频帧的帧标识信息;Determining, according to the second frame related information, frame identification information of a video frame corresponding to the second video frame in the video stream;
    将所述第二视频帧相对应的视频帧的帧标识信息发送至所述第一用户设备。Sending frame identification information of the video frame corresponding to the second video frame to the first user equipment.
  27. 根据权利要求24所述的方法,其中,所述接收并转发第一用户设备发给第二用户设备的视频流,包括:The method of claim 24, wherein the receiving and forwarding the video stream sent by the first user equipment to the second user equipment comprises:
    接收并转发第一用户设备发给第二用户设备及第三用户设备的视频流;Receiving and forwarding a video stream sent by the first user equipment to the second user equipment and the third user equipment;
    其中,所述将所述第二帧相关信息转发至所述第一用户设备,包括:The forwarding the second frame related information to the first user equipment includes:
    将所述第二帧相关信息转发至所述第一用户设备及所述第三用户设备;Forwarding the second frame related information to the first user equipment and the third user equipment;
    其中,所述将所述标注操作信息转发至所述第一用户设备,包括:The forwarding the labeling operation information to the first user equipment includes:
    将所述标注操作信息转发至所述第一用户设备及所述第三用户设备。And forwarding the labeling operation information to the first user equipment and the third user equipment.
  28. 一种用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame, wherein the method comprises:
    第一用户设备向第二用户设备发送视频流;The first user equipment sends a video stream to the second user equipment;
    所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the first user equipment;
    所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
    所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述第一用户设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the first user equipment;
    所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  29. 一种用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame, wherein the method comprises:
    第一用户设备向网络设备发送视频流;The first user equipment sends a video stream to the network device;
    所述网络设备接收所述视频流,并向第二用户设备转发所述视频流;Receiving, by the network device, the video stream, and forwarding the video stream to a second user equipment;
    所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述网络设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the network device;
    所述网络设备接收所述第二帧相关信息,并将所述第二帧相关信息转发至所述第一用户设备;Receiving, by the network device, the second frame related information, and forwarding the second frame related information to the first user equipment;
    所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
    所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述网络设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the network device;
    所述网络设备接收所述第二用户设备对所述第二视频帧的标注操作信息,将所述标注操作信息转发至所述第一用户设备;The network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment;
    所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information.
  30. 一种用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame, wherein the method comprises:
    第一用户设备向第二用户设备及第三用户设备发送视频流;The first user equipment sends a video stream to the second user equipment and the third user equipment;
    所述第二用户设备根据用户在所述视频流中的截图操作,向所述第一用户设备及所述第三用户设备发送被截取的第二视频帧的第二帧相关信息;Transmitting, by the second user equipment, the second frame related information of the intercepted second video frame to the first user equipment and the third user equipment according to a screenshot operation of the user in the video stream;
    所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述第一用户设备及所述第三用户设备发送所述标注操作信息;And acquiring, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the first user equipment and the third user equipment;
    所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作;Receiving, by the first user equipment, the second frame related information, determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, and receiving the labeling operation information And displaying a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
    所述第三用户设备接收所述视频流,接收所述第二视频帧的第二帧相关信息,接收所述标注操作信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧,根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。Receiving, by the third user equipment, the video stream, receiving second frame related information of the second video frame, receiving the labeling operation information, determining, according to the second frame related information, the video stream and the The third video frame corresponding to the second video frame presents a corresponding labeling operation in real time on the third video frame according to the labeling operation information.
  31. 一种用于对视频帧进行实时标注的方法,其中,该方法包括:A method for real-time annotation of a video frame, wherein the method comprises:
    第一用户设备向网络设备发送视频流;The first user equipment sends a video stream to the network device;
    所述网络设备接收所述视频流,并向第二用户设备及第三用户设备转发所述视频流;Receiving, by the network device, the video stream, and forwarding the video stream to the second user equipment and the third user equipment;
    所述第二用户设备接收所述视频流,根据用户在所述视频流中的截图操作,向所述网络设备发送被截取的第二视频帧的第二帧相关信息;Receiving, by the second user equipment, the video stream, and sending, according to a screenshot operation of the user in the video stream, second frame related information of the intercepted second video frame to the network device;
    所述网络设备接收所述第二帧相关信息,并将所述第二帧相关信息转发至所述第一用户设备及所述第三用户设备;Receiving, by the network device, the second frame related information, and forwarding the second frame related information to the first user equipment and the third user equipment;
    所述第一用户设备接收所述第二帧相关信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;Receiving, by the first user equipment, the second frame related information, and determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
    所述第二用户设备获取所述用户对所述第二视频帧的标注操作信息,向所述网络设备发送所述标注操作信息;Obtaining, by the second user equipment, the labeling operation information of the second video frame by the user, and sending the labeling operation information to the network device;
    所述网络设备接收所述第二用户设备对所述第二视频帧的标注操作信息,将所述标注操作信息转发至所述第一用户设备及第三用户设备;The network device receives the labeling operation information of the second video frame by the second user equipment, and forwards the labeling operation information to the first user equipment and the third user equipment;
    所述第一用户设备接收所述标注操作信息,并根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作;The first user equipment receives the labeling operation information, and presents a corresponding labeling operation in real time on the first video frame according to the labeling operation information;
    所述第三用户设备接收所述视频流,接收所述第二视频帧的第二帧相关信息,接收所述标注操作信息,根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧,根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。Receiving, by the third user equipment, the video stream, receiving second frame related information of the second video frame, receiving the labeling operation information, determining, according to the second frame related information, the video stream and the The third video frame corresponding to the second video frame presents a corresponding labeling operation in real time on the third video frame according to the labeling operation information.
  32. 一种用于对视频帧进行实时标注的第一用户设备,其中,该设备包 括:A first user equipment for real-time annotation of a video frame, wherein the device comprises:
    视频发送模块,用于向第二用户设备发送视频流;a video sending module, configured to send a video stream to the second user equipment;
    帧信息接收模块,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;a frame information receiving module, configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
    视频帧确定模块,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧;a video frame determining module, configured to determine, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream;
    标注接收模块,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;An annotation receiving module, configured to receive the labeling operation information of the second video frame by the second user equipment;
    标注呈现模块,用于根据所述标注操作信息在所述第一视频帧上实时呈现对应的标注操作。And an annotation presentation module, configured to present a corresponding annotation operation on the first video frame in real time according to the annotation operation information.
  33. 根据权利要求32所述的设备,其中,所述视频发送模块用于:The device of claim 32, wherein the video transmitting module is configured to:
    向第二用户设备发送视频流及所述视频流中已发送视频帧的帧标识信息;Transmitting, to the second user equipment, a video stream and frame identification information of the transmitted video frame in the video stream;
    其中,所述视频帧确定模块用于:The video frame determining module is configured to:
    根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第一视频帧,其中所述第一视频帧的帧标识信息与所述第二帧相关信息相对应。Determining, according to the second frame related information, a first video frame corresponding to the second video frame in the video stream, where frame identification information of the first video frame corresponds to the second frame related information .
  34. 一种用于对视频帧进行实时标注的第二用户设备,其中,该设备包括:A second user equipment for real-time annotation of a video frame, wherein the device includes:
    视频接收模块,用于接收第一用户设备所发送的视频流;a video receiving module, configured to receive a video stream sent by the first user equipment;
    帧信息确定模块,用于根据用户在所述视频流中的截图操作,向所述第一用户设备发送被截取的第二视频帧的第二帧相关信息;a frame information determining module, configured to send second frame related information of the intercepted second video frame to the first user equipment according to a screenshot operation of the user in the video stream;
    标注获取模块,用于获取所述用户对所述第二视频帧的标注操作信息;An annotation obtaining module, configured to acquire the labeling operation information of the second video frame by the user;
    标注发送模块,用于向所述第一用户设备发送所述标注操作信息。And an annotation sending module, configured to send the labeling operation information to the first user equipment.
  35. 一种用于对视频帧进行实时标注的第三用户设备,其中,该设备包括:A third user equipment for real-time annotation of a video frame, wherein the device comprises:
    第三视频接收模块,用于接收第一用户设备所发送至第二用户设备及第三用户设备的视频流;a third video receiving module, configured to receive a video stream that is sent by the first user equipment to the second user equipment and the third user equipment;
    第三帧信息接收模块,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;a third frame information receiving module, configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
    第三视频帧确定模块,用于根据所述第二帧相关信息确定所述视频流中与所述第二视频帧相对应的第三视频帧;a third video frame determining module, configured to determine, according to the second frame related information, a third video frame corresponding to the second video frame in the video stream;
    第三标注接收模块,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;a third label receiving module, configured to receive the labeling operation information of the second video frame by the second user equipment;
    第三呈现模块,用于根据所述标注操作信息在所述第三视频帧上实时呈现对应的标注操作。The third rendering module is configured to present a corresponding labeling operation on the third video frame in real time according to the labeling operation information.
  36. 一种用于对视频帧进行实时标注的网络设备,其中,该设备包括:A network device for real-time annotation of a video frame, wherein the device includes:
    视频转发模块,用于接收并转发第一用户设备发给第二用户设备的视频流;a video forwarding module, configured to receive and forward a video stream sent by the first user equipment to the second user equipment;
    帧信息接收模块,用于接收所述第二用户设备在所述视频流中所截取的第二视频帧的第二帧相关信息;a frame information receiving module, configured to receive second frame related information of the second video frame that is intercepted by the second user equipment in the video stream;
    帧信息转发模块,用于将所述第二帧相关信息转发至所述第一用户设备;a frame information forwarding module, configured to forward the second frame related information to the first user equipment;
    标注接收模块,用于接收所述第二用户设备对所述第二视频帧的标注操作信息;An annotation receiving module, configured to receive the labeling operation information of the second video frame by the second user equipment;
    标注转发模块,用于将所述标注操作信息转发至所述第一用户设备。And an annotation forwarding module, configured to forward the labeling operation information to the first user equipment.
  37. 一种用于对视频帧进行实时标注的系统,其中,所述系统包括:如权利要求32或33中任一项所述的第一用户设备及如权利要求34所述的第二用户设备。A system for real-time annotation of video frames, wherein the system comprises: a first user device according to any one of claims 32 or 33 and a second user device according to claim 34.
  38. 根据权利要求37所述的系统,其中,该系统还包括:如权利要求36所述的网络设备。The system of claim 37, wherein the system further comprises: the network device of claim 36.
  39. 一种用于对视频帧进行实时标注的系统,其中,所述系统包括:如权利要求32或33中任一项所述的第一用户设备、如权利要求34所述的第二用户设备以及如权利要求35所述的第三用户设备。A system for real-time annotation of video frames, wherein the system comprises: a first user equipment according to any one of claims 32 or 33, a second user equipment according to claim 34, and A third user equipment as claimed in claim 35.
  40. 根据权利要求39所述的系统,其中,该系统还包括:如权利要求36所述的网络设备。The system of claim 39, wherein the system further comprises: the network device of claim 36.
  41. 一种用于对视频帧进行实时标注的设备,其中,该设备包括:A device for real-time annotation of a video frame, wherein the device includes:
    处理器;以及Processor;
    被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行如权利要求1至27中任一项所述方法的操作。A memory arranged to store computer executable instructions that, when executed, cause the processor to perform the operations of the method of any one of claims 1 to 27.
  42. 一种存储指令的计算机可读介质,所述指令在被执行时使得系统进行如权利要求1至27中任一项所述方法的操作。A computer readable medium storing instructions which, when executed, cause a system to perform the operations of the method of any one of claims 1 to 27.
PCT/CN2018/121730 2018-01-05 2018-12-18 Method and device for labeling video frames in real time WO2019134499A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810011908 2018-01-05
CN201810011908.0 2018-01-05
CN201810409977.7 2018-05-02
CN201810409977.7A CN108401190B (en) 2018-01-05 2018-05-02 Method and equipment for real-time labeling of video frames

Publications (1)

Publication Number Publication Date
WO2019134499A1 true WO2019134499A1 (en) 2019-07-11

Family

ID=63101425

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/121730 WO2019134499A1 (en) 2018-01-05 2018-12-18 Method and device for labeling video frames in real time

Country Status (2)

Country Link
CN (1) CN108401190B (en)
WO (1) WO2019134499A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108401190B (en) * 2018-01-05 2020-09-04 亮风台(上海)信息科技有限公司 Method and equipment for real-time labeling of video frames
CN112950951B (en) * 2021-01-29 2023-05-02 浙江大华技术股份有限公司 Intelligent information display method, electronic device and storage medium
CN113596517B (en) * 2021-07-13 2022-08-09 北京远舢智能科技有限公司 Image freezing and labeling method and system based on mixed reality
CN114201645A (en) * 2021-12-01 2022-03-18 北京百度网讯科技有限公司 Object labeling method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412622A (en) * 2016-11-14 2017-02-15 百度在线网络技术(北京)有限公司 Method and apparatus for displaying barrage information during video content playing process
CN106603537A (en) * 2016-12-19 2017-04-26 广东威创视讯科技股份有限公司 System and method for marking video signal source of mobile intelligent terminal
CN107277641A (en) * 2017-07-04 2017-10-20 上海全土豆文化传播有限公司 A kind of processing method and client of barrage information
CN107333087A (en) * 2017-06-27 2017-11-07 京东方科技集团股份有限公司 A kind of information sharing method and device based on video session
CN108401190A (en) * 2018-01-05 2018-08-14 亮风台(上海)信息科技有限公司 A kind of method and apparatus for being marked in real time to video frame

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7810121B2 (en) * 2002-05-03 2010-10-05 Time Warner Interactive Video Group, Inc. Technique for delivering network personal video recorder service and broadcast programming service over a communications network
US20060104601A1 (en) * 2004-11-15 2006-05-18 Ati Technologies, Inc. Method and apparatus for programming the storage of video information
CN103716586A (en) * 2013-12-12 2014-04-09 中国科学院深圳先进技术研究院 Monitoring video fusion system and monitoring video fusion method based on three-dimension space scene
CN104935861B (en) * 2014-03-19 2019-04-19 成都鼎桥通信技术有限公司 A kind of Multiparty Multimedia communication means
CN104954812A (en) * 2014-03-27 2015-09-30 腾讯科技(深圳)有限公司 Video synchronized playing method, device and system
CN104536661A (en) * 2014-12-17 2015-04-22 深圳市金立通信设备有限公司 Terminal screen shot method
US9516255B2 (en) * 2015-01-21 2016-12-06 Microsoft Technology Licensing, Llc Communication system
CN104883515B (en) * 2015-05-22 2018-11-02 广东威创视讯科技股份有限公司 A kind of video labeling processing method and video labeling processing server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412622A (en) * 2016-11-14 2017-02-15 百度在线网络技术(北京)有限公司 Method and apparatus for displaying barrage information during video content playing process
CN106603537A (en) * 2016-12-19 2017-04-26 广东威创视讯科技股份有限公司 System and method for marking video signal source of mobile intelligent terminal
CN107333087A (en) * 2017-06-27 2017-11-07 京东方科技集团股份有限公司 A kind of information sharing method and device based on video session
CN107277641A (en) * 2017-07-04 2017-10-20 上海全土豆文化传播有限公司 A kind of processing method and client of barrage information
CN108401190A (en) * 2018-01-05 2018-08-14 亮风台(上海)信息科技有限公司 A kind of method and apparatus for being marked in real time to video frame

Also Published As

Publication number Publication date
CN108401190A (en) 2018-08-14
CN108401190B (en) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2019134499A1 (en) Method and device for labeling video frames in real time
US9445150B2 (en) Asynchronously streaming video of a live event from a handheld device
CN111135569A (en) Cloud game processing method and device, storage medium and electronic equipment
CN105830047A (en) Uploading and transcoding media files
US20200241835A1 (en) Method and apparatus of audio/video switching
CN104685873B (en) Encoding controller and coding control method
CN113938470B (en) Method and device for playing RTSP data source by browser and streaming media server
US10819951B2 (en) Recording video from a bitstream
WO2021057697A1 (en) Video encoding and decoding methods and apparatuses, storage medium, and electronic device
WO2019149066A1 (en) Video playback method, terminal apparatus, and storage medium
EP4009648A1 (en) Cloud desktop video playback method, server, terminal, and storage medium
EP3800896A1 (en) Uploading and transcoding media files
CN111385576B (en) Video coding method and device, mobile terminal and storage medium
WO2024051823A1 (en) Method for managing reception information and back-end device
JP2014075735A (en) Image processor and image processing method
WO2021120124A1 (en) Method and apparatus for video display, and computer storage medium
US9872060B1 (en) Write confirmation of a digital video record channel
CN112866745B (en) Streaming video data processing method, device, computer equipment and storage medium
CN110855645B (en) Streaming media data playing method and device
CN110798700B (en) Video processing method, video processing device, storage medium and electronic equipment
KR100899666B1 (en) Dispersed multistreaming transmission apparatus
WO2018161790A1 (en) Video transmission method and device
WO2024046124A1 (en) Video processing method and apparatus, and server
US9350968B1 (en) Enhanced digital video recording using video transcoding
TWI688268B (en) Multimedia file management method, terminal device, server device and file management system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18898158

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18898158

Country of ref document: EP

Kind code of ref document: A1