CN112702641A - Video processing method, camera, recording and playing host, system and storage medium - Google Patents

Video processing method, camera, recording and playing host, system and storage medium Download PDF

Info

Publication number
CN112702641A
CN112702641A CN202011532654.0A CN202011532654A CN112702641A CN 112702641 A CN112702641 A CN 112702641A CN 202011532654 A CN202011532654 A CN 202011532654A CN 112702641 A CN112702641 A CN 112702641A
Authority
CN
China
Prior art keywords
image
target
position information
acquiring
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011532654.0A
Other languages
Chinese (zh)
Inventor
袁延金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202011532654.0A priority Critical patent/CN112702641A/en
Publication of CN112702641A publication Critical patent/CN112702641A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/02Electrically-operated educational appliances with visual presentation of the material to be studied, e.g. using film strip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Studio Circuits (AREA)

Abstract

The application provides a video processing method, a video camera, a recording and playing host, a system and a storage medium. The video processing method comprises the following steps: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification; outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information; acquiring a second image of a second target scene; outputting the second image to the device so that the device fuses the target image in the second image.

Description

Video processing method, camera, recording and playing host, system and storage medium
Technical Field
The present application relates to the field of video technologies, and in particular, to a video processing method, a video camera, a recording and playing host, a system, and a storage medium.
Background
In an application scenario of remote teaching, etc., a video source may generally provide at least two scene pictures to a playing end. Here, the scene screen is, for example, a lecture screen and a courseware screen of a teacher. The playing end usually needs multiple displays to present multiple scene pictures, for example, the first display presents a teaching picture, and the second display presents a courseware picture. The manner in which multiple displays present a picture is prone to viewer inattention and the like. In a scene where a plurality of scene pictures need to be displayed by a single display, the display area of the single scene picture is small due to the limitation of the display.
In view of this, how to perform scene fusion is a technical problem to be solved.
Disclosure of Invention
The application provides a video processing method, a video camera, a recording and broadcasting host, a system and a storage medium, which can realize the fusion of scene pictures.
According to an aspect of the present application, there is provided a video processing method, including:
acquiring a first image of a first target scene;
determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information;
acquiring a second image of a second target scene;
outputting the second image to the device so that the device fuses the target image in the second image.
In some embodiments, said obtaining a first image of a first target scene comprises: acquiring a first image of a teaching scene, wherein the first image is a teaching picture;
the acquiring a second image of a second target scene includes: acquiring a second image of a courseware playing scene, wherein the second image is a courseware picture played currently;
the determining the position information of the target object in the first image comprises: and determining a mask image of the target object in the first image, and using the mask image as position information.
In some embodiments, the method further comprises, before acquiring the first image of the first target scene:
acquiring network bandwidth information, wherein the network bandwidth information is used for representing the transmission bandwidth of a network for transmitting the first image and the position information;
determining image acquisition parameters of a first target scene according to the network bandwidth information, wherein the image acquisition parameters comprise at least one of resolution, code rate, frame rate and coding format;
the acquiring of the first image of the first target scene comprises:
and acquiring the first image generated according to the image acquisition parameters.
According to an aspect of the present application, there is provided a video processing method, including:
receiving a first image of a first target scene, the first image including a synchronization identifier;
receiving location information describing a location of a target object in an image, the location information including a synchronization identifier;
receiving a second image of a second target scene;
extracting a target image from a first image with the same synchronous identification as the target image according to the position information;
and fusing the target image in the second image to obtain a third image, and displaying the third image.
In some embodiments, before fusing the target image in the second image to obtain a third image and displaying the third image, the method further includes:
acquiring a user instruction, wherein the user instruction is used for indicating at least one of the position, the transparency and the size of the target image relative to the second image;
the fusing the target image in the second image to obtain a third image includes:
and generating a third image according to the user instruction.
According to an aspect of the present application, there is provided a camera including:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.
According to an aspect of the present application, there is provided a recording and broadcasting host, including:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.
According to an aspect of the present application, there is provided a display device including:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.
According to an aspect of the present application, there is provided a recording and broadcasting system, including:
a camera for: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
a computing device to play a second image of a second target scene;
the first recording and broadcasting host is used for: acquiring the first image containing the synchronous identification and position information containing the synchronous identification from the camera, acquiring a second image, and outputting the first image, the position information and the second image;
the second recording and broadcasting host is used for: receiving the first image, the position information and the second image from a first recording and broadcasting host;
a display device in a play scene to: acquiring the first image, the position information and the second image from a second recording and broadcasting host, and extracting a target image from the first image with the same synchronous identification as the position information according to the position information; and fusing the target image in the second image to obtain a third image, and displaying the third image.
According to an aspect of the present application, there is provided a recording and broadcasting system, including:
a camera according to the present application;
a computing device to play a second image of a second target scene;
a display device according to the present application.
According to an aspect of the present application, there is provided a storage medium storing a program including instructions that, when executed by a computing device, cause the computing device to perform a video processing method.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Drawings
FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;
FIG. 3 illustrates a flow diagram of a video processing method 300 according to some embodiments of the present application;
FIG. 4 illustrates a schematic view of a lecture screen according to some embodiments of the present application;
FIG. 5A illustrates a mask map of a target object according to some embodiments of the present application;
FIG. 5B illustrates a schematic diagram of a courseware screen according to some embodiments of the present application;
FIG. 6 illustrates a flow diagram of a video processing method 600 according to some embodiments of the present application;
FIG. 7 illustrates a flow diagram of a video processing method 700 according to some embodiments of the present application;
FIG. 8 illustrates a flow diagram of a video processing method 800 according to some embodiments of the present application;
FIG. 9 illustrates a flow diagram of a video processing method 900 according to some embodiments of the present application;
FIG. 10 illustrates a schematic diagram of a target object according to some embodiments of the present application;
FIG. 11 illustrates a schematic diagram of a merged picture according to some embodiments of the present application;
FIG. 12 illustrates a schematic diagram of a computing device according to some embodiments of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.
FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.
As shown in fig. 1, an application scenario may include a camera 101, a computing device 102, a recording host 103, a service platform 105, a recording host 106, and a display device 107. Wherein, the recording and broadcasting host machine can also be called as a recording and broadcasting all-in-one machine.
The camera 101 may acquire a first codestream of a first target scene. Any image frame in the first codestream may be represented as a first image, for example. The first target scene is an image capture scene such as a lecture scene. The first image is, for example, a teacher teaching screen. For example, the camera 101 may acquire a picture of a teacher's lecture in a main classroom. The main classroom refers to the classroom in which the teacher gives lessons. The computing device 102 is operable to play a second image of a second target scene. The second target scene is, for example, a courseware playing scene. The second image is, for example, a courseware screen.
The camera 101 may also determine position information of the target object in the first image. Here, the target object is, for example, a human body object, but is not limited thereto. The location information may be represented, for example, as a mask map of the target object. Here, the target object is represented by the mask map, so that transfer consumption for directly transferring the target object can be avoided, and the data transfer amount can be saved. In addition, the camera 101 configures the same synchronization mark for the first image and the position information, respectively. The synchronization identifier is used to associate the first image with the location information. Here, the synchronization identifier is, for example, a frame number (e.g., an RTP frame number), but is not limited thereto.
The recording and broadcasting host 103 is used for: a first image containing a synchronization mark and position information containing the synchronization mark are acquired from the camera 101. In addition, the recording and broadcasting host 103 can also acquire the second image and output the first image, the position information and the second image. For example, the recording host 103 may output the first image, the location information, and the second image to the service platform 105 through the network 104. It should be noted that the operations performed by the video camera 101 may also be performed by the recording and broadcasting host 103, which is not limited in this application.
The service platform 105 may forward the received data. For example, service platform 105 may transmit the first image, the location information, and the second image to videocasting host 106.
Recording host 106 may receive the first image, the location information, and the second image from recording host 103;
the display device 107 is in a play scene. Here, the playback scene is, for example, from a classroom. The slave classroom refers to the classroom in which the student is located. The display device is, for example, an Open Plug Specification (OPS) display. Although fig. 1 shows one display device 107, in practice more display devices 107 may be arranged for an application scenario. Display device 107 may obtain the first image, the location information, and the second image from recording host 106. Also, the display device 107 may extract the target image from the first image whose synchronous identification is the same as it is based on the position information. The display device 107 may also fuse the target image in the second image to obtain a third image, and display the third image.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.
As shown in fig. 2, an application scenario may include a camera 101, a computing device 102, and a display device 107. The camera 101 and the computing device 102 are in a first target scene. The first object scene is for example a main classroom. Here, the camera 101 may acquire a first code stream of a first target scene. Any image frame in the first codestream may be represented as a first image, for example. The first target scene is an image capture scene such as a lecture scene. The first image is, for example, a teacher teaching screen. The computing device 102 is operable to play a second image of a second target scene. The second target scene is, for example, a courseware playing scene. The second image is, for example, a courseware screen. The camera 101 may also determine position information of the target object in the first image. Here, the target object is, for example, a human body object, but is not limited thereto. The location information may be, for example, a mask map of the target object. In addition, the camera 101 configures the same synchronization mark for the first image and the position information, respectively. The synchronization identifier is used to associate the first image with the location information. Here, the synchronization identifier is, for example, a frame number (e.g., an RTP frame number), but is not limited thereto.
The display device 107 is in a play scene. The playback scene is for example from a classroom. Display device 107 may obtain the first image, the location information, and the second image from recording host 106. Also, the display device 107 may extract the target image from the first image whose synchronous identification is the same as it is based on the position information. The display device 107 may fuse the target image in the second image to obtain a third image, and display the third image.
Fig. 3 illustrates a flow diagram of a video processing method 300 according to some embodiments of the present application. The method 300 may be performed, for example, in the camera 101 or the recording and broadcasting host 103.
As shown in fig. 3, in step S301, a first image of a first target scene is acquired. For example, step S301 acquires a first image of a lecture scene. The first image is a teaching picture. For example, FIG. 4 illustrates a schematic view of a lecture screen according to some embodiments of the present application. Fig. 4 is a teacher lecture screen taken from the main classroom, for example.
In step S302, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification. The synchronization identifier is used to associate the first image with the location information. For example, step S302 may determine a mask map of the target object in the first image, with the mask map as the position information. FIG. 5A illustrates a mask map of a target object according to some embodiments of the present application. Fig. 5A is a mask diagram of the human subject of fig. 4.
In step S303, the first image containing the synchronization mark and the position information containing the synchronization mark are output, so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information.
In step S304, a second image of a second target scene is acquired. For example, step S304 may acquire a second image of a courseware playing scene. The second image is a courseware frame currently being played by the computing device 102. For example, fig. 5B illustrates a schematic diagram of a courseware screen according to some embodiments of the present application.
In step S305, the second image is output to the apparatus that received the first image, so that the apparatus fuses the target image in the second image.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Fig. 6 illustrates a flow diagram of a video processing method 600 according to some embodiments of the present application.
As shown in fig. 6, in step S601, a first image of a first target scene is acquired. Here, step S601 may be executed by the video camera 101, for example. Step S601 may acquire, for example, a first image of a lecture scene.
In step S602, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification. Here, step S602 may be executed by the video camera 101 or the recording and broadcasting host 103, for example. The synchronization identifier is used to associate the first image with the location information.
In step S603, the first image containing the synchronization mark and the position information containing the synchronization mark are output, so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information. Step S603 may be performed by the recording host 103, for example.
In step S604, a second image of a second target scene is acquired. Step S604 may be performed by the recording host 103, for example. For example, step S604 may acquire a second image of a courseware playing scene. The second image is a courseware frame currently being played by the computing device 102.
In step S605, the second image is output to the apparatus that received the first image, so that the apparatus fuses the target image in the second image. Step S605 may be performed by the recording host 103, for example.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Fig. 7 illustrates a flow diagram of a video processing method 700 according to some embodiments of the present application. The method 700 may be performed, for example, in the camera 101.
As shown in fig. 7, in step S701, network bandwidth information is acquired. The network bandwidth information is used to characterize the transmission bandwidth of the network 104 that transmits the first image and the location information.
In step S702, image capture parameters for the first target scene are determined according to the network bandwidth information. The image acquisition parameters comprise at least one of resolution, code rate, frame rate and encoding format. The encoding format is, for example, H264 or H265.
In step S703, a first image generated according to the image acquisition parameters is acquired. Here, the manner of generating the first image based on the image acquisition parameter may be adaptive to a network bandwidth, so that real-time performance of data transmission may be ensured.
In step S704, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification.
In step S705, the first image containing the synchronization mark and the position information containing the synchronization mark are output so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information.
In step S706, a second image of a second target scene is acquired.
In step S707, the second image is output to the apparatus that receives the first image, so that the apparatus fuses the target image in the second image.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved. In addition, the video processing scheme can also adjust the code rate of the image according to the network bandwidth, so that the real-time performance of image transmission can be improved, and the video playing effect is further improved.
Fig. 8 illustrates a flow diagram of a video processing method 800 according to some embodiments of the present application. The method 800 may be performed, for example, in the display device 107.
As shown in fig. 8, in step S801, a first image of a first target scene is received. The first image includes a synchronization mark.
In step S802, location information describing the location of a target object in an image is received. The location information includes a synchronization identifier.
In step S803, a second image of a second target scene is received.
In step S804, the target image is extracted from the first image whose synchronous identification is the same as that thereof, based on the position information.
In step S805, the target image is fused in the second image to obtain a third image, and the third image is displayed.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Fig. 9 illustrates a flow diagram of a video processing method 900 according to some embodiments of the present application. The method 900 may be performed, for example, in the display device 107.
As shown in fig. 9, in step S901, a first image of a first target scene is received. The first image includes a synchronization mark.
In step S902, position information describing the position of the target object in the image is received. The location information includes a synchronization identifier.
In step S903, a second image of a second target scene is received.
In step S904, the target image is extracted from the first image whose synchronous identification is the same as that thereof, based on the position information. For example, the first image received by the display device 107 is shown in fig. 4, and the received position information is shown in fig. 5A. The target object extracted in step S904 is as shown in fig. 10.
In step S905, a user instruction is acquired. The user instruction is used for indicating at least one of the position, transparency and size of the target image relative to the second image.
In step S906, the target image is fused in the second image to obtain a third image, and the third image is displayed. For example, step S906 may generate the third image according to a user instruction. The third image (i.e., the fused picture of the target image and the second image) is shown in fig. 11.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved. In addition, the video processing scheme can improve flexibility of screen layout by a user instructing adjustment of the position, transparency, size, and the like of the target image with respect to the second image.
FIG. 12 illustrates a schematic diagram of a computing device according to some embodiments of the present application. Here, as shown in FIG. 12, the computing device includes one or more processors (CPUs) 1202, a communications module 1204, a memory 1206, a user interface 1210, and a communications bus 1208 interconnecting these components.
The processor 1202 can receive and transmit data via the communication module 1204 to enable network communication and/or local communication.
The user interface 1210 includes one or more output devices 1212 including one or more speakers and/or one or more visual displays. The user interface 1210 also includes one or more input devices 1214. The user interface 1210 may receive, for example, an instruction of a remote controller, but is not limited thereto.
The memory 1206 may be a high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 1206 stores a set of instructions executable by the processor 1202, including:
an operating system 1216 including programs for handling various basic system services and for performing hardware related tasks;
applications 1218, including various programs for implementing the video processing schemes described above. Such a program can implement the processing flow in each of the above examples, and may include, for example, a video processing method.
In addition, each of the embodiments of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that the data processing program constitutes the invention. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present invention. The storage medium may use any type of recording means, such as a paper storage medium (e.g., paper tape, etc.), a magnetic storage medium (e.g., a flexible disk, a hard disk, a flash memory, etc.), an optical storage medium (e.g., a CD-ROM, etc.), a magneto-optical storage medium (e.g., an MO, etc.), and the like.
The present application thus also discloses a non-volatile storage medium in which a program is stored. The program comprises instructions which, when executed by a processor, cause a computing device to perform a video processing method according to the present application.
In addition, the method steps described in this application may be implemented by hardware, for example, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers, embedded microcontrollers, and the like, in addition to data processing programs. Such hardware capable of implementing the methods described herein may also constitute the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (11)

1. A video processing method, comprising:
acquiring a first image of a first target scene;
determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information;
acquiring a second image of a second target scene;
outputting the second image to the device so that the device fuses the target image in the second image.
2. The video processing method of claim 1,
the acquiring of the first image of the first target scene comprises: acquiring a first image of a teaching scene, wherein the first image is a teaching picture;
the acquiring a second image of a second target scene includes: acquiring a second image of a courseware playing scene, wherein the second image is a courseware picture played currently;
the determining the position information of the target object in the first image comprises: and determining a mask image of the target object in the first image, and using the mask image as position information.
3. The video processing method of claim 1, wherein prior to acquiring the first image of the first target scene, further comprising:
acquiring network bandwidth information, wherein the network bandwidth information is used for representing the transmission bandwidth of a network for transmitting the first image and the position information;
determining image acquisition parameters of a first target scene according to the network bandwidth information, wherein the image acquisition parameters comprise at least one of resolution, code rate, frame rate and coding format;
the acquiring of the first image of the first target scene comprises:
and acquiring the first image generated according to the image acquisition parameters.
4. A video processing method, comprising:
receiving a first image of a first target scene, the first image including a synchronization identifier;
receiving location information describing a location of a target object in an image, the location information including a synchronization identifier;
receiving a second image of a second target scene;
extracting a target image from a first image with the same synchronous identification as the target image according to the position information;
and fusing the target image in the second image to obtain a third image, and displaying the third image.
5. The video processing method of claim 4, wherein before fusing the target image in the second image to obtain a third image and displaying the third image, further comprising:
acquiring a user instruction, wherein the user instruction is used for indicating at least one of the position, the transparency and the size of the target image relative to the second image;
the fusing the target image in the second image to obtain a third image includes:
and generating a third image according to the user instruction.
6. A camera, comprising:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of any of claims 1-5.
7. A recording and broadcasting host, comprising:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of any of claims 1-3.
8. A display device, comprising:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of claim 4 or 5.
9. A recording and broadcasting system, comprising:
a camera for: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
a computing device to play a second image of a second target scene;
the first recording and broadcasting host is used for: acquiring the first image containing the synchronous identification and position information containing the synchronous identification from the camera, acquiring a second image, and outputting the first image, the position information and the second image;
the second recording and broadcasting host is used for: receiving the first image, the position information and the second image from a first recording and broadcasting host;
a display device in a play scene to: acquiring the first image, the position information and the second image from a second recording and broadcasting host, and extracting a target image from the first image with the same synchronous identification as the position information according to the position information; and fusing the target image in the second image to obtain a third image, and displaying the third image.
10. A recording and broadcasting system, comprising:
the camera of claim 6;
a computing device to play a second image of a second target scene;
the display device of claim 8.
11. A storage medium storing a program comprising instructions that, when executed by a computing device, cause the computing device to perform the video processing method of any of claims 1-5.
CN202011532654.0A 2020-12-23 2020-12-23 Video processing method, camera, recording and playing host, system and storage medium Pending CN112702641A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011532654.0A CN112702641A (en) 2020-12-23 2020-12-23 Video processing method, camera, recording and playing host, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011532654.0A CN112702641A (en) 2020-12-23 2020-12-23 Video processing method, camera, recording and playing host, system and storage medium

Publications (1)

Publication Number Publication Date
CN112702641A true CN112702641A (en) 2021-04-23

Family

ID=75510716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011532654.0A Pending CN112702641A (en) 2020-12-23 2020-12-23 Video processing method, camera, recording and playing host, system and storage medium

Country Status (1)

Country Link
CN (1) CN112702641A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023283894A1 (en) * 2021-07-15 2023-01-19 京东方科技集团股份有限公司 Image processing method and device
WO2023005427A1 (en) * 2021-07-29 2023-02-02 International Business Machines Corporation Context based adaptive resolution modulation countering network latency fluctuation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5789729B1 (en) * 2015-03-30 2015-10-07 株式会社高田工業所 Perspective image teaching type precision cutting device
CN105376547A (en) * 2015-11-17 2016-03-02 广州市英途信息技术有限公司 Micro video course recording system and method based on 3D virtual synthesis technology
CN106572385A (en) * 2015-10-10 2017-04-19 北京佳讯飞鸿电气股份有限公司 Image overlaying method for remote training video presentation
CN108932519A (en) * 2017-05-23 2018-12-04 中兴通讯股份有限公司 A kind of meeting-place data processing, display methods and device and intelligent glasses
CN109587556A (en) * 2019-01-03 2019-04-05 腾讯科技(深圳)有限公司 Method for processing video frequency, video broadcasting method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5789729B1 (en) * 2015-03-30 2015-10-07 株式会社高田工業所 Perspective image teaching type precision cutting device
CN106572385A (en) * 2015-10-10 2017-04-19 北京佳讯飞鸿电气股份有限公司 Image overlaying method for remote training video presentation
CN105376547A (en) * 2015-11-17 2016-03-02 广州市英途信息技术有限公司 Micro video course recording system and method based on 3D virtual synthesis technology
CN108932519A (en) * 2017-05-23 2018-12-04 中兴通讯股份有限公司 A kind of meeting-place data processing, display methods and device and intelligent glasses
CN109587556A (en) * 2019-01-03 2019-04-05 腾讯科技(深圳)有限公司 Method for processing video frequency, video broadcasting method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023283894A1 (en) * 2021-07-15 2023-01-19 京东方科技集团股份有限公司 Image processing method and device
WO2023005427A1 (en) * 2021-07-29 2023-02-02 International Business Machines Corporation Context based adaptive resolution modulation countering network latency fluctuation
US11653047B2 (en) 2021-07-29 2023-05-16 International Business Machines Corporation Context based adaptive resolution modulation countering network latency fluctuation

Similar Documents

Publication Publication Date Title
KR102050865B1 (en) Method and device for synchronizing display of images
US9485493B2 (en) Method and system for displaying multi-viewpoint images and non-transitory computer readable storage medium thereof
US9723223B1 (en) Apparatus and method for panoramic video hosting with directional audio
CN108600773A (en) Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium
CN101150704B (en) Image processing system, image processing method
US20090092311A1 (en) Method and apparatus for receiving multiview camera parameters for stereoscopic image, and method and apparatus for transmitting multiview camera parameters for stereoscopic image
CN109074678A (en) A kind of processing method and processing device of information
CN112702641A (en) Video processing method, camera, recording and playing host, system and storage medium
KR100576544B1 (en) Apparatus and Method for Processing of 3D Video using MPEG-4 Object Descriptor Information
CN112492357A (en) Method, device, medium and electronic equipment for processing multiple video streams
CN113891117B (en) Immersion medium data processing method, device, equipment and readable storage medium
KR100901111B1 (en) Live-Image Providing System Using Contents of 3D Virtual Space
JP6581241B2 (en) Hardware system for 3D video input on flat panel
KR20150117165A (en) Internet based educational information providing system of surgical techniques and skills, and providing Method thereof
CN103533215A (en) Recording and playing system
CN102474634A (en) Modifying images for a 3-dimensional display mode
KR20120102996A (en) System and method for displaying 3d contents of 3d moving picture
CN114846808A (en) Content distribution system, content distribution method, and content distribution program
CN103888808A (en) Video display method, display device, auxiliary device and system
CN112017264A (en) Display control method and device for virtual studio, storage medium and electronic equipment
US20210377514A1 (en) User Interface Module For Converting A Standard 2D Display Device Into An Interactive 3D Display Device
US20220303518A1 (en) Code stream processing method and device, first terminal, second terminal and storage medium
CN109727315B (en) One-to-many cluster rendering method, device, equipment and storage medium
CN113099212A (en) 3D display method, device, computer equipment and storage medium
KR102392908B1 (en) Method, Apparatus and System for Providing of Free Viewpoint Video Service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210423

RJ01 Rejection of invention patent application after publication