CN112702641A - Video processing method, camera, recording and playing host, system and storage medium - Google Patents
Video processing method, camera, recording and playing host, system and storage medium Download PDFInfo
- Publication number
- CN112702641A CN112702641A CN202011532654.0A CN202011532654A CN112702641A CN 112702641 A CN112702641 A CN 112702641A CN 202011532654 A CN202011532654 A CN 202011532654A CN 112702641 A CN112702641 A CN 112702641A
- Authority
- CN
- China
- Prior art keywords
- image
- target
- position information
- acquiring
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 33
- 230000001360 synchronised effect Effects 0.000 claims abstract description 29
- 239000000284 extract Substances 0.000 claims abstract description 8
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 21
- 238000012545 processing Methods 0.000 description 21
- 230000000694 effects Effects 0.000 description 15
- 238000000034 method Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/433—Content storage operation, e.g. storage operation in response to a pause request, caching operations
- H04N21/4334—Recording operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/02—Electrically-operated educational appliances with visual presentation of the material to be studied, e.g. using film strip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440218—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440263—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440281—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Studio Circuits (AREA)
Abstract
The application provides a video processing method, a video camera, a recording and playing host, a system and a storage medium. The video processing method comprises the following steps: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification; outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information; acquiring a second image of a second target scene; outputting the second image to the device so that the device fuses the target image in the second image.
Description
Technical Field
The present application relates to the field of video technologies, and in particular, to a video processing method, a video camera, a recording and playing host, a system, and a storage medium.
Background
In an application scenario of remote teaching, etc., a video source may generally provide at least two scene pictures to a playing end. Here, the scene screen is, for example, a lecture screen and a courseware screen of a teacher. The playing end usually needs multiple displays to present multiple scene pictures, for example, the first display presents a teaching picture, and the second display presents a courseware picture. The manner in which multiple displays present a picture is prone to viewer inattention and the like. In a scene where a plurality of scene pictures need to be displayed by a single display, the display area of the single scene picture is small due to the limitation of the display.
In view of this, how to perform scene fusion is a technical problem to be solved.
Disclosure of Invention
The application provides a video processing method, a video camera, a recording and broadcasting host, a system and a storage medium, which can realize the fusion of scene pictures.
According to an aspect of the present application, there is provided a video processing method, including:
acquiring a first image of a first target scene;
determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information;
acquiring a second image of a second target scene;
outputting the second image to the device so that the device fuses the target image in the second image.
In some embodiments, said obtaining a first image of a first target scene comprises: acquiring a first image of a teaching scene, wherein the first image is a teaching picture;
the acquiring a second image of a second target scene includes: acquiring a second image of a courseware playing scene, wherein the second image is a courseware picture played currently;
the determining the position information of the target object in the first image comprises: and determining a mask image of the target object in the first image, and using the mask image as position information.
In some embodiments, the method further comprises, before acquiring the first image of the first target scene:
acquiring network bandwidth information, wherein the network bandwidth information is used for representing the transmission bandwidth of a network for transmitting the first image and the position information;
determining image acquisition parameters of a first target scene according to the network bandwidth information, wherein the image acquisition parameters comprise at least one of resolution, code rate, frame rate and coding format;
the acquiring of the first image of the first target scene comprises:
and acquiring the first image generated according to the image acquisition parameters.
According to an aspect of the present application, there is provided a video processing method, including:
receiving a first image of a first target scene, the first image including a synchronization identifier;
receiving location information describing a location of a target object in an image, the location information including a synchronization identifier;
receiving a second image of a second target scene;
extracting a target image from a first image with the same synchronous identification as the target image according to the position information;
and fusing the target image in the second image to obtain a third image, and displaying the third image.
In some embodiments, before fusing the target image in the second image to obtain a third image and displaying the third image, the method further includes:
acquiring a user instruction, wherein the user instruction is used for indicating at least one of the position, the transparency and the size of the target image relative to the second image;
the fusing the target image in the second image to obtain a third image includes:
and generating a third image according to the user instruction.
According to an aspect of the present application, there is provided a camera including:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.
According to an aspect of the present application, there is provided a recording and broadcasting host, including:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.
According to an aspect of the present application, there is provided a display device including:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.
According to an aspect of the present application, there is provided a recording and broadcasting system, including:
a camera for: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
a computing device to play a second image of a second target scene;
the first recording and broadcasting host is used for: acquiring the first image containing the synchronous identification and position information containing the synchronous identification from the camera, acquiring a second image, and outputting the first image, the position information and the second image;
the second recording and broadcasting host is used for: receiving the first image, the position information and the second image from a first recording and broadcasting host;
a display device in a play scene to: acquiring the first image, the position information and the second image from a second recording and broadcasting host, and extracting a target image from the first image with the same synchronous identification as the position information according to the position information; and fusing the target image in the second image to obtain a third image, and displaying the third image.
According to an aspect of the present application, there is provided a recording and broadcasting system, including:
a camera according to the present application;
a computing device to play a second image of a second target scene;
a display device according to the present application.
According to an aspect of the present application, there is provided a storage medium storing a program including instructions that, when executed by a computing device, cause the computing device to perform a video processing method.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Drawings
FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;
FIG. 3 illustrates a flow diagram of a video processing method 300 according to some embodiments of the present application;
FIG. 4 illustrates a schematic view of a lecture screen according to some embodiments of the present application;
FIG. 5A illustrates a mask map of a target object according to some embodiments of the present application;
FIG. 5B illustrates a schematic diagram of a courseware screen according to some embodiments of the present application;
FIG. 6 illustrates a flow diagram of a video processing method 600 according to some embodiments of the present application;
FIG. 7 illustrates a flow diagram of a video processing method 700 according to some embodiments of the present application;
FIG. 8 illustrates a flow diagram of a video processing method 800 according to some embodiments of the present application;
FIG. 9 illustrates a flow diagram of a video processing method 900 according to some embodiments of the present application;
FIG. 10 illustrates a schematic diagram of a target object according to some embodiments of the present application;
FIG. 11 illustrates a schematic diagram of a merged picture according to some embodiments of the present application;
FIG. 12 illustrates a schematic diagram of a computing device according to some embodiments of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.
FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.
As shown in fig. 1, an application scenario may include a camera 101, a computing device 102, a recording host 103, a service platform 105, a recording host 106, and a display device 107. Wherein, the recording and broadcasting host machine can also be called as a recording and broadcasting all-in-one machine.
The camera 101 may acquire a first codestream of a first target scene. Any image frame in the first codestream may be represented as a first image, for example. The first target scene is an image capture scene such as a lecture scene. The first image is, for example, a teacher teaching screen. For example, the camera 101 may acquire a picture of a teacher's lecture in a main classroom. The main classroom refers to the classroom in which the teacher gives lessons. The computing device 102 is operable to play a second image of a second target scene. The second target scene is, for example, a courseware playing scene. The second image is, for example, a courseware screen.
The camera 101 may also determine position information of the target object in the first image. Here, the target object is, for example, a human body object, but is not limited thereto. The location information may be represented, for example, as a mask map of the target object. Here, the target object is represented by the mask map, so that transfer consumption for directly transferring the target object can be avoided, and the data transfer amount can be saved. In addition, the camera 101 configures the same synchronization mark for the first image and the position information, respectively. The synchronization identifier is used to associate the first image with the location information. Here, the synchronization identifier is, for example, a frame number (e.g., an RTP frame number), but is not limited thereto.
The recording and broadcasting host 103 is used for: a first image containing a synchronization mark and position information containing the synchronization mark are acquired from the camera 101. In addition, the recording and broadcasting host 103 can also acquire the second image and output the first image, the position information and the second image. For example, the recording host 103 may output the first image, the location information, and the second image to the service platform 105 through the network 104. It should be noted that the operations performed by the video camera 101 may also be performed by the recording and broadcasting host 103, which is not limited in this application.
The service platform 105 may forward the received data. For example, service platform 105 may transmit the first image, the location information, and the second image to videocasting host 106.
Recording host 106 may receive the first image, the location information, and the second image from recording host 103;
the display device 107 is in a play scene. Here, the playback scene is, for example, from a classroom. The slave classroom refers to the classroom in which the student is located. The display device is, for example, an Open Plug Specification (OPS) display. Although fig. 1 shows one display device 107, in practice more display devices 107 may be arranged for an application scenario. Display device 107 may obtain the first image, the location information, and the second image from recording host 106. Also, the display device 107 may extract the target image from the first image whose synchronous identification is the same as it is based on the position information. The display device 107 may also fuse the target image in the second image to obtain a third image, and display the third image.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.
As shown in fig. 2, an application scenario may include a camera 101, a computing device 102, and a display device 107. The camera 101 and the computing device 102 are in a first target scene. The first object scene is for example a main classroom. Here, the camera 101 may acquire a first code stream of a first target scene. Any image frame in the first codestream may be represented as a first image, for example. The first target scene is an image capture scene such as a lecture scene. The first image is, for example, a teacher teaching screen. The computing device 102 is operable to play a second image of a second target scene. The second target scene is, for example, a courseware playing scene. The second image is, for example, a courseware screen. The camera 101 may also determine position information of the target object in the first image. Here, the target object is, for example, a human body object, but is not limited thereto. The location information may be, for example, a mask map of the target object. In addition, the camera 101 configures the same synchronization mark for the first image and the position information, respectively. The synchronization identifier is used to associate the first image with the location information. Here, the synchronization identifier is, for example, a frame number (e.g., an RTP frame number), but is not limited thereto.
The display device 107 is in a play scene. The playback scene is for example from a classroom. Display device 107 may obtain the first image, the location information, and the second image from recording host 106. Also, the display device 107 may extract the target image from the first image whose synchronous identification is the same as it is based on the position information. The display device 107 may fuse the target image in the second image to obtain a third image, and display the third image.
Fig. 3 illustrates a flow diagram of a video processing method 300 according to some embodiments of the present application. The method 300 may be performed, for example, in the camera 101 or the recording and broadcasting host 103.
As shown in fig. 3, in step S301, a first image of a first target scene is acquired. For example, step S301 acquires a first image of a lecture scene. The first image is a teaching picture. For example, FIG. 4 illustrates a schematic view of a lecture screen according to some embodiments of the present application. Fig. 4 is a teacher lecture screen taken from the main classroom, for example.
In step S302, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification. The synchronization identifier is used to associate the first image with the location information. For example, step S302 may determine a mask map of the target object in the first image, with the mask map as the position information. FIG. 5A illustrates a mask map of a target object according to some embodiments of the present application. Fig. 5A is a mask diagram of the human subject of fig. 4.
In step S303, the first image containing the synchronization mark and the position information containing the synchronization mark are output, so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information.
In step S304, a second image of a second target scene is acquired. For example, step S304 may acquire a second image of a courseware playing scene. The second image is a courseware frame currently being played by the computing device 102. For example, fig. 5B illustrates a schematic diagram of a courseware screen according to some embodiments of the present application.
In step S305, the second image is output to the apparatus that received the first image, so that the apparatus fuses the target image in the second image.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Fig. 6 illustrates a flow diagram of a video processing method 600 according to some embodiments of the present application.
As shown in fig. 6, in step S601, a first image of a first target scene is acquired. Here, step S601 may be executed by the video camera 101, for example. Step S601 may acquire, for example, a first image of a lecture scene.
In step S602, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification. Here, step S602 may be executed by the video camera 101 or the recording and broadcasting host 103, for example. The synchronization identifier is used to associate the first image with the location information.
In step S603, the first image containing the synchronization mark and the position information containing the synchronization mark are output, so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information. Step S603 may be performed by the recording host 103, for example.
In step S604, a second image of a second target scene is acquired. Step S604 may be performed by the recording host 103, for example. For example, step S604 may acquire a second image of a courseware playing scene. The second image is a courseware frame currently being played by the computing device 102.
In step S605, the second image is output to the apparatus that received the first image, so that the apparatus fuses the target image in the second image. Step S605 may be performed by the recording host 103, for example.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Fig. 7 illustrates a flow diagram of a video processing method 700 according to some embodiments of the present application. The method 700 may be performed, for example, in the camera 101.
As shown in fig. 7, in step S701, network bandwidth information is acquired. The network bandwidth information is used to characterize the transmission bandwidth of the network 104 that transmits the first image and the location information.
In step S702, image capture parameters for the first target scene are determined according to the network bandwidth information. The image acquisition parameters comprise at least one of resolution, code rate, frame rate and encoding format. The encoding format is, for example, H264 or H265.
In step S703, a first image generated according to the image acquisition parameters is acquired. Here, the manner of generating the first image based on the image acquisition parameter may be adaptive to a network bandwidth, so that real-time performance of data transmission may be ensured.
In step S704, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification.
In step S705, the first image containing the synchronization mark and the position information containing the synchronization mark are output so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information.
In step S706, a second image of a second target scene is acquired.
In step S707, the second image is output to the apparatus that receives the first image, so that the apparatus fuses the target image in the second image.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved. In addition, the video processing scheme can also adjust the code rate of the image according to the network bandwidth, so that the real-time performance of image transmission can be improved, and the video playing effect is further improved.
Fig. 8 illustrates a flow diagram of a video processing method 800 according to some embodiments of the present application. The method 800 may be performed, for example, in the display device 107.
As shown in fig. 8, in step S801, a first image of a first target scene is received. The first image includes a synchronization mark.
In step S802, location information describing the location of a target object in an image is received. The location information includes a synchronization identifier.
In step S803, a second image of a second target scene is received.
In step S804, the target image is extracted from the first image whose synchronous identification is the same as that thereof, based on the position information.
In step S805, the target image is fused in the second image to obtain a third image, and the third image is displayed.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.
Fig. 9 illustrates a flow diagram of a video processing method 900 according to some embodiments of the present application. The method 900 may be performed, for example, in the display device 107.
As shown in fig. 9, in step S901, a first image of a first target scene is received. The first image includes a synchronization mark.
In step S902, position information describing the position of the target object in the image is received. The location information includes a synchronization identifier.
In step S903, a second image of a second target scene is received.
In step S904, the target image is extracted from the first image whose synchronous identification is the same as that thereof, based on the position information. For example, the first image received by the display device 107 is shown in fig. 4, and the received position information is shown in fig. 5A. The target object extracted in step S904 is as shown in fig. 10.
In step S905, a user instruction is acquired. The user instruction is used for indicating at least one of the position, transparency and size of the target image relative to the second image.
In step S906, the target image is fused in the second image to obtain a third image, and the third image is displayed. For example, step S906 may generate the third image according to a user instruction. The third image (i.e., the fused picture of the target image and the second image) is shown in fig. 11.
In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved. In addition, the video processing scheme can improve flexibility of screen layout by a user instructing adjustment of the position, transparency, size, and the like of the target image with respect to the second image.
FIG. 12 illustrates a schematic diagram of a computing device according to some embodiments of the present application. Here, as shown in FIG. 12, the computing device includes one or more processors (CPUs) 1202, a communications module 1204, a memory 1206, a user interface 1210, and a communications bus 1208 interconnecting these components.
The processor 1202 can receive and transmit data via the communication module 1204 to enable network communication and/or local communication.
The user interface 1210 includes one or more output devices 1212 including one or more speakers and/or one or more visual displays. The user interface 1210 also includes one or more input devices 1214. The user interface 1210 may receive, for example, an instruction of a remote controller, but is not limited thereto.
The memory 1206 may be a high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 1206 stores a set of instructions executable by the processor 1202, including:
an operating system 1216 including programs for handling various basic system services and for performing hardware related tasks;
applications 1218, including various programs for implementing the video processing schemes described above. Such a program can implement the processing flow in each of the above examples, and may include, for example, a video processing method.
In addition, each of the embodiments of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that the data processing program constitutes the invention. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present invention. The storage medium may use any type of recording means, such as a paper storage medium (e.g., paper tape, etc.), a magnetic storage medium (e.g., a flexible disk, a hard disk, a flash memory, etc.), an optical storage medium (e.g., a CD-ROM, etc.), a magneto-optical storage medium (e.g., an MO, etc.), and the like.
The present application thus also discloses a non-volatile storage medium in which a program is stored. The program comprises instructions which, when executed by a processor, cause a computing device to perform a video processing method according to the present application.
In addition, the method steps described in this application may be implemented by hardware, for example, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers, embedded microcontrollers, and the like, in addition to data processing programs. Such hardware capable of implementing the methods described herein may also constitute the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of the present application.
Claims (11)
1. A video processing method, comprising:
acquiring a first image of a first target scene;
determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information;
acquiring a second image of a second target scene;
outputting the second image to the device so that the device fuses the target image in the second image.
2. The video processing method of claim 1,
the acquiring of the first image of the first target scene comprises: acquiring a first image of a teaching scene, wherein the first image is a teaching picture;
the acquiring a second image of a second target scene includes: acquiring a second image of a courseware playing scene, wherein the second image is a courseware picture played currently;
the determining the position information of the target object in the first image comprises: and determining a mask image of the target object in the first image, and using the mask image as position information.
3. The video processing method of claim 1, wherein prior to acquiring the first image of the first target scene, further comprising:
acquiring network bandwidth information, wherein the network bandwidth information is used for representing the transmission bandwidth of a network for transmitting the first image and the position information;
determining image acquisition parameters of a first target scene according to the network bandwidth information, wherein the image acquisition parameters comprise at least one of resolution, code rate, frame rate and coding format;
the acquiring of the first image of the first target scene comprises:
and acquiring the first image generated according to the image acquisition parameters.
4. A video processing method, comprising:
receiving a first image of a first target scene, the first image including a synchronization identifier;
receiving location information describing a location of a target object in an image, the location information including a synchronization identifier;
receiving a second image of a second target scene;
extracting a target image from a first image with the same synchronous identification as the target image according to the position information;
and fusing the target image in the second image to obtain a third image, and displaying the third image.
5. The video processing method of claim 4, wherein before fusing the target image in the second image to obtain a third image and displaying the third image, further comprising:
acquiring a user instruction, wherein the user instruction is used for indicating at least one of the position, the transparency and the size of the target image relative to the second image;
the fusing the target image in the second image to obtain a third image includes:
and generating a third image according to the user instruction.
6. A camera, comprising:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of any of claims 1-5.
7. A recording and broadcasting host, comprising:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of any of claims 1-3.
8. A display device, comprising:
a memory;
a processor;
a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of claim 4 or 5.
9. A recording and broadcasting system, comprising:
a camera for: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;
a computing device to play a second image of a second target scene;
the first recording and broadcasting host is used for: acquiring the first image containing the synchronous identification and position information containing the synchronous identification from the camera, acquiring a second image, and outputting the first image, the position information and the second image;
the second recording and broadcasting host is used for: receiving the first image, the position information and the second image from a first recording and broadcasting host;
a display device in a play scene to: acquiring the first image, the position information and the second image from a second recording and broadcasting host, and extracting a target image from the first image with the same synchronous identification as the position information according to the position information; and fusing the target image in the second image to obtain a third image, and displaying the third image.
10. A recording and broadcasting system, comprising:
the camera of claim 6;
a computing device to play a second image of a second target scene;
the display device of claim 8.
11. A storage medium storing a program comprising instructions that, when executed by a computing device, cause the computing device to perform the video processing method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011532654.0A CN112702641A (en) | 2020-12-23 | 2020-12-23 | Video processing method, camera, recording and playing host, system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011532654.0A CN112702641A (en) | 2020-12-23 | 2020-12-23 | Video processing method, camera, recording and playing host, system and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112702641A true CN112702641A (en) | 2021-04-23 |
Family
ID=75510716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011532654.0A Pending CN112702641A (en) | 2020-12-23 | 2020-12-23 | Video processing method, camera, recording and playing host, system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112702641A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023283894A1 (en) * | 2021-07-15 | 2023-01-19 | 京东方科技集团股份有限公司 | Image processing method and device |
WO2023005427A1 (en) * | 2021-07-29 | 2023-02-02 | International Business Machines Corporation | Context based adaptive resolution modulation countering network latency fluctuation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5789729B1 (en) * | 2015-03-30 | 2015-10-07 | 株式会社高田工業所 | Perspective image teaching type precision cutting device |
CN105376547A (en) * | 2015-11-17 | 2016-03-02 | 广州市英途信息技术有限公司 | Micro video course recording system and method based on 3D virtual synthesis technology |
CN106572385A (en) * | 2015-10-10 | 2017-04-19 | 北京佳讯飞鸿电气股份有限公司 | Image overlaying method for remote training video presentation |
CN108932519A (en) * | 2017-05-23 | 2018-12-04 | 中兴通讯股份有限公司 | A kind of meeting-place data processing, display methods and device and intelligent glasses |
CN109587556A (en) * | 2019-01-03 | 2019-04-05 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, video broadcasting method, device, equipment and storage medium |
-
2020
- 2020-12-23 CN CN202011532654.0A patent/CN112702641A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5789729B1 (en) * | 2015-03-30 | 2015-10-07 | 株式会社高田工業所 | Perspective image teaching type precision cutting device |
CN106572385A (en) * | 2015-10-10 | 2017-04-19 | 北京佳讯飞鸿电气股份有限公司 | Image overlaying method for remote training video presentation |
CN105376547A (en) * | 2015-11-17 | 2016-03-02 | 广州市英途信息技术有限公司 | Micro video course recording system and method based on 3D virtual synthesis technology |
CN108932519A (en) * | 2017-05-23 | 2018-12-04 | 中兴通讯股份有限公司 | A kind of meeting-place data processing, display methods and device and intelligent glasses |
CN109587556A (en) * | 2019-01-03 | 2019-04-05 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, video broadcasting method, device, equipment and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023283894A1 (en) * | 2021-07-15 | 2023-01-19 | 京东方科技集团股份有限公司 | Image processing method and device |
WO2023005427A1 (en) * | 2021-07-29 | 2023-02-02 | International Business Machines Corporation | Context based adaptive resolution modulation countering network latency fluctuation |
US11653047B2 (en) | 2021-07-29 | 2023-05-16 | International Business Machines Corporation | Context based adaptive resolution modulation countering network latency fluctuation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102050865B1 (en) | Method and device for synchronizing display of images | |
US9485493B2 (en) | Method and system for displaying multi-viewpoint images and non-transitory computer readable storage medium thereof | |
US9723223B1 (en) | Apparatus and method for panoramic video hosting with directional audio | |
CN108600773A (en) | Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium | |
CN101150704B (en) | Image processing system, image processing method | |
US20090092311A1 (en) | Method and apparatus for receiving multiview camera parameters for stereoscopic image, and method and apparatus for transmitting multiview camera parameters for stereoscopic image | |
CN109074678A (en) | A kind of processing method and processing device of information | |
CN112702641A (en) | Video processing method, camera, recording and playing host, system and storage medium | |
KR100576544B1 (en) | Apparatus and Method for Processing of 3D Video using MPEG-4 Object Descriptor Information | |
CN112492357A (en) | Method, device, medium and electronic equipment for processing multiple video streams | |
CN113891117B (en) | Immersion medium data processing method, device, equipment and readable storage medium | |
KR100901111B1 (en) | Live-Image Providing System Using Contents of 3D Virtual Space | |
JP6581241B2 (en) | Hardware system for 3D video input on flat panel | |
KR20150117165A (en) | Internet based educational information providing system of surgical techniques and skills, and providing Method thereof | |
CN103533215A (en) | Recording and playing system | |
CN102474634A (en) | Modifying images for a 3-dimensional display mode | |
KR20120102996A (en) | System and method for displaying 3d contents of 3d moving picture | |
CN114846808A (en) | Content distribution system, content distribution method, and content distribution program | |
CN103888808A (en) | Video display method, display device, auxiliary device and system | |
CN112017264A (en) | Display control method and device for virtual studio, storage medium and electronic equipment | |
US20210377514A1 (en) | User Interface Module For Converting A Standard 2D Display Device Into An Interactive 3D Display Device | |
US20220303518A1 (en) | Code stream processing method and device, first terminal, second terminal and storage medium | |
CN109727315B (en) | One-to-many cluster rendering method, device, equipment and storage medium | |
CN113099212A (en) | 3D display method, device, computer equipment and storage medium | |
KR102392908B1 (en) | Method, Apparatus and System for Providing of Free Viewpoint Video Service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210423 |
|
RJ01 | Rejection of invention patent application after publication |