CN112702641A

CN112702641A - Video processing method, camera, recording and playing host, system and storage medium

Info

Publication number: CN112702641A
Application number: CN202011532654.0A
Authority: CN
Inventors: 袁延金
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-23

Abstract

The application provides a video processing method, a video camera, a recording and playing host, a system and a storage medium. The video processing method comprises the following steps: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification; outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information; acquiring a second image of a second target scene; outputting the second image to the device so that the device fuses the target image in the second image.

Description

Video processing method, camera, recording and playing host, system and storage medium

Technical Field

The present application relates to the field of video technologies, and in particular, to a video processing method, a video camera, a recording and playing host, a system, and a storage medium.

Background

In an application scenario of remote teaching, etc., a video source may generally provide at least two scene pictures to a playing end. Here, the scene screen is, for example, a lecture screen and a courseware screen of a teacher. The playing end usually needs multiple displays to present multiple scene pictures, for example, the first display presents a teaching picture, and the second display presents a courseware picture. The manner in which multiple displays present a picture is prone to viewer inattention and the like. In a scene where a plurality of scene pictures need to be displayed by a single display, the display area of the single scene picture is small due to the limitation of the display.

In view of this, how to perform scene fusion is a technical problem to be solved.

Disclosure of Invention

The application provides a video processing method, a video camera, a recording and broadcasting host, a system and a storage medium, which can realize the fusion of scene pictures.

According to an aspect of the present application, there is provided a video processing method, including:

acquiring a first image of a first target scene;

determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;

outputting the first image containing the synchronous identification and position information containing the synchronous identification, so that the equipment receiving the first image and the position information extracts a target image from the first image which is the same as the synchronous identification of the position information according to the position information;

acquiring a second image of a second target scene;

outputting the second image to the device so that the device fuses the target image in the second image.

In some embodiments, said obtaining a first image of a first target scene comprises: acquiring a first image of a teaching scene, wherein the first image is a teaching picture;

the acquiring a second image of a second target scene includes: acquiring a second image of a courseware playing scene, wherein the second image is a courseware picture played currently;

the determining the position information of the target object in the first image comprises: and determining a mask image of the target object in the first image, and using the mask image as position information.

In some embodiments, the method further comprises, before acquiring the first image of the first target scene:

acquiring network bandwidth information, wherein the network bandwidth information is used for representing the transmission bandwidth of a network for transmitting the first image and the position information;

determining image acquisition parameters of a first target scene according to the network bandwidth information, wherein the image acquisition parameters comprise at least one of resolution, code rate, frame rate and coding format;

the acquiring of the first image of the first target scene comprises:

and acquiring the first image generated according to the image acquisition parameters.

receiving a first image of a first target scene, the first image including a synchronization identifier;

receiving location information describing a location of a target object in an image, the location information including a synchronization identifier;

receiving a second image of a second target scene;

extracting a target image from a first image with the same synchronous identification as the target image according to the position information;

and fusing the target image in the second image to obtain a third image, and displaying the third image.

In some embodiments, before fusing the target image in the second image to obtain a third image and displaying the third image, the method further includes:

acquiring a user instruction, wherein the user instruction is used for indicating at least one of the position, the transparency and the size of the target image relative to the second image;

the fusing the target image in the second image to obtain a third image includes:

and generating a third image according to the user instruction.

According to an aspect of the present application, there is provided a camera including:

a memory;

a processor;

a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing a video processing method according to the present application.

According to an aspect of the present application, there is provided a recording and broadcasting host, including:

a memory;

a processor;

According to an aspect of the present application, there is provided a display device including:

a memory;

a processor;

According to an aspect of the present application, there is provided a recording and broadcasting system, including:

a camera for: acquiring a first image of a first target scene; determining position information of a target object in the first image, wherein the position information and the first image are configured with the same synchronous identification;

a computing device to play a second image of a second target scene;

the first recording and broadcasting host is used for: acquiring the first image containing the synchronous identification and position information containing the synchronous identification from the camera, acquiring a second image, and outputting the first image, the position information and the second image;

the second recording and broadcasting host is used for: receiving the first image, the position information and the second image from a first recording and broadcasting host;

a display device in a play scene to: acquiring the first image, the position information and the second image from a second recording and broadcasting host, and extracting a target image from the first image with the same synchronous identification as the position information according to the position information; and fusing the target image in the second image to obtain a third image, and displaying the third image.

a camera according to the present application;

a computing device to play a second image of a second target scene;

a display device according to the present application.

According to an aspect of the present application, there is provided a storage medium storing a program including instructions that, when executed by a computing device, cause the computing device to perform a video processing method.

In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved.

Drawings

FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;

FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application;

FIG. 3 illustrates a flow diagram of a video processing method 300 according to some embodiments of the present application;

FIG. 4 illustrates a schematic view of a lecture screen according to some embodiments of the present application;

FIG. 5A illustrates a mask map of a target object according to some embodiments of the present application;

FIG. 5B illustrates a schematic diagram of a courseware screen according to some embodiments of the present application;

FIG. 6 illustrates a flow diagram of a video processing method 600 according to some embodiments of the present application;

FIG. 7 illustrates a flow diagram of a video processing method 700 according to some embodiments of the present application;

FIG. 8 illustrates a flow diagram of a video processing method 800 according to some embodiments of the present application;

FIG. 9 illustrates a flow diagram of a video processing method 900 according to some embodiments of the present application;

FIG. 10 illustrates a schematic diagram of a target object according to some embodiments of the present application;

FIG. 11 illustrates a schematic diagram of a merged picture according to some embodiments of the present application;

FIG. 12 illustrates a schematic diagram of a computing device according to some embodiments of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.

FIG. 1 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.

As shown in fig. 1, an application scenario may include a camera 101, a computing device 102, a recording host 103, a service platform 105, a recording host 106, and a display device 107. Wherein, the recording and broadcasting host machine can also be called as a recording and broadcasting all-in-one machine.

The camera 101 may acquire a first codestream of a first target scene. Any image frame in the first codestream may be represented as a first image, for example. The first target scene is an image capture scene such as a lecture scene. The first image is, for example, a teacher teaching screen. For example, the camera 101 may acquire a picture of a teacher's lecture in a main classroom. The main classroom refers to the classroom in which the teacher gives lessons. The computing device 102 is operable to play a second image of a second target scene. The second target scene is, for example, a courseware playing scene. The second image is, for example, a courseware screen.

The camera 101 may also determine position information of the target object in the first image. Here, the target object is, for example, a human body object, but is not limited thereto. The location information may be represented, for example, as a mask map of the target object. Here, the target object is represented by the mask map, so that transfer consumption for directly transferring the target object can be avoided, and the data transfer amount can be saved. In addition, the camera 101 configures the same synchronization mark for the first image and the position information, respectively. The synchronization identifier is used to associate the first image with the location information. Here, the synchronization identifier is, for example, a frame number (e.g., an RTP frame number), but is not limited thereto.

The recording and broadcasting host 103 is used for: a first image containing a synchronization mark and position information containing the synchronization mark are acquired from the camera 101. In addition, the recording and broadcasting host 103 can also acquire the second image and output the first image, the position information and the second image. For example, the recording host 103 may output the first image, the location information, and the second image to the service platform 105 through the network 104. It should be noted that the operations performed by the video camera 101 may also be performed by the recording and broadcasting host 103, which is not limited in this application.

The service platform 105 may forward the received data. For example, service platform 105 may transmit the first image, the location information, and the second image to videocasting host 106.

Recording host 106 may receive the first image, the location information, and the second image from recording host 103;

the display device 107 is in a play scene. Here, the playback scene is, for example, from a classroom. The slave classroom refers to the classroom in which the student is located. The display device is, for example, an Open Plug Specification (OPS) display. Although fig. 1 shows one display device 107, in practice more display devices 107 may be arranged for an application scenario. Display device 107 may obtain the first image, the location information, and the second image from recording host 106. Also, the display device 107 may extract the target image from the first image whose synchronous identification is the same as it is based on the position information. The display device 107 may also fuse the target image in the second image to obtain a third image, and display the third image.

FIG. 2 illustrates a schematic diagram of an application scenario in accordance with some embodiments of the present application.

As shown in fig. 2, an application scenario may include a camera 101, a computing device 102, and a display device 107. The camera 101 and the computing device 102 are in a first target scene. The first object scene is for example a main classroom. Here, the camera 101 may acquire a first code stream of a first target scene. Any image frame in the first codestream may be represented as a first image, for example. The first target scene is an image capture scene such as a lecture scene. The first image is, for example, a teacher teaching screen. The computing device 102 is operable to play a second image of a second target scene. The second target scene is, for example, a courseware playing scene. The second image is, for example, a courseware screen. The camera 101 may also determine position information of the target object in the first image. Here, the target object is, for example, a human body object, but is not limited thereto. The location information may be, for example, a mask map of the target object. In addition, the camera 101 configures the same synchronization mark for the first image and the position information, respectively. The synchronization identifier is used to associate the first image with the location information. Here, the synchronization identifier is, for example, a frame number (e.g., an RTP frame number), but is not limited thereto.

The display device 107 is in a play scene. The playback scene is for example from a classroom. Display device 107 may obtain the first image, the location information, and the second image from recording host 106. Also, the display device 107 may extract the target image from the first image whose synchronous identification is the same as it is based on the position information. The display device 107 may fuse the target image in the second image to obtain a third image, and display the third image.

Fig. 3 illustrates a flow diagram of a video processing method 300 according to some embodiments of the present application. The method 300 may be performed, for example, in the camera 101 or the recording and broadcasting host 103.

As shown in fig. 3, in step S301, a first image of a first target scene is acquired. For example, step S301 acquires a first image of a lecture scene. The first image is a teaching picture. For example, FIG. 4 illustrates a schematic view of a lecture screen according to some embodiments of the present application. Fig. 4 is a teacher lecture screen taken from the main classroom, for example.

In step S302, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification. The synchronization identifier is used to associate the first image with the location information. For example, step S302 may determine a mask map of the target object in the first image, with the mask map as the position information. FIG. 5A illustrates a mask map of a target object according to some embodiments of the present application. Fig. 5A is a mask diagram of the human subject of fig. 4.

In step S303, the first image containing the synchronization mark and the position information containing the synchronization mark are output, so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information.

In step S304, a second image of a second target scene is acquired. For example, step S304 may acquire a second image of a courseware playing scene. The second image is a courseware frame currently being played by the computing device 102. For example, fig. 5B illustrates a schematic diagram of a courseware screen according to some embodiments of the present application.

In step S305, the second image is output to the apparatus that received the first image, so that the apparatus fuses the target image in the second image.

Fig. 6 illustrates a flow diagram of a video processing method 600 according to some embodiments of the present application.

As shown in fig. 6, in step S601, a first image of a first target scene is acquired. Here, step S601 may be executed by the video camera 101, for example. Step S601 may acquire, for example, a first image of a lecture scene.

In step S602, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification. Here, step S602 may be executed by the video camera 101 or the recording and broadcasting host 103, for example. The synchronization identifier is used to associate the first image with the location information.

In step S603, the first image containing the synchronization mark and the position information containing the synchronization mark are output, so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information. Step S603 may be performed by the recording host 103, for example.

In step S604, a second image of a second target scene is acquired. Step S604 may be performed by the recording host 103, for example. For example, step S604 may acquire a second image of a courseware playing scene. The second image is a courseware frame currently being played by the computing device 102.

In step S605, the second image is output to the apparatus that received the first image, so that the apparatus fuses the target image in the second image. Step S605 may be performed by the recording host 103, for example.

Fig. 7 illustrates a flow diagram of a video processing method 700 according to some embodiments of the present application. The method 700 may be performed, for example, in the camera 101.

As shown in fig. 7, in step S701, network bandwidth information is acquired. The network bandwidth information is used to characterize the transmission bandwidth of the network 104 that transmits the first image and the location information.

In step S702, image capture parameters for the first target scene are determined according to the network bandwidth information. The image acquisition parameters comprise at least one of resolution, code rate, frame rate and encoding format. The encoding format is, for example, H264 or H265.

In step S703, a first image generated according to the image acquisition parameters is acquired. Here, the manner of generating the first image based on the image acquisition parameter may be adaptive to a network bandwidth, so that real-time performance of data transmission may be ensured.

In step S704, position information of the target object in the first image is determined. And the position information and the first image are configured with the same synchronous identification.

In step S705, the first image containing the synchronization mark and the position information containing the synchronization mark are output so that the apparatus that has received the first image and the position information extracts the target image from the same first image as the synchronization mark of the position information based on the position information.

In step S706, a second image of a second target scene is acquired.

In step S707, the second image is output to the apparatus that receives the first image, so that the apparatus fuses the target image in the second image.

In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved. In addition, the video processing scheme can also adjust the code rate of the image according to the network bandwidth, so that the real-time performance of image transmission can be improved, and the video playing effect is further improved.

Fig. 8 illustrates a flow diagram of a video processing method 800 according to some embodiments of the present application. The method 800 may be performed, for example, in the display device 107.

As shown in fig. 8, in step S801, a first image of a first target scene is received. The first image includes a synchronization mark.

In step S802, location information describing the location of a target object in an image is received. The location information includes a synchronization identifier.

In step S803, a second image of a second target scene is received.

In step S804, the target image is extracted from the first image whose synchronous identification is the same as that thereof, based on the position information.

In step S805, the target image is fused in the second image to obtain a third image, and the third image is displayed.

Fig. 9 illustrates a flow diagram of a video processing method 900 according to some embodiments of the present application. The method 900 may be performed, for example, in the display device 107.

As shown in fig. 9, in step S901, a first image of a first target scene is received. The first image includes a synchronization mark.

In step S902, position information describing the position of the target object in the image is received. The location information includes a synchronization identifier.

In step S903, a second image of a second target scene is received.

In step S904, the target image is extracted from the first image whose synchronous identification is the same as that thereof, based on the position information. For example, the first image received by the display device 107 is shown in fig. 4, and the received position information is shown in fig. 5A. The target object extracted in step S904 is as shown in fig. 10.

In step S905, a user instruction is acquired. The user instruction is used for indicating at least one of the position, transparency and size of the target image relative to the second image.

In step S906, the target image is fused in the second image to obtain a third image, and the third image is displayed. For example, step S906 may generate the third image according to a user instruction. The third image (i.e., the fused picture of the target image and the second image) is shown in fig. 11.

In summary, according to the video processing scheme of the embodiment of the application, the target object can be conveniently extracted from the first image of one scene, so that the target object and the second image of another scene can be conveniently fused. Here, the video processing scheme can avoid introducing a background image in the first image (i.e., an image other than the target object in the first image) into the fused image by extracting the target object, and can further improve the video playing effect. Particularly, when the remote teaching method and the remote teaching device are applied to remote teaching scenes, the human body image and the courseware image in the teaching image can be conveniently fused, the influence of the unnecessary background image on the display image is avoided, and the remote teaching effect is conveniently improved. In addition, the video processing scheme can improve flexibility of screen layout by a user instructing adjustment of the position, transparency, size, and the like of the target image with respect to the second image.

FIG. 12 illustrates a schematic diagram of a computing device according to some embodiments of the present application. Here, as shown in FIG. 12, the computing device includes one or more processors (CPUs) 1202, a communications module 1204, a memory 1206, a user interface 1210, and a communications bus 1208 interconnecting these components.

The processor 1202 can receive and transmit data via the communication module 1204 to enable network communication and/or local communication.

The user interface 1210 includes one or more output devices 1212 including one or more speakers and/or one or more visual displays. The user interface 1210 also includes one or more input devices 1214. The user interface 1210 may receive, for example, an instruction of a remote controller, but is not limited thereto.

The memory 1206 may be a high-speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.

The memory 1206 stores a set of instructions executable by the processor 1202, including:

an operating system 1216 including programs for handling various basic system services and for performing hardware related tasks;

applications 1218, including various programs for implementing the video processing schemes described above. Such a program can implement the processing flow in each of the above examples, and may include, for example, a video processing method.

In addition, each of the embodiments of the present application can be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that the data processing program constitutes the invention. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present invention. The storage medium may use any type of recording means, such as a paper storage medium (e.g., paper tape, etc.), a magnetic storage medium (e.g., a flexible disk, a hard disk, a flash memory, etc.), an optical storage medium (e.g., a CD-ROM, etc.), a magneto-optical storage medium (e.g., an MO, etc.), and the like.

The present application thus also discloses a non-volatile storage medium in which a program is stored. The program comprises instructions which, when executed by a processor, cause a computing device to perform a video processing method according to the present application.

In addition, the method steps described in this application may be implemented by hardware, for example, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers, embedded microcontrollers, and the like, in addition to data processing programs. Such hardware capable of implementing the methods described herein may also constitute the present application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of the present application.

Claims

1. A video processing method, comprising:

acquiring a first image of a first target scene;

acquiring a second image of a second target scene;

2. The video processing method of claim 1,

the acquiring of the first image of the first target scene comprises: acquiring a first image of a teaching scene, wherein the first image is a teaching picture;

3. The video processing method of claim 1, wherein prior to acquiring the first image of the first target scene, further comprising:

the acquiring of the first image of the first target scene comprises:

4. A video processing method, comprising:

receiving a second image of a second target scene;

5. The video processing method of claim 4, wherein before fusing the target image in the second image to obtain a third image and displaying the third image, further comprising:

and generating a third image according to the user instruction.

6. A camera, comprising:

a memory;

a processor;

a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of any of claims 1-5.

7. A recording and broadcasting host, comprising:

a memory;

a processor;

a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of any of claims 1-3.

8. A display device, comprising:

a memory;

a processor;

a program stored in the memory and configured to be executed by the processor, the program comprising instructions for performing the video processing method of claim 4 or 5.

9. A recording and broadcasting system, comprising:

a computing device to play a second image of a second target scene;

10. A recording and broadcasting system, comprising:

the camera of claim 6;

a computing device to play a second image of a second target scene;

the display device of claim 8.

11. A storage medium storing a program comprising instructions that, when executed by a computing device, cause the computing device to perform the video processing method of any of claims 1-5.