CN114915798A

CN114915798A - Real-time video generation method, multi-camera live broadcast method and device

Info

Publication number: CN114915798A
Application number: CN202110179211.6A
Authority: CN
Inventors: 林刚
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-08-16

Abstract

The disclosure relates to a real-time video generation method, a multi-camera live broadcast method and a multi-camera live broadcast device, wherein the real-time video generation method with a plurality of camera sources comprises the following steps: acquiring real-time images corresponding to a plurality of different visual angles of the same scene; selecting one real-time image from the acquired real-time images as a main image according to a first preset rule; processing the acquired at least one real-time image according to a second preset rule to generate a target image; synthesizing the main image and the target image to generate a real-time output image; and transmitting the real-time output image.

Description

Real-time video generation method, multi-camera live broadcast method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a real-time video generation method, a multi-camera live broadcast method and a multi-camera live broadcast device.

Background

On-line live broadcast is a common means for people leisure and interaction in the network era. The network anchor can realize online live broadcast through the live broadcast equipment.

Currently, the real-time images watched by viewers are usually obtained from a camera source, such as a camera used in a live broadcasting device, and the watching experience is poor.

Disclosure of Invention

It is an object of embodiments of the present disclosure to provide a new solution for generating real-time video with multiple camera sources.

According to a first aspect of the present disclosure, there is provided a real-time video generation method having a plurality of camera sources, comprising: acquiring real-time images corresponding to a plurality of different visual angles of the same scene; selecting one real-time image from the acquired real-time images as a main image according to a first preset rule; processing the acquired at least one real-time image according to a second preset rule to generate a target image; synthesizing the main image and the target image to generate a real-time output image; and transmitting the real-time output image.

Optionally, the first preset rule is a rule set according to at least one of an image viewing angle, a face feature of a person in the image, a gesture feature of a person in the image, and user input information.

Optionally, the second preset rule is a rule set according to at least one of an image view angle, a face feature of a person in the image, a facial motion feature of a person in the image, a gesture feature of a person in the image, a limb motion feature of a person in the image, an object feature in the image, and user input information.

Optionally, the at least one live image does not include the primary image.

Optionally, the synthesizing the primary image and the target image includes: and superposing the target image in a preset image area of the main image, wherein the target image is used as a background, and the person in the main image is used as a foreground.

Optionally, the processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring a virtual image corresponding to a character in the main image; identifying motion characteristics of a person in the main image, wherein the motion characteristics comprise at least one of facial motion characteristics and limb motion characteristics; and controlling the virtual image and the character to perform synchronous action according to the action characteristics so as to generate a target image.

Optionally, the processing, according to a second preset rule, the at least one acquired real-time image to generate a target image includes: acquiring action characteristics of a person in the main image, wherein the action characteristics comprise first gesture characteristics; selecting a corresponding real-time image according to the first gesture feature; and intercepting a part of picture area of the corresponding real-time image as the target image.

Optionally, the processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring action characteristics of the person in the main image, wherein the action characteristics comprise second gesture characteristics; and acquiring an image corresponding to the second gesture feature as the target image.

Optionally, the processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring the human face characteristics of the figures in the main image; and acquiring an image corresponding to the human face characteristic as the target image.

Optionally, the selecting one real-time image from the acquired multiple real-time images as a main image according to a first preset rule includes: and selecting a real-time image containing preset image characteristics from the plurality of real-time images as a main image, wherein the preset image characteristics comprise at least one of human face characteristics, gesture characteristics, facial action characteristics and limb action characteristics.

Optionally, the processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring feature information of the preset image features in the main image; and rendering any other real-time image which is different from the main image in the plurality of real-time images according to the characteristic information to obtain the target image.

Optionally, the acquiring real-time images corresponding to a plurality of different viewing angles of the same scene includes: the method comprises the steps of obtaining real-time images of at least one visual angle received through at least one expansion interface of live broadcast equipment and real-time images of at least one visual angle collected through at least one built-in camera of the live broadcast equipment.

According to a second aspect of the present disclosure, a multi-camera live broadcasting method is provided, where the method is implemented by a live broadcasting device, the live broadcasting device is provided with at least one expansion interface for connecting an external camera, and the method includes: under the condition that the expansion interface is detected to be connected with an external camera, acquiring a real-time image acquired by the external camera; setting operation is executed aiming at the acquired at least two real-time images, so that live broadcast processing corresponding to the at least two real-time images is executed by a server, wherein the at least two real-time images comprise real-time images acquired by the external camera, and the live broadcast processing comprises distributing real-time output images obtained by processing the at least two real-time images according to set live broadcast processing rules.

Optionally, the live device is provided with at least one built-in camera, and the method further includes: acquiring a real-time image acquired by the built-in camera; wherein the at least two real-time images further comprise real-time images acquired by the built-in camera.

Optionally, the executing the setting operation includes: processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images; sending the real-time output image to the server; or, the executing the setting operation includes: sending the at least two real-time images to the server; and the live broadcast processing further comprises processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images.

Optionally, the live broadcast processing rule includes a picture merging rule;

the processing the at least two real-time images according to the set live broadcast processing rule comprises: and combining at least two real-time images according to the picture combination rule, and taking the combined real-time image as the real-time output image.

Optionally, the executing the setting operation includes: detecting whether the number of the at least two real-time images is not greater than a set threshold value; determining that the live broadcast equipment executes the step of processing the at least two live videos according to a set live broadcast processing rule under the condition that the number is not larger than the set threshold value; and determining that the server executes the step of processing the at least two live videos according to the set live broadcast processing rule under the condition that the number is larger than the set threshold value.

Optionally, the method further comprises: responding to the operation of configuring the live broadcast processing rule, and providing a configuration interface; acquiring configuration information input through the configuration interface; and setting the live broadcast processing rule according to the configuration information.

According to a third aspect of the present disclosure, there is also provided a real-time video generating apparatus having a plurality of camera sources, including: the image acquisition module is used for acquiring real-time images corresponding to a plurality of different visual angles of the same scene; the first processing module is used for selecting one real-time image from the acquired real-time images as a main image according to a first preset rule; the second processing module is used for processing the acquired at least one real-time image according to a second preset rule to generate a target image; a synthesis module for synthesizing the main image and the target image to generate a real-time output image; and the transmission module is used for transmitting the real-time output image.

According to a fourth aspect of the present disclosure, there is also provided a real-time video generating apparatus having a plurality of camera sources, comprising a memory for storing a computer program and a processor for executing the computer program to implement the method according to any one of the first aspect.

According to a fifth aspect of the present disclosure, there is also provided a live broadcast device, including: the system comprises at least one expansion interface used for connecting an external camera; the acquisition module is used for acquiring a real-time image acquired by the external camera under the condition that the expansion interface is detected to be connected with the external camera; the processing module is used for executing setting operation aiming at the acquired at least two real-time images so as to execute live broadcast processing corresponding to the at least two real-time images by the server, wherein the at least two real-time images comprise the real-time images collected by the external camera, and the live broadcast processing comprises distributing real-time output images obtained by processing the at least two real-time images according to set live broadcast processing rules.

According to a sixth aspect of the present disclosure, there is also provided a live broadcast device, including a memory, a processor, and at least one expansion interface for connecting an external camera, where the expansion interface is connected to the processor, and the memory is used for storing a computer program; the processor is configured to execute the computer program to implement the method according to any of the second aspects.

According to a seventh aspect of the present disclosure, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of the first or second aspects.

One beneficial effect of the disclosed embodiments is that real-time images corresponding to a plurality of different viewing angles of the same scene are obtained; selecting one real-time image from the acquired real-time images as a main image according to a first preset rule; processing the acquired at least one real-time image according to a second preset rule to generate a target image; synthesizing the main image and the target image to generate a real-time output image; and transmitting the real-time output image. Based on the above, the real-time images watched by the audience are obtained from a plurality of camera sources, so that the watching experience is better.

Other features of embodiments of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which is to be read in connection with the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the embodiments of the disclosure.

FIG. 1 is a schematic diagram of an implementation environment in which embodiments of the disclosed method can be applied and a system component architecture in which embodiments of the method can be implemented;

FIG. 2 is a flow diagram of a method for real-time video generation with multiple camera sources, according to one embodiment;

FIG. 3 is a schematic diagram of a process for generating real-time video with multiple camera sources, according to one embodiment;

FIG. 4 is a schematic illustration of a process for generating real-time video with multiple camera sources according to another embodiment;

FIG. 5 is a diagram of a user terminal displaying real-time video with multiple camera sources, according to one embodiment;

FIG. 6 is a schematic flow diagram of a method for real-time video generation with multiple camera sources according to another embodiment;

FIG. 7 is a flow diagram of a multi-camera live method according to one embodiment;

FIG. 8 is a diagram of a user terminal displaying a multi-camera live view according to one embodiment;

FIG. 9 is a flow diagram of a multi-camera live method according to another embodiment;

FIG. 10 is a block schematic diagram of a real-time video generation apparatus having multiple camera sources according to one embodiment;

fig. 11 is a schematic diagram of a hardware configuration of a real-time video generating apparatus having a plurality of camera sources according to an embodiment;

FIG. 12 is a block schematic diagram of a live device according to one embodiment;

fig. 13 is a hardware architecture diagram of a live device according to an embodiment;

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

An application scenario of the embodiment of the present disclosure is that a user performs live broadcast through live broadcast equipment. In the implementation process, the inventor finds that under the condition of utilizing live broadcast equipment to carry out live broadcast, the live broadcast equipment can only use one camera, so that a real-time image watched by a viewer is only taken from one camera source, and the watching experience is poor.

In view of the technical problems of the above embodiments, the inventor proposes a real-time video generation method with multiple camera sources, which obtains real-time images corresponding to multiple different viewing angles of the same scene; selecting one real-time image from the acquired real-time images as a main image according to a first preset rule; processing the acquired at least one real-time image according to a second preset rule to generate a target image; synthesizing the main image and the target image to generate a real-time output image; and transmitting the real-time output image.

Fig. 1 is a schematic structural diagram of a multi-camera live broadcast system 10 to which an embodiment of the disclosed method can be applied. As shown in fig. 1, the multi-camera live broadcast system 10 includes a live broadcast device 1000, an external camera 2000, a server 3000, a user terminal 4000 and a network 5000, and the multi-camera live broadcast system 10 can be applied to a multi-camera live broadcast scene.

As shown in fig. 1, the live device 1000 may include a processor 1100, a memory 1200, an interface apparatus 1300, a communication apparatus 1400, a display apparatus 1500, an input apparatus 1600, and the like. The interface device 1300 may include an expansion interface for connecting the external camera 2000.

Processor 1100 is used to execute computer programs, which may be written in instruction sets of architectures such as x86, Arm, RISC, MIPS, SSE, and the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 is capable of wired or wireless communication, for example, the communication device 1400 may include at least one short-range communication module, such as any module for performing short-range wireless communication based on short-range wireless communication protocols, such as the Hilink protocol, WiFi (IEEE 802.11 protocol), Mesh, bluetooth, ZigBee, Thread, Z-Wave, NFC, UWB, LiFi, and the like, and the communication device 1400 may also include a long-range communication module, such as any module for performing WLAN, GPRS, 2G/3G/4G/5G long-range communication. The display device 1500 is, for example, a liquid crystal display, an LED display, a touch display, or the like. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.

The live broadcasting device 1000 may be connected to an external speaker through the interface device 1300 to output an audio signal, and may be connected to an external microphone through the interface device 1300 to collect an audio signal.

In this embodiment of the present disclosure, the expansion interface 1100 of the live broadcast device 1000 is configured to connect to the external camera 2000, and the memory 1300 of the live broadcast device 1000 is configured to store a computer program, where the computer program is configured to control the processor 1200 of the live broadcast device 1000 to operate so as to support implementation of any method embodiment of the present disclosure. A skilled person can design a computer program according to the solution of the embodiments of the present disclosure. How the computer program controls the processor to operate is well known in the art and will not be described in detail here.

Although a plurality of devices of the live broadcast apparatus 1000 are shown in fig. 1, the present invention may only relate to some of the devices, for example, the live broadcast apparatus 1000 only relates to the processor 1100, the memory 1200 and the expansion interface for connecting the external camera 2000.

The external camera 2000 has a joint used in cooperation with the extension interface 1100 of the live broadcast device 1000, and the connection between the external camera 2000 and the live broadcast device 1000 can be realized by inserting the joint into the extension interface 1100.

The server 3000 is a service point that provides processing, databases, and communications facilities. The server 3000 may be an integral server, a distributed server across multiple computers, a computer data center, a cloud server, or a server cluster deployed in the cloud. The server may be of various types, such as, but not limited to, a web server, a news server, a mail server, a message server, an advertisement server, a file server, an application server, an interaction server, a database server, or a proxy server. In some embodiments, each server may include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server. For example, a server, such as a blade server, a cloud server, etc., or may be a group of servers, which may include one or more of the above types of servers, etc.

In one embodiment, the server 3000 may be as shown in fig. 1, including a processor 3100, a memory 3200, an interface device 3300, a communication device 3400.

Processor 3100 is configured to execute computer programs, which may be written in an instruction set using architectures such as x86, Arm, RISC, MIPS, SSE, and the like. The memory 3200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 3300 includes, for example, various bus interfaces such as a serial bus interface (including a USB interface), a parallel bus interface, and the like. The communication device 3400 can perform wired or wireless communication, for example.

In this embodiment, memory 3200 of server 3000 is used to store program instructions that control processor 3100 to operate in support of the implementation of any of the method embodiments of the present disclosure. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail here.

Although a plurality of devices of the server 3000 are shown in fig. 1, the present invention may relate only to some of the devices, for example, the server 3000 relates only to the memory 3200 and the processor 3100.

As applied to the disclosed embodiments, the memory 3200 of the server 3000 is configured to store a computer program that controls the operation of the processor 3100 of the server 3000 to support the implementation of any of the method embodiments of the present disclosure. A skilled person can design a computer program according to the solution of the embodiments of the present disclosure. How the computer program controls the processor to operate is well known in the art and will not be described in detail here.

In this embodiment, the user terminal 4000 is, for example, a mobile phone, a portable computer, a tablet computer, a palmtop computer, or the like. The user terminal 4000 is installed with a live application client for live application, so that the purpose of watching live broadcast is achieved by operating the live application client.

As shown in fig. 1, the user terminal 4000 may include a processor 4100, a memory 4200, an interface device 4300, a communication device 4400, a display device 4500, an input device 4600, a speaker 4700, a microphone 4800, and the like.

The processor 4100 is used to execute a computer program, which may be written in an instruction set of architectures such as x86, Arm, RISC, MIPS, SSE, etc. The memory 4200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface 4300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 4400 is capable of wired or wireless communication, for example, the communication device 4400 may include at least one short-range communication module, for example, any module for performing short-range wireless communication based on short-range wireless communication protocols such as the Hilink protocol, WiFi (IEEE 802.11 protocol), Mesh, bluetooth, ZigBee, Thread, Z-Wave, NFC, UWB, LiFi, and the like, and the communication device 4400 may also include a long-range communication module, for example, any module for performing WLAN, GPRS, 2G/3G/4G/5G long-range communication. The display device 4500 is, for example, a liquid crystal display, an LED display, a touch panel, or the like. The input device 4600 may include, for example, a touch screen, a keyboard, and the like. The user terminal 4000 may output an audio signal through the speaker 4700 and collect an audio signal through the microphone 4800.

The network 5000 may be a wireless communication network, a wired communication network, a local area network, or a wide area network. In the multi-camera live broadcast system 10 shown in fig. 1, the live broadcast device 1000 and the server 3000, and the user terminal 4000 and the server 3000 can communicate with each other through the network 5000. The live broadcast device 1000 and the server 3000 may be the same as each other, and the user terminal 4000 and the server 3000 may be different from each other in the network 5000.

It should be understood that although fig. 1 shows only one external camera 2000, server 3000 and user terminal 4000, the number of each is not meant to be limited, and the multi-camera live broadcast system 10 may include a plurality of external cameras 2000, a plurality of servers 3000 and a plurality of user terminals 4000.

The multi-camera live broadcast system 10 shown in fig. 1 is illustrative only and is in no way intended to limit the invention, its application, or uses.

Various embodiments and examples according to the present invention are described below with reference to the accompanying drawings.

Fig. 2 is a flow diagram of a method for real-time video generation with multiple camera sources, according to one embodiment. The main implementation body of this embodiment is, for example, the live device 1000 or the server 3000 in fig. 1.

As shown in fig. 2, the method for generating a real-time video with a plurality of camera sources according to the present embodiment may include the following steps S201 to S205:

in step S201, real-time images corresponding to a plurality of different viewing angles of the same scene are obtained.

In this step, the live broadcast device 1000 may directly obtain the real-time images collected by the cameras through the connected multiple cameras, or the server 3000 may obtain the real-time images collected by the cameras through the live broadcast device 1000.

In detail, the live broadcasting device 1000 may be provided with at least one expansion interface, and the external camera 2000 may be connected through the expansion interface, so as to obtain a real-time image collected by the external camera 2000. Meanwhile, the live broadcast equipment 1000 can also be provided with a built-in camera, so that real-time images collected by the built-in camera are obtained.

Based on this, in an embodiment of the present disclosure, the step S201 of acquiring real-time images corresponding to a plurality of different viewing angles of a same scene includes: the method comprises the steps of obtaining real-time images of at least one visual angle received through at least one expansion interface of live broadcast equipment and real-time images of at least one visual angle collected through at least one built-in camera of the live broadcast equipment.

In this embodiment, the same scene may be a live scene where the live device 1000 is located. Each camera in the same scene can have different camera angles, so that real-time images of a plurality of different camera angles in the same scene can be acquired.

Step S202, selecting a live image from the acquired live images as a main image according to a first preset rule.

In this embodiment, one live image may be selected as the main image from the plurality of live images as needed. The selection rule may be obtained based on a corresponding preset rule in combination with an on-demand setting of the live user, for example, setting information input by operating the live device 1000 in combination with the live user.

In an embodiment of the present disclosure, the first predetermined rule is a rule set according to at least one of an image viewing angle, a facial feature of a person in the image, a gesture feature of the person in the image, and user input information.

For example, for an image viewing angle, when different viewing angles are displayed on the same product, a real-time image of a specific image viewing angle can be designated as a main image. For the face feature, a live image having a specific face feature may be designated as a main image. For gesture features, live imagery with specific gesture features may be designated as primary imagery. For user input information, a live video temporarily designated by a live user may be used as a main video.

Step S203, processing the acquired at least one real-time image according to a second preset rule to generate a target image.

In this embodiment, at least one live image may be selected from the plurality of live images as needed to be processed to obtain the target image. The selection rule may be obtained based on a corresponding preset rule in combination with an on-demand setting of the live user, for example, setting information input by operating the live device 1000 in combination with the live user.

In an embodiment of the present disclosure, the second predetermined rule is a rule set according to at least one of an image view angle, a face feature of a person in the image, a facial motion feature of a person in the image, a gesture feature of a person in the image, a body motion feature of a person in the image, an object feature in the image, and user input information.

For example, for an image perspective, when different perspectives of the same product are displayed, a real-time image of a specific image perspective can be processed to generate a target image. For the human face features, the human face features in the real-time image can be recognized, and the target image is generated according to the human face features. For the gesture features, the gesture features in the real-time image can be recognized, and the target image can be generated according to the gesture features. For the facial motion features, the facial motion features in the live image can be identified and the target image can be generated according to the facial motion features. For the limb action characteristics, the limb action characteristics in the real-time image can be identified, and the target image is generated according to the limb action characteristics. For object features, object features in the live image can be identified and a target image can be generated accordingly. For user input information, live video temporarily designated by a live user may be processed to generate a target video.

It should be noted that, based on different application scenarios, the at least one live image in step S203 may include the main image or may not include the main image. Namely, the following two cases may exist:

case 1: at least one real-time image in step S203 includes a main image, and a target image is generated according to the main image;

case 2: at least one of the live images in step S203 does not include the main image, and the target image is generated according to the other live images.

In detail, for the above case 1:

in one embodiment of the present disclosure, the at least one live image includes the primary image.

In this case 1, the embodiment of the present disclosure may be applicable to at least a single scene or a combined scene in a directional beauty scene, a facial expression repeating scene, a limb movement repeating scene, and a gesture-assisted control scene.

In detail, for a directional beauty scene:

in an embodiment of the present disclosure, the step S203 of processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring the human face characteristics of the figures in the main image; and acquiring an image corresponding to the human face characteristics as the target image.

In this embodiment, after the main image is selected in step S202, the main image may be identified based on a face recognition algorithm to identify the face position of the person in the main image, so as to obtain the face features of the person, for example, the face features may be features of facial features of eyebrows, eyes, nose, mouth, and the like of a user who is a main player in the main image. In addition, the characteristics of the person such as gender, age and the like can be obtained.

Based on the facial features and the character features obtained through recognition, the face position can be beautified and rendered through a filter to obtain a target image, and meanwhile, other positions in the main image are not rendered. Therefore, when the live user introduces commodities in a live scene, the effect of ensuring that introduced commodities are not distorted can be achieved by only rendering the faces of the characters.

After the target image is obtained, the following step S204 may be executed to merge the main image and the target image to obtain an image obtained by rendering the face of the person in the main image, and the image is used as a real-time output image.

In detail, for a scene of a repetitive facial expression and/or a scene of a repetitive limb movement:

in an embodiment of the present disclosure, the step S203 of processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring a virtual image corresponding to a character in the main image; identifying motion characteristics of a person in the main image, wherein the motion characteristics comprise at least one of facial motion characteristics and limb motion characteristics; and controlling the virtual image and the character to perform synchronous action according to the action characteristics so as to generate a target image.

In this embodiment, after the main image is selected in step S202, the virtual image corresponding to the person in the main image is obtained. Such as gender, shape, etc. of the character in the primary image may be the same or similar to the corresponding character in the virtual image.

In addition, the motion characteristics of the person in the primary image, which may be facial motion characteristics and/or body motion characteristics in general, may be identified using a correspondence algorithm.

For the facial motion characteristics, the change of key characteristic points of the human face can be captured to perform the facial motion of the virtual human figure, and then a face model is reconstructed on the virtual image through three-dimensional modeling, so that the facial motion of the virtual anchor and the facial motion of the real anchor are synchronized.

For the body action characteristics, the change of key characteristic points of human body can be captured to repeatedly carve the body action through the virtual character, and then the body model is reconstructed on the virtual image through three-dimensional modeling, so that the body action of the virtual anchor and the body action of the real anchor are synchronous, and the real anchor can synchronously deduce the virtual character.

After the target image is obtained, the following step S204 may be executed to merge the main image and the target image to obtain an image of the live anchor and the corresponding virtual anchor in the same live image as the live output image.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a process of generating a target image. The live image at the upper left position in fig. 3 may be the main image, the live image at the lower left position in fig. 3 may be the target image, and the live image at the right position in fig. 3 may be the live output image. Therefore, the effect that the live anchor and the corresponding virtual anchor are live broadcast in the same real-time image at the same time can be achieved, and the watching experience of a user is improved.

In detail, for the gesture-assisted control scenario:

in an embodiment of the present disclosure, the step S203 of processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring action characteristics of a person in the main image, wherein the action characteristics comprise second gesture characteristics; and acquiring an image corresponding to the second gesture feature as the target image.

In this embodiment, the current action intention of the anchor in the main image can be recognized through a gesture recognition algorithm, and a target image can be obtained by performing a response operation according to the intention.

In this embodiment, the second gesture feature may be a gesture feature for processing the primary image, such as a gesture feature of a love gesture.

For example, a live user may issue a love-heart gesture during the live process, which may generally correspond to a character in the primary video. If the second action characteristic of the character in the main image for identifying the love-heart gesture is recognized, an image for automatically sending the commodity coupon can be generated as a target image.

After the target image is obtained, the following step S204 may be executed to combine the main image and the target image, and obtain an image of the anchor automatically sending the product coupon at love as a real-time output image.

In detail, for case 2 above:

in one embodiment of the present disclosure, the at least one live image does not include the primary image.

In this case 2, the embodiment of the present disclosure may be at least applicable to a single scene or a combined scene in a facial expression repeated scene, a limb action repeated scene, a gesture-assisted control scene, a virtual background scene, a multi-angle multi-detail display scene, and an image feature rendering scene.

In detail, for a facial expression restrike scene and/or a limb movement restrike scene:

unlike the corresponding scene in case 1 described above, in the present embodiment, the motion characteristics of the person may be recognized not from the main image but from other live images different from the main image. Wherein the characters and their actions in the other live images may be consistent with the characters and their actions in the main image.

For example, taking the content shown in fig. 3 as an example, the live broadcasting device 1000 may be connected with two cameras, one camera is used for live broadcasting a real scene to obtain a main image, the other camera is used for capturing a character motion, and performing virtual character modeling through an algorithm to obtain a target image, and finally, the two images are merged into a real-time output image for live broadcasting.

For the specific implementation process of generating the target image in the scene, reference may be made to the content in the corresponding scene, which is not described herein again.

In detail, for the gesture-assisted control scenario:

in an embodiment of the present disclosure, the step S203 of processing the acquired at least one real-time image according to a second preset rule to generate a target image includes: acquiring action characteristics of a person in the main image, wherein the action characteristics comprise first gesture characteristics; selecting a corresponding real-time image according to the first gesture feature; and intercepting a part of picture area of the corresponding real-time image as the target image.

In this embodiment, the current action intention of the anchor in the main image can be recognized through a gesture recognition algorithm, and a response operation is performed according to the intention to obtain the target image.

In this embodiment, the first gesture feature may be a gesture feature for processing other real-time images different from the main image, such as a gesture feature of a zoom-in lens gesture.

In particular, a live user may issue a zoom-in gesture during a live event, which may generally correspond to a person in the primary video. If the first action characteristic of the figure in the main image for identifying the gesture of the magnifying lens is recognized, the corresponding real-time image can be selected according to the first action characteristic and the object characteristic in the image, and then a partial picture area of the real-time image is captured to serve as a target image.

After the target image is obtained, the following step S204 may be executed to combine the main image and the target image to obtain a real-time output image, where the real-time output image includes two images, namely the main image and the target image, and when the person in the main image sends a zoom-in lens gesture, the target image displays the zoomed-in object.

For example, the live broadcasting device 1000 may be connected with two cameras, one camera is used for obtaining a main image corresponding to a live broadcasting user, the other camera is used for obtaining a target image corresponding to an article introduced by live broadcasting, and finally the two images are combined into a real-time output image for live broadcasting.

In detail, for a virtual background scene:

in one embodiment of the present disclosure, the synthesizing the primary image and the target image includes: and superposing the target image in a preset picture area of the main image, wherein the target image is used as a background, and the person in the main image is used as a foreground. After the superposition processing, other picture areas of the main image are different from the picture area of the target image.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a process of generating a target image. The live image at the left position in fig. 4 may be the main image, the live image at the middle position in fig. 4 may be used to generate the target image, and the live image at the right position in fig. 4 may be the real-time output image. Therefore, when the live broadcast user directly broadcasts in a scene with a green screen background, the real-time images collected by other cameras can be used as the background of the real-time images collected by the current camera, and the watching experience of the user and the using experience of the live broadcast user are improved.

For example, the live broadcasting device 1000 may be connected with two cameras, one camera is used for obtaining a main image corresponding to a live broadcasting user in a green screen background, the other camera is used for obtaining a target image corresponding to a background for an iterative green screen background, and finally the two images are merged into a real-time output image for live broadcasting.

In detail, for a multi-angle multi-detail presentation scenario:

in this embodiment, live equipment 1000 can be connected with a plurality of cameras, and different cameras can be corresponding to the different angles of same article to can realize multi-angle bandwagon effect, or can be corresponding to the different positions of same article, thereby can realize many details bandwagon effect.

For example, to multi-angle show, the clothing anchor can use a plurality of cameras to live from different angles simultaneously when live for the spectator who watches live can watch the dress effect under the different angles. In addition, the audience can select the concerned angle to watch, and the audience can freely select resources at different angles to watch.

For example, for multi-detail display, a main broadcast such as jewelry or a watch can use two cameras to carry out live broadcast simultaneously in the live broadcast process, one main broadcast for displaying a long-range view wears the overall image of the jewelry or the watch, and the other main broadcast is used for displaying the details of the jewelry or the watch in a short-range view.

In detail, for image feature rendering scenes:

in an embodiment of the present disclosure, the step S202 of selecting one live image from the acquired multiple live images as a main image according to a first preset rule includes: and selecting a real-time image containing preset image characteristics from the plurality of real-time images as a main image, wherein the preset image characteristics comprise at least one of human face characteristics, gesture characteristics, facial action characteristics and limb action characteristics.

In this embodiment, each acquired real-time image may be detected to detect whether the preset image features, such as a human face feature, a gesture feature, a facial motion feature, a limb motion feature, and the like, are included in the acquired real-time image. If a real-time image includes the predetermined image feature, the real-time image is used as a main image.

Based on this, in an embodiment of the present disclosure, the step S203 of processing the acquired at least one live image according to a second preset rule to generate a target image includes: acquiring feature information of the preset image features in the main image; and for any other real-time image which is different from the main image in the plurality of real-time images, rendering the other real-time images according to the characteristic information to obtain the target image.

In this embodiment, according to the feature information of the preset image feature in the main image, a corresponding rendering processing mode can be obtained, and then rendering processing can be performed on other real-time images according to the corresponding rendering processing mode, so as to obtain the target image.

Referring to fig. 5, fig. 5 shows a schematic diagram of a user terminal displaying real-time video with multiple camera sources. The live image at the left position in fig. 5 may be a main image, the live image at the right position in fig. 5 may be a target image, and the two images may be a live output image as a whole.

Based on the above, after the target video is obtained in step S203, the following step S204 may be executed to generate a live output video.

In step S204, the main image and the target image are synthesized to generate a real-time output image.

Step S205, transmitting the real-time output image.

In this step, the real-time output image generated in step S204 is transmitted to realize real-time live broadcast based on a plurality of camera sources. For example, the live broadcasting device 1000 transmits a live output video to the server 3000, so that the server 3000 distributes the received live output video to user terminals of respective users watching the live broadcasting. For example, the server 3000 generates a real-time output image and transmits the real-time output image to the user terminal.

As can be seen from the above, the embodiment of the present disclosure obtains real-time images corresponding to a plurality of different viewing angles of the same scene; selecting one real-time image from the acquired real-time images as a main image according to a first preset rule; processing the acquired at least one real-time image according to a second preset rule to generate a target image; synthesizing the main image and the target image to generate a real-time output image; and transmitting the real-time output image. Based on the above, the real-time images watched by the audience are obtained from a plurality of camera sources, so that the watching experience is better.

Fig. 6 is a schematic flowchart of a method for generating a real-time video with multiple camera sources according to an embodiment, and now, taking the multi-camera live broadcasting system 10 shown in fig. 1 as an example, a method for generating a real-time video with multiple camera sources according to the embodiment will be described with respect to the above-mentioned facial expression repeating scene and/or limb movement repeating scene. As shown in fig. 6, the method of this embodiment may include steps S601 to S607 as follows:

step S601, acquiring a real-time image of at least one view angle received through at least one expansion interface of a live broadcast device and a real-time image of at least one view angle acquired through at least one built-in camera of the live broadcast device.

The main execution body of this embodiment may be the server 3000, and the live device may be the live device 1000.

Step S602, selecting a real-time image including preset image features from the acquired real-time images as a main image, wherein the preset image features include at least one of a face feature, a gesture feature, a facial motion feature, and a limb motion feature.

In step S603, a virtual image corresponding to the person in the main image is acquired.

Step S604, identifying motion features of the person in the main image, where the motion features include facial motion features and body motion features.

Step S605, controlling the virtual image and the character to perform a synchronous motion according to the motion characteristics to generate a target image.

In step S606, the main image and the target image are synthesized to generate a real-time output image.

Step S607, transmitting the real-time output image.

The real-time video generation method with the plurality of camera sources, which is provided by the embodiment of the disclosure, can realize the effect that a live anchor and a corresponding virtual anchor are live in the same real-time image at the same time, and improve the watching experience of a user.

Fig. 7 is a flow diagram of a multi-camera live method according to one embodiment. As shown in fig. 7, the multi-camera live broadcasting method of this embodiment is implemented by a live broadcasting device, where the live broadcasting device is provided with at least one expansion interface for connecting an external camera, and the method may include the following steps S701 to S702:

and S701, acquiring a real-time image acquired by an external camera under the condition that the expansion interface is detected to be connected with the external camera.

The live device of the present embodiment may be the live device 1000 shown in fig. 1. The external camera of this embodiment may be the external camera 2000 shown in fig. 1.

For realizing live broadcast of many cameras, live broadcast equipment sets up the expansion interface who is used for connecting external camera, and expansion interface's quantity can be 1, also can be a plurality of, for example 2, 3, even more.

The equipment user who needs carry out many cameras live through live equipment can connect corresponding quantity in advance external camera to live equipment on to start live program. After the live program is started, the processor of the live device can detect the camera connection condition of each expansion interface. For any expansion interface, under the condition that the expansion interface is monitored to be connected with the external camera, the real-time image collected by the connected external camera can be obtained. When a plurality of expansion interfaces are connected with external cameras, real-time images with corresponding numbers can be obtained.

It should be noted that the live broadcast device may be a live broadcast all-in-one machine including a built-in camera. After the live broadcast program is started, the processor of the live broadcast equipment can also acquire real-time images acquired by the built-in camera. So, under the condition that live equipment is connected with N external cameras, live equipment can acquire N +1 real-time images, and these N +1 real-time images include the real-time image that built-in camera gathered. N is an integer.

Based on the above, in one embodiment of the present disclosure, the live device is provided with at least one built-in camera. Based on this, the method further comprises: and acquiring a real-time image acquired by the built-in camera. Wherein the at least two live images further comprise live images collected by the built-in camera.

After at least two real-time images are acquired, the following step S702 can be executed to cooperate with the server to realize live broadcasting with multiple cameras. The server may be the server 3000 shown in fig. 1.

Step S702, setting operation is executed for at least two acquired real-time images, so that a server executes live broadcast processing corresponding to the at least two real-time images, wherein the at least two real-time images comprise real-time images acquired by the external camera, and the live broadcast processing comprises distributing real-time output images obtained by processing the at least two real-time images according to set live broadcast processing rules.

In this embodiment, after acquiring at least two real-time images, the live broadcasting device cooperates with the server to realize live broadcasting with multiple cameras. In detail, the live device performs a setting operation to perform live processing corresponding to the at least two live videos by the server, and particularly, distributes a live output video by the server. In general, a server distributes live output video to a user terminal owned by a user viewing live video. The user terminal may be the user terminal 4000 shown in fig. 1.

In detail, the live output video distributed by the server is obtained by processing the at least two live videos, and may be specifically processed according to a set live processing rule.

It should be noted that the executing entity that processes the at least two live videos to obtain the live output video may be a live device or a server.

For example, case 1: in the case where the hardware configuration of the live device can support the live processing operation, whether to be executed by the live device or the server can be set as necessary.

As another example, case 2: in the case that the hardware configuration of the live broadcast device can only support the processing of a smaller number of live videos but is not enough to support the processing of a larger number of live videos, the live broadcast processing operation can be directly processed by the server, or the live broadcast processing operation is executed by the live broadcast device when the number of live videos is smaller, otherwise the live broadcast processing operation is executed by the processor.

Corresponding to case 2, in an embodiment of the present disclosure, in step S702, the performing the setting operation includes: detecting whether the number of the at least two real-time images is not greater than a set threshold value; determining that the live broadcast equipment executes the step of processing the at least two live videos according to a set live broadcast processing rule under the condition that the number is not larger than the set threshold value; and determining that the server executes the step of processing the at least two live videos according to the set live broadcast processing rule under the condition that the number is larger than the set threshold value. For example, the set threshold may be a positive integer such as 3 or 4.

Based on the above, for the case where live processing is performed by a live device:

in an embodiment of the present disclosure, in step S702, the performing the setting operation includes: processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images; and sending the real-time output image to the server.

In this embodiment, a live broadcast processing rule is pre-stored in the live broadcast device, and the live broadcast device processes the at least two real-time images according to the live broadcast processing rule to obtain real-time output images under the condition that the at least two real-time images are obtained, and sends the real-time output images to the server, so that the server distributes the received real-time output images to user terminals held by viewers watching live broadcast. And the user terminal further plays the received real-time output image so as to display the corresponding live broadcast content to the user.

Based on the above, for the case where live processing is performed by the processor:

in an embodiment of the present disclosure, in step S702, the performing the setting operation includes: and sending the at least two real-time images to the server. And the live broadcast processing further comprises processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images.

In this embodiment, a live broadcast processing rule is pre-stored in the server, and the live broadcast device sends the at least two real-time images to the server when acquiring the at least two real-time images, so that the server processes the at least two real-time images according to the live broadcast processing rule to obtain real-time output images, and distributes the real-time output images to user terminals held by viewers watching live broadcast. And the user terminal further plays the received real-time output image so as to display the corresponding live broadcast content to the user.

In this embodiment, for the processing operations involved in the process of processing a plurality of real-time images to obtain real-time output images, operations such as inserting advertisements, switching lenses, attaching subtitles, various rendering effects, and pushing streams of multiple images simultaneously may be performed freely, and operations such as face recognition, gesture recognition, and body motion recognition may also be performed on the real-time images based on an integrated AI algorithm. As such, the set live processing rules may be used to perform any one or more of these operations.

For example, a function module for performing operations such as rendering and merging of real-time images and an AI algorithm module for performing operations such as face recognition, gesture recognition and limb movement recognition on the real-time images may be set in the live broadcast device.

In one embodiment of the present disclosure, to illustrate one possible implementation of processing multiple live videos to obtain live output videos, the live processing rule includes a frame merging rule. Based on this, the processing the at least two live videos according to the set live broadcast processing rule includes: and combining at least two real-time images according to the picture combination rule, and taking the combined real-time image as the real-time output image.

In detail, in the case that the live broadcast processing rule only includes the frame merging rule, the specific implementation of merging at least two live videos may be to directly merge the live videos acquired by the live broadcast device. As shown in fig. 8, fig. 8 is a schematic diagram of a user terminal displaying a multi-camera live view, where the multi-camera live view may be a video image obtained by directly combining live images acquired by live devices.

In addition, when the live broadcast processing rule includes not only the frame merging rule but also other processing rules (such as rendering processing rules, feature recognition rules, and the like), the specific implementation of merging at least two live videos may be to merge live videos obtained after feature recognition, rendering, and the like.

In detail, one live video includes video images with different timestamps, and the merging of multiple live videos is performed to merge the video images with the same timestamp in each live video, where the merged live video includes the merged video images with different timestamps. Referring to fig. 8, fig. 8 is a schematic diagram of a user terminal displaying a multi-camera live view, where the multi-camera live view can be a merged video image under a timestamp.

In one embodiment of the present disclosure, to illustrate one possible implementation of processing multiple live videos to obtain live output videos, the live processing rules include feature recognition rules. Based on this, in step S702, the processing the at least two live videos according to the set live broadcast processing rule to obtain a live output video includes: identifying whether the at least two real-time images comprise real-time images with preset image characteristics according to the characteristic identification rule; and under the condition that the real-time images with preset image characteristics are included, processing the at least two real-time images according to the characteristic identification rule to obtain the real-time output images.

In one embodiment of the present disclosure, the feature may be a face feature, a gesture feature, a facial motion feature, a limb motion feature, and the like.

Taking the human face feature as an example, the preset image feature may be a human face feature, and the real-time image with the preset image feature may be a real-time image including a human face image. Suppose a real-time image containing face features is identified, and the identification result of the face features is as follows: if the gender is female and the age is 5 years old, a subtitling process may be performed on the live video, such as a subtitle content "call me little princess! And combining the real-time image processed by pasting the subtitle with other real-time images to obtain a real-time output image.

Taking the gesture feature as an example, the preset image feature may be a gesture feature, and the real-time image with the preset image feature may be a real-time image including a gesture image. Assume that a real-time image including gesture features is recognized, and the recognition result of the gesture features is: the love gesture can be performed on other real-time images, such as adding a string of love special effects, and the real-time images after image processing and the real-time images containing the love gesture images are combined to obtain real-time output images. As shown in fig. 5, fig. 5 is a schematic diagram of a user terminal displaying a multi-camera live view, where the currently displayed multi-camera live view is a real-time image at a time stamp in the real-time output image.

Based on the foregoing, in an embodiment of the present disclosure, the processing the at least two live images according to the feature recognition rule to obtain the live output image includes: determining an identification result corresponding to the preset image characteristics according to the characteristic identification rule; rendering any other real-time image different from the first real-time image in the at least two real-time images according to the identification result to obtain a second real-time image, wherein the first real-time image is a real-time image with preset image characteristics; and merging the first real-time image and each second real-time image to obtain the real-time output image.

In this embodiment, an identification result corresponding to the preset image feature is determined according to the feature identification rule, then rendering processing is performed on other real-time images according to the identification result, and then the rendered real-time images and the real-time images corresponding to the image feature are merged to obtain a real-time output image.

In an embodiment of the present disclosure, to illustrate a possible implementation manner for acquiring a real-time image of an external camera by a live broadcast device, the extended interface is a USB interface; the external camera is a camera adopting a USB interface.

In an embodiment of the present disclosure, taking an Android system as an example, before the obtaining of the real-time image collected by the external camera, the method further includes: and a Hardware Abstraction Layer (HAL Layer) of a camera subsystem of an Android system of the live broadcast equipment is adapted with a UVC (USB Video Class) protocol.

The acquiring of the real-time image collected by the external camera comprises the following steps: and acquiring a real-time image acquired by the external Camera based on the UVC protocol by utilizing a Camera2 API of an Android system.

In detail, the hardware abstraction layer is an interface layer between the operating system, kernel and hardware circuitry, which aims at abstracting the hardware. The UVC protocol is a protocol standard defined for USB video capture devices.

In this embodiment, the live broadcast device may adopt an Android platform, adapt a UVC protocol to an HAL layer of an Android Camera, and obtain a real-time image acquired by the connected external USB Camera by using a Camera2 API of an Android standard after the adaptation.

For the case where live processing is performed by a live device, in an embodiment of the present disclosure, the method further includes: responding to the operation of configuring the live broadcast processing rule, and providing a configuration interface; acquiring configuration information input through the configuration interface; and setting the live broadcast processing rule according to the configuration information.

When the live broadcast equipment executes the live broadcast processing, a user of the live broadcast equipment can set the live broadcast processing rule on the live broadcast equipment as required. The set live broadcast processing rules may include: the system is used for executing corresponding rules of operations such as advertisement insertion, lens switching, caption pasting, various rendering effects, multi-picture simultaneous stream pushing, face recognition, gesture recognition, limb action recognition and the like on real-time images.

In detail, the live device may provide the above-described configuration interface through its display screen.

To sum up, when the live broadcast user needs to carry out the live broadcast of many cameras, only need to peg graft each external camera to the corresponding expansion interface of live broadcast equipment, can realize that many cameras are live broadcast, easy operation need not to involve professional equipment and professional, for example need not the live broadcast user oneself and do professional operations such as confluence of a plurality of real-time images, so can popularize on a large scale, be applicable to ordinary live broadcast user's the live broadcast demand of many cameras.

Taking an Android system as an example, fig. 9 is a schematic flow chart of a multi-camera live broadcast method according to an embodiment, and now, taking the multi-camera live broadcast system 10 shown in fig. 1 as an example, for a case where live broadcast processing is executed by live broadcast equipment, the multi-camera live broadcast method according to the embodiment is described. As shown in fig. 3, the method of this embodiment may include the following steps S901 to S907:

step S901, adapting a UVC protocol to a hardware abstraction layer of a camera subsystem of an Android system of the live device.

The live broadcast equipment can be a live broadcast all-in-one machine.

And step S902, correspondingly inserting a plurality of external cameras adopting USB interfaces into a plurality of USB expansion interfaces which are arranged on the live broadcast equipment and used for connecting the external cameras.

Step S903, for any USB expansion interface, under the condition that live broadcast equipment detects that the USB expansion interface is connected with the external Camera, the real-time image collected by the external Camera is obtained based on the UVC protocol by using Camera2 API of an Android system.

And step S904, the live broadcast equipment acquires real-time images acquired by a built-in camera of the live broadcast equipment.

Step S905, for the at least two acquired live images, the live broadcast device processes the at least two live images according to a set live broadcast processing rule to obtain a live output image.

The at least two real-time images include a real-time image collected by the connected external camera and a real-time image collected by the internal camera.

Step S906, the live broadcast equipment sends the real-time output image to a server.

Step S907, the server distributes the real-time output image to the user terminal.

In the step, the live video content is distributed to users watching the live broadcast by the cloud server.

In this embodiment, a plurality of cameras can be opened simultaneously to live equipment to can realize the platform plug flow simultaneously in the leading-in live equipment of the real-time image of a plurality of cameras, realize that many cameras are live promptly. The embodiment can solve the problem that the existing live broadcast equipment can only use one camera, so that the operations of switching machine positions and multiple pictures and pushing streams and the like during live broadcast cannot be realized.

Fig. 10 is a functional block diagram of a real-time video production device 100 having multiple camera sources according to one embodiment. As shown in fig. 10, the real-time video generation apparatus 100 having a plurality of camera sources includes: an image acquisition module 1001, a first processing module 1002, a second processing module 1003, a composition module 1004, and a transmission module 1005. The real-time video generation apparatus 100 having a plurality of camera sources may be the live device 1000 or the server 3000 in fig. 1.

The image obtaining module 1001 obtains real-time images corresponding to a plurality of different viewing angles of a scene. The first processing module 1002 selects one live image from the acquired live images as a main image according to a first preset rule. The second processing module 1003 processes the acquired at least one real-time image according to a second preset rule to generate a target image. The composition module 1004 composes the primary image and the target image to generate a real-time output image. The transmitting module 1005 transmits the live output video.

In an embodiment of the present disclosure, the second preset rule is a rule set according to at least one of an image view angle, a face feature of a person in the image, a facial motion feature of a person in the image, a gesture feature of a person in the image, a limb motion feature of a person in the image, an object feature in the image, and user input information.

In an embodiment of the present disclosure, the composition module 1004 superimposes the target image on a preset frame region of the main image, wherein the target image is used as a background, and a person in the main image is used as a foreground.

In an embodiment of the present disclosure, the second processing module 1003 acquires a virtual image corresponding to a character in the main image; identifying motion characteristics of a person in the main image, wherein the motion characteristics comprise at least one of facial motion characteristics and limb motion characteristics; and controlling the virtual image and the character to perform synchronous action according to the action characteristics so as to generate a target image.

In an embodiment of the present disclosure, the second processing module 1003 acquires motion characteristics of a person in the primary image, where the motion characteristics include a first gesture characteristic; selecting a corresponding real-time image according to the first gesture feature; and intercepting a part of picture area of the corresponding real-time image as the target image.

In an embodiment of the present disclosure, the second processing module 1003 obtains motion features of a person in the main image, where the motion features include a second gesture feature; and acquiring an image corresponding to the second gesture feature as the target image.

In an embodiment of the present disclosure, the second processing module 1003 obtains a face feature of a person in the main image; and acquiring an image corresponding to the human face characteristics as the target image.

In an embodiment of the present disclosure, the first processing module 1002 selects a live image including a preset image feature as a main image from the plurality of live images, where the preset image feature includes at least one of a face feature, a gesture feature, a facial motion feature and a limb motion feature.

In an embodiment of the present disclosure, the second processing module 1003 obtains feature information of the preset image feature in the main image; and rendering any other real-time image which is different from the main image in the plurality of real-time images according to the characteristic information to obtain the target image.

In an embodiment of the present disclosure, the image obtaining module 1001 obtains a real-time image of at least one view angle received through at least one extended interface of a live device and a real-time image of at least one view angle collected through at least one built-in camera of the live device.

Fig. 11 is a hardware configuration diagram of a real-time video generating apparatus 110 having a plurality of imaging sources according to another embodiment. As shown in fig. 11, the real-time video generation apparatus 110 with multiple camera sources comprises a processor 1101 and a memory 1102, the memory 1102 being configured to store an executable computer program, the processor 1101 being configured to execute a method according to the method embodiment shown in fig. 1 as described above under the control of the computer program.

The real-time video generating apparatus 110 with multiple camera sources may be the live device 1000 or the server 3000 in fig. 1.

The modules of the real-time video generation apparatus 110 having a plurality of image capturing sources may be realized by the processor 1101 in the present embodiment executing a computer program stored in the memory 1102, or may be realized by another circuit configuration, which is not limited herein.

Fig. 12 is a functional block diagram of a live device 120 according to one embodiment. As shown in fig. 12, the live device 120 may include at least one expansion interface 1201 for connecting an external camera, an obtaining module 1202, and a processing module 1203. The live device 120 may be the live device 1000 of fig. 1.

The obtaining module 1202 obtains a real-time image collected by an external camera when detecting that the expansion interface 1201 is connected with the external camera. The processing module 1203 executes a setting operation for at least two acquired real-time images, so that a server executes live broadcast processing corresponding to the at least two real-time images, where the at least two real-time images include real-time images acquired by the external camera, and the live broadcast processing includes distributing real-time output images obtained by processing the at least two real-time images according to a set live broadcast processing rule.

In one embodiment of the present disclosure, the live device 120 is provided with at least one built-in camera. The live broadcast device 120 further includes a module for acquiring real-time images collected by the built-in camera. Wherein the at least two live images further comprise live images collected by the built-in camera.

In an embodiment of the present disclosure, the processing module 1203 processes the at least two live videos according to the live broadcast processing rule to obtain the live output video; and sending the real-time output image to the server.

In an embodiment of the present disclosure, the processing module 1203 sends the at least two live images to the server; and the live broadcast processing further comprises processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images.

In one embodiment of the present disclosure, the live processing rule includes a picture merging rule. The processing the at least two real-time images according to the set live broadcast processing rule comprises: and combining at least two real-time images according to the picture combination rule, and taking the combined real-time image as the real-time output image.

In an embodiment of the present disclosure, the processing module 1203 detects whether the number of the at least two real-time images is not greater than a set threshold; determining that the step of processing the at least two live videos according to the set live processing rule is executed by the live device 120 if the number is not greater than the set threshold; and determining that the server executes the step of processing the at least two live videos according to the set live broadcast processing rule under the condition that the number is larger than the set threshold value.

In one embodiment of the present disclosure, the live device 120 further includes a module, which provides a configuration interface in response to an operation of configuring the live processing rule; acquiring configuration information input through the configuration interface; and setting the live broadcast processing rule according to the configuration information.

Fig. 13 is a hardware configuration diagram of a live device 130 according to another embodiment. As shown in fig. 13, the live device 130 includes a processor 1301, a memory 1302 and at least one extension interface 1303 for connecting an external camera, where the extension interface 1303 is connected to the processor 1301, the memory 1302 is used for storing an executable computer program, and the processor 1301 is used for executing the method according to any of the above method embodiments according to the control of the computer program.

The live device 130 may be the live device 1000 of fig. 1.

The modules of the live device 130 may be implemented by the processor 1301 in the present embodiment executing a computer program stored in the memory 1302, or may be implemented by other circuit structures, which is not limited herein.

Furthermore, an embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the method according to any one of the embodiments of the present disclosure.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method of real-time video generation having a plurality of camera sources, comprising:

acquiring real-time images corresponding to a plurality of different visual angles of the same scene;

selecting one real-time image from the acquired real-time images as a main image according to a first preset rule;

processing the acquired at least one real-time image according to a second preset rule to generate a target image;

synthesizing the main image and the target image to generate a real-time output image;

and transmitting the real-time output image.

2. The method of claim 1, wherein the first predetermined rule is a rule set according to at least one of an image viewing angle, a facial feature of a person in the image, a gesture feature of a person in the image, and user input information.

3. The method of claim 1, wherein the second predetermined rule is a rule set according to at least one of an image view angle, a face feature of a person in the image, a facial motion feature of a person in the image, a gesture feature of a person in the image, a body motion feature of a person in the image, an object feature in the image, and user input information.

4. The method of claim 1, wherein the at least one live image does not include the primary image.

5. The method of claim 1, wherein said compositing said primary image and said target image comprises:

and superposing the target image in a preset image area of the main image, wherein the target image is used as a background, and the person in the main image is used as a foreground.

6. The method according to claim 1, wherein the processing the acquired at least one live image according to the second preset rule to generate the target image comprises:

acquiring a virtual image corresponding to a character in the main image;

identifying motion characteristics of a person in the main image, wherein the motion characteristics comprise at least one of facial motion characteristics and limb motion characteristics;

and controlling the virtual image and the character to perform synchronous action according to the action characteristics so as to generate a target image.

7. The method according to claim 1, wherein the processing the acquired at least one live image according to the second preset rule to generate the target image comprises:

acquiring action characteristics of a person in the main image, wherein the action characteristics comprise first gesture characteristics;

selecting a corresponding real-time image according to the first gesture feature;

and intercepting a part of picture area of the corresponding real-time image as the target image.

8. The method according to claim 1, wherein the processing the acquired at least one live image according to the second preset rule to generate the target image comprises:

acquiring action characteristics of a person in the main image, wherein the action characteristics comprise second gesture characteristics;

and acquiring an image corresponding to the second gesture feature as the target image.

9. The method according to claim 1, wherein the processing the acquired at least one live image according to the second preset rule to generate the target image comprises:

acquiring the human face characteristics of the figures in the main image;

and acquiring an image corresponding to the human face characteristics as the target image.

10. The method according to claim 1, wherein the selecting one live image from the plurality of acquired live images as a main image according to a first preset rule includes:

and selecting a real-time image containing preset image characteristics from the plurality of real-time images as a main image, wherein the preset image characteristics comprise at least one of human face characteristics, gesture characteristics, facial action characteristics and limb action characteristics.

11. The method according to claim 10, wherein the processing the acquired at least one live image according to the second preset rule to generate the target image comprises:

acquiring feature information of the preset image features in the main image;

and for any other real-time image which is different from the main image in the plurality of real-time images, rendering the other real-time images according to the characteristic information to obtain the target image.

12. The method of claim 1, wherein said acquiring live imagery corresponding to a plurality of different perspectives of a same scene comprises:

the method comprises the steps of obtaining real-time images of at least one visual angle received through at least one expansion interface of live broadcast equipment and real-time images of at least one visual angle collected through at least one built-in camera of the live broadcast equipment.

13. A multi-camera live broadcasting method is implemented by live broadcasting equipment, the live broadcasting equipment is provided with at least one expansion interface for connecting an external camera, and the method comprises the following steps:

under the condition that the expansion interface is detected to be connected with an external camera, acquiring a real-time image acquired by the external camera;

the method comprises the steps of executing setting operation aiming at least two acquired real-time images so as to execute live broadcast processing corresponding to the at least two real-time images by a server, wherein the at least two real-time images comprise real-time images collected by an external camera, and the live broadcast processing comprises distributing real-time output images obtained by processing the at least two real-time images according to set live broadcast processing rules.

14. The method of claim 13, wherein the live device is provided with at least one built-in camera, the method further comprising:

acquiring a real-time image acquired by the built-in camera;

wherein the at least two real-time images further comprise real-time images acquired by the built-in camera.

15. The method of claim 13, wherein the performing a setting operation comprises:

processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images;

sending the real-time output image to the server; alternatively, the first and second electrodes may be,

the executing the setting operation includes:

sending the at least two real-time images to the server; and the live broadcast processing further comprises processing the at least two real-time images according to the live broadcast processing rule to obtain the real-time output images.

16. The method of claim 13, wherein the live processing rules include picture merge rules;

the processing the at least two real-time images according to the set live broadcast processing rule comprises the following steps:

and combining at least two real-time images according to the picture combination rule, and taking the combined real-time image as the real-time output image.

17. The method of claim 13, wherein the performing a setting operation comprises:

detecting whether the number of the at least two real-time images is not greater than a set threshold value;

determining that the live broadcast equipment executes the step of processing the at least two live videos according to a set live broadcast processing rule under the condition that the number is not larger than the set threshold value;

and determining that the server executes the step of processing the at least two live videos according to the set live broadcast processing rule when the number is larger than the set threshold value.

18. The method of any of claims 13 to 17, wherein the method further comprises:

responding to the operation of configuring the live broadcast processing rule, and providing a configuration interface;

acquiring configuration information input through the configuration interface;

and setting the live broadcast processing rule according to the configuration information.

19. A real-time video generation apparatus having a plurality of camera sources, comprising:

the image acquisition module is used for acquiring real-time images corresponding to a plurality of different visual angles of the same scene;

the first processing module is used for selecting one real-time image from the acquired real-time images as a main image according to a first preset rule;

the second processing module is used for processing the acquired at least one real-time image according to a second preset rule to generate a target image;

a synthesis module for synthesizing the main image and the target image to generate a real-time output image; and (c) a second step of,

and the transmission module is used for transmitting the real-time output image.

20. A real-time video generation apparatus having a plurality of camera sources, comprising a memory for storing a computer program and a processor for executing the computer program to implement the method according to any one of claims 1 to 12.

21. A live device, comprising:

the expansion interface is used for connecting the external camera;

the acquisition module is used for acquiring a real-time image acquired by the external camera under the condition that the expansion interface is detected to be connected with the external camera;

the processing module is used for executing setting operation aiming at the acquired at least two real-time images so as to execute live broadcast processing corresponding to the at least two real-time images by the server, wherein the at least two real-time images comprise the real-time images acquired by the external camera, and the live broadcast processing comprises the step of distributing real-time output images obtained by processing the at least two real-time images according to set live broadcast processing rules.

22. A live broadcast device comprises a memory, a processor and at least one expansion interface used for connecting an external camera, wherein the expansion interface is connected with the processor, and the memory is used for storing a computer program; the processor is adapted to execute the computer program to implement the method according to any of claims 13-18.

23. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-18.