CN113315940A

CN113315940A - Video call method, device and computer readable storage medium

Info

Publication number: CN113315940A
Application number: CN202110310982.4A
Authority: CN
Inventors: 岳晓峰; 吴魁; 钟文亮; 杨春晖
Original assignee: Hainan Shilian Communication Technology Co ltd
Current assignee: Hainan Shilian Communication Technology Co ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-08-27

Abstract

The embodiment of the invention provides a video call method, a video call device and a computer readable storage medium, wherein the method comprises the following steps: after the video call connection is successfully established, respectively starting a first camera and a second camera of the mobile terminal; meanwhile, calling a first camera to collect a first video data stream, and calling a second camera to collect a second video data stream; respectively caching the first video data stream into a first cache region, and caching a second video data stream into a second cache region; video images are respectively extracted from the first cache area and the second cache area, and video image synthesis is carried out so as to combine the first video data stream and the second video data stream into a video data stream; after the synthesized video data stream is coded, the video data stream is sent to a video networking server through the video networking, and then sent to video networking opposite-end equipment through the video networking server, more visual information can be sent to the opposite-end equipment, and the video call experience is improved.

Description

Video call method, device and computer readable storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a video call method, an apparatus, and a computer-readable storage medium.

Background

The current video network video telephone function is that both parties of a conversation start a local camera, transmit pictures to the video network, and reach opposite terminal equipment after being transferred by a video network server. For special fixed video network terminal equipment, only one high-definition camera is generally equipped, and one path of video pictures are output in the video call process. For a video network mobile terminal, a front camera and a rear camera are generally arranged, when the video network mobile terminal is used for video telephone at present, the front camera is started by default, a video picture is collected through the front camera, a user can manually switch from the front camera to the rear camera in the video call process, and the video picture is collected through the rear camera.

In the existing video telephone scheme, a video network mobile terminal can only collect one path of video picture and send the video picture to opposite terminal equipment, and the amount of received visual information is small for an opposite terminal equipment user.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a video call method and a corresponding video call apparatus that overcome or at least partially solve the above problems.

In order to solve the above problem, an embodiment of the present invention discloses a video call method, wherein the video call method is applied to a video network mobile terminal, and the method includes:

after the video call connection is successfully established, respectively starting a first camera and a second camera of the mobile terminal;

simultaneously calling the first camera to acquire a first video data stream, and calling the second camera to acquire a second video data stream;

respectively caching the first video data stream into a first cache region, and caching the second video data stream into a second cache region;

video images are respectively extracted from the first cache area and the second cache area, and video image synthesis is carried out, so that the first video data stream and the second video data stream are combined into a video data stream;

and after the synthesized video data stream is coded, the video data stream is sent to a video networking server through the video networking so as to be sent to video networking opposite-end equipment through the video networking server.

Optionally, before the step of separately turning on the first camera and the second camera of the mobile terminal, the method further includes:

after the video call connection is successfully established, determining whether to start a multi-channel video mode according to the historical video call record;

and under the condition that the multi-channel video mode is determined to be started, executing the step of respectively starting the first camera and the second camera of the mobile terminal.

after the video call connection is successfully established, outputting a selection prompt box, wherein the selection prompt box is used for prompting a user to start a multi-channel video mode;

receiving a first input of a user to the selection prompt box, wherein the first input is used for starting a multi-channel video mode;

and responding to the first input, and executing the step of respectively starting the first camera and the second camera of the mobile terminal.

Optionally, the step of extracting video images from the first buffer area and the second buffer area respectively to perform video image synthesis includes:

extracting a first video image with the shortest storage time from the first cache region;

extracting a second video image with the shortest storage time from the second cache region;

and zooming and splicing the first video image and the second video image to generate a composite video image with a preset resolution.

Optionally, before the step of scaling and splicing the first video image and the second video image to generate a composite video image with a preset resolution, the method further includes:

outputting a plurality of composite video image patterns;

and receiving the selection operation of the target synthesized video image style by the user.

Optionally, the step of sending the synthesized video data stream to a video networking server through a video networking after encoding the synthesized video data stream includes:

reading a synthesized video image according to a preset frame rate;

and coding the read synthesized video image according to a preset coding mode, and then sending the coded video image to a video network server through a video network.

In order to solve the above problem, an embodiment of the present invention discloses a video call device, where the video call device is applied to a video networking mobile terminal, and the device includes:

the starting module is used for respectively starting a first camera and a second camera of the mobile terminal after the video call connection is successfully established;

the first calling module is used for calling the first camera to acquire a first video data stream and calling the second camera to acquire a second video data stream;

the storage module is used for respectively caching the first video data stream into a first cache region and caching the second video data stream into a second cache region;

the synthesis module is used for extracting video images from the first cache region and the second cache region respectively and synthesizing the video images so as to combine the first video data stream and the second video data stream into a video data stream;

and the coding module is used for coding the synthesized video data stream, and then sending the coded video data stream to a video networking server through the video networking so as to send the coded video data stream to video networking opposite-end equipment through the video networking server.

Optionally, the apparatus further comprises:

the determining module is used for determining whether to start a multi-channel video mode according to historical video call records before the starting module respectively starts the first camera and the second camera of the mobile terminal after the video call connection is successfully established;

and the second calling module is used for calling the starting module to respectively start the first camera and the second camera of the mobile terminal under the condition that the determining module determines to start the multi-channel video mode.

Optionally, the apparatus further comprises:

the output module is used for outputting a selection prompt box after the video call connection is successfully established and before the starting module respectively starts the first camera and the second camera of the mobile terminal, wherein the selection prompt box is used for prompting a user to start a multi-channel video mode;

the receiving module is used for receiving a first input of the user for the selection prompt box, wherein the first input is used for starting a multi-channel video mode;

and the third calling module is used for calling the starting module to respectively start the first camera and the second camera of the mobile terminal in response to the first input.

Optionally, the synthesis module comprises:

the first submodule is used for extracting the first video image with the shortest storage time from the first cache region;

the second submodule is used for extracting a second video image with the shortest storage time from the second cache region;

and the synthesis submodule is used for zooming and splicing the first video image and the second video image to generate a synthesized video image with preset resolution.

Optionally, the synthesis module further comprises:

the pattern recommendation submodule is used for outputting a plurality of synthesized video image patterns before the step of zooming and splicing the first video image and the second video image by the synthesis submodule to generate a synthesized video image with a preset resolution;

and the operation receiving submodule is used for receiving the selection operation of the user on the target synthesized video image style.

Optionally, the encoding module comprises:

the reading submodule is used for reading the synthesized video image according to a preset frame rate;

and the sending submodule is used for coding the read synthesized video image according to a preset coding mode and then sending the coded synthesized video image to a video network server through a video network.

In order to solve the above problem, an embodiment of the present invention discloses a video call device, including:

one or more processors; and

one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform any of the video telephony methods described above.

In order to solve the above problem, an embodiment of the present invention discloses a computer-readable storage medium, wherein a stored computer program causes a processor to execute any one of the above video call methods.

The embodiment of the invention has the following advantages:

compared with the existing video call method which only acquires a single-path video data stream, the method and the device can send more visual information to the opposite terminal device, and the video call experience is improved. In addition, in the embodiment of the invention, the video data streams collected by the cameras are cached in the cache region, and the video images are extracted from the cache region for synthesis, so that two paths of video data streams collected by the two cameras can be synchronously synthesized.

Drawings

Fig. 1 is a block diagram of a video network mobile terminal according to the present invention;

FIG. 2 is a flow chart of steps of a video call method embodiment of the present invention;

FIG. 3 is a block diagram of a video call device according to an embodiment of the present invention;

FIG. 4 is a networking schematic of a video network of the present invention;

FIG. 5 is a schematic diagram of a hardware architecture of a node server according to the present invention;

fig. 6 is a schematic diagram of a hardware structure of an access switch of the present invention;

fig. 7 is a schematic diagram of a hardware structure of an ethernet protocol conversion gateway according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Video networking conference system: the video networking conference system is a real-time high-definition conference system which is constructed by corresponding management software and a client and supports the access of various special terminals and mobile terminals on the basis of high-definition audio and video transmission of a video networking network. The main functions include: the method comprises the following steps of establishing a conference, carrying out video call, releasing live broadcast, watching live broadcast and the like, wherein the related applications comprise: a pamier conference control end, a conference scheduling service end, a conference management network background and the like. The supported hardware terminal comprises: aurora/lighting series hardware terminal, palm mobile terminal, parmeter mobile terminal conference panel, etc.

Pamier: the video network conference scheduling system client is client software running on a personal computer platform and used for conference reservation, conference process management and control, conference starting and stopping, speaker switching and split screen mode switching. The front end of the conference system is mainly provided with an operation module which can completely control the whole process of the conference.

A pamier mobile terminal: the video network mobile conference terminal is a Pamier mobile conference terminal which runs on a mobile platform through part of simplified functions, a hardware platform is usually a flat plate of an android system or a mobile phone and the like, the hardware platform can be connected to a video network conference system management terminal through a wireless network to perform conference control and participate in conference operations, and the video network mobile conference terminal also has the functions of live broadcasting, video telephone and the like. The below described internet of view tablet or mobile device by default refers to a pamil mobile terminal.

Video networking videophone: the video networking conference system has a function of a video networking conference system, the video networking conference system can comprise various types of video networking terminals, the video networking conference system comprises an aurora and a kindergarten series multi-channel high-definition video special terminal independently researched and developed by the company, and further comprises mobile terminals such as a mobile phone and a tablet, the video telephone function is that any terminal is used, video telephone services of any other single video networking terminal are dialed through a video networking number, and point-to-point high-definition video call can be carried out.

The invention is used in the video telephone service of the mobile conference terminal in the Pamil, and is started by the video telephone service module.

The video network mobile terminal, namely the Pamier mobile conference terminal, comprises at least two cameras, namely a front camera and a rear camera. An exemplary view-networking mobile terminal is shown in fig. 1.

As shown in fig. 1, the video network mobile terminal includes: the video processing system comprises a front camera, a rear camera, two video acquisition modules, a picture synthesis module, a video encoding module, a video output module, a video telephone control module, a video receiving decoding module and an opposite-end video local output module.

The video call method provided by the embodiment of the invention runs on the video network mobile terminal shown in the figure 1, and is controlled by a Pamil mobile terminal software installed on the video network mobile terminal, the software simultaneously opens the front camera and the rear camera of the mobile terminal, calls two video acquisition modules to respectively acquire two video pictures acquired by the front camera and the rear camera, synthesizes the two camera video pictures into one video picture through a picture synthesis module to output a video data stream, codes the synthesized video data stream through a video coding module, and finally outputs the coded video data stream through a video output module. After receiving the video data stream, the opposite terminal equipment performs video decoding on the received video data stream, and then two paths of video pictures can be displayed simultaneously, so that multi-view call can be performed. The mode uses the innovative mobile terminal video call technology, and the practical effect of video call of the mobile terminal of the video network can be greatly improved. The video call flow provided by the embodiment of the invention is as follows:

referring to fig. 2, a flowchart illustrating steps of an embodiment of a video call method according to the present invention is shown, where the method may be applied to a video networking mobile terminal, and specifically may include the following steps:

step 201: and after the video call connection is successfully established, respectively starting a first camera and a second camera of the mobile terminal.

The video call may refer to a video call during a video phone call, or may refer to a video call established through a third-party communication application.

In the embodiment of the present invention, the mobile terminal includes a front camera as a first camera, and a rear camera as a second camera. In the actual implementation process, if the mobile terminal comprises a plurality of cameras, three or more cameras can be started to acquire video data streams.

Step 202: meanwhile, a first camera is called to collect a first video data stream, and a second camera is called to collect a second video data stream.

Two paths of video data streams can be collected by the two cameras, and compared with the mode that only one camera is opened to collect one path of video data stream, the visual information collection amount can be increased.

Step 203: and respectively caching the first video data stream into a first cache region and caching the second video data stream into a second cache region.

The first cache region and the second cache region are two independent cache regions, and the sizes of the first cache region and the second cache region may be set by a person skilled in the art according to actual needs, which is not specifically limited in the embodiment of the present invention.

Step 204: and respectively extracting the video images from the first buffer area and the second buffer area, and carrying out video image synthesis so as to combine the first video data stream and the second video data stream into a video data stream.

When the extracted two frames of video images are synthesized, the two frames of video images can be synthesized into any appropriate picture mode, which is not limited in the embodiment of the present invention. For example: and carrying out scaling superposition on the two frames of video images on the same picture to form a mode of a big picture or a left picture or a right picture.

In the embodiment of the invention, the video data streams collected by the cameras are cached in the cache region, and the video images are extracted from the cache region for synthesis, so that two paths of video data streams collected by the two cameras can be synchronously synthesized.

Step 205: and after the synthesized video data stream is coded, the video data stream is sent to a video networking server through the video networking so as to be sent to video networking opposite-end equipment through the video networking server.

The synthesized video data stream is a video data stream, and the encoding mode thereof may be according to any existing encoding mode, and the specific encoding mode in the embodiment of the present invention is not specifically limited.

The opposite terminal equipment is used for participating in the video call, the opposite terminal equipment decodes the video data stream after receiving the coded video data stream sent by the video network server, the decoded video stream picture is displayed on the display screen, and the displayed picture comprises two paths of video pictures collected by the first camera and the second camera of the mobile terminal sending the video data stream.

In an optional embodiment, before the first camera and the second camera of the mobile terminal are respectively turned on, the embodiment of the present invention may further include the following process:

after the video call connection is successfully established, determining whether to start a multi-channel video mode according to the historical video call record; and under the condition that the multi-channel video mode is determined to be started, executing the step of respectively starting a first camera and a second camera of the mobile terminal.

When determining whether to start the multi-channel video mode according to the historical video call record, determining whether to start the multi-channel video mode when the video call is carried out last time through the historical video call record, and if so, determining to start the multi-channel video mode; otherwise, the multi-channel video mode is not started. The probability of starting a multi-channel video mode during recent video call can be counted through historical video call records, and the opposite-channel video mode is determined to be started under the condition that the probability is greater than the preset probability; otherwise, the multi-channel video mode is not started. And under the condition that the multi-channel video mode is determined not to be started, only starting the front camera to collect the video data stream by default.

In the optional embodiment, whether to start the multi-channel video mode can be determined according to the historical use habit of the user, so that the video call experience of the user can be improved.

In an optional embodiment, before the first camera and the second camera of the mobile terminal are respectively turned on, the implementation of the present invention may further include the following process:

firstly, after the video call connection is successfully established, outputting a selection prompt box;

the selection prompt box is used for prompting a user to start a multi-channel video mode, a first option and a second option can be set in the selection prompt box, the first option is used for prompting the user to start the multi-channel video mode, and the second option is used for prompting the user to carry out video call in a single-channel video mode.

Secondly, receiving a first input of a user for selecting a prompt box;

wherein the first input is used for starting the multi-channel video mode, and the first input can be selected operation of a first option by a user, such as: the specific operation of the first input is not specifically limited in the embodiment of the present invention, such as long-pressing the first option, double-clicking the first option, or clicking the first option.

And finally, responding to the first input, and executing the step of respectively starting the first camera and the second camera of the mobile terminal.

And under the condition that a second input of the user for selecting the prompt box is received, starting a front camera of the mobile terminal to carry out video call.

In the optional embodiment, the user is prompted to open the multi-channel video mode by outputting the selection prompt box, and the user can select whether to open the multi-channel video mode according to actual requirements, so that the video call experience of the user is improved.

In an optional embodiment, the video images are extracted from the first buffer area and the second buffer area respectively, and the video image synthesis is performed in the following manner:

firstly, extracting a first video image with shortest storage time from a first cache region; extracting a second video image with the shortest storage time from a second cache region;

the video image with the shortest storage time is extracted from the buffer area, and the timeliness of the extracted video image can be ensured.

And secondly, zooming and splicing the first video image and the second video image to generate a composite video image with preset resolution.

The preset resolution is the resolution of a single frame video image.

When two video images are zoomed and spliced, the zooming proportion and the splicing mode of each frame of video image can be set by a user according to actual requirements, and the resolution of the spliced composite video image can be ensured to be the preset resolution.

The two frames of video images are zoomed and spliced to generate a frame of synthesized video image with preset resolution, so that the flow consumption in the video call process can be kept without extra increase.

In an optional embodiment, before the first video image and the second video image are scaled and spliced to generate a composite video image with a preset resolution, the video call method according to the embodiment of the present invention may further include the following steps:

first, outputting a plurality of synthesized video image patterns;

the synthesized video image pattern is used to limit the display mode of the synthesized first video image and the synthesized second video image, and the synthesized video image pattern may include but is not limited to: the mode comprises the modes of left-right evenly distributed display, up-down evenly distributed display, display of the first video image which is reduced and then superposed at a certain preset position of the second video image, display of the second video image which is reduced and then superposed at a certain preset position of the first video image, and the like.

Secondly, receiving the selection operation of the user on the target synthesized video image style.

Each composite video image pattern may correspond to a template, and a user may select its corresponding target composite video image pattern by selecting a target template.

The optional mode provides a plurality of synthesized video image styles for the user to select, and can meet the personalized requirements of the user.

In an optional embodiment, after the synthesized video data stream is encoded, the synthesized video data stream is sent to a video networking server through video networking in the following manner:

firstly, reading a composite video image according to a preset frame rate;

the preset frame rate can be set by a person skilled in the art according to actual requirements, and is not particularly limited in the embodiment of the present invention.

And secondly, coding the read synthesized video image according to a preset coding mode, and then sending the coded synthesized video image to a video network server through a video network.

When the composite video image is encoded, it can be encoded using H264. H.264 is a highly compressed digital video codec standard proposed by the joint video team consisting of the ITU-T video coding experts group and the ISO/IEC moving picture experts group jointly. This standard is commonly referred to as H.264/AVC (or AVC/H.264 or H.264/MPEG-4AVC or MPEG-4/H.264AVC) and specifies the developers of both aspects.

In the actual implementation process, the coded composite video image can be sent to the video network server through the V2V video network, and forwarded to the video network peer device by the video network server.

By means of the structural safety of the V2V video networking technology, data cannot be threatened by hidden dangers such as viruses, hackers, illegal insertion and the like. Meanwhile, all data packets are not broadcasted in the whole network in the sending and transmission process, each packet is not opened and read on each video network server or router, only an independent channel is established between a required point and a point transmission, and only the video network server opens and reads the first data packet, so that the network safety is ensured.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 3, a block diagram of a video call device according to an embodiment of the present invention is shown, where the device may be applied to a video network mobile terminal, and specifically may include the following modules:

the starting module 301 is configured to respectively start a first camera and a second camera of the mobile terminal after the video call connection is successfully established;

a first calling module 302, configured to call the first camera to acquire a first video data stream and call the second camera to acquire a second video data stream at the same time;

a storage module 303, configured to cache the first video data stream in a first cache region, and cache the second video data stream in a second cache region;

a synthesizing module 304, configured to extract video images from the first buffer area and the second buffer area, respectively, and perform video image synthesis to combine the first video data stream and the second video data stream into a video data stream;

and an encoding module 305, configured to encode the synthesized video data stream, send the encoded video data stream to a video networking server through a video networking, and send the video networking server to a video networking peer device through the video networking server.

In a preferred embodiment of the present invention, the apparatus further comprises the following modules:

In a preferred embodiment of the invention, the synthesis module comprises the following sub-modules:

In a preferred embodiment of the invention, the synthesis module further comprises the following sub-modules:

In a preferred embodiment of the present invention, the encoding module comprises the following sub-modules:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The video networking is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces each other.

The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer.

To better understand the embodiments of the present invention, the following description refers to the internet of view:

some of the technologies applied in the video networking are as follows:

network Technology (Network Technology)

Network technology innovation in video networking has improved over traditional Ethernet (Ethernet) to face the potentially enormous video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network Circuit Switching (Circuit Switching), the Packet Switching is adopted by the technology of the video networking to meet the Streaming requirement. The video networking technology has the advantages of flexibility, simplicity and low price of packet switching, and simultaneously has the quality and safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.

Switching Technology (Switching Technology)

The video network adopts two advantages of asynchronism and packet switching of the Ethernet, eliminates the defects of the Ethernet on the premise of full compatibility, has end-to-end seamless connection of the whole network, is directly communicated with a user terminal, and directly bears an IP data packet. The user data does not require any format conversion across the entire network. The video networking is a higher-level form of the Ethernet, is a real-time exchange platform, can realize the real-time transmission of the whole-network large-scale high-definition video which cannot be realized by the existing Internet, and pushes a plurality of network video applications to high-definition and unification.

Server Technology (Server Technology)

The server technology on the video networking and unified video platform is different from the traditional server, the streaming media transmission of the video networking and unified video platform is established on the basis of connection orientation, the data processing capacity of the video networking and unified video platform is independent of flow and communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than one hundred times compared with that of a traditional server.

Storage Technology (Storage Technology)

The super-high speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the media content with super-large capacity and super-large flow, the program information in the server instruction is mapped to the specific hard disk space, the media content is not passed through the server any more, and is directly sent to the user terminal instantly, and the general waiting time of the user is less than 0.2 second. The optimized sector distribution greatly reduces the mechanical motion of the magnetic head track seeking of the hard disk, the resource consumption only accounts for 20% of that of the IP internet of the same grade, but concurrent flow which is 3 times larger than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.

Network Security Technology (Network Security Technology)

The structural design of the video network completely eliminates the network security problem troubling the internet structurally by the modes of independent service permission control each time, complete isolation of equipment and user data and the like, generally does not need antivirus programs and firewalls, avoids the attack of hackers and viruses, and provides a structural carefree security network for users.

Service Innovation Technology (Service Innovation Technology)

The unified video platform integrates services and transmission, and is not only automatically connected once whether a single user, a private network user or a network aggregate. The user terminal, the set-top box or the PC are directly connected to the unified video platform to obtain various multimedia video services in various forms. The unified video platform adopts a menu type configuration table mode to replace the traditional complex application programming, can realize complex application by using very few codes, and realizes infinite new service innovation.

Networking of the video network is as follows:

the video network is a centralized control network structure, and the network can be a tree network, a star network, a ring network and the like, but on the basis of the centralized control node, the whole network is controlled by the centralized control node in the network.

As shown in fig. 4, the video network is divided into an access network and a metropolitan network.

The devices of the access network part can be mainly classified into 3 types: node server, access switch, terminal (including various set-top boxes, coding boards, memories, etc.). The node server is connected to an access switch, which may be connected to a plurality of terminals and may be connected to an ethernet network.

The node server is a node which plays a centralized control function in the access network and can control the access switch and the terminal. The node server can be directly connected with the access switch or directly connected with the terminal.

Similarly, devices of the metropolitan network portion may also be classified into 3 types: a metropolitan area server, a node switch and a node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.

The node server is a node server of the access network part, namely the node server belongs to both the access network part and the metropolitan area network part.

The metropolitan area server is a node which plays a centralized control function in the metropolitan area network and can control a node switch and a node server. The metropolitan area server can be directly connected with the node switch or directly connected with the node server.

Therefore, the whole video network is a network structure with layered centralized control, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star and ring.

The access network part can form a unified video platform (the part in the dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform may be interconnected via metropolitan area and wide area video networking.

Video networking device classification

1.1 devices in the video network of the embodiment of the present invention can be mainly classified into 3 types: servers, switches (including ethernet gateways), terminals (including various set-top boxes, code boards, memories, etc.). The video network as a whole can be divided into a metropolitan area network (or national network, global network, etc.) and an access network.

1.2 wherein the devices of the access network part can be mainly classified into 3 types: node servers, access switches (including ethernet gateways), terminals (including various set-top boxes, code boards, memories, etc.).

The specific hardware structure of each access network device is as follows:

a node server:

as shown in fig. 5, the system mainly includes a network interface module 501, a switching engine module 502, a CPU module 503, and a disk array module 504;

the network interface module 501, the CPU module 503 and the disk array module 504 all enter the switching engine module 502; the switching engine module 502 performs an operation of looking up the address table 505 on the incoming packet, thereby obtaining the direction information of the packet; and stores the packet in a corresponding queue of the packet buffer 506 based on the packet's steering information; if the queue of the packet buffer 506 is nearly full, it is discarded; the switching engine module 502 polls all packet buffer queues for forwarding if the following conditions are met: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero. The disk array module 504 mainly implements control over the hard disk, including initialization, read-write, and other operations of the hard disk; the CPU module 503 is mainly responsible for protocol processing with an access switch and a terminal (not shown in the figure), configuring an address table 505 (including a downlink protocol packet address table, an uplink protocol packet address table, and a data packet address table), and configuring the disk array module 504.

The access switch:

as shown in fig. 6, the network interface module (downlink network interface module 601, uplink network interface module 602), switching engine module 603, and CPU module 604 are mainly included;

wherein, the packet (uplink data) coming from the downlink network interface module 601 enters the packet detection module 605; the packet detection module 605 detects whether the Destination Address (DA), the Source Address (SA), the type of the packet, and the length of the packet meet the requirements, if so, allocates a corresponding stream identifier (stream-id) and enters the switching engine module 603, otherwise, discards the stream identifier; the packet (downstream data) coming from the upstream network interface module 602 enters the switching engine module 603; the incoming data packet from the CPU module 604 enters the switching engine module 603; the switching engine module 603 performs an operation of looking up the address table 606 on the incoming packet, thereby obtaining the direction information of the packet; if the packet entering the switching engine module 603 is from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 607 in association with the stream-id; if the queue of the packet buffer 607 is close to full, it is discarded; if the packet entering the switching engine module 603 is not from the downlink network interface to the uplink network interface, the packet is stored in the queue of the corresponding packet buffer 607 according to the packet guiding information; if the queue of the packet buffer 607 is close to full, it is discarded.

The switching engine module 603 polls all packet buffer queues, which in this embodiment of the present invention is divided into two cases:

if the queue is from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queued packet counter is greater than zero; 3) obtaining a token generated by a code rate control module;

if the queue is not from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero.

The rate control module 608 is configured by the CPU module 604 and generates tokens for packet buffer queues going to the upstream network interface from all downstream network interfaces at programmable intervals to control the rate of upstream forwarding.

The CPU module 604 is mainly responsible for protocol processing with the node server, configuration of the address table 606, and configuration of the code rate control module 608.

Ethernet protocol conversion gateway：

As shown in fig. 7, the apparatus mainly includes a network interface module (a downlink network interface module 701, an uplink network interface module 702), a switching engine module 703, a CPU module 704, a packet detection module 705, a rate control module 708, an address table 706, a packet buffer 707, a MAC adding module 709, and a MAC deleting module 710.

Wherein, the data packet coming from the downlink network interface module 701 enters the packet detection module 705; the packet detection module 705 detects whether the ethernet MAC DA, the ethernet MAC SA, the ethernet length or frame type, the video network destination address DA, the video network source address SA, the video network packet type and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC deleting module 710 subtracts MAC DA, MAC SA, length or frame type (2byte), and enters the corresponding receiving buffer, otherwise, discards;

the downlink network interface module 701 detects the sending buffer of the port, and if a packet exists, the downlink network interface module learns the ethernet MAC DA of the corresponding terminal according to the destination address DA of the packet, adds the ethernet MAC DA of the terminal, the MAC SA of the ethernet protocol gateway, and the ethernet length or frame type, and sends the packet.

The other modules in the ethernet protocol gateway function similarly to the access switch.

A terminal:

the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set-top box mainly comprises a network interface module, a video and audio coding and decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.

1.3 devices of the metropolitan area network part can be mainly classified into 2 types: node server, node exchanger, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.

2. Video networking packet definition

2.1 Access network packet definition

The data packet of the access network mainly comprises the following parts: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.

As shown in the following table, the data packet of the access network mainly includes the following parts:

DA

SA

Reserved

Payload

CRC

wherein:

the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of the data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), there are 256 possibilities at most, the second byte to the sixth byte are metropolitan area network addresses, and the seventh byte and the eighth byte are access network addresses;

the Source Address (SA) is also composed of 8 bytes (byte), defined as the same as the Destination Address (DA);

the reserved byte consists of 2 bytes;

the payload part has different lengths according to different types of datagrams, and is 64 bytes if the datagram is various types of protocol packets, and is 32+1024 or 1056 bytes if the datagram is a unicast packet, of course, the length is not limited to the above 2 types;

the CRC consists of 4 bytes and is calculated in accordance with the standard ethernet CRC algorithm.

2.2 metropolitan area network packet definition

The topology of a metropolitan area network is a graph and there may be 2, or even more than 2, connections between two devices, i.e., there may be more than 2 connections between a node switch and a node server, a node switch and a node switch, and a node switch and a node server. However, the metro network address of the metro network device is unique, and in order to accurately describe the connection relationship between the metro network devices, parameters are introduced in the embodiment of the present invention: a label to uniquely describe a metropolitan area network device.

In this specification, the definition of the Label is similar to that of the Label of MPLS (Multi-Protocol Label Switch), and assuming that there are two connections between the device a and the device B, there are 2 labels for the packet from the device a to the device B, and 2 labels for the packet from the device B to the device a. The label is classified into an incoming label and an outgoing label, and assuming that the label (incoming label) of the packet entering the device a is 0x0000, the label (outgoing label) of the packet leaving the device a may become 0x 0001. The network access process of the metro network is a network access process under centralized control, that is, address allocation and label allocation of the metro network are both dominated by the metro server, and the node switch and the node server are both passively executed, which is different from label allocation of MPLS, and label allocation of MPLS is a result of mutual negotiation between the switch and the server.

As shown in the following table, the data packet of the metro network mainly includes the following parts:

DA

SA

Reserved

label (R)

Payload

CRC

Namely Destination Address (DA), Source Address (SA), Reserved byte (Reserved), tag, payload (pdu), CRC. The format of the tag may be defined by reference to the following: the tag is 32 bits with the upper 16 bits reserved and only the lower 16 bits used, and its position is between the reserved bytes and payload of the packet.

Based on the above characteristics of the video network, one of the core concepts of the embodiments of the present invention is proposed, and a video stream synthesized by a video network mobile terminal is transmitted to an opposite terminal device by a video network server in compliance with a protocol of the video network.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The video call method and the video call device provided by the invention are described in detail, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A video call method is applied to a video network mobile terminal, and the method comprises the following steps:

2. The method of claim 1, wherein prior to the step of separately turning on the first camera and the second camera of the mobile terminal, the method further comprises:

3. The method of claim 1, wherein prior to the step of separately turning on the first camera and the second camera of the mobile terminal, the method further comprises:

4. The method according to claim 1, wherein the step of extracting the video images from the first buffer area and the second buffer area respectively for video image composition comprises:

5. The method according to claim 4, wherein before the step of scaling and stitching the first video image and the second video image to generate the composite video image with the preset resolution, the method further comprises:

outputting a plurality of composite video image patterns;

6. The method of claim 1, wherein the step of encoding the synthesized video data stream and sending the encoded video data stream to a video networking server via a video networking comprises:

reading a synthesized video image according to a preset frame rate;

7. A video call device is applied to a video network mobile terminal, and the device comprises:

8. The apparatus of claim 7, further comprising:

9. The apparatus of claim 7, further comprising:

10. The apparatus of claim 7, wherein the synthesis module comprises:

11. The apparatus of claim 10, wherein the synthesis module further comprises:

12. The apparatus of claim 7, wherein the encoding module comprises:

13. A video call apparatus, comprising:

one or more processors; and

one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the video call method of any of claims 1-6.

14. A computer-readable storage medium storing a computer program for causing a processor to execute the video call method according to any one of claims 1 to 6.