CN110958419B

CN110958419B - Video networking conference processing method and device, electronic equipment and storage medium

Info

Publication number: CN110958419B
Application number: CN201911082979.0A
Authority: CN
Inventors: 钟文亮; 王艳辉; 袁占涛; 赫洁
Original assignee: Visionvera Information Technology Co Ltd
Current assignee: Visionvera Information Technology Co Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2022-02-18
Anticipated expiration: 2039-11-07
Also published as: CN110958419A

Abstract

The application discloses a video networking conference processing method, a video networking conference processing device, electronic equipment and a storage medium, wherein the method is applied to a video networking terminal, the video networking terminal is in communication connection with a video networking conference management system, and the video networking conference management system is in communication connection with a Pamil client, and the method comprises the following steps: receiving a first conference mode signaling sent by a video networking conference management system in a video networking conference in which a video networking terminal participates currently; responding to the first conference mode signaling, and identifying the voice of the participant user corresponding to the video network terminal from a plurality of voices in a preset distance range; generating a speaking instruction when the voice of the participating user is recognized; and sending a speaking instruction to the Pamil client through the video networking conference management system so as to enable the Pamil client to switch the video networking terminal into a speaking party in the video networking conference. The method and the device greatly reduce the silent process in the video networking conference and improve the efficiency of the conference participating terminal.

Description

Video networking conference processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for processing a video networking conference, an electronic device, and a storage medium.

Background

The video networking is an important milestone for network development, can realize the real-time transmission of full-network high-definition videos which cannot be realized by the existing Internet, is a high-grade form of the Ethernet, can realize the real-time transmission of the high-definition videos, and pushes a plurality of Internet applications to high-definition video, and high-definition faces.

At present, as the video networking has the advantages of high transmission speed and high-definition video, online transmission of a large amount of videos can be supported, and more clients use the video networking to carry out video conferences. At present, when a video conference is carried out by utilizing a video network, a chairman party is arranged, the chairman party carries out speaker management of the video conference, when a participant wants to speak, a speaker application needs to be carried out through a special remote controller, and the chairman party can switch a terminal into a speaker. Therefore, when the remote control signal of the remote controller is not good, the participant often spends a long time in making a speaking application, so that the speaking process is very complicated. The video networking conference is generally a multi-region remote cooperative conference, the very complicated application and speaking process reduces the interaction efficiency between participants, and a plurality of silent conference processes are often generated in the video networking conference, so that the conference effect of the video networking conference is reduced.

Disclosure of Invention

The application provides a video networking conference processing method, a video networking conference processing device, electronic equipment and a storage medium so as to solve the existing technical problems.

In a first aspect, an embodiment of the present application provides a video networking conference processing method, where the method is applied to a video networking terminal, the video networking terminal is in communication connection with a video networking conference management system, and the video networking conference management system is in communication connection with a pamil client, and the method includes:

receiving a first conference mode signaling sent by the video networking conference management system in a video networking conference currently participated in by the video networking terminal; the first conference mode signaling is generated by the Pamier client when detecting that a preset conference mode of the video networking conference is started and is sent to the video networking conference management system;

responding to the first conference mode signaling, and recognizing the voice of the conference participating user corresponding to the video network terminal from a plurality of voices within a preset distance range;

generating a speaking instruction when the voice of the participating user is recognized;

and sending the speaking instruction to the video networking conference management system, wherein the video networking conference management system is used for sending the speaking instruction to the Pamil client, and the Pamil client is used for responding to the speaking instruction and switching the video networking terminal into a speaking party in the video networking conference.

Optionally, the video networking conference management system is further communicatively connected with a plurality of conference terminals participating in the video networking conference, and the video networking terminal and the plurality of conference terminals are communicatively connected with a video networking server; while recognizing the voice of the participating user and generating a speaking instruction, the method further comprises the following steps:

recording the recognized voice of the participant user into first audio data, and locally storing the first audio data;

after sending the talk instruction to the video networking conference management system, the method further comprises:

receiving a switching speech reply signaling sent by the video networking conference management system; the switching speech reply signaling is generated by the pamier client when the video networking terminal is switched to a speech party in the video networking conference and is sent to the video networking conference management system;

and responding to the switching speech reply signaling, and sending the first audio data stored locally to the video networking server, wherein the video networking server is used for sending the first audio data to the plurality of conference terminals respectively.

Optionally, after the speaking instruction is sent to the video networking conference management system, the method further includes:

receiving a second conference mode signaling sent by the video networking conference management system, wherein the second conference mode signaling is generated by the pamier client when the preset conference mode is detected to be closed;

in response to the second conference mode signaling, stopping recognizing the speech of the participating user.

Optionally, in response to the switching floor reply signaling, sending the locally stored first audio data to the video networking server, including:

responding to the switching speech reply signaling, acquiring the first audio data from a local storage, and recording the current voice of the participating user as second audio data;

synthesizing the first audio data and the second audio data into mixed audio data;

and sending the mixed audio data to the video networking server, wherein the video networking server is used for sending the mixed audio data to the plurality of conference terminals respectively.

In a second aspect, an embodiment of the present application provides another video networking conference processing method, where the method is applied to a pamil client, the pamil client is communicatively connected to a video networking conference management system, and the video networking conference management system is communicatively connected to a video networking terminal, and the method includes:

in a current video networking conference, generating a first conference mode signaling when detecting that a preset conference mode of the video networking conference is started;

sending the first conference mode signaling to the video networking conference management system, wherein the video networking conference management system is used for sending the first conference mode signaling to the video networking terminal; the video networking terminal is used for responding to the first conference mode signaling and identifying the voice of the conference participating user corresponding to the video networking terminal from a plurality of voices in a preset distance range;

receiving a speaking instruction sent by the video networking conference management system; the speaking instruction is generated by the video networking terminal when the voice of the participating user is recognized, and is sent to the video networking conference management system;

and responding to the speaking instruction, and switching the video networking terminal to be a speaking party in the video networking conference.

Optionally, the video networking conference management system is further in communication connection with a plurality of conference terminals participating in the video networking conference; responding to the speaking instruction, switching the video network terminal to be a speaking party in the video network conference, and the method comprises the following steps:

when the video networking terminal is determined not to be the current speaking terminal in the video networking conference, adding the identification of the video networking terminal into a preset speaking terminal list; at least one conference terminal identifier is prestored in the preset speaking terminal list, the at least one conference terminal identifier is an identifier of a conference terminal requesting to speak in the plurality of conference terminals, and the at least one conference terminal identifier is arranged according to the time sequence of requesting to speak;

and according to the time sequence of the request for speaking, after the conference terminal corresponding to the at least one conference terminal identification is sequentially switched to be the speaking party in the video networking conference, the video networking terminal is switched to be the speaking party in the video networking conference.

Optionally, after the video network terminal is switched to a speaking party in the video network conference, the method further includes:

when detecting that a preset conference mode of the video networking conference is closed, generating a second conference mode signaling, and clearing the preset speaking terminal list;

and sending the second conference mode signaling to the video networking conference management system, wherein the video networking conference management system is used for sending the second conference mode signaling to the video networking terminal, and the video networking terminal is used for responding to the second conference mode signaling and stopping recognizing the voices of the participating users.

In a third aspect, an embodiment of the present application provides a video networking conference processing apparatus to ensure that the video networking conference processing method according to the first aspect is implemented, where the apparatus is applied to a video networking terminal, the video networking terminal is in communication connection with a video networking conference management system, the video networking conference management system is in communication connection with a pamier client, and the apparatus includes:

the first conference mode signaling receiving module is used for receiving a first conference mode signaling sent by the video networking conference management system in a video networking conference currently participated by the video networking terminal; the first conference mode signaling is generated by the Pamier client when detecting that a preset conference mode of the video networking conference is started and is sent to the video networking conference management system;

the voice recognition module is used for responding to the first conference mode signaling and recognizing the voice of the conference participating user corresponding to the video network terminal from a plurality of voices in a preset distance range;

the speaking instruction generating module is used for generating a speaking instruction when the voice of the participating user is recognized;

and the speech instruction sending module is used for sending the speech instruction to the video networking conference management system, the video networking conference management system is used for sending the speech instruction to the pamil client, and the pamil client is used for responding to the speech instruction and switching the video networking terminal into a speech party in the video networking conference.

In a fourth aspect, an embodiment of the present application provides another video networking conference processing apparatus, so as to ensure that the apparatus is applied to a pamil client in an implementation of the video networking conference processing method according to the second aspect, where the pamil client is in communication connection with a video networking conference management system, and the video networking conference management system is in communication connection with a video networking terminal, where the apparatus includes:

the device comprises a first conference mode signaling generation module, a second conference mode signaling generation module and a first conference mode signaling generation module, wherein the first conference mode signaling generation module is used for generating a first conference mode signaling when detecting that a preset conference mode of a video networking conference is started in the current video networking conference;

a first conference mode signaling sending module, configured to send the first conference mode signaling to the video networking conference management system, where the video networking conference management system is configured to send the first conference mode signaling to the video networking terminal; the video networking terminal is used for responding to the first conference mode signaling and identifying the voice of the conference participating user corresponding to the video networking terminal from a plurality of voices in a preset distance range;

the speaking instruction receiving module is used for receiving a speaking instruction sent by the video networking conference management system; the speaking instruction is generated by the video networking terminal when the voice of the participating user is recognized, and is sent to the video networking conference management system;

and the speaking party switching module is used for responding to the speaking instruction and switching the video networking terminal into a speaking party in the video networking conference.

In a fifth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor is configured to: the video networking conference processing method of the first aspect or the second aspect is performed.

In a sixth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor on a server side, enable the server to perform the video networking conference processing method according to the first aspect or the second aspect.

Compared with the prior art, the embodiment of the application has the following advantages:

according to the embodiment of the application, when the preset conference mode of the video networking conference is started, the voices of the conference participating users corresponding to the video networking terminals are identified from a plurality of voices within the preset distance range, the speaking instruction is automatically generated when the voices of the conference participating users are identified, and the speaking instruction is sent to the Pamil client side, so that the Pamil client side can switch the video networking terminals into speaking parties in the video networking conference. Because the video network terminal can automatically generate the speaking instruction when recognizing the voice of the participant in the preset conference mode and send the speaking instruction to the video network conference management system, the participant can directly speak in the video network conference and can be switched to the speaking party in the conference, the conference operation process is reduced, the silent process in the video network conference is greatly reduced, and the efficiency of the participant terminal is improved.

Drawings

FIG. 1 is a networking schematic of a video network of the present application;

FIG. 2 is a schematic diagram of a hardware architecture of a node server according to the present application;

fig. 3 is a schematic diagram of a hardware architecture of an access switch of the present application;

fig. 4 is a schematic diagram of a hardware structure of an ethernet protocol conversion gateway according to the present application;

fig. 5 is a communication environment diagram of a video networking conference processing method according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating steps of a method for video conferencing in accordance with an embodiment of the present invention;

FIG. 7 is a flowchart illustrating steps in another video conferencing processing method according to an embodiment of the present application;

FIG. 8 is a schematic view of a complete flow chart of the video networking conference processing method in one embodiment of the present application;

fig. 9 is a block diagram of a video network conference processing apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of another video network conference processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic frame diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

Before describing the embodiments of the present application, a detailed description of the video network described in the present application is first provided.

The video networking is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces each other.

The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer.

To better understand the embodiments of the present application, the following description refers to the internet of view:

some of the technologies applied in the video networking are as follows:

network technology (network technology)

Network technology innovation in video networking has improved the traditional Ethernet (Ethernet) to face the potentially huge first video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network Circuit Switching (Circuit Switching), the Packet Switching is adopted by the technology of the video networking to meet the Streaming requirement. The video networking technology has the advantages of flexibility, simplicity and low price of packet switching, and simultaneously has the quality and safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.

Switching Technology (Switching Technology)

The video network adopts two advantages of asynchronism and packet switching of the Ethernet, eliminates the defects of the Ethernet on the premise of full compatibility, has end-to-end seamless connection of the whole network, is directly communicated with a user terminal, and directly bears an IP data packet. The user data does not require any format conversion across the entire network. The video networking is a higher-level form of the Ethernet, is a real-time exchange platform, can realize the real-time transmission of the whole-network large-scale high-definition video which cannot be realized by the existing Internet, and pushes a plurality of network video applications to high-definition and unification.

Server technology (Servertechnology)

The server technology on the video networking and unified video platform is different from the traditional server, the streaming media transmission of the video networking and unified video platform is established on the basis of connection orientation, the data processing capacity of the video networking and unified video platform is independent of flow and communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than one hundred times compared with that of a traditional server.

Storage Technology (Storage Technology)

The super-high speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the media content with super-large capacity and super-large flow, the program information in the server instruction is mapped to the specific hard disk space, the media content is not passed through the server any more, and is directly sent to the user terminal instantly, and the general waiting time of the user is less than 0.2 second. The optimized sector distribution greatly reduces the mechanical motion of the magnetic head track seeking of the hard disk, the resource consumption only accounts for 20% of that of the IP internet of the same grade, but concurrent flow which is 3 times larger than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.

Network Security Technology (Network Security Technology)

The structural design of the video network completely eliminates the network security problem troubling the internet structurally by the modes of independent service permission control each time, complete isolation of equipment and user data and the like, generally does not need antivirus programs and firewalls, avoids the attack of hackers and viruses, and provides a structural carefree security network for users.

Service Innovation Technology (Service Innovation Technology)

The unified video platform integrates services and transmission, and is not only automatically connected once whether a single user, a private network user or a network aggregate. The user terminal, the set-top box or the PC are directly connected to the unified video platform to obtain various multimedia video services in various forms. The unified video platform adopts a menu type configuration table mode to replace the traditional complex application programming, can realize complex application by using very few codes, and realizes infinite new service innovation.

Networking of the video network is as follows:

the video network is a centralized control network structure, and the network can be a tree network, a star network, a ring network and the like, but on the basis of the centralized control node, the whole network is controlled by the centralized control node in the network.

As shown in fig. 1, the video network is divided into an access network and a metropolitan network.

The devices of the access network part can be mainly classified into 3 types: node server, access switch, terminal (including various set-top boxes, coding boards, memories, etc.). The node server is connected to an access switch, which may be connected to a plurality of terminals and may be connected to an ethernet network.

The node server is a node which plays a centralized control function in the access network and can control the access switch and the terminal. The node server can be directly connected with the access switch or directly connected with the terminal.

Similarly, devices of the metropolitan network portion may also be classified into 3 types: a metropolitan area server, a node switch and a node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.

The node server is a node server of the access network part, namely the node server belongs to both the access network part and the metropolitan area network part.

The metropolitan area server is a node which plays a centralized control function in the metropolitan area network and can control a node switch and a node server. The metropolitan area server can be directly connected with the node switch or directly connected with the node server.

Therefore, the whole video network is a network structure with layered centralized control, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star and ring.

The access network part can form a unified video platform (the part in the dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform may be interconnected via metropolitan area and wide area video networking.

Video networking device classification

1.1 devices in the video network of the embodiment of the present application can be mainly classified into 3 types: server, exchanger (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, code board, memory, etc.). The video network as a whole can be divided into a metropolitan area network (or national network, global network, etc.) and an access network.

1.2 wherein the devices of the access network part can be mainly classified into 3 types: node server, access exchanger (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, coding board, memory, etc.).

The specific hardware structure of each access network device is as follows:

a node server:

as shown in fig. 2, the system mainly includes a network interface module 201, a switching engine module 202, a CPU module 203, and a disk array module 204;

the network interface module 201, the CPU module 203, and the disk array module 204 all enter the switching engine module 202; the switching engine module 202 performs an operation of looking up the address table 205 on the incoming packet, thereby obtaining the direction information of the packet; and stores the packet in a queue of the corresponding packet buffer 206 based on the packet's steering information; if the queue of the packet buffer 206 is nearly full, it is discarded; the switching engine module 202 polls all packet buffer queues for forwarding if the following conditions are met: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero. The disk array module 204 mainly implements control over the hard disk, including initialization, read-write, and other operations on the hard disk; the CPU module 203 is mainly responsible for protocol processing with an access switch and a terminal (not shown in the figure), configuring an address table 205 (including a downlink protocol packet address table, an uplink protocol packet address table, and a data packet address table), and configuring the disk array module 204.

The access switch:

as shown in fig. 3, the network interface module mainly includes a network interface module (a downlink network interface module 301 and an uplink network interface module 302), a switching engine module 303 and a CPU module 304;

wherein, the packet (uplink data) coming from the downlink network interface module 301 enters the packet detection module 305; the packet detection module 305 detects whether the Destination Address (DA), the Source Address (SA), the packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id) and enters the switching engine module 303, otherwise, discards the stream identifier; the packet (downstream data) coming from the upstream network interface module 302 enters the switching engine module 303; the incoming data packet of the CPU module 304 enters the switching engine module 303; the switching engine module 303 performs an operation of looking up the address table 306 on the incoming packet, thereby obtaining the direction information of the packet; if the packet entering the switching engine module 303 is from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 307 in association with the stream-id; if the queue of the packet buffer 307 is nearly full, it is discarded; if the packet entering the switching engine module 303 is not from the downlink network interface to the uplink network interface, the data packet is stored in the queue of the corresponding packet buffer 307 according to the guiding information of the packet; if the queue of the packet buffer 307 is nearly full, it is discarded.

The switching engine module 303 polls all packet buffer queues and may include two cases:

if the queue is from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queued packet counter is greater than zero; 3) obtaining a token generated by a code rate control module;

if the queue is not from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero.

The rate control module 308 is configured by the CPU module 304, and generates tokens for packet buffer queues from all downstream network interfaces to upstream network interfaces at programmable intervals to control the rate of upstream forwarding.

The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306, and configuration of the code rate control module 308.

Ethernet protocol conversion gateway：

As shown in fig. 4, the apparatus mainly includes a network interface module (a downlink network interface module 401 and an uplink network interface module 402), a switching engine module 403, a CPU module 404, a packet detection module 405, a rate control module 408, an address table 406, a packet buffer 407, a MAC adding module 409, and a MAC deleting module 410.

Wherein, the data packet coming from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects whether the ethernet MAC DA, the ethernet MAC SA, the ethernet length or frame type, the video network destination address DA, the video network source address SA, the video network packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC deletion module 410 subtracts MAC DA, MAC SA, length or frame type (2byte) and enters the corresponding receiving buffer, otherwise, discards it;

the downlink network interface module 401 detects the sending buffer of the port, and if there is a packet, obtains the ethernet MAC DA of the corresponding terminal according to the destination address DA of the packet, adds the ethernet MAC DA of the terminal, the MAC SA of the ethernet protocol gateway, and the ethernet length or frame type, and sends the packet.

The other modules in the ethernet protocol gateway function similarly to the access switch.

A terminal:

the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set-top box mainly comprises a network interface module, a video and audio coding and decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.

1.3 devices of the metropolitan area network part can be mainly classified into 2 types: node server, node exchanger, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.

2. Video networking packet definition

2.1 Access network packet definition

The data packet of the access network mainly comprises the following parts: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.

As shown in the following table, the data packet of the access network mainly includes the following parts:

DA

SA

Reserved

Payload

CRC

wherein:

the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of the data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), there are 256 possibilities at most, the second byte to the sixth byte are metropolitan area network addresses, and the seventh byte and the eighth byte are access network addresses;

the Source Address (SA) is also composed of 8 bytes (byte), defined as the same as the Destination Address (DA);

the reserved byte consists of 2 bytes;

the payload part has different lengths according to different types of datagrams, and is 64 bytes if the datagram is various types of protocol packets, and is 32+1024 or 1056 bytes if the datagram is a unicast packet, of course, the length is not limited to the above 2 types;

the CRC consists of 4 bytes and is calculated in accordance with the standard ethernet CRC algorithm.

2.2 metropolitan area network packet definition

The topology of a metropolitan area network is a graph and there may be 2, or even more than 2, connections between two devices, i.e., there may be more than 2 connections between a node switch and a node server, a node switch and a node switch, and a node switch and a node server. However, the metro network address of the metro network device is unique, and in order to accurately describe the connection relationship between the metro network devices, parameters are introduced in the embodiment of the present application: a label to uniquely describe a metropolitan area network device.

In this specification, the definition of the Label is similar to that of the Label of MPLS (Multi-Protocol Label Switch), and assuming that there are two connections between the device a and the device B, there are 2 labels for the packet from the device a to the device B, and 2 labels for the packet from the device B to the device a. The label is classified into an incoming label and an outgoing label, and assuming that the label (incoming label) of the packet entering the device a is 0x0000, the label (outgoing label) of the packet leaving the device a may become 0x 0001. The network access process of the metro network is a network access process under centralized control, that is, address allocation and label allocation of the metro network are both dominated by the metro server, and the node switch and the node server are both passively executed, which is different from label allocation of MPLS, and label allocation of MPLS is a result of mutual negotiation between the switch and the server.

As shown in the following table, the data packet of the metro network mainly includes the following parts:

DA

SA

Reserved

label (R)

Payload

CRC

Namely Destination Address (DA), Source Address (SA), Reserved byte (Reserved), tag, payload (pdu), CRC. The format of the tag may be defined by reference to the following: the tag is 32 bits with the upper 16 bits reserved and only the lower 16 bits used, and its position is between the reserved bytes and payload of the packet.

In combination with the characteristics of the video networking, one of the core ideas of the embodiments of the present application is to recognize voices of conference participating users corresponding to the video networking terminals from a plurality of voices within a preset distance range when a preset conference mode of the video networking conference is turned on, and automatically generate a speech instruction and send the speech instruction to a pamil client when the voices of the conference participating users are recognized, so that the pamil client switches the video networking terminals to speech parties in the video networking conference.

Referring to fig. 5, a communication environment diagram of an embodiment of a video networking conference processing method according to the present application is shown, and as shown in fig. 5, the communication environment diagram includes a video networking conference management system and a pamier client, where the video networking conference management system is in communication connection with the pamier client, and the video networking conference management system is in communication connection with a video networking terminal, where the video networking conference management system may also be in communication connection with a plurality of conference terminals, and the video networking terminal and the plurality of conference terminals are participants of a video networking conference.

The pamier client is a client for controlling the conference of the video network conference in the video network, such as setting the role of a chairman party, switching participant parties, controlling the conference progress, controlling the conference mode and the like. The video networking conference management system refers to a system for performing video networking conference management in video networking, for example, establishing a video networking conference, performing reservation and approval on a video networking conference to be performed, and the like.

The video network terminal can be an aurora series terminal or a kindergarten series terminal developed in the video network, the conference terminal can also be a terminal with the same model as the video network terminal, and other intelligent equipment can also be used, such as a smart phone, a computer and the like.

Referring to fig. 6, a flowchart illustrating steps of an embodiment of a video networking conference processing method according to the present application is shown, where the video networking conference processing method may be applied to a video networking terminal, and as shown in fig. 6, the method may specifically include the following steps:

step S601, receiving a first conference mode signaling sent by the video networking conference management system in the video networking conference currently participated by the video networking terminal.

And the first conference mode signaling is generated by the Pamier client when detecting that a preset conference mode of the video networking conference is started and is sent to the video networking conference management system.

The video networking terminal and the video networking conference management system can communicate through a video networking protocol to receive a first conference mode signaling sent by the video networking conference management system in the video networking.

In practice, to standardize the management of the video conference so that the video conference proceeds in order, the video conference may include: the conference system comprises a free speech mode, a chairman speech mode, a conference learning mode and the like, wherein the free speech mode means that all parties in a conference can speak freely so as to discuss or communicate all works in the conference; the chairman speaking mode means that all parties of the conference do not speak and the chairman speaks so that the chairman can arrange and host the conference, and the conference learning mode means that all parties of the conference do not speak so as to learn the educational video played in the conference.

In the embodiment of the present application, the preset conference mode may be a free speech mode in the video networking conference, and in the conference mode, all parties participating in the conference can speak freely. When the function button of the preset conference mode is set in an open state, the video networking conference enters a free speech mode, and the pamil client generates a first conference mode signaling immediately so as to inform all parties of the video networking conference to enter a free speech stage.

Step S602, in response to the first conference mode signaling, recognizing the voice of the participant user corresponding to the video network terminal from a plurality of voices within a preset distance range.

In practice, the video networking terminal may be configured with an existing voice monitoring device, and in response to the first conference mode signaling, the video networking terminal may analyze the first conference mode signaling, and read out the valid data carried in the first conference mode signaling, so as to start the voice monitoring device according to the valid data, so as to identify multiple voices within a preset distance range, and identify the voice of the participant corresponding to the video networking terminal. The conference participating user corresponding to the video network terminal can be a user who uses the video network terminal to participate currently.

Specifically, the preset distance range refers to a range which is from the video network terminal to the video network terminal by taking the video network terminal as a circle center, for example, the preset distance is 1 meter, the video network terminal needs to recognize a plurality of voices generated within 1 meter from the video network terminal, and the voices can be voices uttered by other users in a conference place where the participating user is located. In practice, the closer to the video networking terminal, the greater the decibel number of the voice monitored by the video networking terminal, and thus, the decibel number of the voice sent by the user participating in the meeting using the video networking terminal may be the largest. Therefore, the video network terminal can adopt the existing voice decibel recognition technology to determine whether the voice with decibels larger than the preset value exists in the plurality of voices, and if the voice exists, the video network terminal indicates that the participant user sends the voice.

In an alternative example, the video networking terminal may utilize the existing voice recognition technology to recognize the voices of the participating users from a plurality of voices, for example, a voice recognition system may be configured on the voice monitoring device, the voices of the participating users are stored in the voice recognition system in advance, and a plurality of voices are input into the voice recognition system to verify whether each voice matches with the pre-stored voices of the participating users, and if so, the voice of the participating users is recognized; if not, the voice of the participating user is not recognized.

Step S603, when recognizing the voice of the participant user, generating a speaking instruction.

In practice, the process of recognizing the voices of the participating users can be a continuous process, and when the participating users do not send out the voices, the voices of the participating users cannot be recognized; when the voice of the participating user is recognized in any time slot in the ongoing process of the video networking conference in the preset conference mode, a speaking instruction is immediately generated. Compared with the prior art, when the participant user wants to speak, the participant user needs to send a signal to the video network terminal through the remote controller, and the video network terminal generates a speaking instruction mode according to the signal type after receiving the signal.

In this embodiment, the utterance instruction may be an instruction conforming to a video networking protocol, and the utterance instruction may include an identifier of a video networking terminal.

And step S604, sending the speaking instruction to the video networking conference management system.

The video networking conference management system is used for sending the speaking instruction to the parmeter client, and the parmeter client is used for responding to the speaking instruction and switching the video networking terminal into a speaking party in the video networking conference.

In practice, the speaking instruction may be sent to the pamier client via the video networking conference management system, so that the pamier client switches the video networking terminal to be the speaking party according to the speaking instruction.

According to the embodiment of the application, the video networking terminal can automatically generate the speaking instruction when recognizing the voice of the participant in the preset conference mode, and sends the speaking instruction to the video networking conference management system, so that the participant can directly speak in the video networking conference and can be switched to be the speaking party in the conference, the conference operation process is reduced, the silent process in the video networking conference is greatly reduced, and the efficiency of the participant terminal is increased.

With reference to the foregoing embodiment, in an optional example, as shown in fig. 5, a communication environment to which the video networking conference processing method is applied may further include a video networking server, where the video networking terminal and the plurality of conference terminals are in communication connection with the video networking server, and the video networking server is a core server in the video networking and may be used to forward video networking data in the plurality of terminals.

Specifically, while recognizing the speech generation instruction of the participating user in step S603, step S603 may further include the following steps:

step S6031, recording the recognized voice of the conference participating user as first audio data, and locally storing the first audio data.

In the embodiment of the application, the participant users can request to switch to the speaker only by sending the voice, so that when the video networking terminal is switched to the speaker to speak, other participants participating in the video networking conference can know the voice sent by the participant users in the process of presetting the conference mode, the voice sent by the participant users is prevented from being omitted, and the integrity of conference information is ensured. In this alternative example, when the voices of the participating users are recognized, the voices of the participating users can be recorded as first audio data and stored locally, so that the first audio data can be sent to other participating parties at a later time when the participating users become speaking parties.

Accordingly, after the speaking instruction is sent to the video networking conference management system in step S604, the method may further include the following steps:

step S605, receiving a speech switching reply signaling sent by the video networking conference management system.

And the switching speech reply signaling is generated by the pamier client when the video networking terminal is switched to a speech party in the video networking conference and is sent to the video networking conference management system.

Step S606, in response to the switching utterance reply signaling, sending the locally stored first audio data to the video networking server, where the video networking server is configured to send the first audio data to the plurality of conference terminals respectively.

And when receiving the switching speech reply signaling, indicating that the video network terminal becomes the current speech party of the video network conference. At this time, the video network terminal may send the previously recorded first audio data to the video network server, so that the video network server sends the first audio data to each conference terminal participating in the video network conference. Each conference terminal can play the first audio data so that other participants can hear the words spoken by the participant users of the video network terminal when applying for speaking.

Optionally, the step S606 may specifically include the following steps:

step S6061, in response to the switching utterance reply signaling, acquires the first audio data from a local storage, and records the current voice of the participating user as second audio data.

In practice, when receiving the switching speech reply signaling, the participating user becomes a speaker and can publish his own opinion in the video networking conference. Therefore, besides acquiring the first audio data from the local storage, the current voice of the participating user can be recorded to obtain the second audio data.

Step S6062, the first audio data and the second audio data are synthesized into mixed audio data.

The first audio data and the second audio data are audio data recorded in two time periods and have time sequence, so that the first audio data and the second audio data can be synthesized into one path of audio data, the one path of audio data is mixed audio data, and the mixed audio data can be decoded and played separately based on the time sequence.

And step S6063, sending the mixed audio data to the video network server.

The video network server is used for respectively sending the mixed audio data to the plurality of conference terminals.

When each conference terminal receives the mixed audio data, the mixed audio data can be played so as to obtain the voice sent by the participator user of the video network terminal when applying for speaking and the voice sent when speaking currently.

When the technical scheme is adopted, the voice of the participating users when applying for speaking is recorded, so that the voice sent by the participating users is prevented from being omitted, and the integrity of conference information is ensured.

In combination with the foregoing embodiment, in an optional example, after the talk burst instruction is sent to the video networking conference management system in step S604, the method may further include the following steps:

step S607, receiving a second conference mode signaling sent by the video networking conference management system.

Wherein the second conference mode signaling is generated by the pamil client upon detecting that the preset conference mode is turned off.

Step S608, in response to the second conference mode signaling, stopping recognizing the voice of the participating user.

When the preset conference mode is closed, the video network conference does not need the participants to speak freely, at this time, the video network terminal receives the second conference mode signaling, can analyze the second conference mode signaling, reads the effective data carried in the second conference mode signaling, and then closes the voice monitoring device which is opened before according to the effective data so as to stop recognizing the voices of the participants and maintain the normal operation of the video network conference.

Referring to fig. 7, a flowchart illustrating steps of another video networking conference processing method according to an embodiment of the present application is shown, where the video networking conference processing method may be specifically used in a pamier client, and specifically, a communication environment diagram of the video networking conference processing method may be shown in fig. 5, as shown in fig. 7, and specifically may include the following steps:

step S701, in the current video networking conference, when detecting that a preset conference mode of the video networking conference is started, generating a first conference mode signaling.

In practice, the pamier client may be installed on a terminal device, such as a smart phone, a computer, etc., so that a user can log in the pamier client to perform a conference operation. The function button for controlling the conference mode can be arranged on the pamil client, when the function button for presetting the conference mode is set in an opening state, the video networking conference enters a free speech mode, and the pamil client immediately generates a first conference mode signaling to inform all parties of the video networking conference to enter a free speech stage.

Step S702, sending the first conference mode signaling to the video networking conference management system.

The video networking conference management system is used for sending the first conference mode signaling to the video networking terminal; and the video networking terminal is used for responding to the first conference mode signaling and identifying the voice of the participant user corresponding to the video networking terminal from a plurality of voices in a preset distance range.

In practice, the video networking conference management system may send the first conference mode signaling to each conference terminal participating in the video networking conference, in addition to sending the first conference mode signaling to the video networking terminals, so that all the terminals participating in the video networking conference can receive the first conference mode signaling.

Specifically, the specific process that the video networking terminal is configured to identify, in response to the first conference mode signaling, the voice of the conference-participating user corresponding to the video networking terminal from the multiple voices within the preset distance range may refer to the process of step S602, which is not described herein again.

Step S703, receiving a speaking instruction sent by the video networking conference management system.

And the speaking instruction is generated by the video networking terminal when the voice of the participating user is recognized, and is sent to the video networking conference management system.

In practice, the utterance instruction may include an identifier of the terminal of the video network, where the identifier may be an ID number, a name, and the like of the terminal of the video network, and after receiving the utterance instruction, the identifier of the terminal of the video network in the utterance instruction may also be displayed, so that an operator may know the terminal applying for utterance.

Step S704, in response to the speaking instruction, switching the video networking terminal to a speaking party in the video networking conference.

In practice, the pamier client may parse the speaking instruction to read the valid data carried by the speaking instruction, where the valid data may include an identifier of the terminal of the video network. And then, the pamier client can switch the video network terminal to be the speaking party when determining that the video network terminal is not the current speaking terminal of the video network conference according to the effective data.

According to the embodiment of the application, the first conference mode signaling can be sent to the video networking terminal when the preset conference mode is started, so that the video networking terminal responds to the first conference mode signaling to identify the voice of the conference participating user of the video networking terminal, and when the voice of the conference participating user is identified, the speaking instruction is sent, and the Pamil client can automatically switch the video networking terminal into the speaking party according to the speaking instruction. The Pamier client can automatically switch the video networking terminal into the speaking party according to the speaking instruction without manual switching by an operator, so that the conference operation process is reduced, and the efficiency of the participant terminal is increased.

With reference to the foregoing embodiment, in an optional example, because the video-networking conference management system is further in communication connection with a plurality of conference terminals participating in the video-networking conference, in practice, the conference terminals may also send a speaking instruction to the pamier client to apply for speaking, and accordingly, step S704 may specifically include the following steps:

step S7041, when it is determined that the terminal in the video networking is not the terminal currently speaking in the video networking conference, add the identifier of the terminal in the video networking to a preset speaking terminal list.

At least one conference terminal identification is prestored in the preset speaking terminal list, the at least one conference terminal identification is an identification of a conference terminal requesting speaking in the plurality of conference terminals, and the at least one conference terminal identification is arranged according to the time sequence of requesting speaking.

Since the pamil client can send the first conference mode signaling to each conference terminal in the video networking conference, each conference terminal can also send a speaking instruction for applying for speaking to the pamil client according to the processes from the step S601 to the step S604. In this way, each time the pamier client receives an utterance instruction sent by one conference terminal, the identifier of the conference terminal can be added into a preset utterance terminal list, and in the preset utterance terminal list, the identifier of the conference terminal which sends the utterance instruction first is arranged behind the identifier of the conference terminal which sends the instruction later.

Illustratively, taking 4 conference terminals participating in the video networking conference as an example, the 5 conference terminals are respectively referred to as a participant 2, a participant 3, a participant 4, and a participant 5, wherein the participant 1 is a video networking terminal. The pamier client receives the speaking instruction of the participant 3, receives the speaking instruction of the participant 5, and finally receives the speaking instruction of the video network terminal (the participant 1), and then the identifiers of the participant 3, the participant 5 and the participant 1 are sequentially arranged in the preset speaking terminal list according to the time sequence.

Step S7042, according to the time sequence of the request for speaking, after sequentially switching the conference terminal corresponding to the at least one conference terminal identifier to the speaking party in the video networking conference, switching the video networking terminal to the speaking party in the video networking conference.

In practice, in a video networking conference, in order to ensure the speaking quality of a speaking party each time, one speaking party can speak at the same time, and after the speaking party finishes speaking, the next speaking party speaks. In a specific implementation, when the identifiers of the conference terminals are arranged before the identifiers of the video networking terminals, it is indicated that the conference terminal corresponding to the identifier of the conference terminal applies for speaking prior to the video networking terminal, and in practice, the conference terminals corresponding to the identifiers of the conference terminals arranged before can be sequentially switched to the speaking party, and then the video networking terminal is switched to the speaking party in the video networking conference.

Accordingly, in an optional example, after the video network terminal is switched to a speaking party in the video network conference, the method may further include the following steps:

step S705, when detecting that the preset conference mode of the video networking conference is closed, generating a second conference mode signaling, and clearing the preset speaking terminal list.

In practice, when detecting that the function button of the preset conference mode is turned off, the pamil client may immediately generate a second conference mode signaling to notify the participants that the free speaking phase of the video networking conference is finished.

Step S706, sending the second conference mode signaling to the video networking conference management system.

The video networking conference management system is used for sending the second conference mode signaling to the video networking terminal, and the video networking terminal is used for responding to the second conference mode signaling and stopping recognizing the voices of the participating users.

In practice, the video networking conference management system may further forward the second conference mode signaling to each conference terminal, so that each conference terminal also stops voice recognition of its participating user.

With reference to the foregoing embodiments, the process of the video networking conference processing method from the video networking terminal and the pommel client side is fully described, and specifically, the method may include the following steps:

step a, a first conference mode signaling is generated when a pamil client detects that a preset conference mode of a video networking conference is started in the currently-performed video networking conference.

And b, the first conference mode signaling is sent to the video networking conference management system by the Pamier client, and the video networking conference management system sends the first conference mode signaling to the video networking terminal.

And c, the video network terminal receives the first conference mode signaling and responds to the first conference mode signaling to recognize the voice of the participant user of the video network terminal from a plurality of voices in a preset distance range.

And d, generating a speaking instruction by the video contact terminal when recognizing the voice of the participant user.

And e, the video networking terminal sends the speaking instruction to the video networking conference management system, and the video networking conference management system sends the speaking instruction to the Pamier client.

And f, the Pamier client receives the speaking instruction sent by the video networking conference management system, and responds to the speaking instruction, and the video networking terminal is switched to be a speaking party in the video networking conference.

Referring to fig. 8, a complete flow diagram of the video networking conference processing method in a specific example is shown, in fig. 8, a participant 1 is a video networking terminal, other participants 2 to n are conference terminals shown in fig. 5, a conference management system is a video networking conference management system, and pamirobe is a terminal installed with a pamil client. Specifically, the following steps may be included:

first, when the video network enters the free speech mode (the above-mentioned predetermined conference mode), the participant 1 recognizes the voices of the participant users from among a plurality of voices within a predetermined distance range. When the speaking is not recognized or the noise is recognized, the conference system stands by, and when the speaking is recognized, the participant generates a speaking instruction and sends the speaking instruction to the conference management system. The terminal in fig. 8 is participant 1 (video network terminal).

The conference management system then pushes the talk instruction to pamrmobile.

Then, when determining that the participant 1 is not the speaker 2 (i.e., the speaker in step S604), PAMIRMobile detects whether a user is speaking, and if so, adds the participant 1 to a queue (i.e., adds the identifier of the terminal in the video network to a preset speaker list in step S7041).

Finally, after waiting for the speaker who is speaking at present to finish speaking, PAMIR Mobile switches the participant 1 to the speaker 2 in the video networking conference.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Referring to fig. 9, a block diagram of a video networking conference processing apparatus according to the present application is shown, where the apparatus may be applied to a video networking terminal, the video networking terminal may be communicatively connected to a video networking conference management system, the video networking conference management system may be communicatively connected to a pamil client, and the apparatus may specifically include the following modules:

a first conference mode signaling receiving module 901, configured to receive a first conference mode signaling sent by the video networking conference management system in a video networking conference in which the video networking terminal currently participates; the first conference mode signaling is generated by the Pamier client when detecting that a preset conference mode of the video networking conference is started and is sent to the video networking conference management system;

a voice recognition module 902, configured to, in response to the first conference mode signaling, recognize a voice of a participant user corresponding to the video network terminal from multiple voices within a preset distance range;

a speech instruction generating module 903, configured to generate a speech instruction when recognizing the voice of the participant user;

a speaking instruction sending module 904, configured to send the speaking instruction to the video networking conference management system, where the video networking conference management system is configured to send the speaking instruction to the pamil client, and the pamil client is configured to respond to the speaking instruction and switch the video networking terminal to be a speaking party in the video networking conference.

Optionally, the video networking conference management system is further communicatively connected with a plurality of conference terminals participating in the video networking conference, and the video networking terminal and the plurality of conference terminals are communicatively connected with a video networking server; the device further comprises:

the audio recording module is used for recording the recognized voice of the conference participating user into first audio data and locally storing the first audio data;

a reply signaling receiving module, configured to receive a switching utterance reply signaling sent by the video networking conference management system; the switching speech reply signaling is generated by the pamier client when the video networking terminal is switched to a speech party in the video networking conference and is sent to the video networking conference management system;

and the audio data sending module is used for responding to the switching speech reply signaling and sending the locally stored first audio data to the video networking server, and the video networking server is used for respectively sending the first audio data to the plurality of conference terminals.

Optionally, the apparatus may further include the following modules:

a second conference mode signaling receiving module, configured to receive a second conference mode signaling sent by the video networking conference management system, where the second conference mode signaling is generated by the pamier client when the preset conference mode is detected to be closed;

and the voice recognition stopping module is used for responding to the second conference mode signaling and stopping recognizing the voice of the participating user.

Optionally, the audio data sending module may specifically include the following units:

an audio acquiring unit, configured to acquire the first audio data from a local storage in response to the switching utterance reply signaling, and record the current voice of the participant user as second audio data;

the mixing unit is used for synthesizing the first audio data and the second audio data into mixed audio data;

and the sending unit is used for sending the mixed audio data to the video networking server, and the video networking server is used for respectively sending the mixed audio data to the plurality of conference terminals.

Referring to fig. 10, a block diagram of another video networking conference processing apparatus according to the present application is shown, where the apparatus may be applied to a pamil client, the pamil client may be communicatively connected to a video networking conference management system, and the video networking conference management system may be communicatively connected to a video networking terminal, and the apparatus may specifically include the following modules:

a first conference mode signaling generation module 1001, configured to generate a first conference mode signaling when detecting that a preset conference mode of a video networking conference is turned on in a currently-performed video networking conference;

a first conference mode signaling sending module 1002, configured to send the first conference mode signaling to the video networking conference management system, where the video networking conference management system is configured to send the first conference mode signaling to the video networking terminal; the video networking terminal is used for responding to the first conference mode signaling and identifying the voice of the conference participating user corresponding to the video networking terminal from a plurality of voices in a preset distance range;

a speaking instruction receiving module 1003, configured to receive a speaking instruction sent by the video networking conference management system; the speaking instruction is generated by the video networking terminal when the voice of the participating user is recognized, and is sent to the video networking conference management system;

a speaking party switching module 1004, configured to switch the video network terminal to be a speaking party in the video network conference in response to the speaking instruction.

Optionally, the video networking conference management system is further in communication connection with a plurality of conference terminals participating in the video networking conference; the talker switching module 1004 may specifically include the following units:

the list adding unit is used for adding the identifier of the video networking terminal into a preset speaking terminal list when the video networking terminal is determined not to be the terminal which speaks currently in the video networking conference; at least one conference terminal identifier is prestored in the preset speaking terminal list, the at least one conference terminal identifier is an identifier of a conference terminal requesting to speak in the plurality of conference terminals, and the at least one conference terminal identifier is arranged according to the time sequence of requesting to speak;

and the sequential switching unit is used for sequentially switching the conference terminal corresponding to the at least one conference terminal identifier into the speaking party in the video networking conference according to the time sequence of the request for speaking, and then switching the video networking terminal into the speaking party in the video networking conference.

Optionally, the apparatus may further include the following modules:

a second conference mode signaling generation module, configured to generate a second conference mode signaling and empty the preset speaking terminal list when detecting that a preset conference mode of the video networking conference is closed;

and the second conference mode signaling sending module is used for sending the second conference mode signaling to the video networking conference management system, the video networking conference management system is used for sending the second conference mode signaling to the video networking terminal, and the video networking terminal is used for responding to the second conference mode signaling and stopping recognizing the voice of the participating user.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Referring to fig. 11, a schematic structural diagram of an electronic device 1100 according to an embodiment of the present application is shown, where the electronic device 1100 may be used for video networking conference processing, and may include a memory 1101, a processor 1102 and a computer program stored in the memory 1101 and executable on the processor, where the processor 1102 is configured to execute the video networking conference processing method.

The embodiment of the application also provides a computer readable storage medium, and a computer program stored on the storage medium enables a processor to execute the video networking conference processing method.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The video networking conference processing method, the video networking conference processing device, the electronic device and the computer readable storage medium provided by the application are introduced in detail, specific examples are applied in the text to explain the principle and the implementation of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A video networking conference processing method is applied to a video networking terminal, the video networking terminal is in communication connection with a video networking conference management system, the video networking conference management system is in communication connection with a Pamier client, and the method comprises the following steps:

receiving a first conference mode signaling sent by the video networking conference management system in a video networking conference currently participated in by the video networking terminal; the first conference mode signaling is generated by the Pamier client when detecting that a preset conference mode of the video networking conference is started and is sent to the video networking conference management system, wherein the preset conference mode is a free speech mode in the video networking conference;

responding to the first conference mode signaling, and recognizing the voice of the participant user corresponding to the video network terminal from a plurality of voices within a preset distance range, wherein the recognizing includes: pre-storing the audio of the participant user, verifying whether each voice in the plurality of voices is matched with the pre-stored audio of the participant user corresponding to the video networking terminal, and if so, identifying the voice of the participant user corresponding to the video networking terminal;

2. The video networking conference processing method according to claim 1, wherein the video networking conference management system is further communicatively connected with a plurality of conference terminals participating in the video networking conference, and the video networking terminal and the plurality of conference terminals are communicatively connected with a video networking server; while recognizing the voice of the participating user and generating a speaking instruction, the method further comprises the following steps:

3. The method of claim 1, wherein after sending the talk burst directive to the video networking conference management system, the method further comprises:

4. The video networking conference processing method of claim 2, wherein sending the locally stored first audio data to the video networking server in response to the switch talk reply signaling comprises:

5. A video networking conference processing method is applied to a Pamil client, the Pamil client is in communication connection with a video networking conference management system, the video networking conference management system is in communication connection with a video networking terminal, and the method comprises the following steps:

sending the first conference mode signaling to the video networking conference management system, wherein the video networking conference management system is used for sending the first conference mode signaling to the video networking terminal; the video network terminal is used for responding to the first conference mode signaling and recognizing the voice of the participant user corresponding to the video network terminal from a plurality of voices in a preset distance range, and the voice recognition method comprises the following steps: pre-storing the audio of the participant user, verifying whether each voice in the plurality of voices is matched with the pre-stored audio of the participant user corresponding to the video networking terminal, and if so, identifying the voice of the participant user corresponding to the video networking terminal;

receiving a speaking instruction sent by the video networking conference management system; the speaking instruction is generated by the video networking terminal when the voice of the participating user is recognized and is sent to the video networking conference management system, the video networking conference management system is used for sending the speaking instruction to the Pamil client, and the Pamil client responds to the speaking instruction and switches the video networking terminal into a speaking party in the video networking conference.

6. The method of claim 5, wherein the video networking conference management system is further communicatively coupled to a plurality of conference endpoints participating in the video networking conference; responding to the speaking instruction, switching the video network terminal to be a speaking party in the video network conference, and the method comprises the following steps:

7. The method of claim 6, wherein after switching the video networking terminal to be a speaker in the video networking conference, the method further comprises:

8. A video networking conference processing device is applied to a video networking terminal, the video networking terminal is in communication connection with a video networking conference management system, the video networking conference management system is in communication connection with a Pamier client, and the device comprises:

the first conference mode signaling receiving module is used for receiving a first conference mode signaling sent by the video networking conference management system in a video networking conference currently participated by the video networking terminal; the first conference mode signaling is generated by the Pamier client when detecting that a preset conference mode of the video networking conference is started and is sent to the video networking conference management system; the preset conference mode is a free speech mode in the video networking conference;

the voice recognition module is used for responding to the first conference mode signaling and recognizing the voice of the conference participating user corresponding to the video network terminal from a plurality of voices in a preset distance range; the method comprises the following steps: pre-storing the audio of the participant user, verifying whether each voice in the plurality of voices is matched with the pre-stored audio of the participant user corresponding to the video networking terminal, and if so, identifying the voice of the participant user corresponding to the video networking terminal;

9. A video networking conference processing device is applied to a Pamil client, the Pamil client is in communication connection with a video networking conference management system, the video networking conference management system is in communication connection with a video networking terminal, and the device comprises:

10. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor is configured to: performing the video networking conference handling method of any of claims 1-4 or 5-7.

11. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor on a server side, enable the server to perform the video networking conference processing method of any of claims 1-4 or 5-7.