CN114979545A

CN114979545A - Multi-terminal call method, storage medium and electronic device

Info

Publication number: CN114979545A
Application number: CN202111621019.4A
Authority: CN
Inventors: 朱睿; 甄广启; 李岳鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-19
Filing date: 2021-12-27
Publication date: 2022-08-30

Abstract

The invention discloses a multi-terminal communication method, a storage medium and electronic equipment. Wherein, the method comprises the following steps: under the condition that a first terminal is set as an extension terminal of a second terminal to participate in a target call conference which is carried out online, acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal, wherein the terminals participating in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal; according to the first voice signal and the second voice signal, executing target sound mixing processing to obtain a third voice signal; and sending the third voice signal to terminals except the first terminal and the second terminal in the plurality of terminals for playing. The invention solves the technical problem of poor conversation effect of the online conference in the related technology.

Description

Multi-terminal call method, storage medium and electronic device

Technical Field

The invention relates to the field of computers, in particular to a multi-terminal call method, a storage medium and electronic equipment.

Background

In the related art, a general video conference room is usually configured with only one conference phone for playing and collecting sound. If the conference room is large, or the conference phone is unreasonable in placement position, the speaker is far away from the conference phone, and the phenomenon that other online participants cannot listen to the speech on the spot easily occurs.

Extension terminals for traditional conference room scenarios typically include an extension microphone and an extension speaker. With the increasing popularization of consumer electronic equipment and the increasing improvement of audio and video transmission capability of the mobile internet, the possibility exists that the audio playing and collecting capability on a hardware expansion terminal such as a wired microphone is used for assisting original equipment of a conference room to play and collect voice, but the hardware expansion terminal has some defects, namely, firstly, the cost is high, the equipment and consumable materials are various, and the purchase and maintenance costs are high; secondly, the use difficulty is high, professional equipment often needs to be configured by professional personnel, the number and the positions of the extended terminals are often fixed, the conference communication flexibility is reduced, and the conference opening efficiency is reduced.

For example, in an online classroom scene in which multiple persons participate, the device a is a notebook computer of a teacher and is also a conference initiating device, and when the device a enters a meeting through a computer on a platform, audio data of the teacher a for teaching is collected and sent to other online students, and meanwhile, the audio data of the students for questioning can be received in real time. Limited by the limitation of playing and collecting effective pickup distance of a notebook computer, and meanwhile, the participants may not conveniently move the positions of the participants, and students in the back row in the conference room may not communicate with other online students in a voice mode. For example, when the user E is a student who listens to a class in the back row of a classroom and wants to communicate with other online students about a problem, the voice of the user E cannot be effectively collected by the computer a because the user E is far away from the main device a in the conference room, so that the online students cannot hear the content of the speech.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a multi-terminal call method, a storage medium and electronic equipment, which at least solve the technical problem of poor call effect of an online conference in the related art.

According to an aspect of the embodiments of the present invention, a multi-terminal call method is provided, including: under the condition that a first terminal is set as an extension terminal of a second terminal to participate in a target call conference which is carried out online, acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal, wherein the terminals participating in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal; according to the first voice signal and the second voice signal, executing target sound mixing processing to obtain a third voice signal; and sending the third voice signal to terminals except the first terminal and the second terminal in the plurality of terminals for playing.

According to another aspect of the embodiments of the present invention, there is also provided a multi-terminal communication device, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first voice signal acquired by a first terminal and a second voice signal acquired by a second terminal under the condition that the first terminal is set as an extension terminal of the second terminal to participate in a target call conference which is carried out online, wherein the terminals participating in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal;

the audio mixing module is used for executing target audio mixing processing according to the first voice signal and the second voice signal to obtain a third voice signal;

and the sending module is used for sending the third voice signal to the terminals except the first terminal and the second terminal in the plurality of terminals for playing.

Alternatively,

the device is used for acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal in the following way: under the condition that the first terminal comprises N terminals, acquiring N paths of first voice signals acquired by the N terminals and one path of second voice signals acquired by the second terminal, wherein N is 1 or a natural number greater than 1, and each terminal in the N terminals is used for acquiring one path of first voice signals;

the device is used for executing target sound mixing processing according to the first voice signal and the second voice signal in the following mode to obtain a third voice signal: and executing the target sound mixing processing according to the N paths of first voice signals and the one path of second voice signals to obtain one path of third voice signals.

Optionally, the apparatus is configured to perform target mixing processing according to the first speech signal and the second speech signal to obtain a third speech signal by:

under the condition that a fourth voice signal which is played and received on the second terminal is collected by the first terminal, performing echo cancellation processing on the first voice signal according to the fourth voice signal which is sent to the second terminal to obtain a fifth voice signal, wherein the fourth voice signal is a voice signal collected by terminals except the first terminal and the second terminal in the plurality of terminals;

and executing the target sound mixing processing on the fifth voice signal and the second voice signal to obtain the third voice signal.

Optionally, the apparatus is configured to perform echo cancellation processing on the first voice signal according to the fourth voice signal sent to the second terminal, so as to obtain a fifth voice signal:

and eliminating the fourth voice signal sent to the second terminal in the first voice signal to obtain a fifth voice signal.

performing the target audio mixing processing on the first voice signal and the second voice signal to obtain a sixth voice signal;

and under the condition that a fourth voice signal which is played and received on the second terminal is acquired by the first terminal, performing echo cancellation processing on the sixth voice signal according to the fourth voice signal which is sent to the second terminal to obtain the third voice signal, wherein the fourth voice signal is a voice signal acquired by a terminal except the first terminal and the second terminal in the plurality of terminals.

Optionally, the apparatus is configured to perform the target mixing process on the first speech signal and the second speech signal to obtain the sixth speech signal by:

the first terminal includes N terminals N way is gathered to N terminals first speech signal and the second terminal is gathered all the way under the circumstances of second speech signal, right N way first speech signal and all the way second speech signal carries out the target audio mixing is handled, obtains all the way sixth speech signal, wherein, N is 1 or the natural number that is greater than 1, every terminal in N terminals is used for gathering all the way first speech signal.

Optionally, the apparatus is configured to perform echo cancellation processing on the sixth voice signal according to the fourth voice signal sent to the second terminal, so as to obtain the third voice signal:

and eliminating the fourth voice signal sent to the second terminal in the sixth voice signal to obtain the third voice signal.

determining the delay time length of the first voice signal relative to the second voice signal according to the first voice signal and the second voice signal;

adjusting the first voice signal into a seventh voice signal aligned with the second voice signal according to the delay time;

and mixing the seventh voice signal and the second voice signal to obtain a third voice signal.

Optionally, the apparatus is configured to determine a delay time duration of the first speech signal relative to the second speech signal according to the first speech signal and the second speech signal by:

extracting a first set of audio fingerprints and a first set of timestamps corresponding to the first set of audio fingerprints in the first speech signal, and extracting a second set of audio fingerprints and a second set of timestamps corresponding to the second set of audio fingerprints in the second speech signal, wherein the first set of audio fingerprints includes one or more audio fingerprints and the second set of audio fingerprints includes one or more audio fingerprints;

in the event that a first audio fingerprint of the first set of audio fingerprints matches a second audio fingerprint of the second set of audio fingerprints, obtaining a first timestamp of the first set of timestamps corresponding to the first audio fingerprint and a second timestamp of the second set of timestamps corresponding to the second audio fingerprint;

determining the delay time duration as a time interval between the first timestamp and the second timestamp.

Optionally, the apparatus is configured to adjust the first speech signal to a seventh speech signal aligned with the second speech signal according to the delay time duration by:

and moving the first voice signal forwards or backwards by the delay time length according to the time, so that the first group of audio fingerprints and the second group of audio fingerprints are aligned on the time stamps, and obtaining the seventh voice signal aligned with the second voice signal.

Optionally, the apparatus is further configured to:

acquiring a target setting instruction on a target display interface of the first terminal;

and responding to the target setting instruction, and setting the first terminal as an extended terminal of the second terminal to participate in the target call conference which is carried out on line.

Optionally, the apparatus is configured to obtain a target setting instruction on a target display interface of the first terminal by:

displaying the participant identifications of the plurality of terminals on the target display interface of the first terminal, wherein the participant identifications of the plurality of terminals comprise the participant identification of the second terminal; acquiring the target setting instruction on the target display interface, wherein the target setting instruction is used for selecting a conference participating identifier of the second terminal and indicating that the first terminal is set as an extension terminal of the second terminal to participate in the target conference call which is carried out on line; or

Under the condition that the first terminal acquires target information pushed by the second terminal, responding to the target information, and displaying a target prompt option on the target display interface of the first terminal, wherein the target prompt option is used for prompting whether the first terminal agrees to use as an extended terminal of the second terminal to participate in the target call conference which is carried out on line or not; and acquiring the target setting instruction on the target display interface, wherein the target setting instruction is used for selecting an agreement option in the target prompt options and indicating that the first terminal is set as an extension terminal of the second terminal to participate in the target call conference which is carried out on line.

Optionally, the apparatus is further configured to:

and under the condition that the first terminal and the second terminal carry out near field communication, acquiring the target information pushed by the second terminal at the first terminal, wherein the target information is used for triggering the target display interface to display the target prompt option.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned multi-terminal call method when running.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the above-mentioned multi-terminal call method through the computer program.

In the embodiment of the invention, under the condition that a first terminal is set as an extended terminal of a second terminal to participate in a target call conference which is carried out online, a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal are acquired, wherein the terminals participating in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal; according to the first voice signal and the second voice signal, executing target sound mixing processing to obtain a third voice signal; the third voice signal is sent to terminals except the first terminal and the second terminal in the plurality of terminals for playing, the first terminal is set as an extension terminal of the second terminal to participate in a target call conference which is carried out on line, and audio mixing processing is carried out according to the voice signals collected by the first terminal and the second terminal so as to send the third voice signal to other terminals in the target call conference for playing, and the purpose of using the first terminal to assist the second terminal in collecting the voice signals is achieved, so that the call effect of the on-line conference is improved, the technical effect of participation experience of the on-line conference is optimized, and the technical problem that the call effect of the on-line conference is poor in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:

fig. 1 is a schematic diagram of an application environment of an alternative multi-terminal call method according to an embodiment of the present invention;

fig. 2 is a flow chart illustrating an alternative multi-terminal call method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an alternative multi-terminal call method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of another alternative multi-terminal call method according to an embodiment of the present invention;

fig. 5 is a schematic diagram of still another alternative multi-terminal call method according to an embodiment of the present invention;

fig. 6 is a schematic diagram of still another alternative multi-terminal call method according to an embodiment of the present invention;

fig. 7 is a schematic diagram of still another alternative multi-terminal call method according to an embodiment of the present invention;

fig. 8 is a schematic diagram of still another alternative multi-terminal call method according to an embodiment of the present invention;

fig. 9 is a schematic diagram of still another alternative multi-terminal call method according to an embodiment of the present invention;

fig. 10 is a schematic diagram of still another alternative multi-terminal call method according to an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of an alternative multi-terminal communicator according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

co-location multi-extension terminal scenario: in the same place (the same room or the same place can be considered as the same place if the physical distance is relatively close), on the premise that the original single terminal is accessed to the session, in order to improve the playing and collecting quality of the voice, a plurality of other session terminals are added to access the same session. Here, the terminal refers specifically to a terminal device supporting voice playing and capturing in a conference, such as a speaker (loudspeaker), a microphone (microphone ), and the like.

Full duplex conversation: also called bidirectional simultaneous conversation, refers to a conversation interaction mode in which two parties participating in a conversation can speak and listen to voice at the same time.

And (3) stress collection: the same sound is picked up repeatedly by different microphones so that the listener hears a plurality of repeated sounds. Such as the speaker saying: ABCD; if there are two microphones recording at the same time in the field, the listener may hear if the two microphone network delays are not aligned: AABBCDCD.

The invention is illustrated below with reference to examples:

according to an aspect of the embodiment of the present invention, a multi-terminal call method is provided, and optionally, in this embodiment, the multi-terminal call method may be applied to a hardware environment formed by a server 101 and a user terminal 103 as shown in fig. 1. As shown in fig. 1, a server 101 is connected to a terminal 103 through a network, and may be configured to provide a service to a user terminal or a client installed on the user terminal, where the client may be a video client, an instant messaging client, a browser client, an education client, a game client, a conference client, or the like. The database 105 may be provided on or separate from the server for providing data storage services for the server 101, such as a conference data storage server, and the network may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other wireless communication enabled networks, the user terminal 103 may be a terminal configured with an application capable of conducting a target conference call, and may include but is not limited to at least one of the following: the server may be a single server, or a server cluster composed of a plurality of servers, or a cloud server.

As shown in fig. 1, the multi-terminal call method may be implemented in the server 101 through the following steps:

s1, when a first terminal is configured as an extension terminal of a second terminal to participate in a target call conference that is conducted online, obtaining, on a server 101, a first voice signal collected by the first terminal and a second voice signal collected by the second terminal, where the terminals participating in the target call conference online include a plurality of terminals, and the plurality of terminals include the first terminal and the second terminal;

s2, executing a target audio mixing process on the server 101 according to the first audio signal and the second audio signal to obtain a third audio signal;

s3, sending the third voice signal to the terminals other than the first terminal and the second terminal for playing on the server 101.

Optionally, in this embodiment, the multi-terminal call method may also be used by a client configured in a server, but not limited to.

The above is merely an example, and the present embodiment is not particularly limited.

Optionally, as an optional implementation manner, as shown in fig. 2, the multi-terminal call method includes:

s202, under the condition that a first terminal is set as an extension terminal of a second terminal to participate in a target call conference which is carried out online, acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal, wherein the terminals participating in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal;

s204, executing target sound mixing processing according to the first voice signal and the second voice signal to obtain a third voice signal;

and S206, sending the third voice signal to the terminals except the first terminal and the second terminal in the plurality of terminals for playing.

Optionally, in this embodiment, the first terminal may include, but is not limited to, a terminal capable of joining the target conference call and having a voice signal collection function, for example, a Mobile phone (e.g., an Android Mobile phone, an iOS Mobile phone, etc.), a notebook computer, a tablet computer, a palmtop computer, an MID (Mobile Internet Devices), a PAD, a desktop computer, a smart television, and the like, and the second terminal may include, but is not limited to, a terminal that is the same as the first terminal but participates in the target conference call conducted online, and may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Optionally, in this embodiment, the main body of the multi-terminal call execution method may include, but is not limited to, a server, where the server may be an independent physical server, may also be a server cluster or a distributed system formed by multiple physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and big data and artificial intelligence platform.

For example, fig. 3 is a schematic diagram of an optional multi-terminal call method according to an embodiment of the present invention, and as shown in fig. 3, the multi-terminal call method may be applied to the following architectures:

an application 302, which is used for the above-mentioned application participating in the target call session in online operation;

intelligent cloud mixing (server) 304: a call method for executing the multi-terminal as an execution subject;

the expansion device 306: namely the first equipment;

the master device 308: namely the second device.

In particular, applications may include, but are not limited to, in a cloud conferencing application scenario:

cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is called the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

According to the logic function division, a Platform as a Service (PaaS a Service) layer can be deployed on an Infrastructure as a Service (IaaS a Service) layer, a Software as a Service (SaaS a Service) layer is deployed on the PaaS layer, and the SaaS layer can be directly deployed on the IaaS layer. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology. A user can share voice, data files and videos with teams and clients all over the world quickly and efficiently only by performing simple and easy-to-use operation through an internet interface, and complex technologies such as transmission and processing of data in a conference are assisted by a cloud conference service provider to operate.

At present, domestic cloud conferences mainly focus on Service contents mainly in a Software as a Service (SaaS a Service) mode, including Service forms such as telephones, networks and videos, and cloud computing-based video conferences are called cloud conferences.

In the cloud conference era, data transmission, processing and storage are all processed by computer resources of video conference manufacturers, so that users do not need to purchase expensive hardware and install complicated software, and can carry out efficient teleconference only by opening a browser and logging in a corresponding interface.

The above is merely an example, and the present embodiment is not limited in any way.

Optionally, in this embodiment, the target conference call in which the first terminal is set as the extension terminal of the second terminal to participate in the online conference call may include, but is not limited to, setting in a manner as shown in fig. 4, where fig. 4 is a schematic diagram of another optional multi-terminal call method according to an embodiment of the present invention, and as shown in fig. 4, a display interface 402 is displayed on the first terminal, and the extension terminal that sets the first terminal as the second terminal participates in the target conference call in the online conference call by performing an interactive operation on an interactive object on the display interface 402.

Optionally, in this embodiment, the acquiring the first voice signal acquired by the first terminal and the second voice signal acquired by the second terminal may include, but is not limited to, acquiring the first voice signal and the second voice signal that are uplink-transmitted by the first terminal and the second terminal.

Optionally, in this embodiment, the terminal participating in the target conference call online includes a plurality of terminals, which may include, but are not limited to, the first terminal, the second terminal, and other terminals participating in the target conference call online and having the same type as the second terminal.

According to the embodiment, under the condition that the first terminal is set as an extended terminal of the second terminal to participate in the target call conference which is carried out online, the first voice signal acquired by the first terminal and the second voice signal acquired by the second terminal are acquired, wherein the terminals which participate in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal; according to the first voice signal and the second voice signal, executing target sound mixing processing to obtain a third voice signal; the third voice signal is sent to terminals except the first terminal and the second terminal in the plurality of terminals for playing, the extension terminal which sets the first terminal as the second terminal participates in the target call conference which is carried out on line, and the voice signal which is collected according to the first terminal and the second terminal is subjected to audio mixing processing so as to be sent to other terminals in the target call conference for playing, and the purpose of using the first terminal to assist the second terminal in collecting the voice signal is achieved, so that the call effect of the on-line conference is improved, the technical effect of participation experience of the on-line conference is optimized, and the technical problem that the call effect of the on-line conference existing in the related technology is poor is solved.

As an alternative to this, it is possible to,

the acquiring a first voice signal collected by the first terminal and a second voice signal collected by the second terminal includes: under the condition that the first terminal comprises N terminals, acquiring N paths of first voice signals acquired by the N terminals and one path of second voice signals acquired by the second terminal, wherein N is 1 or a natural number greater than 1, and each terminal in the N terminals is used for acquiring one path of first voice signals;

the executing a target audio mixing process according to the first voice signal and the second voice signal to obtain a third voice signal includes: and executing the target sound mixing processing according to the N paths of first voice signals and the one path of second voice signals to obtain one path of third voice signals.

Optionally, in this embodiment, in the case that the first terminal includes N terminals, the obtaining N paths of first voice signals collected by the N terminals and one path of second voice signals collected by the second terminal may include, but is not limited to, configuring the number of the first terminals to be one or more, where in the case that the number of the first terminals is N, each of the N terminals is used to collect one path of first voice signals, and the N terminals are all extension terminals of the second terminal.

Optionally, in this embodiment, performing a target mixing process according to N paths of first voice signals and one path of second voice signals, to obtain one path of third voice signals may include, but is not limited to, performing an echo cancellation operation and a delay alignment operation on each of the N paths of first voice signals and the second voice signal, and performing a mixing process on the processed N paths of voice signals and the second voice signal to obtain the one path of third voice signals, in other words, the target mixing process may include, but is not limited to, voice processing operations of an echo cancellation operation, a delay alignment operation, and a mixing operation.

Fig. 5 is a schematic diagram of another alternative multi-terminal call method according to the embodiment of the present invention, as shown in fig. 5, including but not limited to the following:

s1, the intelligent mixer obtains a second voice signal uploaded by a main microphone (corresponding to the aforementioned second terminal);

s2, the intelligent mixer obtains the first voice signals uploaded by N upspreads (corresponding to the aforementioned N first terminals);

and S3, the intelligent mixer executes target mixing processing according to the N paths of first voice signals and the one path of second voice signals, and performs mixing output to obtain a path of third voice signals.

Through the embodiment, the method and the device have the advantages that under the condition that the first terminal comprises N terminals, N paths of first voice signals collected by the N terminals and one path of second voice signals collected by the second terminal are obtained, target sound mixing processing is executed according to the N paths of first voice signals and the one path of second voice signals, one path of third voice signals is obtained, the N paths of first voice signals and the one path of second voice signals are subjected to target sound mixing processing, the purpose of using the first terminal to assist the second terminal to collect the voice signals is achieved, the call effect of the online conference is improved, the technical effect of participation experience of the online conference is optimized, and the technical problem that the call effect of the online conference existing in the related technology is poor is solved.

As an optional scheme, the performing, according to the first speech signal and the second speech signal, a target mixing process to obtain a third speech signal includes:

and performing target sound mixing processing on the fifth voice signal and the second voice signal to obtain a third voice signal.

Optionally, in this embodiment, the fourth voice signal may include, but is not limited to, a voice signal transmitted by a terminal other than the first terminal and the second terminal in the plurality of terminals for playing on the second terminal, and the fourth voice signal sent to the second terminal may include, but is not limited to, a voice signal sent by an audio mixer to the second terminal for playing.

Optionally, in this embodiment, the performing echo cancellation processing on the first voice signal according to the fourth voice signal sent to the second terminal to obtain the fifth voice signal may include, but is not limited to, using an echo canceller, and using the fourth voice signal as a near-end reference signal to cancel an echo to obtain the fifth voice signal.

Optionally, in this embodiment, the performing the target mixing process on the fifth speech signal and the second speech signal to obtain the third speech signal may include, but is not limited to, performing a delay alignment operation on the fifth speech signal and the second speech signal, and performing a mixing process on the processed fifth speech signal and the second speech signal to obtain the third speech signal, in other words, the target mixing process may include, but is not limited to, speech processing operations of a delay alignment operation and a mixing operation.

For example, in the case where the target mixing process includes a delay alignment operation, the following steps may be included, but not limited to:

determining the delay time length of the fifth voice signal relative to the second voice signal according to the fifth voice signal and the second voice signal;

adjusting the fifth voice signal to be aligned with the second voice signal according to the delay time;

and mixing the fifth voice signal aligned with the second voice signal and the second voice signal to obtain a third voice signal.

Optionally, in this embodiment, fig. 6 is a schematic diagram of still another optional multi-terminal call method according to an embodiment of the present invention, which may include, but is not limited to, as shown in fig. 6, in a case that a fourth voice signal (a main device broadcast signal broadcast on the second terminal and collected by the extension microphone shown in fig. 6) broadcast on the second terminal is collected by the first terminal, performing echo cancellation processing on the first voice signal (all voice signals collected by the extension microphone shown in fig. 6) according to the fourth voice signal (a main device broadcast signal shown in fig. 6) sent to the second terminal to obtain a fifth voice signal, where the fourth voice signal is a voice signal collected by a terminal other than the first terminal and the second terminal among the multiple terminals, performing target mixing processing on the fifth voice signal and the second voice signal to obtain a third voice signal, and then sending the third voice signal to the terminals except the first terminal and the second terminal in the plurality of terminals for playing.

According to the embodiment, under the condition that a fourth voice signal which is played and received on the second terminal and is acquired by the first terminal is adopted, echo cancellation processing is carried out on the first voice signal according to the fourth voice signal which is sent to the second terminal, and a fifth voice signal is obtained, wherein the fourth voice signal is a voice signal which is acquired by terminals except the first terminal and the second terminal in the plurality of terminals; the method comprises the steps of executing target sound mixing processing on a fifth voice signal and a second voice signal to obtain a third voice signal, and performing echo cancellation processing on the first voice signal to achieve the purpose of using a first terminal to assist a second terminal to collect the voice signals so as to improve the quality of the voice signals, thereby achieving the purpose of improving the conversation effect of the online conference, optimizing the technical effect of participation experience of the online conference, and further solving the technical problem that the conversation effect of the online conference is poor in the related technology.

As an optional scheme, the performing echo cancellation processing on the first voice signal according to the fourth voice signal sent to the second terminal to obtain a fifth voice signal includes:

and eliminating the fourth voice signal sent to the second terminal in the first voice signal to obtain the fifth voice signal.

Optionally, in this embodiment, the removing, in the first voice signal, the fourth voice signal sent to the second terminal, and the obtaining of the fifth voice signal may be understood as removing, from all voice signals collected by the first terminal, a voice signal sent to the second terminal for playing, so as to obtain the fifth voice signal, which is used for performing subsequent audio mixing processing.

According to the embodiment, the fourth voice signal sent to the second terminal is eliminated from the first voice signal to obtain the fifth voice signal, and the voice signals used for playing and sent to the second terminal in all the voice signals collected by the first terminal are removed, so that the multi-terminal conversation method can be applied to a full-duplex application scene, the purpose of using the first terminal to assist the second terminal in collecting the voice signals is achieved, the conversation effect of an online conference is improved, the technical effect of participation experience of the online conference is optimized, and the technical problem that the conversation effect of the online conference is poor in the related technology is solved.

Optionally, in this embodiment, the performing the target mixing processing on the first voice signal and the second voice signal to obtain the sixth voice signal may include, but is not limited to, performing a delay alignment operation on the first voice signal and the second voice signal, and performing mixing processing on the processed first voice signal and the processed second voice signal to obtain the sixth voice signal, in other words, the target mixing processing may include, but is not limited to, voice processing operations of a delay alignment operation and a mixing operation.

Optionally, in this embodiment, in the case that the fourth voice signal received by playing on the second terminal is collected by the first terminal, performing echo cancellation processing on the sixth voice signal according to the fourth voice signal sent to the second terminal to obtain the third voice signal, wherein, the fourth voice signal is the voice signal collected by the terminal except the first terminal and the second terminal in the plurality of terminals, which may include but is not limited to the case that the fourth voice signal received by playing on the second terminal is collected by the first terminal, and carrying out echo cancellation processing on a sixth voice signal according to a fourth voice signal sent to the second terminal to obtain a third voice signal, and sending the third voice signal to terminals except the first terminal and the second terminal in the plurality of terminals for playing.

As an optional solution, the performing the target mixing process on the first voice signal and the second voice signal to obtain the sixth voice signal includes:

first terminal includes N terminals N way is gathered to N terminals first speech signal and second terminal gathers one way under the circumstances of second speech signal, right N way first speech signal and one way second speech signal carries out the target audio mixing is handled, obtains one way the sixth speech signal, wherein, N is 1 or is greater than 1 natural number, every terminal in N terminals is used for gathering one way first speech signal.

Optionally, in this embodiment, the performing the target mixing processing on the N paths of first voice signals and the one path of second voice signals to obtain the one path of sixth voice signals may include, but is not limited to, performing delay alignment processing on the N paths of first voice signals and the one path of second voice signals, and specifically, may include, but is not limited to, the following steps:

determining the delay time length of the N paths of first voice signals relative to one path of second voice signals according to the N paths of first voice signals and one path of second voice signals;

adjusting the N paths of first voice signals into N paths of first voice signals aligned with one path of second voice signals according to the delay time;

and mixing the N paths of first voice signals and one path of second voice signals aligned with one path of second voice signals to obtain one path of sixth voice signals.

By the embodiment, under the condition that the first terminal comprises N terminals, the N terminals collect N paths of first voice signals and the second terminal collects one path of second voice signals, performing target sound mixing processing on the N paths of first voice signals and one path of second voice signal to obtain a path of sixth voice signal, wherein N is 1 or a natural number greater than 1, each terminal in the N terminals is used for collecting a path of first voice signals, the aim of using the first terminal to assist the second terminal to collect the voice signals is achieved by carrying out target audio mixing processing on the N paths of first voice signals and one path of second voice signals, thereby realizing the technical effects of improving the conversation effect of the online conference, optimizing the participation experience of the online conference, and then the technical problem that the conversation effect of the online conference is poor in the related technology is solved.

As an optional scheme, the performing echo cancellation processing on the sixth voice signal according to the fourth voice signal sent to the second terminal to obtain the third voice signal includes:

Optionally, in this embodiment, the removing of the fourth voice signal sent to the second terminal in the sixth voice signal to obtain the third voice signal may be understood as removing a voice signal sent to the second terminal and used for playing, which is included in all voice signals collected by the first terminal, to obtain the third voice signal, which is used for performing subsequent audio mixing processing.

According to the embodiment, the fourth voice signal sent to the second terminal is eliminated from the sixth voice signal to obtain the third voice signal, and the voice signal used for playing and sent to the second terminal in all the voice signals collected by the first terminal is removed, so that the multi-terminal call method can be applied to a full-duplex application scene, the purpose of using the first terminal to assist the second terminal in collecting the voice signals is achieved, the call effect of an online conference is improved, the technical effect of participation experience of the online conference is optimized, and the technical problem that the call effect of the online conference is poor in the related technology is solved.

according to the delay time length, the first voice signal is adjusted to be a seventh voice signal aligned with the second voice signal;

Optionally, in this embodiment, the time delay duration of the first voice signal relative to the second voice signal may be estimated by, but not limited to, a time delay estimation method, for example, as shown in fig. 6, the time delay estimation module estimates a time delay Tn of each path of the expanded microphone (corresponding to the aforementioned first device) relative to the main microphone (corresponding to the aforementioned second device).

Optionally, in this embodiment, the adjusting the first voice signal to the seventh voice signal aligned with the second voice signal according to the delay duration may include, but is not limited to, adjusting the first voice signal to the seventh voice signal aligned with the second voice signal by a delay alignment method, for example, as shown in fig. 6, the delay alignment module dynamically adjusts the delay of different expanded microphone voices with reference to the estimated delay Tn, so that the content of the expanded microphone voice of each path at the same time is consistent with the content of the main microphone voice signal, and compensates for the delay misalignment caused by the inconsistency of the uplink network conditions of the main microphone and the expanded microphone, where the delay alignment method may include, but is not limited to, being based on an audio fingerprint technology.

Optionally, in this embodiment, the mixing the seventh speech signal and the second speech signal to obtain the third speech signal may include, but is not limited to, as shown in fig. 7, where fig. 7 is a schematic diagram of yet another optional multi-terminal call method according to an embodiment of the present invention, where the method includes, but is not limited to, the following steps:

1) acquiring multiple paths of microphones (including a main microphone and an extension microphone, and 4 paths are illustrated in fig. 7), sending the acquired signals to a PRE-mixer (PRE MIX), and calculating the total energy without gain adjustment;

2) an energy Ratio calculation module (Ratio calculator) calculates the energy Ratio of each microphone and the total energy given in the step 1)

3) A Gain Control module (Gain Control) for adjusting the energy of each microphone based on the energy ratio calculated in step 2)

4) And mixing and superposing the audio frequency after the energy adjustment in the step 3) to realize the sound mixing of the seventh voice signal and the second voice signal, so as to obtain a third voice signal.

According to the embodiment, the method and the device have the advantages that the time delay duration of the first voice signal relative to the second voice signal is determined according to the first voice signal and the second voice signal, the first voice signal is adjusted to be the seventh voice signal aligned with the second voice signal according to the time delay duration, the seventh voice signal and the second voice signal are subjected to sound mixing to obtain the third voice signal, the first voice signal and the second voice signal are subjected to time delay alignment to avoid the phenomenon of collecting accents, the purpose of using the first terminal to assist the second terminal in collecting the voice signals is achieved, the conversation effect of the online conference is improved, the technical effect of participation experience of the online conference is optimized, and the technical problem that the conversation effect of the online conference in the related technology is poor is solved.

As an alternative, the determining a delay time duration of the first voice signal relative to the second voice signal according to the first voice signal and the second voice signal includes:

Optionally, in this embodiment, the determining the delay time duration of the first voice signal relative to the second voice signal according to the first voice signal and the second voice signal may include, but is not limited to, being based on an audio fingerprint technology.

Specifically, as shown in fig. 8, after the data streams of the main microphone and the extension microphone are acquired, by analyzing and comparing the audio fingerprints (corresponding to the foregoing first group of audio fingerprints and the foregoing second group of audio fingerprints) of the two streams, it is determined whether the sound acquired by the main terminal device is similar to the sound acquired by the extension terminal (corresponding to the foregoing first group of audio fingerprints matching the second audio fingerprint in the second group of audio fingerprints), and if so, the value of the relative delay between the two sounds is determined.

The audio fingerprint is a digital signature which is formed by analyzing and extracting important characteristics in sound and has low dimensionality and uniqueness, and is commonly used in the fields of song quick search, music copyright protection and the like. The method is mainly applied to fingerprint comparison of the voice in the call, the time length of a voice fragment of which the fingerprint is to be extracted is shortened in consideration of a real-time communication scene, a plurality of sub-fingerprints are extracted from one frame of data, and dimension reduction processing is carried out on the number of features.

When sub-fingerprints with matching degrees meeting the requirements appear in the two fingerprint pools, recording time stamps corresponding to the two matched sub-fingerprints (corresponding to the first time stamp corresponding to the first audio fingerprint in the first group of acquired time stamps and the second time stamp corresponding to the second audio fingerprint in the second group of acquired time stamps).

Finally, the delay decision module integrates the short-time fingerprint comparison similarity, the similar fingerprint timestamp, the long-time high-similarity fingerprint occurrence statistical probability and the like to give a delay estimation value (corresponding to the time interval).

By this embodiment, a first set of audio fingerprints and a first set of timestamps corresponding to the first set of audio fingerprints are extracted in a first speech signal, and a second set of audio fingerprints and a second set of timestamps corresponding to the second set of audio fingerprints are extracted in a second speech signal, wherein the first set of audio fingerprints includes one or more audio fingerprints and the second set of audio fingerprints includes one or more audio fingerprints, and in case a first audio fingerprint of the first set of audio fingerprints matches a second audio fingerprint of the second set of audio fingerprints, a first timestamp of the first set of timestamps corresponding to the first audio fingerprint and a second timestamp of the second set of timestamps corresponding to the second audio fingerprint are obtained, the delay duration is determined as the time interval between the first timestamp and the second timestamp, by delay aligning the first speech signal and the second speech signal, the phenomenon of collecting stress is avoided, the purpose of using the first terminal to assist the second terminal in collecting voice signals is achieved, the conversation effect of an online conference is improved, the technical effect of participation experience of the online conference is optimized, and the technical problem that the conversation effect of the online conference is poor in the related technology is solved.

and mixing the first voice signal and the second voice signal to obtain the third voice signal.

Optionally, in this embodiment, the mixing the first voice signal and the second voice signal to obtain the third voice signal may include, but is not limited to, as shown in fig. 7, where fig. 7 is a schematic diagram of yet another optional multi-terminal call method according to an embodiment of the present invention, where the method includes, but is not limited to, the following steps:

4) And mixing and superposing the audio frequency after the energy adjustment in the step 3) to realize the sound mixing of the first voice signal and the second voice signal and obtain a third voice signal.

As an optional solution, the adjusting the first voice signal to a seventh voice signal aligned with the second voice signal according to the delay time duration includes:

Optionally, in this embodiment, the method may include, but is not limited to, moving the first voice signal forward or backward by a delay duration according to time by the delay alignment module, so that the first group of audio fingerprints and the second group of audio fingerprints are aligned on a timestamp, and obtaining a seventh voice signal aligned with the second voice signal, specifically, adjusting a relative time delay of the audio of the main device and the audio of the extension device according to the delay estimated by the delay estimation module, and aligning the content of the audio text, so as to avoid occurrence of a phenomenon of collecting accents due to a close distance between the multiple microphones. The main scheme is as follows:

and taking the delay of the main equipment as a reference, and acquiring the audio frequency by the expansion equipment to be aligned with the main equipment. And establishing audio data buffers of the main equipment and the expansion equipment, buffering audio data with a certain length, wherein the general length is 20-300ms, and the audio data buffers can be specifically adjusted according to the use scene. When audio data of the extension equipment arrives at the audio mixing server before the main equipment, buffering the data which arrives firstly, wherein the buffering length is the estimated relative delay of the extension equipment and the main equipment; when the audio data of the extension equipment arrives at the mixing server later than the main equipment, the data arriving earlier in the buffer area of the extension equipment is cleared, and the clear length of the data is the time delay of the main equipment compared with the extension equipment.

According to the scheme, the main equipment is used as a reference, only the data stream of the expansion equipment is adjusted, the audio stream can be ensured to be smooth integrally, and phenomena such as noise caused by packet loss are avoided. In addition, embodiments are similar to those described above with reference to any expansion device.

Through the embodiment, adopt and move the time delay length forward or backward with first speech signal according to the time, make first group's audio fingerprint align on the time stamp with the second group's audio fingerprint, obtain the mode with the seventh speech signal of second speech signal alignment, through carrying out the time delay alignment with first speech signal and second speech signal, in order to avoid the above-mentioned phenomenon of gathering the accent, reached and used first terminal to assist the second terminal to gather the purpose of speech signal, thereby realized the conversation effect that improves the online meeting, the technological effect of the participation experience of having optimized the online meeting, and then solved the relatively poor technical problem of conversation effect of the online meeting that exists among the correlation technique.

As an optional solution, the method further comprises:

Optionally, in this embodiment, the target display interface of the first terminal may include, but is not limited to, as shown in fig. 4, where obtaining the target setting instruction on the target display interface may include, but is not limited to, performing a target interaction operation on an interaction object "yes", as the target setting instruction, so as to respond to the target setting instruction, and set the first terminal as an extension terminal of the second terminal to participate in the target conference call that is performed online.

It should be noted that fig. 9 is a schematic diagram of still another optional multi-terminal call method according to an embodiment of the present invention, as shown in fig. 9, taking a first terminal as an extension device and a second terminal as a main device, and with reference to fig. 4, when a location of a user E is far away from the second device, acquiring a target setting instruction on a target display interface of the first terminal, and in response to the target setting instruction, setting the first terminal as an extension terminal of the second terminal to participate in a target call conference performed online, where a user B, a user C, and a user D are users who can clearly acquire a voice signal by the second device.

By the embodiment, a target setting instruction is obtained on a target display interface of the first terminal, the first terminal is set as an extension terminal of the second terminal to participate in the target conference call which is carried out on line in response to the target setting instruction, by using the first terminal as an extension device of the second terminal at a position which is far away from the second device and can not be effectively collected by the second device, the target setting instruction is obtained through displaying on the target display interface of the second terminal to set the first terminal as an extended terminal of the second terminal, so that the aim of using the first terminal to assist the second terminal in acquiring the voice signal is achieved, thereby realizing the technical effects of improving the conversation effect of the online conference, optimizing the participation experience of the online conference, and then the technical problem that the conversation effect of the online conference is poor in the related technology is solved.

As an optional scheme, the obtaining a target setting instruction on a target display interface of the first terminal includes:

Optionally, in this embodiment, the participant identifier may include, but is not limited to, an identifier for displaying, on the first terminal, an account number for logging in the second terminal, where the account number is an account number of an application where the target conference call is logged in.

For example, as shown in fig. 4, the avatar of the user a is the meeting id, and by selecting the avatar of the user a, a target prompt option "is displayed on the target display interface as an extended meeting of the user a? And further, acquiring a target setting instruction, and indicating that the first terminal is set as an extension terminal of the second terminal to participate in the online target call conference.

It should be noted that, the step of instructing the first terminal to be set as the extension terminal of the second terminal to participate in the target conference call conducted online may include, but is not limited to, one step or two steps, for example, the first step, double-clicking the user head; and secondly, selecting 'yes' to indicate that the first terminal is set as an extension terminal of the second terminal to participate in the target call conference which is carried out online.

As an optional solution, the method further comprises:

Optionally, in this embodiment, the performing, by the first terminal, near field communication with the second terminal may include, but is not limited to, methods such as NFC proximity or ultrasonic watermarking, the second device broadcasts the identification code, and the first device automatically responds to the identification code to prompt the user whether to enter a meeting as an extension device.

For example, fig. 10 is a schematic diagram of still another optional multi-terminal call method according to an embodiment of the present invention, as shown in fig. 10, when the near field communication method is configured to NFC identification, the main device broadcasts an identification code as the target information as the second device, and when the user E holds an extension device as the first device close to the main device, acquires the identification code pushed by the second terminal as the target information, and displays a target prompt option on the extension device, where the target prompt option may include, but is not limited to, the one shown in fig. 4.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the invention, a multi-terminal calling device for implementing the multi-terminal calling method is further provided. As shown in fig. 11, the apparatus includes:

an obtaining module 1102, configured to obtain a first voice signal acquired by a first terminal and a second voice signal acquired by a second terminal when the first terminal is set as an extension terminal of the second terminal to participate in a target call conference that is performed online, where the terminals participating in the target call conference online include a plurality of terminals, and the plurality of terminals include the first terminal and the second terminal;

a sound mixing module 1104, configured to perform target sound mixing processing according to the first voice signal and the second voice signal to obtain a third voice signal;

a sending module 1106, configured to send the third voice signal to a terminal, other than the first terminal and the second terminal, of the multiple terminals for playing.

As an alternative to this, it is possible to,

the device is used for acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal in the following modes: under the condition that the first terminal comprises N terminals, acquiring N paths of first voice signals acquired by the N terminals and one path of second voice signals acquired by the second terminal, wherein N is 1 or a natural number greater than 1, and each terminal in the N terminals is used for acquiring one path of first voice signals;

As an alternative, the apparatus is configured to perform target mixing processing according to the first speech signal and the second speech signal to obtain a third speech signal by:

As an optional scheme, the apparatus is configured to perform echo cancellation processing on the first voice signal according to the fourth voice signal sent to the second terminal, so as to obtain a fifth voice signal:

As an alternative, the apparatus is configured to perform the target mixing process on the first speech signal and the second speech signal to obtain the sixth speech signal by:

As an optional scheme, the apparatus is configured to perform echo cancellation processing on the sixth voice signal according to the fourth voice signal sent to the second terminal, so as to obtain the third voice signal:

As an alternative, the apparatus is configured to determine a delay time duration of the first speech signal relative to the second speech signal according to the first speech signal and the second speech signal by:

determining the delay time length as a time interval between the first time stamp and the second time stamp.

As an alternative, the apparatus is configured to adjust the first speech signal to a seventh speech signal aligned with the second speech signal according to the delay time duration by:

and moving the first voice signal forwards or backwards according to the time delay duration, so that the first group of audio fingerprints and the second group of audio fingerprints are aligned on the time stamps, and the seventh voice signal aligned with the second voice signal is obtained.

As an optional solution, the apparatus is further configured to:

As an optional scheme, the apparatus is configured to obtain a target setting instruction on a target display interface of the first terminal by:

As an optional solution, the apparatus is further configured to:

According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the multi-terminal call method, where the electronic device may be the terminal device or the server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 12, the electronic device comprises a memory 1202 and a processor 1204, the memory 1202 having a computer program stored therein, the processor 1204 being arranged to perform the steps of any of the above method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, under the condition that a first terminal is set as an extension terminal of a second terminal to participate in a target call conference which is carried out online, acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal, wherein the terminals which participate in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal;

s2, according to the first voice signal and the second voice signal, executing target sound mixing processing to obtain a third voice signal;

and S3, sending the third voice signal to the terminals except the first terminal and the second terminal in the plurality of terminals for playing.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 12 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

The memory 1202 may be used to store software programs and modules, such as program instructions/modules corresponding to the multi-terminal call method and apparatus in the embodiments of the present invention, and the processor 1204 executes various functional applications and data processing by running the software programs and modules stored in the memory 1202, that is, implements the multi-terminal call method described above. The memory 1202 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1202 can further include memory located remotely from the processor 1204, which can be connected to a terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1202 may be used for information such as voice signals, but is not limited thereto. As an example, as shown in fig. 12, the memory 1202 may include, but is not limited to, an obtaining module 1102, a mixing module 1104, and a sending module 1106 in the multi-terminal call device. In addition, the present invention may further include, but is not limited to, other module units in the multi-terminal call device, which is not described in detail in this example.

Optionally, the transmitting device 1206 is configured to receive or transmit data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmitting device 1206 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmitting device 1206 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1208, configured to display the target display interface; and a connection bus 1210 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the method provided in the various alternative implementations of the call aspect of the multi-terminal. Wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A multi-terminal call method is characterized by comprising the following steps:

under the condition that a first terminal is set as an extension terminal of a second terminal to participate in a target call conference which is carried out online, acquiring a first voice signal acquired by the first terminal and a second voice signal acquired by the second terminal, wherein the terminals participating in the target call conference online comprise a plurality of terminals, and the plurality of terminals comprise the first terminal and the second terminal;

according to the first voice signal and the second voice signal, executing target sound mixing processing to obtain a third voice signal;

and sending the third voice signal to terminals except the first terminal and the second terminal in the plurality of terminals for playing.

2. The method of claim 1,

3. The method according to claim 1, wherein the performing target mixing processing according to the first speech signal and the second speech signal to obtain a third speech signal comprises:

4. The method of claim 3, wherein performing echo cancellation processing on the first voice signal according to the fourth voice signal sent to the second terminal to obtain a fifth voice signal comprises:

5. The method according to claim 1, wherein the performing target mixing processing according to the first speech signal and the second speech signal to obtain a third speech signal comprises:

and under the condition that a fourth voice signal which is played and received on the second terminal is acquired by the first terminal, carrying out echo cancellation processing on the sixth voice signal according to the fourth voice signal which is sent to the second terminal to obtain the third voice signal, wherein the fourth voice signal is a voice signal acquired by a terminal except the first terminal and the second terminal in the plurality of terminals.

6. The method according to claim 5, wherein the performing the target mixing process on the first speech signal and the second speech signal to obtain the sixth speech signal comprises:

7. The method of claim 5, wherein the performing echo cancellation processing on the sixth voice signal according to the fourth voice signal sent to the second terminal to obtain the third voice signal comprises:

8. The method according to claim 1, wherein the performing target mixing processing according to the first speech signal and the second speech signal to obtain a third speech signal comprises:

9. The method of claim 8, wherein determining the delay duration of the first speech signal relative to the second speech signal based on the first speech signal and the second speech signal comprises:

10. The method of claim 9, wherein the adjusting the first speech signal to a seventh speech signal aligned with the second speech signal according to the delay time duration comprises:

11. The method according to any one of claims 1 to 10, further comprising:

12. The method according to claim 11, wherein the obtaining of the target setting instruction on the target display interface of the first terminal comprises:

displaying the participant identifications of the plurality of terminals on the target display interface of the first terminal, wherein the participant identifications of the plurality of terminals comprise the participant identification of the second terminal; acquiring the target setting instruction on the target display interface, wherein the target setting instruction is used for selecting a conference participating identifier of the second terminal and indicating that the first terminal is set as an extension terminal of the second terminal to participate in the target call conference which is carried out on line; or

13. The method of claim 12, further comprising:

14. A computer-readable storage medium, comprising a stored program, wherein the program is executable by a terminal device or a computer to perform the method of any one of claims 1 to 13.

15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of the claims 1 to 13 by means of the computer program.