CN114531425A - Processing method and processing device - Google Patents

Processing method and processing device Download PDF

Info

Publication number
CN114531425A
CN114531425A CN202111672799.5A CN202111672799A CN114531425A CN 114531425 A CN114531425 A CN 114531425A CN 202111672799 A CN202111672799 A CN 202111672799A CN 114531425 A CN114531425 A CN 114531425A
Authority
CN
China
Prior art keywords
electronic device
information
target information
target
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111672799.5A
Other languages
Chinese (zh)
Inventor
刘扬
刘金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202111672799.5A priority Critical patent/CN114531425A/en
Publication of CN114531425A publication Critical patent/CN114531425A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The embodiment of the application discloses a processing method and a processing device, wherein the method comprises the following steps: after the first electronic device establishes communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information comes from at least one third electronic device connected with the first electronic device; performing first processing on the first target information to obtain second target information; outputting the second target information, or sending the second target information to a second electronic device for outputting; wherein the output effect of the second target information is better than the output effect of the first target information.

Description

Processing method and processing device
Technical Field
The present application relates to the field of communications technologies, and in particular, to a processing method and a processing apparatus.
Background
With the rapid development of internet technology, online audio and video conferences become a common conference form in daily work of people, and a user can initiate an online conference through an application program with an online conference function and invite other users to participate in the online conference.
In some conference and other scenes, when multiple employees participate in a conference in the same space, the employees usually take a personal notebook to participate in the conference. However, when the staff attends the conference by using their own notebook computers, etc., there are many problems such as mutual information interference, etc., and the conference effect is poor.
Disclosure of Invention
The technical scheme of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a processing method, including:
after the first electronic device establishes communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information comes from at least one third electronic device connected with the first electronic device;
performing first processing on the first target information to obtain second target information;
outputting the second target information, or sending the second target information to a second electronic device for outputting;
wherein the output effect of the second target information is better than the output effect of the first target information.
In a second aspect, an embodiment of the present application provides a processing apparatus, including:
the acquisition unit is configured to acquire first target information in a current space after the first electronic device establishes communication connection with the second electronic device, wherein the first target information is from at least one third electronic device connected with the first electronic device;
the processing unit is configured to perform first processing on the first target information to obtain second target information;
an output unit configured to output the second target information or to give the second target information to the second electronic device for output; wherein the output effect of the second target information is better than the output effect of the first target information.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a memory for storing a computer program capable of running on the processor;
a processor for performing the processing method according to the first aspect when running the computer program.
In a fourth aspect, an embodiment of the present application provides a computer storage medium storing a computer program, where the computer program is executed by at least one processor to implement the processing method according to the first aspect.
According to the scheme provided by the embodiment of the application, after the first electronic device is in communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information is from at least one third electronic device connected with the first electronic device; performing first processing on the first target information to obtain second target information; outputting the second target information, or sending the second target information to a second electronic device for outputting; wherein the output effect of the second target information is better than the output effect of the first target information. Therefore, based on the connection between the first electronic equipment and the third electronic equipment, the first target information is received, the first target information is processed into the second target information, the second target information is output by the first electronic equipment or the second electronic equipment, the output effect of the second target information after the first processing is better than that of the first target information, so that the multi-scene adaptive audio-video multi-equipment audio-video multi-equipment audio-video multi-equipment audio-video-audio-equipment audio-video-equipment audio-video-audio-video-audio-equipment video-audio-video-audio-video-audio-.
Drawings
Fig. 1 is a schematic flow chart of a processing method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a spatial distribution of a plurality of electronic devices according to an embodiment of the present disclosure;
fig. 3 is a schematic view of an application scenario of a processing method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a process of sending and receiving information according to an embodiment of the present application;
fig. 5 is a schematic spatial arrangement diagram of an electronic device according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating a determination method of network delay according to an embodiment of the present application;
fig. 7 is a schematic flowchart of an echo cancellation process according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a processing apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of another electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may be interchanged under specific ordering or sequence if allowed, so that the embodiments of the present application described herein can be implemented in other orders than illustrated or described herein.
At present, for employees of large-scale enterprises and small-scale and medium-scale enterprises, the following current situations are frequently met when online conferences are carried out: (1) the small and medium-sized enterprises have no special meeting room equipment, and the employees can only take personal equipment to participate in the meeting. (2) When the staff of a large enterprise is in a meeting in a large conference room, the staff far away can dial in the meeting by using a personal notebook computer. (3) When the conference is shared with the colleagues around the office, everyone uses a notebook computer to dial in the conference.
In these situations, employees typically attend meetings with their personal laptop computers. However, when an employee attends a meeting using his own laptop or the like, the following pain points are usually encountered: (1) when the conference is participated in the same conference together with the surrounding colleagues in the same space, if the conference is participated in by using the earphone, the voice of the speaker enables the surrounding colleagues to hear the speech simultaneously on-line and off-line, the experience is poor, and meanwhile, the conference can not be interactively discussed with the colleagues well; if everyone uses the play-out equipment to participate in the conference, mutual interference and howling influence can occur; if only one notebook computer is used for meeting. But because the pickup distance and angle of the microphone are limited, the sound of a speaker at a position slightly far away cannot be well picked up; moreover, the volume of the loudspeaker carried by the notebook computer is usually small, and the listening requirement of all people cannot be met. (2) When a meeting is carried out in a large conference room, colleagues far away from the main sound pickup equipment speak, and far-end personnel cannot hear the speech normally.
Although some solutions already exist at present, for example, there are mainly two types of products: (1) a sound amplifying platform: can only be used for picking up sound and requires the speaker to turn on the microphone by himself. It has the following disadvantages: the microphone cannot be automatically turned on and off; stereo mixing cannot be done; the flexibility is poor due to the wired fixed connection; there is no loudspeaker playback. (2) A wireless audio receiver: it can only change wired sound into wireless sound, receive audio and play, and can not realize the transmission of pronunciation. However, the existing solutions can only solve some pain points through the personal manual adjustment of the participants, and still have some defects, which leads to poor user experience and efficiency.
Based on this, the embodiment of the present application provides a processing method, including: after the first electronic device establishes communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information comes from at least one third electronic device connected with the first electronic device; performing first processing on the first target information to obtain second target information; outputting the second target information, or giving the second target information to the second electronic equipment for outputting; wherein the output effect of the second target information is better than the output effect of the first target information. Based on the connection between first electronic equipment and third electronic equipment, the first target information is received, the first target information is processed into second target information, the second target information is output by the first electronic equipment or the second electronic equipment, the output effect of the second target information after the first processing is better than that of the first target information, so that the conference system can adapt to various scenes, and the problems of mutual interference among audio information or video information, interference among users of a plurality of third electronic equipment and the like when a plurality of pieces of equipment are avoided, the conference effect is improved, and the conference quality is further improved.
Of course, the technical scheme provided by the application can also be applied to other application scenes with pickup and/or image acquisition, such as live broadcast scenes, teaching scenes and the like. The following is mainly explained in the conference scene:
embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In an embodiment of the present application, referring to fig. 1, a flowchart of a processing method provided in an embodiment of the present application is shown. As shown in fig. 1, the method may include:
s101, after the first electronic device is in communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information is from at least one third electronic device connected with the first electronic device.
It should be noted that the processing method provided in the embodiment of the present application relates to a plurality of electronic devices, such as a first electronic device, a second electronic device, and a third electronic device. Here, a communication connection is established between the first electronic device and the second electronic device, and a communication connection is also established between the first electronic device and the at least one third electronic device. The communication connection manner between the electronic devices may include a wired connection, a wireless connection, and the like, which is not specifically limited in this embodiment of the application.
It should be noted that, for example, a large conference scene in which multiple persons participate is taken as an example, at this time, the third electronic device may be an electronic device equipped for the participant, for example, a notebook computer carried by the participant. In addition, the actual scene may also be live broadcast, lecture, and the like, which is not specifically limited in this embodiment of the application. The following describes the implementation of the processing method in detail by taking a conference scenario as an example.
In this embodiment of the application, the first target information may be audio or video information, that is, the first target information may only be audio information from at least one third electronic device, or may also be audio information and video graphic information from at least one third electronic device.
The first electronic device is capable of obtaining first target information from at least one third electronic device based on a communication connection between the first electronic device and the third electronic device, and the first target information is obtained from a current space in which the first electronic device and the third electronic device are located.
For the second electronic device, at least two scenarios are included in the embodiment of the present application, and in a local scenario, the second electronic device is an electronic device located in the same space as the first electronic device; in a remote scenario, the second electronic device is a remote device that is not in the same space as the first electronic device.
For the obtaining manner of the first target information, in some embodiments, obtaining the first target information in the current space may include:
determining the position relationship between third electronic equipment and first electronic equipment in the current space, and taking audio information or video information acquired by at least one third electronic equipment which has a target position relationship with the first electronic equipment as first target information; or the like, or, alternatively,
determining user information of third electronic equipment in the current space, and taking audio information or video information acquired by the third electronic equipment used by a target user as first target information; or the like, or, alternatively,
acquiring attribute parameters of audio information or video information acquired by third electronic equipment in a current space, and determining the audio information or video information with target attributes as first target information; or the like, or, alternatively,
and if the first electronic equipment is in first communication connection with the second electronic equipment, acquiring audio information or video information acquired by at least one third electronic equipment connected with the first electronic equipment through a fourth electronic equipment.
It should be noted that the first electronic device and the third electronic device are located in the same space, for example, in the same meeting room. Each third electronic device collects audio information or video information through a microphone, a camera and other devices. The method comprises the steps of screening based on a certain condition, screening at least one third electronic device from a plurality of third electronic devices, calling the screened at least one third electronic device as a target electronic device, and using audio information or video information collected by the screened target electronic device as first target information.
Here, the embodiment of the application may determine a position relationship between the third electronic device and the first electronic device in the current space, and then take at least one third electronic device having a target position relationship with the first electronic device as the target electronic device. Here, the target position relationship may be a preset position relationship, for example, a third electronic device which is a preset distance away from the first electronic device may be used as the target device, and the like, and may be specifically set in combination with actual requirements, and is not specifically limited herein.
In addition, the embodiment of the application can also select the third electronic device used by the target user as the target electronic device based on the user information of the third device in the current space. Here, the target user may include a conference host, a reporter, a lecturer, a leader, a teacher, or the like. The user information may include identification number (ID) information bound or registered in the third electronic device, and the screening of the target user is performed according to the ID information, for example, the ID information of the third electronic device used by the target user is a specific ID, or the ID of the target user has higher authority, or the ID of the target user has a special mark, and the like; the user information may also include face information and the like acquired by a camera of the third electronic device, the face information is matched with the preset face information, a successfully matched user is a target user, and the third electronic device can be used as the target electronic device. Users having a specific relationship with other device equipment may also be determined as target users, such as users carrying remote controls, users wearing Virtual Reality (VR)/Augmented Reality (AR)/Mediated Reality (MR) glasses, and so forth. The third electronic device used by the target user, for example, the third electronic device having a specific positional relationship with the target user, may be determined in other feasible manners, and may specifically combine with the actual scene and the requirement, which is not specifically limited herein.
The method and the device for obtaining the audio information and the audio-visual information can also obtain the attribute parameters of the audio information or the audio-visual information collected by the third electronic equipment in the current space, analyze the obtained attribute parameters and determine the audio information or the audio-visual information with the target attribute as the first target information. The attribute parameters may include a transmission time period for transmitting the audio information or the video information from the third electronic device to the first electronic device, a signal quality of the audio information or the video information received by the first electronic device, a signal energy value, audio-related sound attribute parameters (such as a decibel value, a sampling frequency, a sampling bit number, a channel number, a frame number, a bit rate, and the like), and image (movie) related attribute parameters (such as a resolution, a definition, a video frame rate, and the like). Determining audio information or video information of which the attribute parameters accord with the target attributes as first target information, for example, regarding the same acquired audio information, taking the third electronic equipment with the strongest signal energy value as target electronic equipment, and taking the audio information or video information acquired by the target electronic equipment as the first target information; or the collected video information comprises the video information of the target user as first target information; or, the audio information or the video information with the shortest transmission time or the best data quality is taken as the first target information. That is, the target property may be determined in combination with the actual scene and the requirement, which is not particularly limited.
If the first electronic device and the second electronic device have the first communication connection, the embodiment of the application can also acquire the first target information through a fourth electronic device. The first communication connection means that the first electronic device and the second electronic device are connected remotely, that is, the second electronic device is a remote device which is not in the same space as the first electronic device. In this case, the first electronic device serves as a master device of the current space, a first communication connection (i.e., a remote connection) is established between the first electronic device and the second device, the fourth electronic device serves as a transfer device of the current space, the first electronic device and the fourth electronic device can be connected in a wired manner, the fourth electronic device and the third electronic device can form a local networking in a wireless connection manner, the first electronic device can obtain audio information or audio-visual information collected by at least one third electronic device through the fourth electronic device, i.e., the fourth electronic device can first obtain the audio information or the audio-visual information from the at least one third electronic device, and determine the first target information.
Further, in some embodiments, taking audio information or video information collected by at least one third electronic device having a target position relationship with the first electronic device as the first target information may include:
combining a microphone and/or a camera on at least one third electronic device with a microphone and/or a camera on the first electronic device to form a corresponding microphone array and/or camera array;
determining a location of a third electronic device using the microphone array and/or the camera array;
taking audio information or video information acquired by at least one third electronic device having a first position relation with the electronic device as the first target information; or the like, or, alternatively,
and determining transmission time delay between the third electronic equipment and the first electronic equipment based on the position, and taking audio information or video information acquired by at least one third electronic equipment having a second position relation with the electronic equipment as first target information based on the transmission time delay.
It should be further noted that, a microphone (microphone) and a camera may be included in each of the first electronic device and the third electronic device, and a microphone array and/or a camera array may be formed based on the microphone and/or the camera in the first electronic device and the microphones and/or the cameras in the plurality of third electronic devices.
Exemplarily, refer to fig. 2, which shows a schematic diagram of spatial distribution of a plurality of electronic devices provided in an embodiment of the present application. As shown in fig. 2, a first electronic device and a plurality of third electronic devices form a spatial array, including a microphone array and/or a camera array.
At this time, the location of the third electronic device may be determined using the microphone array and/or the camera array. For example: and taking the spatial position of the first electronic equipment as a reference coordinate of the microphone array and/or the camera array, and determining the position of each third electronic equipment in the microphone array and/or the camera array relative to the first electronic equipment based on the communication connection between the first electronic equipment and each third electronic equipment or the respective positioning information of the first electronic equipment and the third electronic equipment. And judging whether the position relation between the third electronic equipment and the first electronic equipment accords with the first position relation, if so, taking the third electronic equipment as target electronic equipment, and taking the audio information or the video information collected by the third electronic equipment as first target information. Here, the first positional relationship may be that a distance between the first electronic device and the third electronic device coincides with a preset distance, or that a position of the third electronic device coincides with preset coordinates, and so on.
In addition, for the audio information, the third electronic device which is usually closest to the speaker can obtain the audio information with the best audio quality, so the embodiment of the application can also determine the transmission delay between the third electronic device and the first electronic device based on the position of the third electronic device, where the transmission delay refers to the difference between the time point of transmitting the same audio information to the third electronic device and the time point of transmitting the same audio information to the first electronic device.
It is understood that for a certain third electronic device, the distance between the corresponding speaker and the third electronic device is small, and for all the third electronic devices, the distance between the speaker and the corresponding third electronic device is the smallest. For example, as shown in fig. 2, for the third electronic device a, the corresponding speaker is located in front of the third electronic device a and is far away from other third electronic devices. Since the speaker corresponding to the third electronic device a is closest to the third electronic device a, the transmission time period of the sound from the speaker to the third electronic device a can be ignored for the third electronic device a. At this time, the ratio of the distance between the first electronic device and the third electronic device to the sound velocity is the standard transmission delay of the third electronic device and the first electronic device, that is, the standard transmission delay represents the time length for the sound of the speaker in front of a certain third electronic device to be transmitted to the first electronic device.
As shown in fig. 2, taking the third electronic device a and the first electronic device as an example, the propagation delay between the two is the distance between the two divided by the sound velocity. Then, for a speaker, if the speaker speaks at the location of the third electronic device a, the transmission delay between the third electronic device a and the first electronic device should substantially coincide with the standard transmission delay between the third electronic device a and the first electronic device.
And selecting at least one third electronic device having a second position relation with the first electronic device as a target electronic device based on the transmission delay between the third electronic device and the first electronic device, so as to acquire the first target information. Here, the second positional relationship may be a positional relationship indicating the closest distance to the speaker.
In order to obtain the clearest and most reliable audio information, the audio information or the audio-visual information collected by the third electronic device closest to the speaker is usually used as the first target information, at this time, for the other third electronic devices except the third electronic device a, since the third electronic devices are far away from the speaker, the time consumed for the sound to propagate to the third electronic devices is longer than the time consumed for the sound to propagate to the third electronic device a, so that the transmission delay between the third electronic device except the third electronic device a and the first electronic device is shorter than the transmission delay between the third electronic device a and the first electronic device, which indicates that the third electronic device a is the third electronic device corresponding to the current speaker, and the audio information or the audio-visual information collected by the third electronic device a is used as the first target information.
That is to say, in the embodiment of the present application, the distance between the third electronic device and the first electronic device may be obtained based on the position of the third electronic device, the transmission delay between the third electronic device and the first electronic device may be determined, and the third electronic device whose transmission delay is closest to the standard transmission delay may be determined as the target electronic device, so as to obtain the first target information. That is, in the embodiment of the application, sound source localization may be performed based on a microphone array formed by a plurality of microphones of the third electronic device and a microphone of the first electronic device, and the third electronic device in a specific location (for example, closest to the speaker) is selected to acquire the first target information.
For a scenario in which the first electronic device is remotely connected, when determining the location, the first electronic device may be replaced with a fourth electronic device for location determination and selection of the target electronic device.
In addition, in order to determine the third electronic device closest to the speaker, in the embodiment of the application, the time point when each third electronic device acquires the same audio information may be respectively obtained, and the third electronic device which acquires the audio information earliest may be determined as the target electronic device. In this way, the clearest and best quality audio information can also be collected.
S102, performing first processing on the first target information to obtain second target information.
After the first target information is obtained, first processing is performed on the first target information to obtain second target information.
In some embodiments, the performing the first processing on the first target information to obtain the second target information may include:
determining a communication connection state between the first electronic device and the second electronic device, and performing noise elimination and/or enhancement processing on audio information or audio-visual information by the first electronic device or the fourth electronic device at least based on the communication connection state to obtain second target information; or the like, or, alternatively,
and if the first electronic equipment is in second communication connection with the second electronic equipment, the first electronic equipment identifies and processes the audio information or the video information to obtain second target information, wherein the data volume of the second target information is less than that of the first target information.
It should be noted that, in the embodiment of the present application, the first processing for the first target information may be executed based on the communication connection state between the first electronic device and the second electronic device. When the first electronic device and the second electronic device are remotely connected, the fourth electronic device can perform denoising processing and/or enhancement processing on the first target information (including audio information or video information) so as to obtain second target information; or, after receiving the first target information forwarded by the fourth device, the first electronic device may perform corresponding processing on the first target information to obtain the second target information.
When the first electronic device and the second electronic device are in the same space, the connection mode between the first electronic device and the second electronic device is called local connection, and the first electronic device can perform denoising processing and/or enhancement processing on audio information or video information, so that second target information is obtained.
In addition, the connection state between the first electronic device and the second electronic device may include a normal connection state, a disconnection state, and the like, in addition to the remote connection and the local connection. When the communication between the first electronic device and the second electronic device is disconnected, the first electronic device cannot send information to the second electronic device, and the first electronic device performs denoising processing and/or enhancement processing on the audio information or the video information to obtain second target information.
Further, if there is a second communication connection (i.e., a local connection) between the first electronic device and the second electronic device, the first electronic device and the second electronic device may be in the same space at this time. At this time, the first electronic device may further perform recognition processing on the first target information, for example, to recognize and convert audio information in the audio information or the audio-visual information into text information, generate a meeting summary, and the like, and the recognition processing may further optimize the audio information and/or convert the optimized audio information into text information. For example, blank sections and repeated sentences are deleted, summarization processing is performed, and the like, and meanwhile, sound recording information of audio information before and/or after optimization, video recording information of video information, and the like can be generated. At this time, since the identification processing is performed and the blank segment, the repeated sentence, or the like is deleted, the data amount of the generated second object information is smaller than the data amount of the first object information.
Further, when the first electronic device and the second electronic device are remotely connected, the identification process may be performed on the first target information to obtain the second target information.
In this case, the embodiment of the present application may perform audio mixing processing on the first target information to obtain the second target information, so that when the second target information is transmitted to the remote device, the audio information heard by the far-end participant has a stereoscopic effect. Therefore, in some embodiments, performing the first processing on the first target information to obtain the second target information may include:
determining the position of the third electronic equipment and the transmission time delay between the third electronic equipment and the first electronic equipment or the fourth electronic equipment;
and performing sound mixing processing on the first target information from the plurality of third electronic devices based on the positions and the transmission time delays to obtain second target information.
It should be noted that, when the first target information is from a plurality of third electronic devices, each third electronic device respectively acquires audio information or video information of a speaker (or a non-speaker, etc.) that is closest to the third electronic device, at this time, the positions of the third electronic devices for acquiring the first target information (that is, the distance between the third electronic device and the first electronic device) may be determined, and the transmission delay between each third electronic device and the first electronic device may be determined. For a scenario in which the fourth electronic device is present as a relay device, the positions of the third electronic devices relative to the fourth electronic device are determined (that is, the distances between the third electronic device and the fourth electronic device are determined), and the transmission delays between the third electronic device and the fourth electronic device are determined.
And then, according to the position and the transmission delay of each third electronic device, performing sound mixing processing on the first target information from the plurality of third electronic devices to obtain second target information. The mixing process is to integrate the sound from multiple sources into a stereo track or a mono track. The specific implementation of the mixing process can be implemented according to the conventional understanding of those skilled in the art, and will not be described herein.
S103, outputting the second target information, or sending the second target information to a second electronic device for outputting; wherein the output effect of the second target information is better than the output effect of the first target information.
It should be noted that, after the second target information is obtained, the second target information may be output, or the first target information is sent to the second electronic device and output by the second electronic device, and since the second target information is obtained after the first target information is processed, the output effect of the second target information is better than that of the first target information.
It should be noted that, when the second target information is output, all the third electronic devices may be set to be in a mute state, so as to avoid interference between signals.
In some embodiments, outputting the second target information, or giving the second target information to the second electronic device for output, may include:
if the first electronic device and the second electronic device have the first communication connection, outputting second target information to the second electronic device, or acquiring hardware configuration information of the first electronic device and the fourth electronic device, and determining that the first electronic device or the fourth electronic device outputs the second target information based on the hardware configuration information; or the like, or, alternatively,
if the first electronic device and the second electronic device have the second communication connection, acquiring hardware configuration information of the first electronic device and the second electronic device, and determining that the first electronic device or the second electronic device outputs second target information based on the hardware configuration information; or the like, or, alternatively,
and if the first electronic equipment and the second electronic equipment have second communication connection, obtaining output parameters and/or current spatial environment of second target information, and determining that the first electronic equipment or the second electronic equipment outputs the second target information based on the output parameters and/or the current spatial environment.
It should be noted that, when the second target information is output, if the first electronic device and the second electronic device have the first communication connection therebetween, that is, the first electronic device and the second electronic device are remotely connected, the second target information may be given to the second electronic device, so that the second electronic device may output the second target information at a far end, and a participant at the far end may acquire the second target information.
In the local space, the second target information may be output through the first electronic device or the fourth electronic device. At this time, the hardware configuration information of the first electronic device and the second electronic device may be obtained, where the hardware configuration information may include the size of the playing sound and the sound quality of the electronic device, and if the output audio-visual information is output, the hardware configuration information may also include the screen size, the screen resolution, the image quality, and the like of the electronic device. And selecting the electronic equipment with better output effect from the hardware configuration information of the first electronic equipment and the hardware configuration information of the fourth electronic equipment to output the second target information. For example, the electronic device capable of playing a larger sound or a larger screen or having better sound quality, image quality, or the like is selected to output the second target information.
In addition, the electronic device for outputting the second target information may also be determined based on a specific scene, the number of people in the current space, environmental noise, the location distribution of the participants, and the like, in combination with hardware configuration information of the electronic device. For example, when the number of persons in the space is large, the environmental noise is large, and the positions of the participants are relatively dispersed, an electronic device playing a larger sound is selected as the electronic device outputting the second target information.
It should be noted that, if the first electronic device and the second electronic device have a second communication connection therebetween, that is, the first electronic device and the second electronic device are in the same space and are locally connected. At this time, the output parameter of the second target information and/or the current spatial environment may be obtained, where the output parameter of the second target information may include whether the second target information is stereo, audio information only or audio-visual information, the volume of the audio information, and the like; the current spatial environment may include a size of a conference room, a layout manner between the first electronic device, the second electronic device, and the third electronic device, and the like.
And selecting one of the first electronic device and the second electronic device to output the second target information based on the output parameter of the second target information and/or the current spatial environment. For example, if the second target information is stereo, the electronic device with a stereo output function is selected to output the second target information, if the second target information is audio-visual information, the electronic device with a display screen is selected to output the second target information, and if the current space is large, the electronic device with larger playing sound is selected to output the second target information.
Further, in some embodiments, the method may further comprise:
obtaining an echo cancellation signal sent by the first electronic device or the fourth electronic device;
and determining a target sound pickup equipment based on the position relation between the third electronic equipment and the first electronic equipment or the fourth electronic equipment, so as to give an echo cancellation signal to a microphone of the first electronic equipment or the fourth electronic equipment to perform echo cancellation operation, and give the echo cancellation signal to the target sound pickup equipment to perform echo cancellation operation, wherein the target sound pickup equipment is the third electronic equipment in a sound pickup state.
It should be noted that, during the audio output process, there may also be a problem of echo interference, and therefore, in the local space where the first electronic device is located, the electronic device (i.e. the first electronic device or the fourth electronic device, or the second electronic device that may also be local) that performs output of the second target signal may also generate an echo cancellation reference signal, and determine a target sound pickup device based on a position relationship between the third electronic device and the first electronic device or the fourth electronic device, where the determination method is as described above for determining the third electronic device used for acquiring the first target information, and the target sound pickup device represents the third electronic device in a sound pickup state.
For the first electronic device or the fourth electronic device, when performing the echo cancellation operation, the echo cancellation signal may be given to its own microphone to perform the echo cancellation operation, and also given to the target sound pickup device, so that the target sound pickup device performs the echo cancellation operation to avoid picking up echoes and causing interference.
In some embodiments, the method may further comprise:
sending a detection signal to at least one third electronic device at a first time through the first electronic device or the fourth electronic device;
receiving a detection response signal returned by at least one third electronic device, and recording a second time for receiving the detection response signal of each third electronic device;
determining the network delay of each third electronic device according to the second time and the first time of each third electronic device;
determining the delay compensation time of each third electronic device according to the network delay of each third electronic device;
accordingly, after obtaining the second target information, the method may further include:
and after the delay compensation time of the third electronic equipment corresponding to the second target information is separated, outputting the second target information, or sending the second target information to the second electronic equipment for outputting.
It should be noted that, for a different third electronic device, there may be a different network delay between the different third electronic device and the first electronic device (or the fourth electronic device if the fourth electronic device obtains the first target signal) due to a network reason or a hardware reason. Therefore, the embodiment of the application also performs delay compensation on signal transmission of each third electronic device.
First, the first electronic device or the fourth electronic device respectively sends a detection signal to each third electronic device at a first time, receives a detection response signal returned by each third electronic device, and records a second time when the detection response signal returned by each third electronic device is received.
Then, based on the second time and the first time, a network delay of each third electronic device is determined. Specifically, one half of the time difference between the second time and the first time is taken as the network delay of the third electronic device.
Finally, the delay compensation time of each third electronic device is determined based on the network delay of each of the plurality of third electronic devices. Illustratively, the network delays of different third electronic devices are different, and some network delays are short and signal transmission is fast, and some network delays are long and signal transmission is slow. At this time, if a third electronic device with a long network delay first transmits audio information and then a third electronic device with a short network delay also transmits audio information, there is a possibility that the latter audio information may have already started to be output when the former audio information has not been output yet, thereby causing signal interference. Therefore, the delay compensation time of each third electronic device is respectively determined based on the network delay of each third electronic device, and the time sequence consistency of the first target signal sent by each third electronic device is ensured. Assuming that there are two third electronic devices a and B, where the network delay of the third electronic device a is 3 milliseconds (ms) and the network delay of the third electronic device B is 4ms, in order to ensure the timing consistency of the two devices sending the first target signal, a delay compensation time of 1ms is set for 4-3 of the third electronic device a, and the delay compensation time of the second electronic device B is 0.
Correspondingly, when the second target information is output, after the second target information is obtained, the source of the first target information corresponding to the second target information, that is, the third electronic device corresponding to the second target information, is determined, and the delay compensation time of the third electronic device corresponding to the second target information is determined. And then, after the interval delay compensation time, outputting second target information or sending the second target information to second electronic equipment for outputting, thereby avoiding the interference problem caused by inconsistent network delay of each third electronic equipment.
In some embodiments, the method may further comprise:
obtaining third target information, wherein the third target information comes from electronic equipment outside the current space;
and determining to output the third target information by the first electronic device or the fourth electronic device based on the attribute information of the third target information and/or the environmental information and/or the device information in the current space.
It should be noted that, when there is a remote participant outside the current space, it is sometimes necessary to output third target information from outside the current space, where the third target information is from an electronic device outside the current space. Under the condition that the first electronic equipment and the second electronic equipment are remotely connected, the second electronic equipment is the electronic equipment outside the current space, and the first electronic equipment obtains audio information or video information collected by at least one third electronic equipment through a fourth electronic equipment; in the case where the first electronic device establishes a local connection with the second electronic device, an electronic device outside the current space may establish a remote connection with the first electronic device or the second electronic device.
And then determining to output the third target information by the first electronic device or the fourth electronic device based on the attribute information of the third target information and/or the environmental information and/or the device information in the current space. The device information may include hardware configuration information of the first electronic device and the fourth electronic device, and the specific determination manner for selecting the first electronic device or the fourth electronic device as the output device for outputting the third target information may refer to the foregoing manner for outputting the second target information, which is not described herein again.
The embodiment of the application provides a processing method, which comprises the following steps: after the first electronic device establishes communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information comes from at least one third electronic device connected with the first electronic device; performing first processing on the first target information to obtain second target information; outputting the second target information, or sending the second target information to a second electronic device for outputting; wherein the output effect of the second target information is better than the output effect of the first target information. Therefore, based on the connection between the first electronic device and the third electronic device, the first target information is received, the first target information is processed into the second target information, the second target information is output by the first electronic device or the second electronic device, the output effect of the second target information after the first processing is better than that of the first target information, so that the conference system can adapt to various scenes, and the problems of mutual interference among audio information or video information, mutual interference among users of a plurality of third electronic devices and the like during multiple devices are avoided, the conference effect is improved, and the conference quality is further improved.
In another embodiment of the present application, referring to fig. 3, an application scenario diagram of a processing method provided in the embodiment of the present application is shown. As shown in fig. 3, the scene is a conference scene, and the local participants participate in the same conference in the same space, and meanwhile, the remote participants also participate in the conference through the remote connection. The local space includes devices such as an audio relay device, a master device, n local devices (local device 1, local device 2, … …, local device n), and the remote space includes a remote device. The audio relay device may also be a video relay device, that is, if the first target information to be processed is audio information, the audio relay device may be used to implement processing only for the audio information, and if the first target information is video information, the video relay device may be used to implement processing for the video information and the audio information, and the determination is specifically performed in combination with actual scene requirements.
It should be further noted that, in fig. 3, the audio relay may be the first electronic device or the fourth electronic device in the foregoing embodiment; the master device may be the second electronic device in the foregoing embodiment when the second electronic device is in the same space (i.e., in a local scene) as the first electronic device, and the remote device may be the second electronic device in the foregoing embodiment when the second electronic device is in a different space (i.e., in a remote scene) from the first electronic device. It can be understood that, for a far-end participant, the far-end space where the far-end participant is located may include one or more far-end devices, and the far-end space may have the same or similar device layout as the local space where the local participant is located, so that, for the far-end space, the video information or the audio-visual information may also be processed according to the method provided in the embodiment of the present application, and if, for the far-end participant, the space where the far-end participant is located is taken as the local space, the local space in fig. 3 is the far-end space relative to the far-end participant.
In the conference scenario shown in fig. 3, the master device may integrate the function of the audio relay device, that is, the audio relay device may be removed, and each local device directly establishes a connection with the master device. Here, as shown in fig. 3, taking an example that the audio relay device corresponds to a first electronic device, the main device corresponds to a second electronic device, the remote device corresponds to an electronic device outside the current space, and the local device corresponds to a third electronic device, the implementation of the processing method in this scenario will be described in detail.
As shown in fig. 3, wireless connections are established between the audio relay and the n local devices, and the audio relay may receive first target information sent by the local devices through the wireless connections; the audio repeater and the main equipment are in wired connection, can send information to the main equipment and receive the information sent by the main equipment, and both have an audio playing function (and can also comprise a video image playing function); the main device establishes remote connection with the remote device, can send information to the remote device, and receives information sent by the remote device.
As shown in fig. 3, the host device may be a kiosk, a desktop, a notebook, or the like. The main device may be any local device, that is, the main device may be a specific device, for example, a conference device or a computer installed in a conference room in advance, or an electronic device such as a notebook computer carried by a local participant, and may be used as the main device during a conference.
Further, referring to fig. 4, a schematic diagram of a process of sending and receiving information provided by an embodiment of the present application is shown. As shown in fig. 4, each local apparatus includes a sound pickup device (microphone) for picking up audio information, where the user of the local apparatus 1 is user 1, the user of the local apparatus 2 is user 2, … …, and the user of the local apparatus n is user n, and in practice, there may be a case where several users share one local apparatus or one user uses a plurality of local apparatuses, and each local apparatus is in a mute state. After obtaining the audio information of the user, the local device may first perform audio preprocessing on the audio information, such as denoising, and then send the audio information after the audio preprocessing to the audio relay based on a Wireless connection between the local device and the audio relay, such as a Wireless Fidelity (WIFI) connection. The audio relay device may receive audio information (first target information) sent by the local device through a WiFi module therein, decode the audio Signal, and perform further noise reduction processing (relative to secondary noise reduction) and Automatic Gain Control (AFC) and other processing on the audio Signal through a Digital Signal Processor (DSP) or other processor therein to obtain processed audio information (second target information); then, in the sending direction of the audio relay, the second target information is sent to the main device, the main device plays the second target information through a playing path 1 (the playing path 1 indicates that the second target information is played by a device such as a speaker on the main device), and under the condition of a far-end user (such as a far-end participant), the main device can also send the second target information to the far-end device through the cloud server, so that the far-end device plays and outputs the second target information. The audio relay device may play the second target information through its own play path 2 (the play path 2 indicates that the second target information is played by a device such as a speaker on the audio relay device).
In addition, in the receiving direction (in fig. 4, the sending and receiving are the sending and receiving of the audio relay device relative to the main device), in the case that there is a far-end user, the main device also receives the third target information sent by the far-end device through the cloud server, and plays and outputs the third target information through the playing path 1; or the third target information is sent to the audio relay, and the audio relay plays and outputs the third target information through the playing path 2 after receiving the third target information.
In the embodiment of the present application, at least one of the n local devices may be selected to pick up sound, and fig. 5 is a schematic diagram illustrating a spatial arrangement of an electronic device according to the embodiment of the present application. As shown in fig. 5, the system includes an audio relay device, a master device, and n local devices, and each local device can at least pick up audio information or video information of its corresponding user. The audio relay device receives and processes the audio information or the audio-visual information sent by the specific local device to obtain second target information, and sends the second target information to the main device to execute subsequent operations, or the audio relay device can be integrated with the main device to realize related functions.
As shown in fig. 4, for a specific local device for transmitting the first target information, the determination manner may be: for the same audio information, since the sound needs time to propagate, the local device closest to the user who sends the audio information inevitably receives the audio information earliest, and in addition, under the condition that the local device picks up sound in a near field, the adjacent local devices may also pick up the audio information, at this time, the time point at which each local device acquires the same audio information can be respectively determined, the local device which receives the audio information earliest is determined as the target electronic device, and the audio information acquired by the target electronic device is taken as the first target information. For some scenes with multiple speakers, the target electronic equipment corresponding to each piece of audio information can be obtained according to the method, and the first target information is collected. In addition, the target electronic device may also be determined based on the transmission delay between the local device and the audio relay, and the specific manner may refer to the foregoing embodiment.
If the local equipment has far-field sound pickup capability, the local equipment at a specific position can be selected for sound pickup according to the magnitude of the signal energy value on the basis of the judgment mode so as to acquire the first target information.
Furthermore, according to the embodiment of the application, the distance information can be positioned according to different time differences between the audio information at different positions and the local device and the audio relay device, and then the first target information from the plurality of local devices is subjected to audio mixing processing according to the judgment of the distance information.
For example, when the user 1 speaks at the location of the local apparatus 1, the time difference between the time point when the sound reaches the microphone of the local apparatus 1 and the time point when the sound reaches the microphone of the audio relay is Δ T1, and according to the sound propagation speed, the distance L1 between the local apparatus 1 and the audio relay is obtained to be about: the distances L2, L3, L4, … …, Ln between the local devices and the audio relay device can be obtained by analogy with L1 being 340 × Δ T1.
Therefore, in some special scenes, the microphones of the local devices pick up sound at the same time, the distance between the local device and the audio repeater of each picked sound can be obtained according to the time difference, and the audio information channels (channels) picked up by the local devices of each picked sound are mixed by using a stereo technology to generate second target information with a surround stereo effect and send the second target information to the far-end device, so that the physical position of each person at a sender speaking can be sensed during far-end playing, and better conference experience is obtained.
Further, for each local device, there may be network delay in wireless transmission between the local device and the audio repeater, and at this time, matching of delay compensation may be performed for each local device. Referring to fig. 6, a schematic diagram of a determination method of network delay provided in the embodiment of the present application is shown. As shown in fig. 6, the audio repeater may actively transmit a detection signal to each local device, receive a detection response signal returned by the local device, and determine one half of a time difference between transmission of the detection signal and reception of the detection response signal as the network delay. I.e. the network delay T of the local device 1, as shown in fig. 6M1t 12; network delay T of local device 2M2t 22; network delay T of local device 3M3t 32; network delay T of local device 4M4t 42; … …, respectively; network delay T of local device nMnt n2; wherein, t1、t2、t3、t4,……,tnEach time difference is indicated.
Therefore, the local equipment which is currently picking up sound can be obtained by combining the method, and then matching compensation of network delay is carried out, so that consistency of the sound time sequence sent by each local equipment is ensured.
Further, in order to avoid the problem of echo interference, the embodiment of the present application further performs echo cancellation on the audio signal. Referring to fig. 7, a schematic flow chart of an echo cancellation process provided in the embodiment of the present application is shown. As shown in fig. 7, a microphone 0(mic0) represents a microphone of the audio relay, a microphone 1(mic1) represents a microphone of the local device 1, a microphone 2(mic2) represents a microphone of the local device 2, a microphone 3(mic3) represents a microphone of the local device 3, a microphone 4 represents a microphone of the local device 4, and … …, and a microphone n (mic n) represents a microphone of the local device n. Based on the connection between the audio relay and each local device, the audio relay is also connected to the microphone of each local device. As shown in fig. 7, it shows that when the local device is a 3-bit sound pickup device, the connection between the microphone 3 and the audio relay device is ON (ON), and the connections between the other microphones and the audio relay device are all OFF (OFF), at this time, for the audio information from the local device, only the audio information picked up by the microphone 4 of the local device 4 will be sent to the audio relay device as the first target information.
As shown in fig. 7, echo cancellation may include both Downlink (Downlink) and Uplink (Uplink) directions; as for the signal transmitted to the audio relay via the downlink, it is usually the third target signal transmitted by the remote device, at this time, after the audio signal is transmitted, first, a series of processing such as automatic gain control processing (AGC), Dynamic Range Compression (DRC), equalization processing (Equalizer), amplification processing (Amplifier) and the like is performed on the audio signal, and then the obtained audio signal enters the audio relay, and the audio relay generates an Echo cancellation signal (AEC reference) based on the audio signal and transfers the Echo cancellation signal to its own microphone 0, so that the microphone 0 performs Echo cancellation operation (AEC), Echo Suppression operation (Echo Suppression), processing, automatic gain control processing and the like based on the Echo cancellation signal. It can be understood that, in some cases, the sound of the speaker may be directly picked up by the microphone 0, for example, the speaker is closest to the audio relay, or the device used by the person hosting the conference is the audio relay or the main device integrated with the audio relay, so that based on the echo cancellation process, the microphone 0 may not generate echo interference when picking up the sound, the audio relay may not speak the output of echo and the like when outputting the sound, and when sending the audio information to the remote device through the uplink, the audio information including echo, noise and the like may not be sent out.
The audio repeater also sends the echo cancellation signal to the local equipment which is picking up sound, so that a microphone of the local equipment carries out echo cancellation based on the echo cancellation signal, and the local equipment cannot pick up echo information when picking up audio information, thereby avoiding the local equipment from sending the echo information. After the local device performs sound pickup, it may perform automatic echo cancellation processing, echo suppression processing, noise reduction processing, and the like based on the echo cancellation signal, and then send the processed audio information to the audio relay or the main device performs noise reduction processing again and performs automatic gain control, and then send the obtained audio information to the remote device through an uplink.
In short, in the embodiment of the present application, in order to develop an audio relay (also referred to as an audio relay enhancer) applied to a multi-person scene such as a conference and a lecture, or a master device integrated with a function of the audio relay, when a plurality of persons participate in the same conference in the same space, local devices (such as a notebook and a mobile phone) in the local space may access the audio relay in a wireless connection manner to perform local networking. The audio transmission and receiving transfer is realized, and then a plurality of devices in the same space can simultaneously participate in the same conference.
For the audio repeater, in the sending direction, the embodiment of the application considers at least the following three technical points:
(1) fig. 5 illustrates how the microphone of the local device matches the microphone of the audio repeater to obtain distance information, and the microphone of the local device matching the optimal position picks up sound. According to the scheme, the pickup capacity based on local equipment is near-field pickup, and at present, electronic equipment such as a notebook computer and a mobile phone on the market is also near-field pickup. And according to different time differences of the sound arriving at the audio repeater, the distance information is positioned, and the position from which the sound comes is judged, so that the sound is picked up by a microphone of the local equipment matched with the optimal position. For example, in fig. 5, when the user 1 speaks at the location of the local device 1, the time difference between the time point when the sound reaches the microphone of the local device 1 and the time point when the sound reaches the microphone of the audio relay is Δ T1, according to the sound propagation speed, L1 is 340 × Δ T1, and so on, L2, L3, L4, and L5 … … Ln can be obtained. When a speaker speaks, the distance between each electronic device and the audio repeater can be obtained according to the time difference delta T, the current speaker is judged to be closer to the microphone of which local device, the microphone of the local device closest to the current speaker is used for picking up sound, noise reduction and enhancement processing are carried out on the audio signal, and the sound quality in the sending direction is improved. If the local equipment with far-field sound pickup capability exists, the judgment of the signal energy value can be added to the judgment standard to select the microphone of the local equipment at the optimal position for sound pickup.
(2) Since there may be network delay in the wireless transmission of each local device and the audio relay, matching of delay compensation is also performed for each local device. For each local device, the audio repeater can actively send a detection signal, obtain network delay data of each local device, and record the network delay data. In FIG. 6, TMi(i-1, 2,3, … …, n) is the network delay of each local device and the audio repeater, and the method in the technical point (1) is combined to obtain the local device which is currently picking up sound, so as to perform matching compensation of the network delay and ensure the time sequence consistency of the audio information sent by each local device.
(3) In some special scenes, microphones of a plurality of local devices pick up sound simultaneously, and audio information of the local devices is subjected to sound mixing by channels by using a stereo technology in combination with distance information acquired by the technical point (1) to generate surrounding stereo sound to be transmitted to a remote device, so that the physical position of each person of a sender speaking can be sensed during remote playing, and better conference experience can be obtained.
(4) The embodiment of the application also utilizes the cooperation of the audio repeater and the local equipment to perform echo cancellation. Besides being transmitted to the microphone reference of the audio repeater, the echo cancellation signal generated by the audio repeater is transmitted to the local equipment which is picking up at present as a reference signal after the local equipment which is picking up at present is determined, and therefore the purpose of echo cancellation is achieved.
As can be seen from the detailed description of the audio repeater and the processing method performed by the audio repeater provided in the embodiments of the present application, the technical advantages of the audio repeater in the transmission direction at least include: (1) each local device is placed at different positions, a microphone of each local device can be used as a sound pickup and forms a spatial combination with a microphone on the audio repeater, so that distance information of a speaker is obtained, and the microphone at the best position can be automatically used for picking up sound according to the position information. (2) The network delay of wireless transmission between the audio repeater and each local device can be automatically obtained, and then targeted delay compensation is carried out on the local devices which are picking up sound. (3) When the received audio information comes from a plurality of local devices, the audio information of each local device can be mixed into stereo in a channel-splitting mode and sent to the far-end device in a stereo mode, and then the physical position of a near-end speaker can be sensed when the far-end speaker plays the audio information. (4) Echo cancellation processing and secondary noise reduction are performed on the local equipment which is picking up sound, so that the far-end client has better listening experience.
Its technical advantages in the receive direction may include at least: (1) the local devices of the networking can be set to be mute, and the sound of the remote participants can be played in the main device after being transmitted back or the sound can be played in the audio repeater when the volume of the main device is not large enough, so that the listening requirements of all the participants are met.
Therefore, the embodiment of the application integrates two directions of sending and receiving to meet the requirement of the conference audio, and simultaneously, the processing of noise reduction, echo cancellation and the like of the conference audio is added, so that a brand new audio experience can be provided for users. The positioning of the audio information or the video information can be more accurate in the conference process, and the audio information or the video information has better tone quality after being subjected to noise reduction, enhancement, echo elimination and the like.
In yet another embodiment of the present application, referring to fig. 8, a schematic structural diagram of a processing apparatus 80 provided in the embodiment of the present application is shown. As shown in fig. 8, the processing device 80 may include:
an obtaining unit 801 configured to obtain first target information in a current space after a first electronic device establishes a communication connection with a second electronic device, where the first target information is from at least one third electronic device connected with the first electronic device;
a processing unit 802 configured to perform a first processing on the first target information to obtain second target information;
an output unit 803 configured to output the second target information or to give the second target information to the second electronic device for output; wherein the output effect of the second target information is better than the output effect of the first target information.
It should be noted that, in the embodiment of the present application, the processing apparatus 80 may be the first electronic device, or may be integrated on the first electronic device.
In some embodiments, the obtaining unit 801 is specifically configured to determine a position relationship between a third electronic device in the current space and the first electronic device, and use audio information or audio-visual information acquired by at least one third electronic device having a target position relationship with the first electronic device as the first target information; or determining user information of third electronic equipment in the current space, and taking audio information or video information acquired by the third electronic equipment used by a target user as first target information; or obtaining attribute parameters of audio information or video information collected by third electronic equipment in the current space, and determining the audio information or video information with target attributes as first target information; or, if the first electronic device is in communication connection with the second electronic device, the fourth electronic device obtains audio information or video information collected by at least one third electronic device connected with the first electronic device.
It should be noted that, in the embodiment of the present application, in the case of having a fourth electronic device, the processing apparatus 80 may be the fourth electronic device, or be integrated on the fourth electronic device.
In some embodiments, the obtaining unit 801 is specifically configured to combine a microphone and/or a camera on at least one third electronic device with a microphone and/or a camera on the first electronic device to form a corresponding microphone array and/or camera array; and determining a location of a third electronic device using the microphone array and/or the camera array; the audio information or the video information collected by at least one third electronic device having a first position relation with the first electronic device is used as first target information; or determining transmission delay between the third electronic equipment and the first electronic equipment based on the positions, and taking audio information or video information collected by at least one third electronic equipment having a second position relation with the first electronic equipment as first target information based on the transmission delay.
In some embodiments, the processing unit 802 is specifically configured to determine a communication connection state between the first electronic device and the second electronic device, and perform denoising and/or enhancement processing on the audio information or the video information by the first electronic device or the fourth electronic device based on at least the communication connection state to obtain second target information; or if the first electronic equipment and the second electronic equipment are in second communication connection, the first electronic equipment identifies and processes the audio information or the video information to obtain second target information, wherein the data volume of the second target information is less than that of the first target information.
In some embodiments, the processing unit 802 is specifically configured to determine a location of the third electronic device and a transmission delay between the third electronic device and the first electronic device or the fourth electronic device; and performing sound mixing processing on the first target information from the plurality of third electronic devices based on the positions and the transmission time delays to obtain second target information.
In some embodiments, the obtaining unit 801 is further configured to obtain an echo cancellation signal sent by the first electronic device or the fourth electronic device;
the processing unit 802 is further configured to determine a target sound pickup apparatus based on a positional relationship between the third electronic apparatus and the first electronic apparatus or the fourth electronic apparatus, to perform an echo cancellation operation on a microphone of the first electronic apparatus or the fourth electronic apparatus with an echo cancellation signal, and to perform an echo cancellation operation on the target sound pickup apparatus with the echo cancellation signal, where the target sound pickup apparatus is the third electronic apparatus in a sound pickup state.
In some embodiments, the output unit 803 is specifically configured to output the second target information to the second electronic device if the first electronic device and the second electronic device have the first communication connection therebetween, or obtain hardware configuration information of the first electronic device and the fourth electronic device, and determine to output the second target information by the first electronic device or the fourth electronic device based on the hardware configuration information; or, if the first electronic device and the second electronic device have a second communication connection, acquiring hardware configuration information of the first electronic device and the second electronic device, and determining that the first electronic device or the second electronic device outputs second target information based on the hardware configuration information; or, if the first electronic device and the second electronic device have a second communication connection, obtaining the output parameter and/or the current spatial environment of the second target information, and determining to output the second target information by the first electronic device or the second electronic device based on the output parameter and/or the current spatial environment.
In some embodiments, the obtaining unit 801 is further configured to obtain third target information, where the third target information is from an electronic device outside the current space;
an output unit 803 further configured to determine to output the third target information by the first electronic device or the fourth electronic device based on the attribute information of the third target information and/or the environmental information and/or the device information within the current space.
In some embodiments, as shown in fig. 8, the processing apparatus 80 may further include a determining unit 804 configured to transmit a detection signal to at least one third electronic device at a first time through the first electronic device or the fourth electronic device; receiving a detection response signal returned by at least one third electronic device, and recording a second time for receiving the detection response signal of each third electronic device; determining the network delay of each third electronic device according to the second time and the first time of each third electronic device; determining the delay compensation time of each third electronic device according to the network delay of each third electronic device;
the output unit 803 is further configured to perform a step of outputting the second target information or giving the second target information to the second electronic device for output after the delay compensation time of the third electronic device corresponding to the second target information is set.
For the embodiments of the present application, technical details that are not disclosed with respect to the processing device 80 can be understood by referring to the foregoing embodiments of the processing method.
It is understood that in this embodiment, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may also be a module, or may also be non-modular. Moreover, each component in the embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Accordingly, the present embodiment provides a computer storage medium storing a computer program which, when executed by at least one processor, implements the steps of the processing method of any one of the preceding embodiments.
Based on the composition of the processing device 80 and the computer storage medium, refer to fig. 9, which shows a schematic structural diagram of a composition of an electronic device 90 provided in an embodiment of the present application. As shown in fig. 9, may include: a communication interface 901, a memory 902, and a processor 903; the various components are coupled together by a bus system 904. It is understood that the bus system 904 is used to enable communications among the components. The bus system 904 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 904 in figure 9. The communication interface 901 is configured to receive and send signals in a process of receiving and sending information with other external network elements;
a memory 902 for storing a computer program operable on the processor 903;
a processor 903 for executing, when running the computer program, the following:
after the first electronic device establishes communication connection with the second electronic device, first target information in a current space is obtained, wherein the first target information comes from at least one third electronic device connected with the first electronic device;
performing first processing on the first target information to obtain second target information;
outputting the second target information, or sending the second target information to a second electronic device for outputting;
wherein the output effect of the second target information is better than the output effect of the first target information.
It will be appreciated that the memory 902 in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous chained SDRAM (Synchronous link DRAM, SLDRAM), and Direct memory bus RAM (DRRAM). The memory 902 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
And the processor 903 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 903. The Processor 903 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 902, and the processor 903 reads information in the memory 902 and performs the steps of the above method in combination with hardware thereof.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPDs), Programmable Logic Devices (PLDs), Field-Programmable Gate arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Optionally, as another embodiment, the processor 903 is further configured to execute the steps of the method of any one of the preceding embodiments when running the computer program.
In still another embodiment of the present application, referring to fig. 10, a schematic structural diagram of another electronic device 90 provided in the embodiment of the present application is shown. As shown in fig. 10, the electronic device 90 may include the processing device 80 according to any of the previous embodiments.
In some embodiments, the electronic device 90 may be the aforementioned first electronic device or fourth electronic device.
The electronic device 90 includes the processing device 80 in the foregoing embodiment, so that information interference among multiple persons can be avoided when the persons meet, the meeting effect is improved, and the electronic device is suitable for multiple scenes.
It should be noted that, in the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of processing, comprising:
after a first electronic device establishes communication connection with a second electronic device, obtaining first target information in a current space, wherein the first target information is from at least one third electronic device connected with the first electronic device;
performing first processing on the first target information to obtain second target information;
outputting the second target information, or outputting the second target information to the second electronic equipment;
wherein an output effect of the second target information is better than an output effect of the first target information.
2. The method of claim 1, wherein the obtaining first target information within a current space comprises:
determining a position relationship between third electronic equipment and the first electronic equipment in a current space, and taking audio information or video information acquired by at least one third electronic equipment having a target position relationship with the first electronic equipment as the first target information; or the like, or, alternatively,
determining user information of third electronic equipment in the current space, and taking audio information or video information acquired by the third electronic equipment used by a target user as the first target information; or the like, or, alternatively,
acquiring attribute parameters of audio information or video information acquired by third electronic equipment in a current space, and determining the audio information or video information with target attributes as first target information; or the like, or, alternatively,
and if the first electronic equipment is in first communication connection with the second electronic equipment, acquiring audio information or video information acquired by at least one third electronic equipment connected with the first electronic equipment through a fourth electronic equipment.
3. The method of claim 2, wherein the using, as the first target information, audio information or audio-visual information collected by at least one third electronic device having a target location relationship with the first electronic device comprises:
combining a microphone and/or a camera on at least one third electronic device with a microphone and/or a camera on the first electronic device to form a corresponding microphone array and/or camera array;
determining a location of the third electronic device with the microphone array and/or camera array;
taking audio information or video information acquired by at least one third electronic device having a first position relation with the first electronic device as the first target information; or the like, or, alternatively,
and determining transmission time delay between the third electronic equipment and the first electronic equipment based on the position, and taking audio information or video information acquired by at least one third electronic equipment having a second position relation with the first electronic equipment as the first target information based on the transmission time delay.
4. The method according to claim 2 or 3, wherein the performing the first processing on the first target information to obtain the second target information comprises:
determining a communication connection state between the first electronic device and the second electronic device, and performing denoising and/or enhancement processing on the audio information or the video information by the first electronic device or a fourth electronic device at least based on the communication connection state to obtain second target information; or the like, or, alternatively,
and if the first electronic equipment and the second electronic equipment are in second communication connection, the first electronic equipment identifies the audio information or the audio and video information to obtain second target information, wherein the data volume of the second target information is less than that of the first target information.
5. The method according to claim 2 or 3, wherein the performing the first processing on the first target information to obtain the second target information comprises:
determining a position of a third electronic device and a transmission delay between the third electronic device and the first electronic device or a fourth electronic device;
and performing sound mixing processing on the first target information from the plurality of third electronic devices based on the positions and the transmission time delays to obtain the second target information.
6. The method of claim 2 or 3, further comprising:
obtaining an echo cancellation signal sent by the first electronic device or the fourth electronic device;
determining a target sound pickup equipment based on the position relation between the third electronic equipment and the first electronic equipment or the fourth electronic equipment, so as to give the echo cancellation signal to a microphone of the first electronic equipment or the fourth electronic equipment to execute an echo cancellation operation, and give the echo cancellation signal to the target sound pickup equipment to execute an echo cancellation operation, wherein the target sound pickup equipment is the third electronic equipment in a sound pickup state.
7. The method of any of claims 1-3, wherein the outputting the second objective information or giving the second objective information to the second electronic device for output comprises:
if the first electronic device and the second electronic device have a first communication connection, giving the second target information to the second electronic device for outputting, or obtaining hardware configuration information of the first electronic device and a fourth electronic device, and determining that the second target information is output by the first electronic device or the fourth electronic device based on the hardware configuration information; or the like, or, alternatively,
if the first electronic device and the second electronic device have a second communication connection, obtaining hardware configuration information of the first electronic device and the second electronic device, and determining that the first electronic device or the second electronic device outputs the second target information based on the hardware configuration information; or the like, or a combination thereof,
and if the first electronic equipment and the second electronic equipment have second communication connection, obtaining output parameters and/or current spatial environment of the second target information, and determining that the first electronic equipment or the second electronic equipment outputs the second target information based on the output parameters and/or the current spatial environment.
8. The method of any of claims 1 to 3, further comprising:
obtaining third target information, wherein the third target information is from electronic equipment outside the current space;
determining to output the third target information by the first electronic device or a fourth electronic device based on attribute information of the third target information and/or environmental information and/or device information within the current space.
9. The method of claim 1, further comprising:
sending, by the first electronic device or a fourth electronic device, a detection signal to the at least one third electronic device at a first time;
receiving a detection response signal returned by the at least one third electronic device, and recording a second time when the detection response signal of the at least one third electronic device is received;
determining a network delay of each third electronic device according to the second time and the first time of each third electronic device;
determining the delay compensation time of each third electronic device according to the network delay of each third electronic device;
correspondingly, after the obtaining of the second target information, the method further includes:
and after the delay compensation time of the third electronic equipment corresponding to the second target information is separated, executing the step of outputting the second target information or sending the second target information to the second electronic equipment for outputting.
10. A processing apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire first target information in a current space after a first electronic device and a second electronic device establish communication connection, and the first target information is from at least one third electronic device connected with the first electronic device;
the processing unit is configured to perform first processing on the first target information to obtain second target information;
the output unit is configured to output the second target information or output the second target information to the second electronic device; wherein an output effect of the second target information is better than an output effect of the first target information.
CN202111672799.5A 2021-12-31 2021-12-31 Processing method and processing device Pending CN114531425A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111672799.5A CN114531425A (en) 2021-12-31 2021-12-31 Processing method and processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111672799.5A CN114531425A (en) 2021-12-31 2021-12-31 Processing method and processing device

Publications (1)

Publication Number Publication Date
CN114531425A true CN114531425A (en) 2022-05-24

Family

ID=81621166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111672799.5A Pending CN114531425A (en) 2021-12-31 2021-12-31 Processing method and processing device

Country Status (1)

Country Link
CN (1) CN114531425A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114999490A (en) * 2022-08-03 2022-09-02 成都智暄科技有限责任公司 Intelligent cabin audio control system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008017126A (en) * 2006-07-05 2008-01-24 Yamaha Corp Voice conference system
US20140154968A1 (en) * 2012-12-04 2014-06-05 Timothy D. Root Audio system with centralized audio signal processing
CN107205199A (en) * 2017-06-16 2017-09-26 福建星网智慧科技股份有限公司 The microphone array and its communication means of a kind of Android phone
US10110994B1 (en) * 2017-11-21 2018-10-23 Nokia Technologies Oy Method and apparatus for providing voice communication with spatial audio
CN110235428A (en) * 2017-02-02 2019-09-13 伯斯有限公司 Meeting room audio setting

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008017126A (en) * 2006-07-05 2008-01-24 Yamaha Corp Voice conference system
US20140154968A1 (en) * 2012-12-04 2014-06-05 Timothy D. Root Audio system with centralized audio signal processing
CN110235428A (en) * 2017-02-02 2019-09-13 伯斯有限公司 Meeting room audio setting
CN107205199A (en) * 2017-06-16 2017-09-26 福建星网智慧科技股份有限公司 The microphone array and its communication means of a kind of Android phone
US10110994B1 (en) * 2017-11-21 2018-10-23 Nokia Technologies Oy Method and apparatus for providing voice communication with spatial audio

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114999490A (en) * 2022-08-03 2022-09-02 成都智暄科技有限责任公司 Intelligent cabin audio control system

Similar Documents

Publication Publication Date Title
US11539844B2 (en) Audio conferencing using a distributed array of smartphones
US11282532B1 (en) Participant-individualized audio volume control and host-customized audio volume control of streaming audio for a plurality of participants who are each receiving the streaming audio from a host within a videoconferencing platform, and who are also simultaneously engaged in remote audio communications with each other within the same videoconferencing platform
US9973561B2 (en) Conferencing based on portable multifunction devices
US10732924B2 (en) Teleconference recording management system
US11782674B2 (en) Centrally controlling communication at a venue
US11521636B1 (en) Method and apparatus for using a test audio pattern to generate an audio signal transform for use in performing acoustic echo cancellation
WO2021244159A1 (en) Translation method and apparatus, earphone, and earphone storage apparatus
JP2006254064A (en) Remote conference system, sound image position allocating method, and sound quality setting method
US11741984B2 (en) Method and apparatus and telephonic system for acoustic scene conversion
CN114531425A (en) Processing method and processing device
US20240064485A1 (en) Systems and methods for sound-enhanced meeting platforms
WO2022054900A1 (en) Information processing device, information processing terminal, information processing method, and program
US11089164B2 (en) Teleconference recording management system
JP2006339869A (en) Apparatus for integrating video signal and voice signal
US20190149917A1 (en) Audio recording system and method
CN111201784B (en) Communication system, method for communication and video conference system
WO2023286320A1 (en) Information processing device and method, and program
KR102363652B1 (en) Method and Apparatus for Playing Multiple Audio
CN117636928A (en) Pickup device and related audio enhancement method
CN116114241A (en) Information processing device, information processing terminal, information processing method, and program
JP2003069968A (en) Method for realizing electronic conference with sense of reality
JP2010148143A (en) Teleconference system, method for allocating sound image position, and method for setting sound quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination