CN111683183A

CN111683183A - Multimedia conference non-participant conversation shielding processing method and system thereof

Info

Publication number: CN111683183A
Application number: CN202010474772.4A
Authority: CN
Inventors: 张晨成
Original assignee: Taicang Qinfeng Advertising Media Co ltd
Current assignee: Jieyuntong Beijing Technology Co ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-18
Anticipated expiration: 2040-05-29
Also published as: CN111683183B

Abstract

The invention provides a multimedia conference non-participant conversation shielding processing method and a system thereof, wherein the method comprises the following steps: the method comprises the steps of collecting first voice information of a participant and extracting first voiceprint information from the first voice information, detecting a non-participant terminal near the participant terminal, collecting second voice information of a host from the non-participant terminal and extracting second voiceprint information from the host, judging whether the second voiceprint information is the same as the first voiceprint information, judging whether the host and the participant of the non-participant terminal are acquainted if the host and the participant are different, storing the second voiceprint information if the host and the participant are acquainted, collecting newly-added voice information in environmental sound and extracting third voiceprint information from the environment sound, judging whether the third voiceprint information is the same as the second voiceprint information, judging whether third voice information of the participant is generated after the newly-added voice information if the third voice information is the same as the first voiceprint information, judging whether the third voice information moves relative to the first voice information if the third voice information moves relative to the newly-added voice information, and shielding the third voice information and the newly-added voice information if.

Description

Multimedia conference non-participant conversation shielding processing method and system thereof

Technical Field

The invention relates to the technical field of multimedia conferences, in particular to a multimedia conference non-participant conversation shielding processing method and a system thereof.

Background

With the development of society, the application of audio and video conference software is increasingly popularized, and the audio and video conference software is a multimedia communication mode for holding a conference through a communication network by using electronic equipment, and can enable geographically dispersed participants to exchange and share real-time information through video and sound information streams so as to develop a cooperative working mode.

During the course of a conference, such scenarios are often encountered: hum sounds of the participants and other people in the same space in reality (such as daily conversations of the participants and family) are collected to be transmitted to audio and video conference software, so that the voices are heard by the other participants, and the quality of a conference and the privacy of a user are affected.

In contrast, the practice adopted in the prior art is to reduce the noise of the ambient sound, i.e., to recognize the sound of the non-participant, so as to reduce the noise and filter the sound.

This approach has the disadvantage of filtering only the voices of non-participants, but not the conversations between the participants and non-participants at the same time, resulting in inadequate voice filtering and failure to take into account the actual filtering requirements.

Disclosure of Invention

The purpose of the invention is as follows:

in order to overcome the disadvantages in the background art, embodiments of the present invention provide a method and a system for processing non-participant conversation screening in a multimedia conference, which can effectively solve the problems related to the background art.

The technical scheme is as follows:

a multimedia conference non-participant conversation screening processing method, the method comprising:

collecting first voice information of participants and extracting first voiceprint information from the first voice information;

detecting non-participant terminals near the participant terminals;

acquiring second voice information of the owner from the detected non-participant terminal and extracting second voiceprint information from the second voice information;

comparing the second voiceprint information with the first voiceprint information, judging whether the second voiceprint information is the same as the first voiceprint information, and if the second voiceprint information is different from the first voiceprint information, judging whether the owner of the non-participant terminal is acquainted with the participant;

if so, storing the second voiceprint information;

acquiring newly added voice information in the environmental sound and extracting third voiceprint information from the newly added voice information;

judging whether the third voiceprint information is the same as the second voiceprint information;

if the voice information is the same as the voice information of the newly added person, judging whether third voice information of the participant is generated or not, and if the third voice information is generated, judging whether the third voice information moves relative to the first voice information or not;

and if the third voice information occurs, shielding the third voice information and the newly added voice information.

As a preferred aspect of the present invention, a method for detecting a non-participant terminal in the vicinity of a participant terminal, includes:

sending a detection instruction to a router accessed with the participating terminal, judging whether other accessed terminals exist through the router, and if so, confirming the terminals as nearby non-participating terminals;

or, the participating terminal detects whether the nearby Bluetooth equipment is started or not, and if so, the participating terminal is determined as a nearby non-participating terminal.

As a preferred mode of the present invention, the acquiring of the second voice information of the owner from the detected non-participant terminal includes:

sending a second voice information acquisition request to the detected non-participant terminal;

and the non-participant terminal acquires the installed chat application according to the acquisition request, acquires one voice section of the owner from the chat application and feeds the voice section as second voice information.

As a preferred aspect of the present invention, the process of masking the third speech information and the newly added speech information includes:

and canceling the conference transmission of the audio signal corresponding to the third voice information and the newly added voice information.

As a preferred mode of the present invention, the method further includes:

and judging whether the newly added voice information in the environmental sound stops or not, if so, judging whether the sound level of the third voice information is recovered, and if so, canceling the shielding processing of the third voice information and the newly added voice information.

A system for processing non-participant conversation screening in a multimedia conference, comprising:

the first voice information acquisition module is used for acquiring first voice information of the participants;

the first voice print information extraction module is used for extracting first voice print information from the first voice information;

the non-participant terminal detection module is used for detecting non-participant terminals near the participant terminals;

the second voice information acquisition module is used for acquiring second voice information of the owner from the detected non-participant terminal;

the second voiceprint information extraction module is used for extracting second voiceprint information from the second voice information;

the first voiceprint information comparison module is used for comparing the second voiceprint information with the first voiceprint information and judging whether the second voiceprint information is the same as the first voiceprint information;

the personnel acquaintance judging module is used for judging whether the owner of the non-participant terminal and the participant acquaintance or not when the second voiceprint information is judged to be different from the first voiceprint information;

the second fingerprint information storage module is used for storing the second fingerprint information when the owner of the non-participant terminal is judged to be acquainted with the participant;

the newly-added voice information acquisition module is used for acquiring newly-added voice information in the environmental sound;

the third voiceprint information extraction module is used for extracting third voiceprint information from the newly added voice information;

a second voiceprint information judgment module, configured to judge whether the third voiceprint information is the same as the second voiceprint information;

the third voice information judging module is used for judging whether the third voice information of the participant is generated immediately after the newly added voice information when the third voice information is judged to be the same as the second voice information;

the voiceprint judging module is used for judging whether the third voice information moves relative to the first voice information or not when judging that the third voice information of the participant is generated immediately after the newly added voice information;

and the sound shielding module is used for shielding the third voice information and the newly added voice information when judging that the third voice information moves relative to the sound level of the first voice information.

As a preferred mode of the present invention, the non-participant terminal detection module is further configured to:

or detecting whether the nearby Bluetooth equipment is started or not, and if so, confirming the nearby Bluetooth equipment as a nearby non-participating terminal.

As a preferred mode of the present invention, the second voice information collecting module is further configured to:

As a preferred mode of the present invention, the sound shielding module is further configured to:

As a preferred aspect of the present invention, a sound shielding module includes:

the newly added voice information judgment module is used for judging whether the newly added voice information in the environmental sound stops or not;

the sound level recovery judging module is used for judging whether the sound level of the third voice information is recovered or not when judging that the newly added voice information in the environmental sound stops;

and the sound shielding submodule is used for canceling the shielding processing of the third voice information and the newly added voice information when the sound level of the third voice information is judged to be recovered.

The invention realizes the following beneficial effects:

the invention collects the first voice information of the participant and extracts the first voiceprint information from the first voiceprint information, detects the non-participant terminal near the participant terminal, collects the second voice information of the owner from the detected non-participant terminal and extracts the second voiceprint information from the second voiceprint information, stores the second voiceprint information when judging that the second voiceprint information is different from the first voiceprint information and the owner of the non-participant terminal is identified with the participant, collects the newly added voice information in the environmental sound and extracts the third voiceprint information from the environment sound, shields the third voice information and the newly added voice information when judging that the third voiceprint information is the same as the second voiceprint information and generates the third voice information of the participant immediately after the newly added voice information and the third voice information moves relative to the first voice information, thus the participant and other participants can carry out the multimedia conference, if the conversation between the participant and the non-participant is generated, the conversation between the participant and the non-participant is subjected to sound shielding, so that the progress of the conference is not influenced (the participant can normally hear the sound of other participants), and the conversation between the participant and the non-participant is prevented from being heard by other participants, so that the quality of the conference and the privacy of a user are effectively guaranteed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic flowchart of a non-participant session shielding processing method for a multimedia conference according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a non-participant session shielding processing method for a multimedia conference according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a system for processing a non-participant session shielding in a multimedia conference according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a sound shielding module according to a third embodiment of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments; in the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure; one skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc.; in other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale; the same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted; the structures shown in the drawings are illustrative only and do not necessarily include all of the elements; for example, some components may be split and some components may be combined to show one device.

Example one

As shown with reference to fig. 1. The embodiment provides a multimedia conference non-participating session shielding processing method, wherein the method provided by the invention can be implemented by software installed or set in a device, and the software can be an application program, such as a typical APP, or implemented by an operating system running in the device. The participant terminal mentioned in the embodiment of the invention refers to a terminal which is provided with multimedia conference software and is starting the software to carry out a conference, the non-participant terminal refers to other terminals which are positioned in the same space with the participant terminal except the participant terminal, wherein the non-participant terminal can also be one of the participant terminals, but the non-participant terminal is converted into the participant terminal when the multimedia conference software is operated to carry out the conference, and the terminal mentioned in the embodiment of the invention can be electronic equipment such as a mobile phone, a computer, a tablet and the like; the multimedia conference mentioned in this embodiment includes an audio conference and a video conference.

The method provided by the embodiment of the invention comprises the following steps:

s101, collecting first voice information of the participants and extracting first voiceprint information from the first voice information.

The participant in this embodiment refers to a person who is in a multimedia conference, that is, a participant terminal is used to conference with other participants, the participant terminal (specifically using APP or an operating system) collects the sound of the participant, that is, first voice information, the first voice information may be collected in an instant manner or in advance, and then voiceprint information is extracted from the collected first voice information.

The voiceprint information is the sound wave frequency spectrum carrying speech information and displayed by an electroacoustic instrument, and the voiceprint has the characteristics of specificity and relative stability; after the adult, the voice of the human can be kept relatively stable and unchanged for a long time; experiments prove that whether a speaker intentionally imitates the voice and tone of other people or speaks with whisper and whisper, even if the imitation is vivid, the voiceprint of the speaker is different all the time, so that the voiceprint information can be considered to represent the identity of each user, and different people can be distinguished.

S102, detecting non-participant terminals near the participant terminals.

In the embodiment of the present invention, S102 may specifically be implemented in the following manner:

and sending a detection instruction to a router accessed with the participating terminal, judging whether other accessed terminals exist through the router, and if so, confirming the terminals as nearby non-participating terminals.

Or, the participating terminal detects whether the nearby Bluetooth device is started or not, and if so, the participating terminal is determined as a nearby non-participating terminal.

For the former, the participating terminal sends a detection instruction to a router accessed by the participating terminal, the router judges which terminals exist in the terminals connected with the participating terminal after receiving the detection instruction sent by the participating terminal, and confirms the detected other terminals except the participating terminal as non-participating terminals.

For the latter, the participant terminal searches nearby bluetooth devices through a built-in bluetooth search function, and confirms the searched bluetooth devices as non-participant terminals in the vicinity of the participant terminal.

S103, second voice information of the owner is collected from the detected non-participant terminal, and second voiceprint information is extracted from the second voice information.

In this step, the second voice information of the owner collected from the detected non-participant terminal can be specifically realized by adopting the following mode:

sending a second voice information acquisition request to the detected non-participant terminal; and the non-participant terminal acquires the installed chat application according to the acquisition request, acquires one voice section of the owner from the chat application and feeds the voice section as second voice information.

Specifically, the participant terminal sends a second voice information acquisition request to the detected non-participant terminal, and after receiving the second voice information acquisition request, the non-participant terminal acquires an installed chat application, such as an application capable of sending voice messages, such as QQ, WeChat, and the like, and certainly, other chat applications capable of sending and receiving voice messages are also within the protection scope of the present invention, and then finds out any voice chat record from the chat application, identifies a master user (the master user can be acquired through the arrangement mode of the message interface), and then correspondingly acquires the voice message of the master user, that is, one voice segment of the collector owner, and feeds back the voice segment as the second voice message to the participant terminal.

And S104, comparing the second voiceprint information with the first voiceprint information, judging whether the second voiceprint information is the same as the first voiceprint information or not, and executing S105 if the second voiceprint information is different from the first voiceprint information.

In S104, the voiceprint features of the second voiceprint information and the first voiceprint information are respectively provided, then the respective voiceprint features are input into voiceprint recognition software for comparison, then a comparison result is output, whether the two voiceprint features are the same or not is judged according to the comparison result, if the two voiceprint features are the same, it is determined that the users corresponding to the two voiceprint features are the same person, otherwise, it is determined that the users are different users, and if the two voiceprint features are the same, S105 is executed.

And S105, judging whether the owner of the non-participant terminal is acquainted with the participant, and executing S106 if the owner of the non-participant terminal is acquainted with the participant.

In S105, determining whether the owner of the non-participant terminal and the participant meet each other may specifically be implemented as follows: judging by an address book or chatting software mode, namely judging whether the contact way of one party exists in the address book of the other party terminal or chatting software or whether historical chatting records exist among the contact ways of the two parties, and if so, judging that the two parties are identified; the picture recognition mode can also be used, namely, a picture of one person is obtained, whether the picture exists is searched and recognized in the other terminal, and if the picture exists, the two persons are considered to be acquainted.

And S106, storing the second voiceprint information.

It is stored in a specific database.

And S107, acquiring newly added voice information in the environmental sound and extracting third voiceprint information from the newly added voice information.

The newly added voice information refers to voice of people except for the participants, the voice information includes other participants who carry out the multimedia conference with the participants, the participants terminal filters (only does not consider and is not filtering in practical significance) voice of other participants in the step, then whether newly appeared voice information exists in the environment is detected, and if the newly appeared voice information is detected, third voiceprint information is extracted from the voice information.

And S108, judging whether the third voiceprint information is the same as the second voiceprint information or not, and if so, executing S109.

Similar to S104, in S108, the third voiceprint information and the voiceprint features of the second voiceprint information are respectively extracted through the participant terminal, then the respective voiceprint features are input into voiceprint recognition software for comparison, then a comparison result is output, whether the third voiceprint information and the second voiceprint information are the same is judged according to the comparison result, if the third voiceprint information and the second voiceprint information are the same, it is determined that the users corresponding to the third voiceprint information and the second voiceprint information are the same, otherwise, it is determined that the users are different, and if the third voiceprint information and the second voice.

And S109, judging whether the third voice information of the participant is generated or not after the newly added voice information, and executing S110 if the third voice information of the participant is generated.

The immediate judgment can be determined by judging whether the third voice information of the participant is generated within a preset time threshold after the newly-added voice information is generated, and the preset time threshold can be set according to actual conditions, for example, by acquiring an average interval time value of the participant and the newly-added voice information (namely, the owner of the non-participant terminal) in an actual conversation, the preset time threshold is correspondingly set; and if the second voice information is within the preset time threshold, the third voice information of the participant is considered to be generated immediately after the newly added voice information.

The third voice information of the participant is not a section of voice in a specific time point, and may be continuous or intermittent, that is, the participant can be considered as the third voice information as long as it is judged that the participant utters voice (immediately) within a preset time after the new voice information is generated.

And S110, judging whether the third voice message generates sound level movement relative to the first voice message, and if so, executing S111.

Specifically, when the participant terminal acquires the first voice information, it records a sound level corresponding to the first voice information, that is, a sound source position relative to the participant terminal, where the sound source position may be determined by training positions corresponding to sounds at different positions in advance to serve as a basis for judgment, for example, when a sound source is at one of the positions, a band, a frequency, a volume value, and the like corresponding to the detected sound source are acquired, so as to record sound source information corresponding to different positions.

In the embodiment of the present invention, it is necessary to determine a sound generating position of the third voice information, after the sound generating position of the third voice information is obtained, compare the sound generating position with the sound generating position of the first voice information, and if the sound generating position of the third voice information is different or exceeds a certain range, it is determined that the sound generating position of the third voice information moves relative to the sound generating position of the first voice information.

Based on the test result, when the sound level of the third voice information is judged to move relative to the first voice information, the third voice information is considered to be the voice information which carries out conversation with the newly added voice information, namely the conversation information carried out by the participants and the host of the non-participant terminal, but not the voice information which carries out conversation with other participants in the conference.

And S111, shielding the third voice information and the newly added voice information.

In this embodiment of the present invention, the shielding processing on the third voice information and the newly added voice information may specifically refer to: and canceling the conference transmission of the audio signal corresponding to the third voice information and the newly added voice information.

That is, in the process of carrying out the multimedia conference between the participant and other participants, if the conversation between the participant and non-participants is generated, the conversation between the participant and non-participants is sound-shielded, and specifically, the audio information corresponding to the participant is acquired, so that the participant is cancelled to carry out conference transmission, namely, the audio information is not transmitted to the ongoing multimedia conference, so that the conference is not influenced (the participant can normally hear the sound of other participants), and meanwhile, the conversation between the participant and non-participants is prevented from being heard by other participants, thereby effectively guaranteeing the quality of the conference and the privacy of users.

Example two

As shown with reference to fig. 2. In this embodiment, on the basis of the first embodiment, the method further includes the following steps:

s201, judging whether the voice information added in the environment sound stops, and if the voice information stops, executing S202.

The judgment of whether the newly added voice information stops can also be determined by judging whether the disappearance time of the newly added voice information exceeds a preset time threshold, and if the disappearance time exceeds the preset time threshold, the newly added voice information in the reference environmental sound is considered to stop.

S202, judging whether the sound level of the third voice information is recovered or not, and if so, executing S203.

Specifically, the participant terminal determines whether the sound level of the third voice message is restored to the position before the movement, that is, the sound source position corresponding to the first voice message, and if the sound level of the third voice message is matched or tends to be matched, the participant terminal regards that the sound level is restored.

In this case, the conversation between the participant and the non-participant will be considered to have ended.

And S203, canceling the shielding processing of the third voice information and the newly added voice information.

After the recovery, the audio signal corresponding to the third voice information can be normally transmitted to the multimedia conference, and other participants can normally hear the sound of the participants.

EXAMPLE III

As shown with reference to fig. 3. The embodiment provides a non-participating session shielding processing system for a multimedia conference, which comprises:

the first voice information collecting module 301 is configured to collect first voice information of the participant.

A first voiceprint information extracting module 302, configured to extract first voiceprint information from the first voice information.

A non-participant terminal detecting module 303, configured to detect a non-participant terminal near the participant terminal.

And the second voice information acquisition module 304 is configured to acquire second voice information of the owner from the detected non-participant terminal.

A second voiceprint information extracting module 305, configured to extract second voiceprint information from the second voice information.

The first voiceprint information comparison module 306 is configured to compare the second voiceprint information with the first voiceprint information, and determine whether the second voiceprint information is the same as the first voiceprint information.

And a person acquaintance determination module 307, configured to determine whether the owner of the non-participant terminal and the participant acquaintance with each other when determining that the second voiceprint information is different from the first voiceprint information.

And a second fingerprint information storage module 308, configured to store the second fingerprint information when it is determined that the owner of the non-participant terminal is acquainted with the participant.

And a new voice information acquisition module 309, configured to acquire new voice information in the environmental sound.

And a third voiceprint information extraction module 310, configured to extract third voiceprint information from the newly added voice information.

And a second voiceprint information determining module 311, configured to determine whether the third voiceprint information is the same as the second voiceprint information.

The third voice information determining module 312 is configured to determine whether the third voice information of the participant is generated immediately after the newly added voice information when it is determined that the third voiceprint information is the same as the second voiceprint information.

And a voiceprint determination module 313, configured to determine whether a phonic shift occurs in the third voice message relative to the first voice message when it is determined that the third voice message of the participant is generated immediately after the newly added vocal message.

The sound shielding module 314 is configured to shield the third voice information and the newly added voice information when it is determined that the third voice information is shifted in sound level relative to the first voice information.

Wherein, the non-participating terminal detecting module 303 is further configured to:

Wherein the second voice information collecting module 304 is further configured to:

and sending a second voice information acquisition request to the detected non-participating terminal.

Wherein the sound masking module 314 is further configured to:

As shown with reference to fig. 4. As a preferable mode of the embodiment of the present invention, the sound shielding module 314 includes:

and a newly added voice information determining module 315, configured to determine whether the newly added voice information in the environmental sound stops.

The sound level restoration determining module 316 is configured to determine whether the sound level of the third voice information is restored when it is determined that the voice information added in the environmental sound stops.

And the sound masking submodule 317 is configured to cancel the masking processing on the third voice information and the newly added voice information when it is determined that the sound level of the third voice information is recovered.

The implementation process of this embodiment is the same as that of the first and second embodiments, and the above contents are specifically referred to.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and are intended to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention. All equivalent changes or modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A multimedia conference non-participant conversation screening processing method is characterized by comprising the following steps:

detecting non-participant terminals near the participant terminals;

if so, storing the second voiceprint information;

2. The method as claimed in claim 1, wherein detecting non-participating terminals in the vicinity of the participating terminals comprises:

3. The method as claimed in claim 1, wherein the step of collecting the second voice information of the owner from the detected non-participating terminal comprises:

4. The method as claimed in claim 1, wherein the step of masking the third voice message and the additional voice message comprises:

5. The method as claimed in claim 1, wherein the method further comprises:

6. A system for processing non-participant conversation screening in a multimedia conference, comprising:

7. The system according to claim 6, wherein the non-participant terminal detection module is further configured to:

8. The system of claim 6, wherein the second voice information collection module is further configured to:

9. The system according to claim 6, wherein the sound masking module is further configured to:

10. The system of claim 6, wherein the sound masking module comprises: