CN111554314A

CN111554314A - Noise detection method, device, terminal and storage medium

Info

Publication number: CN111554314A
Application number: CN202010415327.0A
Authority: CN
Inventors: 鲍枫; 李岳鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2020-08-18

Abstract

The disclosure provides a noise detection method, a noise detection device, a terminal and a storage medium, and belongs to the technical field of artificial intelligence and cloud computing. The method comprises the following steps: displaying a first session interface of a multimedia session application; collecting an audio signal; determining the parameter value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal; in response to the parameter value of the noise-indicating parameter being greater than the parameter threshold, displaying a noise-cue message on the first session interface. The method and the device determine according to the signal state and the signal energy of the current frame audio signal, the detection information of the previous frame audio signal and the parameter threshold of the noise indication parameter, and comprehensively consider the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal instead of simply taking the signal state of the audio signal as a noise detection result, so that the contingency of the detection result is removed, and the accuracy of the detection result is improved.

Description

Noise detection method, device, terminal and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence and cloud computing technologies, and in particular, to a noise detection method, an apparatus, a terminal, and a storage medium.

Background

With the development of artificial intelligence technology and cloud computing technology, the session means and the session form are continuously changed, and the multimedia session is favored by more and more users due to the diversity of functions, and becomes the current mainstream session form. In the process of the multimedia session, because the environments of the users may be different and may change constantly, when the user speaks, noise exists in the background environment, and other users may not be able to clearly listen to the session content, thereby greatly reducing the session quality of the multimedia session.

When noise detection is performed in the related art, the following method may be adopted: collecting audio signals of a user in real time; detecting the collected audio signals by adopting VAD (Voice Activity Detection) to obtain the signal states of the audio signals, wherein the signal states comprise a Voice state or a non-Voice state; if the signal state of the audio signal is a non-speech state, it is determined that noise is present in the audio signal.

Since the signal state of the audio signal has a certain contingency, the related art directly uses the signal state of the audio signal as a noise detection result, so that the detection result of the related art is not accurate enough.

Disclosure of Invention

In order to improve the accuracy of a multimedia session environment noise detection result, the embodiments of the present disclosure provide a noise detection method, apparatus, terminal, and storage medium. The technical scheme is as follows:

in one aspect, a noise detection method is provided, and the method includes:

displaying a first session interface of a multimedia session application based on a first user identifier of a currently logged multimedia session application;

collecting an audio signal;

determining a change value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the process of detecting the audio signal frame by frame;

determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal;

and responding to the parameter value of the noise indication parameter of the current frame audio signal being larger than a parameter threshold value, and displaying noise prompt information on the first conversation interface.

In another embodiment of the present disclosure, the determining a variation value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal includes:

in response to that the signal state of the current frame audio signal is a speech state and the signal energy value of the current frame audio signal is greater than a first energy value, determining that the variation value of the noise indication parameter of the current frame audio signal is a first numerical value;

in response to that the signal state of the current frame audio signal is a speech state and the signal energy value of the current frame audio signal is smaller than a first energy value, determining that the variation value of the noise indication parameter of the current frame audio signal is a second value;

wherein the first value and the second value are positive values, and the first value is greater than the second value.

in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than the second energy value, determining that the variation value of the noise indication parameter of the current frame audio signal is a third value;

in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than a third energy value and less than the second energy value, determining that the variation value of the noise indication parameter of the current frame audio signal is a fourth value;

in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than a fourth energy value and less than the third energy value, determining that the variation value of the noise indication parameter of the current frame audio signal is a fifth value;

in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is smaller than the fourth energy value, determining that the variation value of the noise indication parameter of the current frame audio signal is a sixth numerical value;

wherein the third, fourth, fifth, and sixth values are negative values, the third value is greater than the fourth value, the fourth value is greater than the fifth value, and the fifth value is greater than the sixth value.

In another embodiment of the present disclosure, the determining the parameter value of the noise indication parameter of the current frame audio signal according to the variation value of the noise indication parameter of the current frame audio signal, the parameter value of the noise indication parameter of the previous frame audio signal, and the noise detection result corresponding to the previous frame audio signal includes:

in response to that the noise detection result corresponding to the previous frame of audio signal is a trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is greater than a seventh value, determining the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

and in response to that the noise detection result corresponding to the previous frame of audio signal is a trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the seventh value, determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

in response to that the noise detection result corresponding to the previous frame of audio signal is that no noise prompt is triggered and the parameter value of the noise indication parameter of the previous frame of audio signal is greater than an eighth value, determining the sum of the eighth value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

and in response to that the noise detection result corresponding to the previous frame of audio signal is that no noise prompt is triggered and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another aspect, a noise detection method is provided, the method including:

displaying a second session interface of the multimedia session application based on a second user identifier of the currently logged multimedia session application;

receiving a prompt message sent by a server, wherein the prompt message is sent by a terminal logging in a first user identifier when detecting that a parameter value of a noise indication parameter of a current frame audio signal is greater than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to a signal state and signal energy of the current frame audio signal and detection information of a previous frame audio signal;

and responding to the prompt message, and displaying noise prompt information corresponding to the first user identification on the second conversation interface.

In another aspect, there is provided a noise detection apparatus, the apparatus including:

the display module is used for displaying a first session interface of the multimedia session application based on a first user identifier of the current login multimedia session application;

the acquisition module is used for acquiring audio signals;

the determining module is used for determining a change value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the frame-by-frame detection process of the audio signal;

the determining module is configured to determine a parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal;

the display module is used for responding to the fact that the parameter value of the noise indication parameter is larger than a parameter threshold value, and displaying noise prompt information on the first session interface.

the display module is used for displaying a second session interface of the multimedia session application based on a second user identifier of the currently logged multimedia session application;

the receiving module is used for receiving a prompt message sent by the server, wherein the prompt message is sent by a terminal logging in a first user identifier when detecting that a parameter value of a noise indication parameter of a current frame audio signal is larger than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to a signal state and a signal energy of the current frame audio signal and detection information of a previous frame audio signal;

and the display module is used for responding to the prompt message and displaying the noise prompt information corresponding to the first user identification on the second conversation interface.

In another aspect, a terminal is provided, which includes a processor and a memory, where at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to implement the above noise detection method.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, the at least one program code being loaded and executed by a processor to implement the above-mentioned noise detection method.

The technical scheme provided by the embodiment of the disclosure has the following beneficial effects:

the method comprises the steps of acquiring an audio signal, and determining a parameter value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

In addition, the signal state and the signal energy of the audio signal are utilized in the detection process, other complex detection logics are not added, and detection resources are saved.

In addition, when the noisy environment is determined according to the noise detection result, the noise prompt information is displayed on the first session interface, so that the user can know the own session environment in time, and measures are taken to create a good session environment for other users.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment related to a noise detection method provided by an embodiment of the present disclosure;

fig. 2 is a flow chart of a noise detection method provided by the embodiment of the present disclosure;

fig. 3 is a flowchart of a noise detection method provided by an embodiment of the present disclosure;

fig. 4 is a flowchart of a noise detection method provided by the embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a session interface of a multimedia session application in a quiet environment according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a session interface of a multimedia session application in a noisy environment according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a session interface of a multimedia session application in another noisy environment according to an embodiment of the disclosure;

fig. 8 is a schematic structural diagram of a noise detection apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a noise detection apparatus according to an embodiment of the present disclosure;

fig. 10 shows a block diagram of a terminal according to an exemplary embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

It is to be understood that the terms "each," "a plurality," and "any" and the like, as used in the embodiments of the present disclosure, are intended to encompass two or more, each referring to each of the corresponding plurality, and any referring to any one of the corresponding plurality. For example, the plurality of words includes 10 words, and each word refers to each of the 10 words, and any word refers to any one of the 10 words.

Before carrying out the embodiments of the present disclosure, terms to which the embodiments of the present disclosure relate will be explained first.

A noisy environment is an environment in which noise is large and which easily causes auditory discomfort to a user.

The detection is to discriminate between an event and an environment.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR

The method comprises the following steps of (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction and the like, and also comprises common biological feature Recognition technologies such as face Recognition, fingerprint Recognition and the like.

Key technologies of Speech Technology (Speech Technology) are automatic Speech recognition Technology and Speech synthesis Technology, as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology. A user can share voice, data files and videos with teams and clients all over the world quickly and efficiently only by performing simple and easy-to-use operation through an internet interface, and complex technologies such as transmission and processing of data in a conference are assisted by a cloud conference service provider to operate.

At present, domestic cloud conferences mainly focus on Service contents mainly in a Software as a Service (SaaS a Service) mode, including Service forms such as telephones, networks and videos, and cloud computing-based video conferences are called cloud conferences.

In the cloud conference era, data transmission, processing and storage are all processed by computer resources of video conference manufacturers, users do not need to purchase expensive hardware and install complicated software, and efficient teleconferencing can be performed only by opening a browser and logging in a corresponding interface.

The cloud conference system supports multi-server dynamic cluster deployment, provides a plurality of high-performance servers, and greatly improves conference stability, safety and usability. In recent years, video conferences are popular with many users because of greatly improving communication efficiency, continuously reducing communication cost and bringing about upgrading of internal management level, and the video conferences are widely applied to various fields such as governments, armies, transportation, finance, operators, education, enterprises and the like. Undoubtedly, after the video conference uses cloud computing, the cloud computing has stronger attraction in convenience, rapidness and usability, and the arrival of new climax of video conference application is necessarily stimulated.

Based on an artificial intelligence technology and a cloud conference technology, the embodiment of the disclosure provides a noise detection method, the method includes acquiring an audio signal of a first user, acquiring a signal state and a signal energy of an audio signal of a current frame by adopting a conventional voice activity detection algorithm, determining a parameter value of a noise indication parameter of the audio signal of the current frame according to the signal state and the signal energy of the audio signal of the current frame, a parameter value of a noise indication parameter of an audio signal of a previous frame and a noise detection result corresponding to the audio signal of the previous frame, and displaying noise prompt information on a session interface when the parameter value of the noise indication parameter of the audio signal of the current frame is larger than a parameter threshold value. By adopting the method, when the first user speaking cannot communicate with the second user effectively in a noisy environment, the second user is informed of the fact that the second user is in the noisy environment in time, and the second user may not listen to the content spoken by the second user. When the parameter value of the noise indication parameter of the current frame audio signal of the first user is larger than the parameter threshold value, the embodiment of the disclosure also automatically reports the prompt message, so that the second user in the multimedia session can know that the speech content of the first user cannot be clearly listened to currently, and the speech content is not caused by network, software hardware and equipment factors, but is caused by the environment of the first user side.

Referring to fig. 1, an implementation environment related to the noise detection method provided by the embodiment of the present disclosure is shown, and referring to fig. 1, the implementation environment includes: a first terminal 101, a server 102 and a second terminal 103.

Wherein the first terminal 101 and the second terminal 103 are terminals used by users currently participating in the multimedia session. The first terminal 101 is the terminal used by the first user currently speaking and the second terminal 103 is the terminal used by the second user in the multimedia session. The first terminal 101 and the second terminal 103 are both installed with a multimedia session application, and based on the installed multimedia session application, the first user and the second user can perform a multimedia session, where the multimedia session form includes one of a video conference, a voice conference, a video call, or a voice call. The first terminal 101 and the second terminal 103 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like.

The server 102 is a background server of the multimedia session application, and the server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

The first terminal 101 and the server 102, and the server 102 and the second terminal 103 may be directly or indirectly connected through wired or wireless communication, and the embodiments of the present disclosure are not limited herein.

Based on the implementation environment shown in fig. 1, the embodiment of the present disclosure provides a noise detection method, which is performed by a first terminal as an example. Referring to fig. 2, a method flow provided by the embodiment of the present disclosure includes:

201. and displaying a first session interface of the multimedia session application based on the first user identification of the currently logged multimedia session application.

202. An audio signal is collected.

203. And determining the change value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the process of detecting the audio signal frame by frame.

Wherein, the signal state includes a speech state or a non-speech state, the speech state can be represented by 1, and the non-speech state can be represented by 0.

204. And determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal.

205. In response to the parameter value of the noise-indicating parameter being greater than the parameter threshold, displaying a noise-cue message on the first session interface.

The method provided by the implementation of the disclosure acquires the current frame audio signal, and determines the parameter value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

In another embodiment of the present disclosure, in response to a parameter value of a noise-indicating parameter being greater than a parameter threshold, displaying a noise-alerting message on a first session interface, comprising:

and in response to the parameter value of the noise indication parameter of the current frame audio signal being larger than the parameter threshold value, displaying a noise prompt text at a position, in a conversation member list of the first conversation interface, of which the distance from the first user identifier is smaller than a preset distance, wherein the conversation member list comprises a plurality of user identifiers participating in the multimedia conversation.

In another embodiment of the present disclosure, in response to a parameter value of a noise indication parameter of a current frame audio signal being greater than a parameter threshold, displaying a noisy prompt message on a first session interface, includes:

and in response to the parameter value of the noise indication parameter of the current frame audio signal being larger than the parameter threshold value, changing the display color of the microphone identification corresponding to the first user identification on the first conversation interface.

In another embodiment of the present disclosure, in response to a parameter value of a noise indication parameter of the current frame audio signal being greater than a parameter threshold, before displaying the noise cue information on the first session interface, the method further includes:

and responding to the parameter value of the noise indication parameter of the current frame audio signal being larger than the parameter threshold value and the noise prompt function being started, executing the step of displaying the noise prompt information on the first conversation interface.

In another embodiment of the present disclosure, the method further comprises:

and in response to the fact that the parameter value of the noise indication parameter of the current frame audio signal is larger than the parameter threshold value and the automatic mute function is started, closing the conversation sound corresponding to the first user identification.

In another embodiment of the present disclosure, after displaying the noise cue information on the first session interface in response to the parameter value of the noise indication parameter of the current frame audio signal being greater than the parameter threshold, the method further includes:

and hiding the noise prompt information in response to the parameter value of the noise indication parameter of the audio signal of the next frame being smaller than the parameter threshold value.

In another embodiment of the present disclosure, after displaying the noise cue information on the first session interface in response to the parameter value of the noise indicating parameter being greater than the parameter threshold, the method further includes:

and sending a prompt message to a server, wherein the prompt message is sent to the terminals logging in the plurality of second user identifications by the server, the prompt message is used for triggering the terminals logging in the second user identifications to display noise prompt information, and the second user identifications are other user identifications except the first user identification in the multimedia session.

In another embodiment of the present disclosure, determining a variation value of a noise indication parameter of a current frame audio signal according to a signal state and a signal energy of the current frame audio signal includes:

responding to the condition that the signal state of the current frame audio signal is a voice state and the signal energy value of the current frame audio signal is greater than a first energy value, and determining that the change value of the noise indication parameter of the current frame audio signal is a first value;

responding to the signal state of the current frame audio signal as a voice state and the signal energy value of the current frame audio signal is smaller than the first energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a second value;

the first value and the second value are positive values, and the first value is larger than the second value.

responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being greater than the second energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a third value;

responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being greater than the third energy value and less than the second energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a fourth value;

responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being greater than the fourth energy value and less than the third energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a fifth numerical value;

responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being less than the fourth energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a sixth value;

the third value, the fourth value, the fifth value and the sixth value are negative values, the third value is larger than the fourth value, the fourth value is larger than the fifth value, and the fifth value is larger than the sixth value.

In another embodiment of the present disclosure, determining a parameter value of a noise indication parameter of a current frame audio signal according to a variation value of the noise indication parameter of the current frame audio signal, a parameter value of the noise indication parameter of a previous frame audio signal, and a noise detection result corresponding to the previous frame audio signal includes:

in response to the fact that the noise detection result corresponding to the previous frame of audio signal is the trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is larger than the seventh value, determining the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

and in response to the fact that the noise detection result corresponding to the previous frame of audio signal is the trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the seventh value, determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

in response to the fact that the noise detection result corresponding to the previous frame of audio signal is that the noise prompt is not triggered and the parameter value of the noise indication parameter of the previous frame of audio signal is larger than the eighth value, determining the sum of the eighth value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal;

and in response to the fact that the noise detection result corresponding to the previous frame of audio signal is that the noise prompt is not triggered and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

Based on the implementation environment shown in fig. 1, an embodiment of the present disclosure provides a noise detection method, taking a second terminal to execute the embodiment of the present disclosure as an example, referring to fig. 3, a flow of the method provided by the embodiment of the present disclosure includes:

301. and displaying a second session interface of the multimedia session application based on the second user identification of the currently logged multimedia session application.

302. And receiving a prompt message sent by the server.

The prompt message is sent by a terminal logging in a first user identifier when detecting that a parameter value of a noise indication parameter of a current frame audio signal is larger than a parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to a signal state and signal energy of the current frame audio signal and detection information of a previous frame audio signal.

303. And responding to the prompt message, and displaying noise prompt information corresponding to the first user identification on the second conversation interface.

According to the method provided by the embodiment of the disclosure, the prompt message is received, the prompt message is sent when the terminal detects that the parameter value of the noise indication parameter of the current frame audio signal is greater than the parameter threshold value, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal.

In another embodiment of the present disclosure, in response to the noise prompt message, displaying noise prompt information corresponding to the first user identifier on the second session interface, including:

and responding to the prompt message, and displaying noise prompt words at a position, with a distance smaller than a preset distance, between the first user account and a conversation member list of the second conversation interface, wherein the conversation member list comprises a plurality of user identifications participating in the multimedia conversation.

In another embodiment of the present disclosure, in response to the prompt message, displaying noise prompt information corresponding to the first user identifier on the second session interface, including:

and changing the display color of the microphone identification corresponding to the first user identification on the second conversation interface in response to the prompt message.

Based on the implementation environment shown in fig. 1, an embodiment of the present disclosure provides a noise detection method, which is implemented by using the first terminal, the server, and the second terminal shown in fig. 1 as an example, and referring to fig. 4, a flow of the method provided by the embodiment of the present disclosure includes:

401. based on the first user identification of the currently logged-in multimedia session application, the first terminal displays a first session interface of the multimedia session application.

The multimedia session application is an application capable of implementing a multimedia session, and may be a multimedia conference application, a social application, or the like. The session interface of the multimedia session application is an interface for carrying a multimedia session process and managing the multimedia session, and the session interface of the multimedia session application can comprise identifications and head portraits of a plurality of users who are conducting the multimedia session, can also comprise a plurality of session function options, such as a mute option, an open video option, a share screen option, an invite session option, a manage member option, a chat option, an expression option, a document option, a setting option and the like, and can also comprise a session operation option, such as an end session option and the like.

In order for a user to use the multimedia session application, each user in the multimedia session application registers a user identification and password for logging into the multimedia session application. When a first user logs in the multimedia session application through a first user identifier and a password input on a login interface of the multimedia session application and performs a multimedia session with a plurality of second users, the first terminal may display the first session interface of the multimedia session application based on the first user identifier of the currently logged-in multimedia session application.

402. And based on the second user identification of the currently logged multimedia session application, the second terminal displays a second session interface of the multimedia session application.

When the second user logs in the multimedia session application through the second user identifier and the password input on the login interface of the multimedia session application and performs the multimedia session with the first user, the second terminal may display the second session interface of the multimedia session application based on the second user identifier of the currently logged-in multimedia session application. And the second user identification is other user identifications except the first user identification in the multimedia session.

403. The first terminal collects an audio signal.

In the process of carrying out a multimedia session with a second user, when a first user speaks, a first terminal collects an audio signal of the first user in real time by adopting equipment such as a microphone.

404. And in the process of detecting the audio signal frame by frame, the first terminal determines the change value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal.

For the collected audio signals, the first terminal divides the audio signals into a plurality of frames, and when noise detection is carried out based on the audio signals, the detection can be carried out frame by frame taking a frame as a unit. For the divided multi-frame audio signals, the divided multi-frame audio signals can be combined into a frame sequence according to the acquisition time sequence. The current frame audio signal is any frame in the frame sequence, and the current frame audio signal is also a frame audio signal processed when the noise detection is performed at this time. The previous frame audio signal is a frame preceding the current frame audio signal in the frame sequence, and the previous frame audio signal is also a frame audio signal processed when the noise detection is performed last time. In fact, the current frame audio signal is not fixed, and the current frame audio signal will be detected as the previous frame of the next frame audio signal in the course of time.

For the current frame audio signal, the first terminal may use an energy algorithm to obtain the signal energy of the current frame audio signal. When the signal energy of the current frame audio signal is acquired by adopting an energy algorithm, the first terminal can input the current frame audio signal into the oscilloscope to obtain the waveform corresponding to the current frame audio signal, and further acquire the signal energy of the current frame audio signal according to the amplitude of the waveform displayed on the oscilloscope.

For the signal state of the current frame audio signal, the first terminal may process the current frame audio signal by using a VAD algorithm. VAD is used to accurately locate the beginning and end of speech from a noisy audio signal, i.e. to separate the silence from the actual speech. When VAD is adopted to process the current frame audio signal, a threshold value can be preset, and if the signal energy of the current frame audio signal is greater than the threshold value, the signal state of the current frame audio signal is determined to be a voice state; and if the signal energy of the current frame audio signal is less than the threshold value, determining that the signal state of the current frame audio signal is a non-speech state.

Wherein the noise indication parameter is used to determine a parameter of the multimedia session environment. In the embodiment of the present disclosure, the signal states of the audio signals of the current frame are different, and the determined variation values of the noise indication parameters of the audio signals of the current frame are also different. For different signal states of the current frame audio signal, when the first terminal determines the variation value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal, the following conditions are included but not limited:

in the first case, the signal state of the current frame audio signal is a speech state.

In one possible implementation manner, in response to that the signal state of the current frame audio signal is a speech state and the signal energy value of the current frame audio signal is greater than the first energy value, the first terminal determines that the variation value of the noise indication parameter of the current frame audio signal is the first value.

In another possible implementation manner, in response to that the signal state of the current frame audio signal is a speech state and the signal energy value of the current frame audio signal is smaller than the first energy value, the first terminal determines that the variation value of the noise indication parameter of the current frame audio signal is a second value.

The first value and the second value can be determined according to statistical data, the first value and the second value are usually positive values, and the first value is larger than the second value, for example, the first value can be 50, 60, etc., and the second value can be 20, 30, etc. A first energy value may also be determined based on the statistical data, which may be-48 db, -50db, and so on.

For example, the first energy value is-48 db, the first value is 50, and the second value is 20. In response to that the signal state of the current frame audio signal is a speech state and that the signal energy value of the current frame audio signal is-20 db greater than the first energy value-48 db, determining that the variation value of the noise indication parameter of the current frame audio signal is 50; and in response to the signal state of the current frame audio signal being a speech state and the signal energy value-50 db of the current frame audio signal being less than the first energy value-48 db, determining the variation value of the noise indication parameter of the current frame audio signal to be 20.

In the second case, the signal state of the current frame audio signal is a non-speech state.

In one possible implementation manner, in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than the second energy value, the first terminal determines that the variation value of the noise indication parameter of the current frame audio signal is a third value.

In another possible implementation manner, in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than the third energy value and less than the second energy value, the first terminal determines that the variation value of the noise indication parameter of the current frame audio signal is a fourth value.

In another possible implementation manner, in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than the fourth energy value and less than the third energy value, the first terminal determines that the variation value of the noise indication parameter of the current frame audio signal is a fifth value.

In another possible implementation manner, in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is smaller than the fourth energy value, the first terminal determines that the variation value of the noise indication parameter of the current frame audio signal is a sixth value.

The second energy value, the third energy value and the fourth energy value can also be determined according to the statistical data, and the magnitude relationship of the second energy value, the third energy value and the fourth energy value is as follows: the second energy value is greater than the third energy value, and the third energy value is greater than the fourth energy value. The second energy value can be-38 db, -39db, etc., the third energy value can be-42 db, -43db, etc., and the fourth energy value can be-48 db, -49db, etc. The third, fourth, fifth and sixth values can also be determined according to statistical data, and usually the third, fourth, fifth and sixth values are negative values, and the magnitude relationship of the third, fourth, fifth and sixth values is: the third value is greater than the fourth value, the fourth value is greater than the fifth value, and the fifth value is greater than the sixth value. The third value may be-320, the fourth value may be-400, the fifth value may be-440, the sixth value may be-640, etc.

For example, the second energy value is-38 db, the third energy value is-42 db, the fourth energy value is-48 db, the third value is-320, the fourth value is-400, the fifth value is-440, and the sixth value is-640. In response to the signal state of the current frame audio signal being in a non-speech state and the signal energy value of the current frame audio signal being-20 db greater than the second energy value-38 db, determining the variation value of the noise indication parameter of the current frame audio signal to be-320; in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being-40 db greater than the third energy value-42 db and less than the second energy value-38 db, determining the change value of the noise indication parameter of the current frame audio signal to be-400; in response to the signal state of the current frame audio signal being a non-speech state and the signal energy value of the current frame audio signal being-45 db greater than the fourth energy value-48 db and less than the third energy value-42 db, determining the variation value of the noise indication parameter of the current frame audio signal to be-440; and in response to the signal state of the current frame audio signal being in the non-speech state and the signal energy value-50 db of the current frame audio signal being less than the fourth energy value-48 db, determining the variation value of the noise indication parameter of the current frame audio signal to be-640.

405. And the first terminal determines the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal.

The detection information of the previous frame of audio signal includes a parameter value of a noise indication parameter of the previous frame of audio signal, a noise detection result corresponding to the previous frame of audio signal, and the like. The noise detection result corresponding to the previous frame of audio signal comprises a triggered noise prompt or an un-triggered noise prompt. The noise detection result is used for representing the multimedia session environment of the first user identifier, the noise detection result corresponding to the previous frame of audio signal is used for representing the multimedia session environment of the first user identifier when the previous frame of audio signal is collected, and the noise detection result corresponding to the current frame of audio signal is used for representing the multimedia session environment of the first user identifier when the current frame of audio signal is collected.

In the embodiment of the present disclosure, the noise detection results corresponding to the previous frame of audio signal are different, and the parameter values of the noise indication parameters of the current frame of audio signal are also different. Aiming at different noise detection results corresponding to a previous frame of audio signal, when the first terminal determines the parameter value of the noise indication parameter of the current frame of audio signal according to the variation value of the noise indication parameter of the current frame of audio signal, the parameter value of the noise indication parameter of the previous frame of audio signal and the noise detection result corresponding to the previous frame of audio signal, the following conditions are included:

in the first case, the noise detection result corresponding to the previous frame of audio signal is the trigger noise prompt.

In a possible implementation manner, in response to that the noise detection result corresponding to the previous frame of audio signal is the trigger noise prompt, and the parameter value of the noise indication parameter of the previous frame of audio signal is greater than the seventh value, the first terminal determines the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another possible implementation manner, in response to that the noise detection result corresponding to the previous frame of audio signal is the trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the seventh value, the first terminal determines the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

Wherein the seventh value may be determined from the statistical data, the seventh value may be 1400 × 20, 1500 × 20, and so on. For example, the noise detection result corresponding to the previous frame of audio signal is a trigger noise prompt, the seventh value is 1400 × 20, and if the parameter value of the noise indication parameter of the previous frame of audio signal is 1500 × 20 greater than the seventh value 1400 × 20, the sum of the 1400 × 20 and the change value of the noise indication parameter of the current frame of audio signal is determined as the parameter value of the noise indication parameter of the current frame of audio signal; and if the parameter value of the noise indicating parameter of the previous frame audio signal is 1300 x 20 and is smaller than the seventh value 1400 x 20, determining the sum of the parameter value of the noise indicating parameter of the previous frame audio signal and the change value of the noise indicating parameter of the current frame audio signal as the parameter value of the noise indicating parameter of the current frame audio signal.

In the second case, the corresponding noise detection result of the previous frame of audio signal is an un-triggered noise prompt.

In a possible implementation manner, in response to that the noise detection result corresponding to the previous frame of audio signal is that the noise prompt is not triggered, and the parameter value of the noise indication parameter of the previous frame of audio signal is greater than the eighth value, the first terminal determines the sum of the eighth value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another possible implementation manner, in response to that the noise detection result corresponding to the previous frame of audio signal is that the noise cue is not triggered and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, the first terminal determines the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

Wherein the eighth value may be determined from the statistical data, the eighth value may be 1200 x 20, 1300 x 20, and so on. For example, if the noise detection result corresponding to the previous frame of audio signal is that the noise cue is not triggered, and the eighth value is 1300 × 20, if the parameter value of the noise indication parameter of the previous frame of audio signal is 1400 × 20 greater than the eighth value 1300 × 20, determining the sum of the 1300 × 20 and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal; and if the parameter value of the noise indicating parameter of the previous frame audio signal is 1200 × 20 and is smaller than the eighth numerical value 1300 × 20, determining the sum of the parameter value of the noise indicating parameter of the previous frame audio signal and the change value of the noise indicating parameter of the current frame audio signal as the parameter value of the noise indicating parameter of the current frame audio signal.

406. In response to the parameter value of the noise-indicating parameter being greater than the parameter threshold, the first terminal displays a noise-prompting message on the first session interface.

In the embodiment of the present disclosure, when the signal energy of the audio signal of the current frame is different, the parameter threshold for determining the multimedia session environment is also different. For example, when the signal energy of the current frame audio signal is greater than-31 db, the parameter threshold may be 5400, when the signal energy of the current frame audio signal is greater than-38 db and less than-31 db, the parameter threshold may be 23600, when the signal energy of the current frame audio signal is greater than-44 db and less than-38 db, the parameter threshold may be 16400, and when the signal energy of the current frame audio signal is less than-44 db, the parameter threshold may be 9200.

Based on the determined parameter threshold, the first terminal compares the parameter value of the noise indication parameter of the current frame audio signal with the parameter threshold, and determines that the noise detection result corresponding to the current frame audio signal is a trigger noise prompt in response to the fact that the parameter value of the noise indication parameter of the current frame audio signal is greater than or equal to the parameter threshold, namely that the multimedia session environment is a noisy environment; and in response to the fact that the parameter value of the noise indication parameter of the current frame audio signal is smaller than the parameter threshold value, determining that the noise detection result corresponding to the current frame audio signal is the non-triggered noise prompt, namely that the multimedia session environment is a quiet environment.

In the implementation of the present disclosure, the first terminal may provide an interface, where the interface is used for communication between a background and a foreground of the first terminal, and is capable of outputting a noise detection result of a multimedia session environment to the front end. The first terminal comprises an algorithm layer, a middle layer and a UI interface layer, wherein the algorithm layer belongs to a background, the middle layer is connected with a foreground and a background, and the UI interface layer belongs to a foreground. Based on the interface and the noise detection result corresponding to the current frame audio signal, the algorithm layer of the first terminal can obtain a noise detection result identifier according to the noise detection result, wherein the noise detection result identifier comprises a noise identifier and a quiet identifier, the middle layer obtains the noise detection result identifier and determines whether to display noise prompt information on the first session interface according to the noise detection result identifier, if the noise detection result identifier is the noise identifier, the UI interface layer of the first terminal displays the noise prompt information on the first session interface, otherwise, the UI interface layer of the first terminal does not display the noise prompt information on the first session interface.

The first terminal displays the noise prompt information on the first session interface, and the following modes can be adopted:

in a first mode, when the clicking operation on the conversation member option on the first conversation interface is detected, the first terminal displays a conversation member list, and the conversation member list comprises a plurality of user identifications for referring to the multimedia conversation. And responding to the fact that the parameter value of the noise indication parameter is larger than the parameter threshold value, the multimedia session environment is a noisy environment, and the first terminal can display noise prompt words on a position, where the distance between the first terminal and the first user identifier is smaller than a preset distance, in the session member list of the first session interface. The preset distance may be determined according to the length of the rows and columns of the conversation member list, and may be 1 cm, 2 cm, and the like. To be able to alert the first user, the first terminal may highlight the noise-alerting text, e.g., bold, highlight, etc. Fig. 5 is a session interface of a multimedia session application in a quiet environment, and fig. 6 is a session interface of a multimedia session application in a noisy environment, and it can be known from comparing fig. 5 and fig. 6 that when a multimedia session environment corresponding to a user identifier AAAA is a noisy environment, two words of "noisy" prompt characters are displayed near the user identifier AAAA on a session member list of the session interface shown in fig. 6.

In the second way, in response to that the parameter value of the noise indication parameter is greater than the parameter threshold value, the multimedia session environment is a noisy environment, and the first terminal may change the display color of the microphone identifier corresponding to the first user identifier on the first session interface, for example, change the display color of the microphone identifier from green to red, and the like.

Referring to fig. 7, in order to provide more choices to the first user, the disclosed embodiment will also add a noise-cue sub-option and an automatic mute sub-option to the original audio options of the conversational interface. When the clicking operation on the noise prompt option is detected, the noise prompt function is started, and in response to the fact that the parameter value of the noise indication parameter is larger than the parameter threshold value, noise prompt information can be displayed on the first conversation interface. When the clicking operation on the automatic mute sub-option is detected, the automatic mute function is started, the first terminal can close the conversation sound corresponding to the first user identification in response to the fact that the parameter value of the noise indication parameter is larger than the parameter threshold value, the collected audio signal is not sent to the server, the second user participating in the multimedia conversation can not receive the audio signal of the first user, and therefore when the first user is in a noisy environment, the effect of muting the first user is achieved. Of course, if the multimedia session environment corresponding to the first user identifier changes from a noisy environment to a quiet environment, the first terminal may start the session sound corresponding to the first user identifier again, and send the acquired audio signal to the server, and the server sends the audio signal to each second user participating in the multimedia session.

In another embodiment of the present disclosure, after the noise-indicating information is displayed on the first session interface in response to the parameter value of the noise-indicating parameter being greater than the parameter threshold, the first terminal hides the noise-indicating information in response to the parameter value of the noise-indicating parameter of the next frame of audio signal being less than the parameter threshold, thereby preventing the displayed noise-indicating information from interfering with the first user.

407. The first terminal sends a prompt message to the server.

In order to make other users participating in the multimedia session know about the current multimedia session environment of the first user, the first terminal may further send a prompt message to the server after detecting that the parameter value of the noise indication parameter of the current frame audio signal is greater than the parameter threshold value.

408. And the server sends the prompt message to the second terminal.

When receiving the prompt message sent by the first terminal, the server can send the prompt message to the second terminal in a wired or wireless communication mode.

409. And when a prompt message sent by the server is received, responding to the prompt message, and displaying the noise prompt message corresponding to the first user identification on the second session interface by the second terminal.

And after receiving the prompt message sent by the server, responding to the prompt message, and displaying the noise prompt information corresponding to the first user identification on the second session interface by the second terminal so as to prompt the second user.

The second terminal displays the noise prompt information corresponding to the first user identifier on the second session interface, and the following modes can be adopted:

in the first mode, the second terminal can display the noise prompt words at the position where the distance between the conversation member list of the second conversation interface and the first user account is smaller than the preset distance.

In the second mode, the second terminal can change the display color of the microphone identifier corresponding to the first user identifier on the second conversation interface.

In another embodiment of the present disclosure, when the multimedia session environment corresponding to the first user identifier changes from a noisy environment to a quiet environment, in response to the noise detection result changing from the noisy environment to the quiet environment, the second terminal will also hide the noise prompt information, thereby avoiding interference to the second user.

According to the method provided by the embodiment of the disclosure, the current frame audio signal is collected, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

Referring to fig. 8, an embodiment of the present disclosure provides a noise detection apparatus, including:

a display module 801, configured to display a first session interface of a multimedia session application based on a first user identifier of a currently logged-in multimedia session application;

an acquisition module 802 for acquiring an audio signal;

a determining module 803, configured to determine a change value of a noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal;

a determining module 803, configured to determine, based on the audio signal frame-by-frame detection process, a parameter value of a noise indication parameter of the current frame audio signal according to a change value of the noise indication parameter of the current frame audio signal and detection information of the previous frame audio signal;

a display module 801, configured to display a noise prompt message on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold.

In another embodiment of the present disclosure, the display module 801 is configured to, in response to a parameter value of the noise indication parameter being greater than a parameter threshold value, display a noise prompt text at a position in a conversation member list of the first conversation interface where a distance between the conversation member list and the first user identifier is less than a preset distance, where the conversation member list includes a plurality of user identifiers participating in the multimedia conversation.

In another embodiment of the present disclosure, the display module 801 is configured to change a display color of a microphone identifier corresponding to the first user identifier on the first session interface in response to the parameter value of the noise indication parameter being greater than the parameter threshold.

In another embodiment of the present disclosure, the first and second substrates are,

the display module 801 is further configured to display a noise prompt message on the first session interface in response to that the parameter value of the noise indication parameter is greater than the parameter threshold and the noise prompt function is turned on.

the display module 801 is further configured to, in response to that a parameter value of the noise indication parameter is greater than a parameter threshold and the automatic muting function is turned on, turn off the conversation sound corresponding to the first user identifier.

In another embodiment of the present disclosure, the apparatus further comprises:

and the hiding module is used for hiding the noise prompt information in response to the fact that the parameter value of the noise indication parameter of the audio signal of the next frame is smaller than the parameter threshold value.

In another embodiment of the present disclosure, the apparatus comprises:

and the sending module is used for sending a prompt message to the server, the prompt message is sent to the terminals logging in the plurality of second user identifications by the server, the prompt message is used for triggering the terminals logging in the second user identifications to display noise prompt information, and the second user identifications are other user identifications except the first user identification in the multimedia session.

In another embodiment of the present disclosure, the determining module 803 is configured to determine, in response to that the signal state of the current frame audio signal is a speech state and the signal energy value of the current frame audio signal is greater than a first energy value, that the variation value of the noise indication parameter of the current frame audio signal is a first value; responding to the signal state of the current frame audio signal as a voice state and the signal energy value of the current frame audio signal is smaller than the first energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a second value;

In another embodiment of the present disclosure, the determining module 803 is configured to determine, in response to that the signal state of the current frame audio signal is a non-speech state and the signal energy value of the current frame audio signal is greater than the second energy value, that the variation value of the noise indication parameter of the current frame audio signal is a third value; responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being greater than the third energy value and less than the second energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a fourth value; responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being greater than the fourth energy value and less than the third energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a fifth numerical value; responding to the signal state of the current frame audio signal being a non-voice state and the signal energy value of the current frame audio signal being less than the fourth energy value, and determining the change value of the noise indication parameter of the current frame audio signal as a sixth value;

In another embodiment of the present disclosure, the determining module 803 is configured to, in response to that the noise detection result corresponding to the previous frame of audio signal is a trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is greater than a seventh value, determine the sum of the seventh value and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal; and in response to the fact that the noise detection result corresponding to the previous frame of audio signal is the trigger noise prompt and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the seventh value, determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In another embodiment of the present disclosure, the determining module 803 is configured to, in response to that the noise detection result of the previous frame of audio signal is that no noise cue is triggered and the parameter value of the noise indicating parameter of the previous frame of audio signal is greater than an eighth value, determine the sum of the eighth value and the variation value of the noise indicating parameter of the current frame of audio signal as the parameter value of the noise indicating parameter of the current frame of audio signal; and in response to the fact that the noise detection result of the previous frame of audio signal is that the noise prompt is not triggered and the parameter value of the noise indication parameter of the previous frame of audio signal is smaller than the eighth value, determining the sum of the parameter value of the noise indication parameter of the previous frame of audio signal and the change value of the noise indication parameter of the current frame of audio signal as the parameter value of the noise indication parameter of the current frame of audio signal.

In summary, the apparatus provided in the embodiment of the present disclosure acquires the audio signal, and determines the parameter value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

Referring to fig. 9, an embodiment of the present disclosure provides a noise detection apparatus, including:

a display module 901, configured to display a second session interface of the multimedia session application based on a second user identifier of the currently logged multimedia session application;

a receiving module 902, configured to receive a prompt message sent by a server, where the prompt message is sent by a terminal logging in a first user identifier when detecting that a parameter value of a noise indication parameter of a current frame audio signal is greater than a parameter threshold, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to a signal state and a signal energy of the current frame audio signal and detection information of a previous frame audio signal;

a display module 901, configured to display, in response to the prompt message, noise prompt information corresponding to the first user identifier on the second session interface.

In another embodiment of the present disclosure, the display module 901 is configured to, in response to the prompt message, display a noise prompt text at a position, in a conversation member list of the second conversation interface, where a distance between the conversation member list and the first user account is smaller than a preset distance, where the conversation member list includes a plurality of user identifiers participating in the multimedia conversation.

In another embodiment of the present disclosure, the display module 901 is configured to change a display color of a microphone identifier corresponding to the first user identifier on the second session interface in response to the prompt message.

In summary, the apparatus provided in the embodiment of the present disclosure receives a prompt message, where the prompt message is sent by the terminal when detecting that the parameter value of the noise indication parameter of the current frame audio signal is greater than the parameter threshold, and the parameter value of the noise indication parameter of the current frame audio signal is determined according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

Fig. 10 shows a block diagram of a terminal 1000 according to an exemplary embodiment of the disclosure. The terminal 1000 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio layer iii, motion video Experts compression standard Audio layer 3), an MP4 player (Moving Picture Experts Group Audio layer IV, motion video Experts compression standard Audio layer 4), a notebook computer, or a desktop computer. Terminal 1000 can also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or the like by other names.

In general, terminal 1000 can include: a processor 1001 and a memory 1002.

Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement the noise detection methods provided by method embodiments herein.

In some embodiments, terminal 1000 can also optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, display screen 1005, camera 1006, audio circuitry 1007, positioning components 1008, and power supply 1009.

The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to capture touch signals on or over the surface of the display screen 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this point, the display screen 1005 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display screen 1005 can be one, providing a front panel of terminal 1000; in other embodiments, display 1005 can be at least two, respectively disposed on different surfaces of terminal 1000 or in a folded design; in still other embodiments, display 1005 can be a flexible display disposed on a curved surface or on a folded surface of terminal 1000. Even more, the display screen 1005 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display screen 1005 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For stereo sound collection or noise reduction purposes, multiple microphones can be provided, each at a different location of terminal 1000. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1007 may also include a headphone jack.

A location component 1008 is employed to locate a current geographic location of terminal 1000 for navigation or LBS (location based Service). The positioning component 1008 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 1009 is used to supply power to various components in terminal 1000. The power source 1009 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.

Acceleration sensor 1011 can detect acceleration magnitudes on three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1011 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1012 may detect a body direction and a rotation angle of the terminal 1000, and the gyro sensor 1012 and the acceleration sensor 1011 may cooperate to acquire a 3D motion of the user on the terminal 1000. From the data collected by the gyro sensor 1012, the processor 1001 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1013 can be disposed on a side frame of terminal 1000 and/or underneath display screen 1005. When pressure sensor 1013 is disposed on a side frame of terminal 1000, a user's grip signal on terminal 1000 can be detected, and processor 801 performs right-left hand recognition or shortcut operation according to the grip signal collected by pressure sensor 813. When the pressure sensor 1013 is disposed at a lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the user according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Fingerprint sensor 1014 can be disposed on the front, back, or side of terminal 1000. When a physical key or vendor Logo is provided on terminal 1000, fingerprint sensor 1014 can be integrated with the physical key or vendor Logo.

The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 according to the ambient light intensity collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.

Proximity sensor 1016, also known as a distance sensor, is typically disposed on a front panel of terminal 1000. Proximity sensor 1016 is used to gather the distance between the user and the front face of terminal 1000. In one embodiment, when proximity sensor 1016 detects that the distance between the user and the front surface of terminal 1000 is gradually reduced, processor 1001 controls display screen 1005 to switch from a bright screen state to a dark screen state; when proximity sensor 1016 detects that the distance between the user and the front of terminal 1000 is gradually increased, display screen 1005 is controlled by processor 1001 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and that terminal 1000 can include more or fewer components than shown, or some components can be combined, or a different arrangement of components can be employed.

The terminal provided by the embodiment of the disclosure determines the parameter value of the noise indication parameter of the current frame audio signal by acquiring the audio signal and according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

The disclosed embodiments provide a computer readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement the noise detection method shown in fig. 2 or fig. 3 or fig. 4. The computer readable storage medium may be non-transitory. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

The computer-readable storage medium provided by the embodiment of the disclosure acquires the audio signal, and determines the parameter value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal and the detection information of the previous frame audio signal. Because the signal state of the audio signal is not simply taken as the noise detection result, but the signal energy of the current frame audio signal and the related detection information of the previous frame audio signal are comprehensively considered, the contingency of the detection result is removed, and the accuracy of the detection result is improved.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is intended to be exemplary only and not to limit the present disclosure, and any modification, equivalent replacement, or improvement made without departing from the spirit and scope of the present disclosure is to be considered as the same as the present disclosure.

Claims

1. A method of noise detection, the method comprising:

collecting an audio signal;

in response to the parameter value of the noise-indicating parameter being greater than a parameter threshold, displaying a noise-cue message on the first session interface.

2. The method of claim 1, wherein displaying a noise cue on the first session interface in response to the parameter value of the noise-indicating parameter being greater than a parameter threshold comprises:

and responding to the parameter value of the noise indication parameter being larger than a parameter threshold value, displaying noise prompt words at positions, in a conversation member list of the first conversation interface, of which the distance between the conversation member list and the first user identifier is smaller than a preset distance, wherein the conversation member list comprises a plurality of user identifiers participating in a multimedia conversation.

3. The method of claim 1, wherein displaying a noise cue on the first session interface in response to the parameter value of the noise-indicating parameter being greater than a parameter threshold comprises:

and in response to the parameter value of the noise indication parameter being larger than a parameter threshold value, changing the display color of the microphone identification corresponding to the first user identification on the first conversation interface.

4. The method of claim 1, wherein, in response to the parameter value of the noise-indicating parameter being greater than a parameter threshold value, prior to displaying a noise-alerting message on the first session interface, further comprising:

and in response to the parameter value of the noise indication parameter being greater than a parameter threshold value and a noise prompt function being turned on, performing the step of displaying noise prompt information on the first session interface.

5. The method of claim 1, further comprising:

and in response to that the parameter value of the noise indication parameter is larger than a parameter threshold value and an automatic mute function is started, closing the conversation sound corresponding to the first user identification.

6. The method of claim 1, wherein, in response to the parameter value of the noise-indicating parameter being greater than a parameter threshold value, after displaying a noise-alerting message on the first session interface, further comprising:

7. The method of claim 1, wherein, in response to the parameter value of the noise-indicating parameter being greater than a parameter threshold value, after displaying a noise-alerting message on the first session interface, further comprising:

and sending a prompt message to a server, wherein the prompt message is sent to a terminal for logging in a plurality of second user identifications by the server, the prompt message is used for triggering the terminal for logging in the second user identifications to display the noise prompt information, and the second user identifications are other user identifications except the first user identification in the multimedia session.

8. The method according to any one of claims 1 to 7, wherein the determining the parameter value of the noise indication parameter of the current frame audio signal according to the variation value of the noise indication parameter of the current frame audio signal and the detection information of the previous frame audio signal comprises:

and determining the parameter value of the noise indication parameter of the current frame audio signal according to the change value of the noise indication parameter of the current frame audio signal, the parameter value of the noise indication parameter of the previous frame audio signal and the noise detection result corresponding to the previous frame audio signal.

9. A method of noise detection, the method comprising:

10. The method of claim 9, wherein the displaying noise prompt information corresponding to the first user identifier on the second session interface in response to the prompt message comprises:

and responding to the prompt message, and displaying noise prompt words on a position, with a distance between the position and the first user account being smaller than a preset distance, in a conversation member list of the second conversation interface, wherein the conversation member list comprises a plurality of user identifications participating in the multimedia conversation.

11. The method of claim 9, wherein the displaying noise prompt information corresponding to the first user identifier on the second session interface in response to the prompt message comprises:

and responding to the prompt message, and changing the display color of the microphone identification corresponding to the first user identification on the second conversation interface.

12. A noise detection apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring audio signals;

the determining module is used for determining the change value of the noise indication parameter of the current frame audio signal according to the signal state and the signal energy of the current frame audio signal in the frame-by-frame detection process of the audio signal;

13. A noise detection apparatus, characterized in that the apparatus comprises:

14. A terminal, characterized in that the terminal comprises a processor and a memory, in which at least one program code is stored, which is loaded and executed by the processor to implement the noise detection method according to any of claims 1 to 8, or to implement the noise detection method according to any of claims 9 to 11.

15. A computer-readable storage medium, having stored therein at least one program code, which is loaded and executed by a processor, to implement the noise detection method according to any one of claims 1 to 8, or the noise detection method according to any one of claims 9 to 11.