CN115706875A - Method, device and equipment for optimizing talkback voice quality and storage medium - Google Patents

Method, device and equipment for optimizing talkback voice quality and storage medium Download PDF

Info

Publication number
CN115706875A
CN115706875A CN202110904992.0A CN202110904992A CN115706875A CN 115706875 A CN115706875 A CN 115706875A CN 202110904992 A CN202110904992 A CN 202110904992A CN 115706875 A CN115706875 A CN 115706875A
Authority
CN
China
Prior art keywords
voice
state
talkback
sound information
intercom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110904992.0A
Other languages
Chinese (zh)
Inventor
刘建兵
冯波
刘永辉
徐圣杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fanvil Technology Co ltd
Original Assignee
Fanvil Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fanvil Technology Co ltd filed Critical Fanvil Technology Co ltd
Priority to CN202110904992.0A priority Critical patent/CN115706875A/en
Publication of CN115706875A publication Critical patent/CN115706875A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a talkback voice quality optimization method, a talkback voice quality optimization device, talkback voice quality optimization equipment and a storage medium, wherein the talkback voice quality optimization method applies a first talkback device which is in communication connection with a second talkback device, and the method comprises the following steps: acquiring sound information acquired by first intercom equipment in real time, and analyzing the sound information to obtain a first voice state of the first intercom equipment, wherein the voice state comprises a call state and a mute state; recording a first voice state, sending the first voice state to second talkback equipment, and receiving a second voice state sent by the second talkback equipment in real time; and carrying out echo cancellation or noise suppression processing on the sound information according to the first sound state and the second sound state in a preset sound processing mode, and then sending the sound information to the second talkback equipment. The invention adjusts the sound processing mode of the talkback equipment in real time according to the voice state of the talkback equipment at two ends, and realizes the echo cancellation and noise reduction processing effects of the sound while ensuring the voice quality.

Description

Method, device and equipment for optimizing talkback voice quality and storage medium
Technical Field
The invention relates to the technical field of talkback equipment, in particular to a method, a device, equipment and a storage medium for optimizing the quality of talkback voice.
Background
With the rapid development of modern information technology industry, VOIP communication technology has also been rapidly developed, and various voice communication devices such as video conference systems, hands-free phones, mobile communications, hearing aids, etc. are continuously appearing, so that people can communicate more conveniently and comfortably.
In VOIP communication, the requirement for voice call quality is higher and higher, and currently most products adopt a special dsp processing chip to implement echo cancellation and noise reduction processing. However, in embedded systems and internet of things products, due to considerations on product size and cost, it is becoming an industry trend to abandon a conventional dsp dedicated chip and implement a dsp algorithm on a general chip by using software.
However, most current dsp algorithms for echo cancellation and noise reduction are numerous, but most dsp algorithms can only be applied to a specific situation, for example, some echo cancellation algorithms have a good echo cancellation effect when a single end is speaking, but when two ends are speaking simultaneously, the echo cancellation effect will also cancel the effective voice of the local end, which results in that the sound heard by the opposite end is intermittent, and the voice talkback effect is poor.
Disclosure of Invention
The application provides a talkback voice quality optimization method, device, equipment and storage medium, which aim to solve the technical problem that the talkback voice quality of the existing talkback equipment is poor.
In order to solve the above problem, the present application provides, on one hand, a method for optimizing talkback voice quality, which employs a first talkback device, where the first talkback device is in communication connection with a second talkback device; the method comprises the following steps: acquiring sound information acquired by first intercom equipment in real time, and analyzing the sound information to obtain a first voice state of the first intercom equipment, wherein the voice state comprises a call state and a mute state; recording a first voice state, sending the first voice state to second talkback equipment, and receiving a second voice state sent by the second talkback equipment in real time; and carrying out echo cancellation or noise suppression processing on the sound information according to the first sound state and the second sound state in a preset sound processing mode, and then sending the sound information to the second talkback equipment.
As a further improvement of the present application, acquiring, in real time, sound information collected by a first intercom device, and analyzing the sound information to obtain a first speech state of the first intercom device, includes: and acquiring continuous sound information segments of the first intercom device according to a preset period, and confirming the first voice state according to the sound information segments.
As a further improvement of the present application, acquiring a continuous sound information segment of the first intercom device itself according to a preset period, and confirming the first speech state according to the sound information segment, includes: acquiring continuous sound information segments of the first intercom device according to a preset period, and determining whether the first intercom device recorded currently is in a conversation state or a mute state; when the first talkback equipment is in a conversation state, sequentially identifying whether each sound information segment is voice or non-voice, and when the continuous sound information segments are non-voice, re-marking the voice state of the first talkback equipment as a mute state, otherwise, not replacing the voice state of the first talkback equipment; when the first talkback equipment is in a mute state, sequentially identifying whether each sound information segment is voice or non-voice, and when the continuous sound information segments are voice, re-marking the voice state of the first talkback equipment as a call state, otherwise, not replacing the voice state of the first talkback equipment.
As a further development of the application, the recognition of the voice message sections is carried out on the basis of VAD techniques.
As a further improvement of the present application, performing echo cancellation or noise suppression processing on the sound information according to a preset sound processing manner according to the first sound state and the second sound state, and then sending the sound information to the second intercom device includes: acquiring a first voice state and a second voice state; when the first voice state is a call state and the second voice state is a mute state or the first voice state and the second voice state are both mute states, carrying out strong noise reduction processing on the voice information acquired by the first talkback equipment, and then sending the voice information to the second talkback equipment; and when the first voice state is a mute state and the second voice state is a call state or both the first voice state and the second voice state are call states, performing echo cancellation processing on the sound information acquired by the first intercom device, and then sending the sound information to the second intercom device.
As a further improvement of the present application, when the first voice state is a mute state and the second voice state is a call state, or both the first voice state and the second voice state are call states, the method performs echo cancellation processing on the sound information collected by the first intercom device, and includes: when the first voice state is a mute state and the second voice state is a call state, carrying out strong echo cancellation processing on the voice information acquired by the first intercom device; and when the first voice state and the second voice state are both in a call state, carrying out weak echo cancellation processing on the voice information acquired by the first intercom device, wherein the echo cancellation capability of the weak echo cancellation processing is weaker than that of the strong echo cancellation processing.
As a further improvement of the present application, when the first voice state and the second voice state are both in a call state, the method further includes, while performing weak echo cancellation processing on the sound information collected by the first intercom device: and receiving the voice information sent by the second talkback equipment, and playing the voice information of the second talkback equipment after the voice information is suppressed.
In order to solve the above problem, the present application provides, in another aspect, a talkback voice quality optimization apparatus, which includes: the state confirmation module is used for acquiring the sound information acquired by the first intercom device in real time and analyzing the sound information to obtain a first voice state of the first intercom device, wherein the voice state comprises a call state and a mute state; the state transceiving module is used for recording a first voice state, sending the first voice state to the second talkback equipment and receiving a second voice state sent by the second talkback equipment in real time; and the sound processing module is used for carrying out echo cancellation or noise suppression processing on the sound information according to the first voice state and the second voice state in a preset sound processing mode and then sending the sound information to the second talkback equipment.
In order to solve the above problem, the present application provides in another aspect a computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions, which when executed by the processor, cause the processor to perform the steps of the talkback voice quality optimization method as in any one of the above.
In order to solve the above problem, the present application provides in another aspect a storage medium storing program instructions capable of implementing the talkback voice quality optimization method as in any one of the above.
The beneficial effect of this application: according to the talkback voice quality optimization method, whether the local talkback equipment is in a conversation state or a mute state is analyzed through the voice information collected by the talkback equipment, then echo cancellation or noise suppression processing is carried out on the voice information collected by the local talkback equipment according to the voice states of the local talkback equipment and the opposite talkback equipment in a preset voice processing mode, namely whether the echo cancellation processing or the noise suppression processing is carried out on the local talkback equipment is related to the voice states of the local talkback equipment and the opposite talkback equipment and is not fixed, so that the local talkback equipment can adjust the voice processing algorithm in real time according to the voice states, and the proper voice processing algorithm is adopted for processing at proper time to avoid influencing the voice quality, and meanwhile, the echo cancellation and the noise reduction processing of the voice are effectively realized.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:
FIG. 1 is a schematic flow chart diagram of a method for optimizing the quality of an intercom voice according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of an apparatus for optimizing the quality of intercom voice according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a storage medium according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a talkback voice quality optimization method according to an embodiment of the present invention. In this embodiment, the method for optimizing the talkback voice quality is applied to a first talkback device, the first talkback device is in communication connection with a second talkback device, and it should be noted that, in this embodiment, the first talkback device and the second talkback device form a dual-talk network, the first talkback device refers to a local device in the dual-talk network, and the second talkback device refers to an opposite device in the dual-talk network.
As shown in fig. 1, the method for optimizing the quality of the talkback voice includes:
step S101: the method comprises the steps of acquiring sound information acquired by first intercom equipment in real time, and analyzing the sound information to obtain a first voice state of the first intercom equipment, wherein the voice state comprises a call state and a mute state.
Generally, during double-talk, situations of single-talk, double-talk, and double-talk silence occur, wherein the single-talk can be further classified as local-talk and opposite-talk silence, or local-talk and opposite-talk situations, so that for the first intercom device, the corresponding voice states include a talk state and a silence state. In step S101, in order to achieve the purpose of adjusting the sound processing algorithm according to the voice state of the intercom device, in this embodiment, sound information currently acquired by the first intercom device itself needs to be acquired in real time, where the sound information may be voice information when the user speaks or may be meaningless noise, and after the sound information is acquired, the sound information is analyzed to determine whether the sound information is voice information or noise information, and if the sound information includes voice information, it indicates that the current user is sending voice to the second intercom device through the first intercom device, that is, the first intercom device is in a conversation state at this time; if the voice information is not included in the voice information, it indicates that the current user does not input voice into the first intercom device, and at this time, the first intercom device is in a mute state.
In some embodiments, to ensure the accuracy of the identification, the step S101 may further be: and acquiring continuous sound information segments of the first intercom device according to a preset period, and confirming the first voice state according to the sound information segments.
Specifically, the first voice state of the first intercom device can be confirmed by respectively identifying a plurality of continuous voice information segments, confirming whether each voice information segment corresponds to voice information or noise information, and then utilizing the identification results of the plurality of voice information segments, so that the probability of misjudgment caused by one-time detection, namely judgment, is reduced.
It should be understood that, in order to prevent the progress of the intercom process from being affected, the preset period is usually set to be a very short period of time, for example, 10 milliseconds, 20 milliseconds, etc., and the number of consecutive pieces of sound information may also be preset, for example, 8 pieces, 10 pieces, etc., and when a certain number of pieces of sound information, which are pieces of voice information, of the consecutive pieces of sound information occupy, the first intercom device may be considered to be in a conversation state, and otherwise, to be in a mute state.
Further, the voice state of the first intercom device may be further comprehensively determined in combination with the current voice state of the first intercom device, so that the steps of obtaining a continuous sound information segment of the first intercom device according to a preset period and confirming the first voice state according to the sound information segment specifically include:
1. and acquiring continuous sound information segments of the first intercom device according to a preset period, and determining whether the first intercom device which is currently recorded is in a call state or a mute state.
2. When the first talkback equipment is in a conversation state, sequentially identifying whether each sound information segment is speech or non-speech, and when the continuous sound information segments are non-speech, re-marking the speech state of the first talkback equipment as a mute state, otherwise, not replacing the speech state of the first talkback equipment.
Specifically, it should be understood that the first speech state of the first intercom device needs to be recorded after confirmation. In this embodiment, when it is determined that the currently recorded first voice state of the first intercom device is a call state, each piece of voice information is sequentially identified according to a time sequence relationship, and it is determined whether each piece of voice information is voice or non-voice, when consecutive pieces of voice information are non-voice, it is considered that the voice state of the first intercom device has changed, and the first voice state of the first intercom device is marked as a mute state again and recorded, otherwise, the first voice state of the first intercom device is not changed. The non-speech includes noise information and the like.
3. When the first talkback equipment is in a mute state, sequentially identifying whether each sound information segment is voice or non-voice, and when the continuous sound information segments are voice, re-marking the voice state of the first talkback equipment as a call state, otherwise, not replacing the voice state of the first talkback equipment.
Specifically, when the recorded first intercom device is in a mute state, a plurality of sound information segments are identified, if the sound information segments are all voices, the first voice state of the first intercom device is marked as a call state disease again for recording, and otherwise, the first intercom device is not replaced.
Preferably, in this embodiment, the identification of the voice message segment can be implemented by VAD technology.
VAD techniques, also known as voice activity detection or voice endpoint detection or voice boundary detection, have the main role of accurately locating the beginning and end points of speech from noisy speech.
Step S102: and recording the first voice state, sending the first voice state to the second talkback equipment, and receiving a second voice state sent by the second talkback equipment in real time.
In step S102, after the first voice state of the first intercom device is acquired according to the sound information, the first voice state is recorded, and meanwhile, the first voice state needs to be sent to the second intercom device, and the second voice state sent by the second intercom device is received, so that the first intercom device can conveniently acquire the voice state of the second intercom device.
Step S103: and carrying out echo cancellation or noise suppression processing on the sound information according to the first sound state and the second sound state in a preset sound processing mode, and then sending the sound information to the second talkback equipment.
In step S103, after the first voice state and the second voice state are obtained, the sound processing mode of the first intercom device is adjusted in real time according to the first voice state and the second voice state, for example, when the first intercom device is in a talk state and the second intercom device is in a mute state, the sound information collected by the first intercom device may be subjected to noise reduction processing and then sent to the second intercom device, and when the first intercom device is in the mute state and the second intercom device is in the talk state, the first intercom device may be subjected to echo cancellation processing, so that an echo is prevented from being transmitted to the second intercom device, and the use experience of a user of the second intercom device is prevented from being affected.
Further, the step 103 specifically includes:
1. and acquiring a first voice state and a second voice state.
Specifically, the first voice state is a voice state of the first voice device, and the second voice state is a second voice state of the second voice device.
2. When the first voice state is a call state and the second voice state is a mute state or the first voice state and the second voice state are both mute states, the sound information collected by the first talkback equipment is subjected to strong noise reduction processing and then sent to the second talkback equipment.
Specifically, when the first voice state is a talk state and the second voice state is a mute state or both the first voice state and the second voice state are mute states, the second intercom device is always in the mute state and does not send voice information to the first intercom device for playing, at this time, the echo collected by the sound collection sensor of the first intercom device is few and the noise is large, at this time, no matter the first intercom device is in the talk state or the mute state, the sound information collected by the first intercom device can be subjected to strong noise reduction processing, if the first intercom device is in the talk state at this time, the noise in the sound information can be forcibly removed through the strong noise reduction processing, so that the voice information transmitted to the second intercom device is clearer, the voice quality is higher, if the first intercom device is in the mute state at this time, the noise collected by the first intercom device can be prevented from being transmitted to the second intercom device through the strong noise reduction processing, and the use experience of a user of the second intercom device is improved.
3. And when the first voice state is a mute state and the second voice state is a call state or both the first voice state and the second voice state are call states, performing echo cancellation processing on the sound information acquired by the first intercom device, and then sending the sound information to the second intercom device.
Specifically, when the first voice state is a mute state and the second voice state is a call state or both the first voice state and the second voice state are call states, the second intercom is always in the call state, and both the first intercom and the second intercom send voice information to the first intercom and play the voice information on the first intercom, at this time, more echo information exists in the voice information collected by the voice collecting sensor of the first intercom, and the influence on the voice quality is large.
Further, when the first voice state is a mute state and the second voice state is a call state or both the first voice state and the second voice state are call states, the step of performing echo cancellation processing on the sound information collected by the first intercom specifically includes:
and 3.1, when the first voice state is a mute state and the second voice state is a call state, carrying out strong echo cancellation processing on the voice information collected by the first intercom device.
And 3.2, when the first voice state and the second voice state are both in a call state, carrying out weak echo cancellation processing on the voice information collected by the first intercom device.
In this embodiment, the echo cancellation processing includes two modes, namely, a strong echo cancellation processing and a weak echo cancellation processing, where the echo cancellation capability of the weak echo cancellation processing is weaker than that of the strong echo cancellation processing.
Specifically, when the first voice state is a mute state and the second voice state is a call state, the first intercom device does not need to collect the voice information and send the voice information to the second intercom device, and then a strong echo processing mode is adopted to perform strong echo cancellation processing on the voice information collected by the first intercom device, so that the cancellation effect of the echo collected by the first intercom device is improved. When the first voice state and the second voice state are both in a call state, the first intercom device also needs to send voice information to the second intercom device, and at this time, in order to avoid deleting normal voice information collected in the first intercom device, a weak echo cancellation processing mode is adopted to perform weak echo cancellation processing on the voice information collected by the first intercom device, so that echoes in the voice information can be eliminated to a certain extent, the quality of the voice information can be ensured, and the intermittent situation can be avoided.
Further, in this embodiment, when the first voice state and the second voice state are both in a call state, the method further includes, while performing the operation of performing the weak echo cancellation processing on the sound information collected by the first intercom device:
and 3.3, receiving the voice information sent by the second talkback equipment, and playing the voice information of the second talkback equipment after the voice information is subjected to suppression processing.
Specifically, in order to avoid that the echo in the sound information collected by the first intercom device is too large, which results in that the echo cannot be effectively eliminated, in this embodiment, after the first intercom device receives the voice information sent by the second intercom device, the voice information sent by the second intercom device is suppressed and then played on the first intercom device, so that the first intercom device does not collect the too large echo when collecting the sound information.
The talkback voice quality optimization method analyzes whether the local talkback device is in a conversation state or a mute state through the voice information collected by the talkback device, and then performs echo cancellation or noise suppression processing on the voice information collected by the local talkback device according to the voice states of the local talkback device and the opposite talkback device in a preset voice processing mode, namely whether the echo cancellation processing or the noise suppression processing is performed on the local talkback device is related to the voice states of the local talkback device and the opposite talkback device and is not fixed, so that the local talkback device can adjust the voice processing algorithm in real time according to the voice states and adopt the proper voice processing algorithm to process at proper time to avoid influencing the voice quality, and simultaneously, the echo cancellation and the noise reduction processing of the voice are effectively realized.
Fig. 2 shows a schematic diagram of functional modules of the talkback voice quality optimization device according to the embodiment of the invention. As shown in fig. 2, the talkback voice quality optimizing apparatus 20 includes: a status confirmation module 21, a status transceiver module 22 and a sound processing module 23.
The state confirmation module 21 is configured to acquire sound information acquired by the first intercom device in real time, and analyze the sound information to obtain a first voice state of the first intercom device, where the voice state includes a call state and a mute state; the state transceiver module 22 is configured to record a first voice state, send the first voice state to the second intercom device, and receive a second voice state sent by the second intercom device in real time; and the sound processing module 23 is configured to perform echo cancellation or noise suppression processing on the sound information according to the first speech state and the second speech state and according to a preset sound processing mode, and send the sound information to the second intercom device.
Preferably, the operation of the status confirmation module 21 performing real-time acquisition of the sound information collected by the first intercom device and analyzing the sound information to obtain the first voice status of the first intercom device may further be: and acquiring continuous sound information segments of the first intercom device according to a preset period, and confirming the first voice state according to the sound information segments.
Preferably, the operation of acquiring the continuous sound information segments of the first intercom device according to the preset period and confirming the first speech state according to the sound information segments by the state confirmation module 21 may further be: acquiring continuous sound information segments of the first intercom device according to a preset period, and determining whether the first intercom device recorded currently is in a conversation state or a mute state; when the first talkback equipment is in a conversation state, sequentially identifying whether each sound information segment is voice or non-voice, and when the continuous sound information segments are non-voice, re-marking the voice state of the first talkback equipment as a mute state, otherwise, not replacing the voice state of the first talkback equipment; when the first talkback equipment is in a mute state, sequentially identifying whether each sound information segment is voice or non-voice, and when the continuous sound information segments are voice, re-marking the voice state of the first talkback equipment as a call state, otherwise, not replacing the voice state of the first talkback equipment.
Preferably, the identification of the pieces of sound information is carried out on the basis of VAD techniques.
Preferably, the sound processing module 23 performs echo cancellation or noise suppression processing on the sound information according to the first speech state and the second speech state in a preset sound processing manner, and the operation of sending the sound information to the second intercom device may further be: acquiring a first voice state and a second voice state; when the first voice state is a call state and the second voice state is a mute state or the first voice state and the second voice state are both mute states, carrying out strong noise reduction processing on the voice information acquired by the first talkback equipment, and then sending the voice information to the second talkback equipment; and when the first voice state is a mute state and the second voice state is a call state or both the first voice state and the second voice state are call states, performing echo cancellation processing on the sound information acquired by the first talkback equipment, and then sending the sound information to the second talkback equipment.
Preferably, when the first voice state is a mute state and the second voice state is a call state or both the first voice state and the second voice state are call states, the performing, by the sound processing module 23, the echo cancellation processing on the sound information collected by the first intercom device may further be: when the first voice state is a mute state and the second voice state is a call state, carrying out strong echo cancellation processing on the voice information acquired by the first intercom device; and when the first voice state and the second voice state are both in a call state, carrying out weak echo cancellation processing on the voice information acquired by the first intercom device, wherein the echo cancellation capability of the weak echo cancellation processing is weaker than that of the strong echo cancellation processing.
Preferably, when the first voice state and the second voice state are both in a call state, the sound processing module 23 is configured to perform the operation of performing weak echo cancellation processing on the sound information collected by the first intercom device, and at the same time, is further configured to: and receiving the voice information sent by the second talkback equipment, and playing the voice information of the second talkback equipment after the voice information of the second talkback equipment is subjected to suppression processing.
For other details of the technical solution implemented by each module in the talkback voice quality optimization apparatus in the foregoing embodiment, reference may be made to the description of the talkback voice quality optimization method in the foregoing embodiment, and details are not described herein again.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 3, the computer device 30 includes a processor 31 and a memory 32 coupled to the processor 31, wherein the memory 32 stores program instructions, and the program instructions, when executed by the processor 31, cause the processor 31 to perform the steps of the talkback voice quality optimization method according to any one of the embodiments.
The processor 31 may also be referred to as a CPU (Central Processing Unit). The processor 31 may be an integrated circuit chip having signal processing capabilities. The processor 31 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores program instructions 41 capable of implementing all the methods described above, where the program instructions 41 may be stored in the storage medium in the form of a software product, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or a computer device such as a computer, a server, a mobile phone, or a tablet.
In the several embodiments provided in the present application, it should be understood that the disclosed computer apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In the above description of the present specification, the terms "fixed," "mounted," "connected," or "connected," and the like, are to be construed broadly unless otherwise expressly specified or limited. For example, with the term "coupled", it can be fixedly coupled, detachably coupled, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship. Therefore, unless otherwise specifically defined in the present specification, the specific meanings of the above-mentioned terms in the present invention can be understood by those skilled in the art according to specific situations.
In light of the foregoing description of the present specification, those skilled in the art will also understand that terms used herein, such as "upper," "lower," "front," "rear," "left," "right," "length," "width," "thickness," "vertical," "horizontal," "top," "bottom," "inner," "outer," "axial," "radial," "circumferential," "center," "longitudinal," "lateral," "clockwise," or "counterclockwise," etc., indicate that such terms are based on the orientations and positional relationships illustrated in the drawings of the present specification, and are intended merely to facilitate explanation of the invention and to simplify the description, but do not indicate or imply that the device or element involved must have the particular orientation, be constructed and operated in the particular orientation, and therefore such terms are not to be understood or interpreted as limiting the scope of the present invention.
In addition, the terms "first" or "second", etc. used in this specification are used to refer to numbers or ordinal numbers only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present specification, "a plurality" means at least two, for example, two, three or more, and the like, unless specifically defined otherwise.
While various embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that the module composition, equivalents, or alternatives falling within the scope of these claims be covered thereby.

Claims (10)

1. A talkback voice quality optimization method is characterized in that a first talkback device is applied, and the first talkback device is in communication connection with a second talkback device; the method comprises the following steps:
acquiring sound information acquired by first intercom equipment in real time, and analyzing the sound information to obtain a first voice state of the first intercom equipment, wherein the voice state comprises a call state and a mute state;
recording the first voice state, sending the first voice state to the second talkback equipment, and receiving a second voice state sent by the second talkback equipment in real time;
and according to the first voice state and the second voice state, carrying out echo cancellation or noise suppression processing on the voice information in a preset voice processing mode, and then sending the voice information to the second intercom device.
2. The method for optimizing the quality of the talkback voice according to claim 1, wherein the obtaining the sound information collected by the first talkback device in real time and analyzing the sound information to obtain the first voice state of the first talkback device comprises:
and acquiring continuous sound information segments of the first intercom device according to a preset period, and confirming the first voice state according to the sound information segments.
3. The talkback voice quality optimization method according to claim 2, wherein the obtaining of the continuous sound information segment of the first talkback device according to the preset period and the confirming of the first voice state according to the sound information segment comprises:
acquiring continuous sound information segments of the first intercom device according to a preset period, and determining whether the first intercom device which is currently recorded is in the conversation state or the mute state;
when the first intercom is in the conversation state, sequentially identifying whether each sound information segment is voice or non-voice, and when the continuous sound information segments are non-voice, re-marking the voice state of the first intercom as the mute state, otherwise, not replacing the voice state of the first intercom;
when the first talkback equipment is in the mute state, whether each sound information segment is voice or non-voice is sequentially identified, and when the sound information segments are continuous, the voice state of the first talkback equipment is marked as the call state again, otherwise, the voice state of the first talkback equipment is not changed.
4. A method for optimizing the quality of intercom voice according to claim 3, characterized in that the identification of said pieces of voice information is carried out on the basis of VAD techniques.
5. The method for optimizing the quality of intercom voice according to claim 1, wherein said performing echo cancellation or noise suppression processing on said voice information according to a preset voice processing manner according to said first voice state and said second voice state, and then sending to said second intercom device comprises:
acquiring the first voice state and the second voice state;
when the first voice state is the call state and the second voice state is the mute state or both the first voice state and the second voice state are the mute state, performing strong noise reduction processing on the voice information acquired by the first intercom device, and then sending the voice information to the second intercom device;
and when the first voice state is the mute state and the second voice state is the call state or both the first voice state and the second voice state are the call states, performing echo cancellation processing on the sound information acquired by the first talkback device, and then sending the sound information to the second talkback device.
6. The method for optimizing the quality of intercom voice according to claim 5, wherein when the first voice state is the mute state and the second voice state is the call state or both the first voice state and the second voice state are the call state, performing echo cancellation processing on the sound information collected by the first intercom device comprises:
when the first voice state is the mute state and the second voice state is the call state, performing strong echo cancellation processing on the voice information acquired by the first intercom device;
and when the first voice state and the second voice state are both in the call state, performing weak echo cancellation processing on the voice information acquired by the first intercom device, wherein the echo cancellation capability of the weak echo cancellation processing is weaker than that of the strong echo cancellation processing.
7. The method for optimizing the quality of the intercom voice according to claim 6, wherein when the first voice state and the second voice state are both the call state, the method further comprises, while performing weak echo cancellation processing on the sound information collected by the first intercom device:
and receiving the voice information sent by the second intercom device, and playing the voice information of the second intercom device after suppressing the voice information.
8. The utility model provides a pronunciation quality optimization device talkbacks, its characterized in that, it includes:
the state confirmation module is used for acquiring the sound information acquired by the first intercom device in real time and analyzing the sound information to obtain a first voice state of the first intercom device, wherein the voice state comprises a call state and a mute state;
the state transceiving module is used for recording the first voice state, sending the first voice state to the second talkback equipment and receiving a second voice state sent by the second talkback equipment in real time;
and the sound processing module is used for carrying out echo cancellation or noise suppression processing on the sound information according to the first voice state and the second voice state in a preset sound processing mode and then sending the sound information to the second intercom device.
9. A computer device, characterized in that it comprises a processor, a memory coupled to the processor, in which are stored program instructions that, when executed by the processor, cause the processor to carry out the steps of the talkback voice quality optimization method according to any one of claims 1 to 7.
10. A storage medium storing program instructions capable of implementing the talkback voice quality optimization method according to any one of claims 1 to 7.
CN202110904992.0A 2021-08-07 2021-08-07 Method, device and equipment for optimizing talkback voice quality and storage medium Pending CN115706875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110904992.0A CN115706875A (en) 2021-08-07 2021-08-07 Method, device and equipment for optimizing talkback voice quality and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110904992.0A CN115706875A (en) 2021-08-07 2021-08-07 Method, device and equipment for optimizing talkback voice quality and storage medium

Publications (1)

Publication Number Publication Date
CN115706875A true CN115706875A (en) 2023-02-17

Family

ID=85179172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110904992.0A Pending CN115706875A (en) 2021-08-07 2021-08-07 Method, device and equipment for optimizing talkback voice quality and storage medium

Country Status (1)

Country Link
CN (1) CN115706875A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116647634A (en) * 2023-07-27 2023-08-25 河北跃创科技有限公司 Broadcasting intercom terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116647634A (en) * 2023-07-27 2023-08-25 河北跃创科技有限公司 Broadcasting intercom terminal
CN116647634B (en) * 2023-07-27 2024-03-12 河北跃创科技有限公司 Broadcasting intercom terminal

Similar Documents

Publication Publication Date Title
US9407680B2 (en) Quality-of-experience measurement for voice services
CN103578470B (en) A kind of processing method and system of telephonograph data
CN107910014B (en) Echo cancellation test method, device and test equipment
US8606573B2 (en) Voice recognition improved accuracy in mobile environments
CN102056036B (en) Reproducing device, headphone and reproducing method
CN107995360B (en) Call processing method and related product
KR101626438B1 (en) Method, device, and system for audio data processing
US20090248411A1 (en) Front-End Noise Reduction for Speech Recognition Engine
US9583108B2 (en) Voice detection for automated communication system
CN107564523B (en) Earphone answering method and device and earphone
CN104202469B (en) Method, device and terminal that management call is connected
CN101313483A (en) Configuration of echo cancellation
EP3751568A1 (en) Audio noise reduction
CN106791244B (en) Echo cancellation method and device and call equipment
US10504538B2 (en) Noise reduction by application of two thresholds in each frequency band in audio signals
US10204634B2 (en) Distributed suppression or enhancement of audio features
CN115482830B (en) Voice enhancement method and related equipment
CN108449496A (en) Voice communication data detection method, device, storage medium and mobile terminal
US10540983B2 (en) Detecting and reducing feedback
CN115706875A (en) Method, device and equipment for optimizing talkback voice quality and storage medium
CN105933181A (en) Conversation time delay evaluation method and apparatus
US9257117B2 (en) Speech analytics with adaptive filtering
WO2014000658A1 (en) Method and device for eliminating noise, and mobile terminal
CN114627854A (en) Speech recognition method, speech recognition system, and storage medium
CN110225213B (en) Recognition method of voice call scene and audio policy server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination