CN115705847A - Method for processing audio watermark and audio watermark generating device - Google Patents

Method for processing audio watermark and audio watermark generating device Download PDF

Info

Publication number
CN115705847A
CN115705847A CN202110914948.8A CN202110914948A CN115705847A CN 115705847 A CN115705847 A CN 115705847A CN 202110914948 A CN202110914948 A CN 202110914948A CN 115705847 A CN115705847 A CN 115705847A
Authority
CN
China
Prior art keywords
sound signal
watermark
sound
phase
reflected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110914948.8A
Other languages
Chinese (zh)
Inventor
杜博仁
张嘉仁
曾凯盟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acer Inc
Original Assignee
Acer Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acer Inc filed Critical Acer Inc
Priority to CN202110914948.8A priority Critical patent/CN115705847A/en
Publication of CN115705847A publication Critical patent/CN115705847A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a processing method of a voice watermark and a voice watermark generating device. And acquiring a call receiving sound signal through a radio. And generating a reflected sound signal according to the virtual reflection condition and the call receiving sound signal. The virtual reflection condition comprises the position relation among the radio, the sound source and an external object, and the reflected sound signal is a sound signal obtained by reflecting the sound emitted by the simulated sound source by the external object and recording the sound by the radio. The phase of the reflected sound signal is shifted according to the watermark identifier to generate a watermarked sound signal. The watermark sound signal comprises a phase shifted reflected sound signal. Therefore, the echo cancellation mechanism at the receiving end can eliminate the watermark sound signal passing through the feedback path without influencing the voice signal of the communication transmission path.

Description

Method for processing audio watermark and audio watermark generating device
Technical Field
The present invention relates to a sound signal processing technology, and in particular, to a sound watermark processing method and a sound watermark generating apparatus.
Background
Teleconferencing enables people in different locations or spaces to converse, and conference-related devices, protocols, and applications have grown quite mature. It is noted that some real-time conferencing programs may converge into voice signals and voice watermark signals and be used to identify the speaker.
For example, fig. 1 is a schematic diagram illustrating an example of a mobile device M for a conference call. Referring to fig. 1, the mobile device M can receive the sound signal S1 through the network. The audio signal S1 includes a call reception signal recorded to the caller and an audio watermark signal. The sound watermark signal may be used to identify another device that transmitted the sound signal S1. The call receiving signal may be further played through the speaker S to allow the user sp of the mobile device M to listen to the voice of the other party. On the other hand, a sound receiver R (e.g., a microphone) records a sound to a user sp to acquire a sound signal S2.
In general, echo cancellation (echo cancellation) C on the call transmission path has a main function of canceling a component belonging to a call reception signal from the sound signal S2 received by the radio receiver R, and obtaining a sound signal S3 without echo. However, the generation path of the voice watermark signal may be different from the path of the general call reception signal. When the radio R receives the audio signal from the speaker S via the feedback path fp, the component belonging to the audio watermark signal in the audio signal S1 may not be eliminated and further transmitted via the network, thereby affecting the voice component of the user sp in the audio signal S3 on the call transmission path.
Disclosure of Invention
The invention aims at a processing method of a voice watermark and a voice watermark generating device, which generate the voice watermark capable of being eliminated by an echo eliminating mechanism, thereby improving the communication quality.
According to the embodiment of the invention, the processing method of the sound watermark is suitable for the conference terminal, and the conference terminal comprises a radio receiver. The processing method of the sound watermark includes (but is not limited to) the following steps: and acquiring a call receiving sound signal through the radio. And generating a reflected sound signal according to the virtual reflection condition and the call receiving sound signal. The virtual reflection condition comprises the position relation among the radio, the sound source and an external object, and the reflected sound signal is a sound signal obtained by reflecting the sound emitted by the simulated sound source by the external object and recording the sound by the radio. The phase of the reflected sound signal is shifted according to the watermark identifier to generate a watermarked sound signal. The watermark sound signal comprises a phase shifted reflected sound signal.
According to an embodiment of the present invention, a sound watermark generating apparatus includes, but is not limited to, a memory and a processor. The memory is used for storing program codes. The processor is coupled to the memory. The processor is configured to load and execute the program code to obtain a call receiving voice signal, generate a reflected voice signal according to the virtual reflection condition and the call receiving voice signal, and offset a phase of the reflected voice signal according to the watermark identifier to generate a watermark voice signal. The call receiving sound signal is obtained by recording through a radio. The virtual reflection condition comprises the position relation among the radio, the sound source and an external object, and the reflected sound signal is a sound signal obtained by reflecting the sound emitted by the simulated sound source by the external object and recording the sound by the radio. The watermark sound signal comprises a phase shifted reflected sound signal.
Based on the above, according to the sound watermark processing method and the sound watermark generating apparatus of the embodiment of the present invention, the sound signal reflected by the external object is simulated, and the simulated sound signal is encoded by shifting the phase, thereby generating the watermarked sound signal. Therefore, the common call receiving signal and the voice watermark signal can be simultaneously kept at the loudspeaker end. In addition, both signals can be eliminated by the existing echo cancellation algorithm, so that the voice signal on the call transmission path is not influenced.
Drawings
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 is a diagram illustrating an example of a mobile device for a conference call;
FIG. 2 is a schematic diagram of a conference call system according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method of processing a sound watermark according to an embodiment of the invention;
FIG. 4 is a flow chart of a method of generating a sound watermark according to an embodiment of the invention;
FIG. 5 is a schematic diagram illustrating virtual reflection conditions according to one embodiment of the present invention;
FIG. 6 is a diagram illustrating a filtering process according to one embodiment of the invention;
FIG. 7 is a schematic diagram illustrating multiphase offset according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating two phase offsets according to an embodiment of the present invention;
FIG. 9A is a simulation diagram illustrating an example of a call receiving voice signal;
fig. 9B is a diagram illustrating an example of simulation of embedding a watermark signal;
FIG. 10 is a flow chart illustrating watermark identification according to one embodiment of the present invention.
Description of the reference numerals
M is a mobile device;
S1-S3, sound signals;
s, a loudspeaker;
r is a radio;
sp is the user;
c, echo cancellation;
fp is feedback path;
1: a voice communication system;
10. 20, conference terminals;
50, a cloud server;
11. 21, a radio;
13. 21, a loudspeaker;
15. 25, 55, a communication transceiver;
17. 27, 57 a memory;
19. 29, 59, a processor;
70, sound watermark generating device;
s310 to S350, S410 to S450, S910 to S950;
S Rx receiving voice signals during conversation;
S Tx a voice signal is transmitted in a call;
S WM 、S WM1 watermark audio signals;
S Rx +S WM embedding a watermark signal;
S’ Rx 、S” Rx
Figure BDA0003205176180000031
S 90° 、S WO reflecting the sound signal;
w is a wall;
γ w a reflection coefficient;
d s 、d w distance;
SS is a sound source;
W O 、W E a watermark identifier;
Figure BDA0003205176180000041
a phase shift;
S A
Figure BDA0003205176180000042
Figure BDA0003205176180000043
a sound signal is transmitted.
Detailed Description
Reference will now be made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.
Fig. 2 is a schematic diagram of a conference call system 1 according to an embodiment of the present invention. Referring to fig. 2, the voice communication system 1 includes, but is not limited to, a conference terminal 10,20 and a cloud server 50.
The conference terminal 10,20 may be a wired phone, a mobile phone, a network phone, a tablet computer, a desktop computer, a notebook computer, or a smart speaker.
The conference terminal 10 includes, but is not limited to, a radio 11, a speaker 13, a communications transceiver 15, a memory 17, and a processor 19.
The radio receiver 11 may be a moving coil (dynamic), capacitor (Condenser), or Electret Condenser (Electret Condenser), and the radio receiver 11 may be a combination of other electronic components, analog-to-digital converters, filters, and audio processors, which can receive sound waves (e.g., human voice, ambient sound, machine operation sound, etc.) and convert the sound waves into sound signals. In one embodiment, the radio receiver 11 is used for receiving/recording a voice signal to a caller to obtain a call receiving voice signal. In some embodiments, the call receiving sound signal may include the speaker's voice, the sound emitted by speaker 13, and/or other ambient sounds.
The loudspeaker 13 may be a horn or a loudspeaker. In one embodiment, the speaker 13 is used to play sound.
The communication transceiver 15 is, for example, a transceiver supporting a wired network such as an Ethernet (Ethernet), a fiber optic network, or a cable (which may include (but is not limited to) components such as a connection interface, a signal converter, a communication protocol processing chip), or a wireless network such as a Wi-Fi, a fourth generation (4G), a fifth generation (5G), or a later generation mobile network (which may include (but is not limited to) components such as an antenna, a digital-to-analog/analog-to-digital converter, a communication protocol processing chip). In one embodiment, the communication transceiver 15 is used to transmit or receive data.
The Memory 17 may be any type of fixed or removable Random Access Memory (RAM), read Only Memory (ROM), flash Memory (flash Memory), hard Disk Drive (HDD), solid-State Drive (SSD), or the like. In one embodiment, the memory 17 is used for storing program codes, software modules, configuration configurations, data (e.g., audio signals, watermark identifiers, or watermark audio signals) or files.
The processor 19 is coupled to the radio 11, the speaker 13, the communication transceiver 15, and the memory 17. The Processor 19 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other Programmable general purpose or special purpose Microprocessor (Microprocessor), digital Signal Processor (DSP), programmable controller, field Programmable Gate Array (FPGA), application-Specific Integrated Circuit (ASIC), or other similar components or combinations thereof. In one embodiment, the processor 19 is configured to execute all or part of the operations of the conference terminal 10, and can load and execute the software modules, files and data stored in the memory 17.
The conference terminal 20 includes, but is not limited to, a radio 21, a speaker 23, a communication transceiver 25, a memory 27, and a processor 29. The embodiments and functions of the radio 21, the speaker 23, the communication transceiver 25, the memory 27 and the processor 29 may refer to the description of the radio 11, the speaker 13, the communication transceiver 15, the memory 17 and the processor 19, which will not be repeated herein. And processor 29 is configured to execute all or part of the operations of the conference terminal 20 and can load and execute the software modules, files and data stored in memory 27.
The cloud server 50 is directly or indirectly connected to the conference terminal 10,20 via the network. The cloud server 50 may be a computer system, a server, or a signal processing device. In one embodiment, the conference terminal 10,20 may also serve as the cloud server 50. In another embodiment, the cloud server 50 may be a separate cloud server from the conference terminal 10,20. In some embodiments, the cloud server 50 includes (but is not limited to) the same or similar communication transceiver 55, memory 57, and processor 59, and the implementation and functions of the components will not be described again.
In one embodiment, the sound watermark generating apparatus 70 may be the conference terminal 10,20 or the cloud server 50. The sound watermark generating device 70 is used to generate a sound watermark signal, and will be described in detail in the following embodiments.
Hereinafter, the method according to the embodiment of the present invention will be described with reference to various devices, components and modules in the conference communication system 1. The various processes of the method may be adjusted according to the implementation, and are not limited thereto.
It should be noted that, for convenience of description, the same components may perform the same or similar operations, and are not described in detail again. For example, the processor 19 of the conference terminal 10, the processor 19 of the conference terminal 20, and/or the processor 59 of the cloud server 50 may implement the same or similar methods according to the embodiments of the present invention.
Fig. 3 is a flowchart of a processing method of sound watermarking according to an embodiment of the present invention. Referring to fig. 3, the processor 29 obtains the call receiving sound signal S by recording through the radio 21 Rx (step S310). Specifically, assume that the conference terminal 10,20 establishes a conference call. For example, a conference is established by video software, voice call software, or a telephone call, and the speaker can start speaking. After being recorded/picked up by the radio 21, the processor 29 can obtain the call receiving sound signal S Rx . This call receives a sound signal S Rx As related to the voice content of the corresponding speaker of the conference terminal 20 (and may also include ambient sounds or other noise). The processor 29 of the conference terminal 20 may transmit the call reception sound signal S through the communication transceiver 25 (i.e., via the network interface) Rx . In some embodiments, the call receives a sound signal S Rx Possibly via echo cancellation, noise filtering, and/or other sound signal processing.
The processor 59 of the cloud server 50 receives the call reception sound signal S from the conference terminal 20 through the communication transceiver 55 Rx . Processor 59 generates reflected sound signal S 'from the virtual reflection condition and the call reception sound signal' Rx (step S330). Specifically, the general echo cancellation algorithm adaptively cancels a component belonging to a reference signal (for example, a call reception sound signal S of a call reception path) in a sound signal received from the outside by the radio 11,21 Rx ). The sound recorded by the radio 11,21 includes the shortest path from the speaker 13,23 to the radio 11,21 and the different reflected paths of the environment (i.e., the paths formed by the sound reflecting off of external objects). Reflected soundThe tone signal is influenced by the reflection coefficient of the object being reflected, and the location of the reflection influences the time delay and attenuation of the sound signal. Furthermore, the reflected sound signals may also come from different directions, leading to a phase shift. In the embodiment of the invention, the sound signal S of the known call receiving path is utilized Rx Generating a virtual/simulated reflected sound signal which can be cancelled by an echo cancellation mechanism and, in dependence thereon, generating a sound watermark signal S WM
FIG. 4 is a sound watermark S according to an embodiment of the invention WM Is generated according to the method of (1). Referring to FIG. 4, the processor 59 may set a virtual reflection condition to generate the reflected sound signal S' Rx (step S410). Specifically, the virtual reflection condition includes a positional relationship between the radio 11,21, a sound source (e.g., a talker, a speaker 13,23), and an external object (e.g., a wall, a ceiling, furniture, or a person). For example, the distance between the radio receiver 11 and the external object, the distance between the radio receiver 11 and the sound source, and/or the distance between the sound source and the external object. And reflects the sound signal S' Rx The sound signal is obtained by reflecting the sound emitted by the simulated sound source by an external object and recording the sound by the radio 11,21.
In one embodiment, the processor 59 may determine the reflected sound signal S 'according to the position relationship and the reflection coefficient of the external object' Rx Compared with the call receiving sound signal S Rx Time delay and amplitude decay. For example, FIG. 5 is a schematic diagram illustrating a virtual reflection condition according to an embodiment of the invention. Referring to fig. 5, assuming that the virtual reflection condition is a single wall (i.e., an external object), the reflection coefficient of the wall W is γ w (e.g., 0.7, 0.3, or 1). The distance between the radio 21 and the sound source SS is d s (e.g., 0.3, 0.5, or 0.8 meters) and the distance d between the radio 21 and the wall W w (e.g., 1, 1.5 or 2 meters) of the acoustic signal S' Rx Receiving a voice signal S in connection with a call Rx Can be expressed as follows:
Figure BDA0003205176180000071
wherein T is s For the sampling time, v s Then the speed of sound and n is the sampling point or time.
If the reflected sound signal S 'is set' Rx Compared with the call receiving sound signal S Rx With time delay of gamma w And amplitude attenuation alpha w Then, sound signal S 'is reflected' Rx Receiving a voice signal S in connection with a call Rx Can be expressed as follows:
s′ Rx (n)=α w ·s Rx (n-n w )…(2)
. According to the equations (1) and (2), it can be obtained:
Figure BDA0003205176180000072
Figure BDA0003205176180000073
wherein n is f The time delay incurred for the filter (optionally, and as will be described in further detail below),
Figure BDA0003205176180000074
the time delay caused by the phase offset (optional, and will be described in further detail in the following embodiments).
It should be noted that the variation in the virtual reflection condition can be further adjusted according to different design requirements. For example, more than one foreign object or relative position.
Referring to fig. 3, the processor 59 generates the watermark identifier W O Offset reflected sound signal S' Rx To generate a watermark sound signal S WM (step S350). Specifically, in the conventional echo cancellation scheme, the time delay and amplitude variation of the reflected sound signal have a larger effect on the error of the echo cancellation scheme than the phase shift of the reflected sound signal. This change is as if it were in a completely new interference environment and makes the echo cancellation mechanism necessaryTo be adapted again. Thus, the watermark identifier W of embodiments of the present invention O Corresponding to different values in the audio watermark signal S WM There is only a phase difference, but the time delay and amplitude are the same. I.e. the watermark sound signal S WM Comprising one or more phase shifted reflected sound signals S' Rx
Referring to fig. 4, in one embodiment, processor 59 may select a filter to generate a filtered reflected sound signal S " Rx (step S430). Specifically, a general echo cancellation mechanism has a slow convergence rate (e.g., 3 kilohertz (kHz) or below 4 kHz) for processing a low-frequency sound signal, but has a fast convergence rate (e.g., 10 milliseconds (ms) or below) for processing a high-frequency sound signal (e.g., 3kHz or above 4 kHz). Thus, processor 59 may target only reflected sound signals S 'of high frequencies (e.g., 4kHz, 5kHz and above)' Rx The phase shift is made and the interference of the signal is made less perceptible to humans (i.e., the frequency of the high frequency sound signal is outside the human hearing range).
For example, fig. 6 is a diagram illustrating a filtering process according to an embodiment of the invention. Referring to FIG. 6, the processor 59 may process the reflected sound signal S 'through the low pass filter LPF' Rx Low pass filtering processing is performed to output the reflected sound signal through the low pass filtering processing
Figure BDA0003205176180000081
For example, the low pass filter LPF blocks signals above 4kHz and allows only signals below 4kHz to pass through. On the other hand, the processor 59 may pair the reflected sound signal S 'through the high-pass filter HPF' Rx Performing high-pass filtering processing to output the reflected sound signal processed by the high-pass filtering processing
Figure BDA0003205176180000082
For example, the high pass filter HPF blocks signals below 4kHz and allows only signals above 4kHz to pass.
In another embodiment, processor 59 may not reflect sound signal S' Rx Filtering processing of specific frequency. I.e. reflecting the sound signal S " Rx Is equal to the reflected sound signal S' Rx
Referring to fig. 4, the processor 59 may be configured to determine the watermark identifier W O For reflected sound signal S " Rx The phase shift is performed (step S450). In an embodiment, the watermark identifier W O Is encoded in a multi-level scheme, and the multi-level scheme is applied to the watermark identifier W O Each of the one or more bits of provides a plurality of values. In binary system, for example, watermark identifier W O The value of each bit in (a) may be "0" or "1". Taking hexadecimal system as an example, the watermark identifier W O The value of each bit in (a) may be "0", "1", "2", …, "E", "F". In another embodiment, the watermark identifier is encoded in letters, words and/or symbols. For example, the watermark identifier W O The value of each bit in (a) may be any one of the english "a" to "Z".
In an embodiment, the watermark identifier W O Those different values at each bit of (a) correspond to different phase offsets. For example, fig. 7 is a diagram illustrating multiphase offset according to an embodiment of the invention. Referring to fig. 7, assume watermark identifier W O Is an N-bit system (N being a positive integer), N values may be provided for each bit. The N different values correspond to different phase offsets respectively
Figure BDA0003205176180000083
Fig. 8 is a diagram illustrating two phase offsets according to an embodiment of the invention. Referring to fig. 7, assume that the watermark identifier W O Is binary, 2 values (i.e., 1 and 0) may be provided for each bit. These 2 different values correspond to two phase offsets, respectively
Figure BDA0003205176180000091
For example, phase shift
Figure BDA0003205176180000092
Is 90 DEG and is phase shifted
Figure BDA0003205176180000093
Is-90 ° (i.e., -1).
The processor 59 may be responsive to the watermark identifier W O Is shifted by the value of one or more bits of the reflected sound signal S " Rx The phase of (c). Taking fig. 7 as an example, the processor 59 bases on the watermark identifier W O Selecting a phase offset by one or more values of
Figure BDA0003205176180000094
And using a selected phase offset
Figure BDA0003205176180000095
Is performed with a phase shift. For example, the watermark identifier W O Is 1, the outputted phase shifted reflected sound signal
Figure BDA0003205176180000096
With respect to reflected sound signal S " Rx Offset of
Figure BDA0003205176180000097
The rest of the reflected sound signal
Figure BDA0003205176180000098
And so on. The phase shift can be achieved by using Hilbert transform (Hilbert transform) or other phase shift algorithm.
In an embodiment, the watermark identifier comprises a plurality of bits. This watermarked sound signal S WM Comprising a plurality of phase shifted reflected sound signals, each phase shifted reflected sound signal occupying a watermark sound signal S WM The length of time in (1). Suppose that the time length of each bit is L b (e.g., 0.1, 0.5, or 1 second, and greater than the time delay n w ) And (4) showing. Similar to the concept of time division multiplexing, the processor 59 watermarks the sound signal S WM According to the watermark identifier W O The included number of bits is divided into sub-time units of the same or different time lengths, and each sub-time unitCarries phase-shifted reflected sound signals corresponding to different bits.
In one embodiment, if the filtering process of FIG. 6 is employed, processor 59 may synthesize one or more phase shifted reflected sound signals and a reflected sound signal processed by low pass filtering
Figure BDA0003205176180000099
Using FIG. 8 as an example, reflected sound signal processed by high-pass filtering
Figure BDA00032051761800000910
Through a phase shift of 90 DEG
Figure BDA00032051761800000911
(Generation of phase-shifted reflected Sound Signal S 90° ) And outputs a phase-shifted reflected sound signal S WO . The processor 59 further synthesizes the reflected sound signal processed by the low-pass filtering
Figure BDA00032051761800000912
And a phase-shifted reflected sound signal S WO To generate a watermarked sound signal S WM1
In some embodiments, the processor 59 may generate a plurality of identical watermark sound signals. These watermark sound signals correspond to different master time units, respectively. I.e. cyclically outputting the watermarked sound signal. To distinguish between adjacent watermark sound signals, the processor 59 may add a space between adjacent watermark sound signals. For example, a mute signal or other known high frequency sound signal is added at intervals.
In one embodiment, the processor 59 may transmit the call receiving sound signal S through the communication transceiver 55 respectively Rx And watermark sound signal S WM . In another embodiment, the processor 59 may synthesize the call receiving sound signal S Rx And watermark sound signal S WM To generate an embedded watermark signal S Rx +S WM . The processor 59 may then transmit the embedded watermark signal S via the communication transceiver 55 Rx +S WM
FIG. 9A is a diagram illustrating an example of a call receiving audio signal S Rx And fig. 9B is an example illustrating the embedded watermark signal S Rx +S WM A simulation diagram of (1). Referring to fig. 9A and 9B, the two sounds are very close and difficult or impossible for a person to distinguish.
The processor 19 of the conference terminal 10 receives the watermarked sound signal S via the network through the communication transceiver 15 WM Or embedding a watermark signal S Rx +S WM To obtain a transmission audio signal S A (i.e. the transmitted watermark sound signal S WM Or embedding the watermark signal S Rx +S WM ). Due to the watermark sound signal S WM Including a speech reception sound signal (i.e., a reflected sound signal) delayed in time and attenuated in amplitude, the echo cancellation mechanism of the processor 19 is effective to cancel the watermark sound signal S WM . Thus, the voice signal S can be transmitted without affecting the communication on the communication transmission path Tx (e.g., a call intended by the conference terminal 10 to transmit via a network receives a voice signal).
For a watermarked sound signal S WM Fig. 10 is a flow chart illustrating watermark identification according to an embodiment of the present invention. Referring to fig. 10, in an embodiment, if the filtering process of fig. 6 is adopted, the processor 19 may use the same or similar high pass filter HPF to transmit the sound signal S A High-pass filtering processing is performed (step S910) to output a transmission sound signal through the high-pass filtering processing
Figure BDA0003205176180000101
In another embodiment, if the filtering process of fig. 6 is not used, step S910 (i.e., transmitting the sound signal) can be omitted
Figure BDA0003205176180000102
Equivalent to transmitting a sound signal S A )。
The processor 19 may shift the transmission sound signal according to the correspondence between the value and the phase shift in step S450
Figure BDA0003205176180000103
Is detected (i.e., step S930, phase shift is performed). Taking fig. 8 as an example, the processor 19 generates a transmission sound signal phase-shifted by 90 °
Figure BDA0003205176180000104
The processor 19 may be responsive to the transmitted sound signal
Figure BDA0003205176180000105
And phase-shifted transmitted sound signals
Figure BDA0003205176180000106
The correlation between the watermark identifiers W E (step S950). For example, the processor 19 will transmit a sound signal
Figure BDA0003205176180000107
And transmitting the sound signal
Figure BDA0003205176180000108
At a time delay of n w To calculate the orthogonal cross correlation R xy (n w ) And-1. Ltoreq. R xy (n w ) Less than or equal to 1. Processor 19 defines a threshold Th R Then watermark identifier W E Can be expressed as:
Figure BDA0003205176180000111
i.e. if the correlation is above the threshold Th R Then processor 19 determines that the value of this bit is a value corresponding to a phase offset of 90 (e.g., 1); if the correlation is below threshold Th R Then the processor 19 determines that the value of this bit is a value corresponding to a phase offset of-90 deg. (e.g., 0). In another embodiment, processor 19 may transmit the sound signal through deep learning based classifier identification
Figure BDA0003205176180000112
Corresponding values in different time units of the order.
In summary, in the method for processing an audio watermark and the apparatus for generating an audio watermark according to the embodiments of the present invention, a reflected audio signal is simulated according to the principle of an echo cancellation mechanism, and the audio watermark signal is encoded by shifting the phase of the reflected audio signal. Therefore, at the receiving end, the voice watermark signal obtained through the feedback path can be eliminated by the echo cancellation mechanism, and the voice watermark signal will not affect the communication transmission signal on the communication transmission path.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (12)

1. A processing method of sound watermark is suitable for a conference terminal, the conference terminal comprises a radio receiver, and the processing method of the sound watermark comprises the following steps:
acquiring a call receiving sound signal through the radio;
generating a reflected sound signal according to a virtual reflection condition and the call receiving sound signal, wherein the virtual reflection condition comprises the position relation among the radio, a sound source and an external object, and the reflected sound signal is a sound signal obtained by simulating the sound emitted by the sound source, reflecting the sound by the external object and recording the sound by the radio; and
shifting a phase of the reflected sound signal according to a watermark identifier to generate a watermark sound signal, wherein the watermark sound signal comprises at least the reflected sound signal shifted in phase.
2. The method of claim 1, wherein the step of generating the reflected sound signal according to the virtual reflection condition and the call receiving sound signal comprises:
and determining the time delay and amplitude attenuation of the reflected sound signal compared with the call receiving sound signal according to the position relation and the reflection coefficient of the external object.
3. The method of processing a sound watermark according to claim 1, wherein the watermark identifier is encoded in a multi-carry scheme that provides a plurality of values in each of at least one bit of the watermark identifier, and the step of offsetting the phase of the reflected sound signal according to the watermark identifier comprises:
shifting a phase of the reflected sound signal according to a value of the bit in the watermark identifier, wherein different of the values correspond to different phase shifts.
4. The method of processing a sound watermark according to claim 3, wherein the bits of the watermark identifier comprise a plurality of bits, the watermark sound signal comprises a plurality of the phase-shifted reflected sound signals, and each of the phase-shifted reflected sound signals occupies a length of time in the watermark sound signal.
5. The method of processing a sound watermark according to claim 1, wherein the step of shifting the phase of the reflected sound signal according to the watermark identifier is preceded by the step of:
low-pass filtering the reflected sound signal; and
performing a high-pass filtering process on the reflected sound signal, wherein only the phase of the reflected sound signal subjected to the high-pass filtering process is shifted, and the step of generating the watermark sound signal further includes:
synthesizing the phase-shifted reflected sound signal and the reflected sound signal processed by the low-pass filtering.
6. The method of claim 1, wherein the method further comprises:
receiving a transmission sound signal via a network, wherein the transmission sound signal includes the transmitted watermark sound signal;
shifting a phase of the transmission sound signal; and
identifying the watermark identifier based on a correlation between the transmitted sound signal and the phase-shifted transmitted sound signal.
7. A sound watermark generation apparatus comprising:
a memory to store program code; and
a processor coupled to the memory, wherein the processor is configured to load and execute the program code to:
obtaining a call receiving sound signal, wherein the call receiving sound signal is obtained by recording through a radio;
generating a reflected sound signal according to a virtual reflection condition and the call receiving sound signal, wherein the virtual reflection condition comprises the position relation among the radio, a sound source and an external object, and the reflected sound signal is a sound signal obtained by simulating the sound emitted by the sound source, reflecting the sound by the external object and recording the sound by the radio; and
shifting a phase of the reflected sound signal according to a watermark identifier to generate a watermark sound signal, wherein the watermark sound signal comprises at least the reflected sound signal shifted in phase.
8. The sound watermark generation apparatus of claim 7, wherein the processor is further configured to:
and determining the time delay and amplitude attenuation of the reflected sound signal compared with the call receiving sound signal according to the position relation and the reflection coefficient of the external object.
9. The sound watermark generation device of claim 7, wherein the watermark identifier is encoded in a multi-level scheme that provides a plurality of values in each of at least one bit of the watermark identifier, and the processor is further configured to:
shifting a phase of the reflected sound signal according to a value of the bit in the watermark identifier, wherein different of the values correspond to different phase shifts.
10. The sound watermark generation apparatus according to claim 9, wherein the bits of the watermark identifier comprise a plurality of bits, the watermark sound signal comprises a plurality of the phase-shifted reflected sound signals, and each of the phase-shifted reflected sound signals occupies a length of time in the watermark sound signal.
11. The sound watermark generation apparatus of claim 7, wherein the processor is further configured to:
low-pass filtering the reflected sound signal;
performing a high-pass filtering process on the reflected sound signal, wherein only a phase of the reflected sound signal subjected to the high-pass filtering process is shifted; and
synthesizing the phase-shifted reflected sound signal and the reflected sound signal processed by the low-pass filtering.
12. The sound watermark generation apparatus according to claim 7, wherein the watermark identifier is identified based on a correlation between the transmitted watermark sound signal and the phase-shifted watermark sound signal.
CN202110914948.8A 2021-08-10 2021-08-10 Method for processing audio watermark and audio watermark generating device Pending CN115705847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914948.8A CN115705847A (en) 2021-08-10 2021-08-10 Method for processing audio watermark and audio watermark generating device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914948.8A CN115705847A (en) 2021-08-10 2021-08-10 Method for processing audio watermark and audio watermark generating device

Publications (1)

Publication Number Publication Date
CN115705847A true CN115705847A (en) 2023-02-17

Family

ID=85179528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914948.8A Pending CN115705847A (en) 2021-08-10 2021-08-10 Method for processing audio watermark and audio watermark generating device

Country Status (1)

Country Link
CN (1) CN115705847A (en)

Similar Documents

Publication Publication Date Title
US8842851B2 (en) Audio source localization system and method
JP5003531B2 (en) Audio conference system
JP2018528479A (en) Adaptive noise suppression for super wideband music
CN101370323A (en) Apparatus capable of performing acoustic echo cancellation and a method thereof
CN104243732A (en) Use of vibration sensor in acoustic echo cancellation
USRE49462E1 (en) Adaptive noise cancellation for multiple audio endpoints in a shared space
US9219958B2 (en) Systems and methods for acoustic echo cancellation with wireless microphones and speakers
CN108335701B (en) Method and equipment for sound noise reduction
US10354673B2 (en) Noise reduction method and electronic device
US9491306B2 (en) Signal processing control in an audio device
US8582754B2 (en) Method and system for echo cancellation in presence of streamed audio
JPH09233198A (en) Method and device for software basis bridge for full duplex voice conference telephone system
WO2012020394A2 (en) Background sound removal for privacy and personalization use
TWI790718B (en) Conference terminal and echo cancellation method for conference
CN115705847A (en) Method for processing audio watermark and audio watermark generating device
TWI790694B (en) Processing method of sound watermark and sound watermark generating apparatus
TWI806299B (en) Processing method of sound watermark and sound watermark generating apparatus
CN116486823A (en) Sound watermark processing method and sound watermark generating device
TWI806210B (en) Processing method of sound watermark and sound watermark processing apparatus
TWI837542B (en) Identifying method of sound watermark and sound watermark identifying apparatus
US11915710B2 (en) Conference terminal and embedding method of audio watermarks
US11955132B2 (en) Identifying method of sound watermark and sound watermark identifying apparatus
CN116137152A (en) Method and device for recognizing voice watermark
CN115798495A (en) Conference terminal and echo cancellation method for conference
CN116129919A (en) Sound watermark processing method and sound watermark generating device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination