CN115705847A

CN115705847A - Method for processing audio watermark and audio watermark generating device

Info

Publication number: CN115705847A
Application number: CN202110914948.8A
Authority: CN
Inventors: 杜博仁; 张嘉仁; 曾凯盟
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2023-02-17

Abstract

The embodiment of the invention provides a processing method of a voice watermark and a voice watermark generating device. And acquiring a call receiving sound signal through a radio. And generating a reflected sound signal according to the virtual reflection condition and the call receiving sound signal. The virtual reflection condition comprises the position relation among the radio, the sound source and an external object, and the reflected sound signal is a sound signal obtained by reflecting the sound emitted by the simulated sound source by the external object and recording the sound by the radio. The phase of the reflected sound signal is shifted according to the watermark identifier to generate a watermarked sound signal. The watermark sound signal comprises a phase shifted reflected sound signal. Therefore, the echo cancellation mechanism at the receiving end can eliminate the watermark sound signal passing through the feedback path without influencing the voice signal of the communication transmission path.

Description

Method for processing audio watermark and audio watermark generating device

Technical Field

The present invention relates to a sound signal processing technology, and in particular, to a sound watermark processing method and a sound watermark generating apparatus.

Background

Teleconferencing enables people in different locations or spaces to converse, and conference-related devices, protocols, and applications have grown quite mature. It is noted that some real-time conferencing programs may converge into voice signals and voice watermark signals and be used to identify the speaker.

For example, fig. 1 is a schematic diagram illustrating an example of a mobile device M for a conference call. Referring to fig. 1, the mobile device M can receive the sound signal S1 through the network. The audio signal S1 includes a call reception signal recorded to the caller and an audio watermark signal. The sound watermark signal may be used to identify another device that transmitted the sound signal S1. The call receiving signal may be further played through the speaker S to allow the user sp of the mobile device M to listen to the voice of the other party. On the other hand, a sound receiver R (e.g., a microphone) records a sound to a user sp to acquire a sound signal S2.

In general, echo cancellation (echo cancellation) C on the call transmission path has a main function of canceling a component belonging to a call reception signal from the sound signal S2 received by the radio receiver R, and obtaining a sound signal S3 without echo. However, the generation path of the voice watermark signal may be different from the path of the general call reception signal. When the radio R receives the audio signal from the speaker S via the feedback path fp, the component belonging to the audio watermark signal in the audio signal S1 may not be eliminated and further transmitted via the network, thereby affecting the voice component of the user sp in the audio signal S3 on the call transmission path.

Disclosure of Invention

The invention aims at a processing method of a voice watermark and a voice watermark generating device, which generate the voice watermark capable of being eliminated by an echo eliminating mechanism, thereby improving the communication quality.

According to the embodiment of the invention, the processing method of the sound watermark is suitable for the conference terminal, and the conference terminal comprises a radio receiver. The processing method of the sound watermark includes (but is not limited to) the following steps: and acquiring a call receiving sound signal through the radio. And generating a reflected sound signal according to the virtual reflection condition and the call receiving sound signal. The virtual reflection condition comprises the position relation among the radio, the sound source and an external object, and the reflected sound signal is a sound signal obtained by reflecting the sound emitted by the simulated sound source by the external object and recording the sound by the radio. The phase of the reflected sound signal is shifted according to the watermark identifier to generate a watermarked sound signal. The watermark sound signal comprises a phase shifted reflected sound signal.

According to an embodiment of the present invention, a sound watermark generating apparatus includes, but is not limited to, a memory and a processor. The memory is used for storing program codes. The processor is coupled to the memory. The processor is configured to load and execute the program code to obtain a call receiving voice signal, generate a reflected voice signal according to the virtual reflection condition and the call receiving voice signal, and offset a phase of the reflected voice signal according to the watermark identifier to generate a watermark voice signal. The call receiving sound signal is obtained by recording through a radio. The virtual reflection condition comprises the position relation among the radio, the sound source and an external object, and the reflected sound signal is a sound signal obtained by reflecting the sound emitted by the simulated sound source by the external object and recording the sound by the radio. The watermark sound signal comprises a phase shifted reflected sound signal.

Based on the above, according to the sound watermark processing method and the sound watermark generating apparatus of the embodiment of the present invention, the sound signal reflected by the external object is simulated, and the simulated sound signal is encoded by shifting the phase, thereby generating the watermarked sound signal. Therefore, the common call receiving signal and the voice watermark signal can be simultaneously kept at the loudspeaker end. In addition, both signals can be eliminated by the existing echo cancellation algorithm, so that the voice signal on the call transmission path is not influenced.

Drawings

The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a diagram illustrating an example of a mobile device for a conference call;

FIG. 2 is a schematic diagram of a conference call system according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method of processing a sound watermark according to an embodiment of the invention;

FIG. 4 is a flow chart of a method of generating a sound watermark according to an embodiment of the invention;

FIG. 5 is a schematic diagram illustrating virtual reflection conditions according to one embodiment of the present invention;

FIG. 6 is a diagram illustrating a filtering process according to one embodiment of the invention;

FIG. 7 is a schematic diagram illustrating multiphase offset according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating two phase offsets according to an embodiment of the present invention;

FIG. 9A is a simulation diagram illustrating an example of a call receiving voice signal;

fig. 9B is a diagram illustrating an example of simulation of embedding a watermark signal;

FIG. 10 is a flow chart illustrating watermark identification according to one embodiment of the present invention.

Description of the reference numerals

M is a mobile device;

S1-S3, sound signals;

s, a loudspeaker;

r is a radio;

sp is the user;

c, echo cancellation;

fp is feedback path;

1: a voice communication system;

10. 20, conference terminals;

50, a cloud server;

11. 21, a radio;

13. 21, a loudspeaker;

15. 25, 55, a communication transceiver;

17. 27, 57 a memory;

19. 29, 59, a processor;

70, sound watermark generating device;

s310 to S350, S410 to S450, S910 to S950;

S _Rx receiving voice signals during conversation;

S _Tx a voice signal is transmitted in a call;

S _WM 、S _WM1 watermark audio signals;

S _Rx +S _WM embedding a watermark signal;

S’ _Rx 、S” _Rx 、

S _90° 、S _WO reflecting the sound signal;

w is a wall;

γ _w a reflection coefficient;

d _s 、d _w distance;

SS is a sound source;

W _O 、W _E a watermark identifier;

a phase shift;

S _A 、

a sound signal is transmitted.

Detailed Description

Reference will now be made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.

Fig. 2 is a schematic diagram of a conference call system 1 according to an embodiment of the present invention. Referring to fig. 2, the voice communication system 1 includes, but is not limited to, a

conference terminal

10,20 and a cloud server 50.

The

conference terminal

10,20 may be a wired phone, a mobile phone, a network phone, a tablet computer, a desktop computer, a notebook computer, or a smart speaker.

The conference terminal 10 includes, but is not limited to, a radio 11, a speaker 13, a communications transceiver 15, a memory 17, and a processor 19.

The radio receiver 11 may be a moving coil (dynamic), capacitor (Condenser), or Electret Condenser (Electret Condenser), and the radio receiver 11 may be a combination of other electronic components, analog-to-digital converters, filters, and audio processors, which can receive sound waves (e.g., human voice, ambient sound, machine operation sound, etc.) and convert the sound waves into sound signals. In one embodiment, the radio receiver 11 is used for receiving/recording a voice signal to a caller to obtain a call receiving voice signal. In some embodiments, the call receiving sound signal may include the speaker's voice, the sound emitted by speaker 13, and/or other ambient sounds.

The loudspeaker 13 may be a horn or a loudspeaker. In one embodiment, the speaker 13 is used to play sound.

The communication transceiver 15 is, for example, a transceiver supporting a wired network such as an Ethernet (Ethernet), a fiber optic network, or a cable (which may include (but is not limited to) components such as a connection interface, a signal converter, a communication protocol processing chip), or a wireless network such as a Wi-Fi, a fourth generation (4G), a fifth generation (5G), or a later generation mobile network (which may include (but is not limited to) components such as an antenna, a digital-to-analog/analog-to-digital converter, a communication protocol processing chip). In one embodiment, the communication transceiver 15 is used to transmit or receive data.

The Memory 17 may be any type of fixed or removable Random Access Memory (RAM), read Only Memory (ROM), flash Memory (flash Memory), hard Disk Drive (HDD), solid-State Drive (SSD), or the like. In one embodiment, the memory 17 is used for storing program codes, software modules, configuration configurations, data (e.g., audio signals, watermark identifiers, or watermark audio signals) or files.

The processor 19 is coupled to the radio 11, the speaker 13, the communication transceiver 15, and the memory 17. The Processor 19 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other Programmable general purpose or special purpose Microprocessor (Microprocessor), digital Signal Processor (DSP), programmable controller, field Programmable Gate Array (FPGA), application-Specific Integrated Circuit (ASIC), or other similar components or combinations thereof. In one embodiment, the processor 19 is configured to execute all or part of the operations of the conference terminal 10, and can load and execute the software modules, files and data stored in the memory 17.

The conference terminal 20 includes, but is not limited to, a radio 21, a speaker 23, a communication transceiver 25, a memory 27, and a processor 29. The embodiments and functions of the radio 21, the speaker 23, the communication transceiver 25, the memory 27 and the processor 29 may refer to the description of the radio 11, the speaker 13, the communication transceiver 15, the memory 17 and the processor 19, which will not be repeated herein. And processor 29 is configured to execute all or part of the operations of the conference terminal 20 and can load and execute the software modules, files and data stored in memory 27.

The cloud server 50 is directly or indirectly connected to the

conference terminal

10,20 via the network. The cloud server 50 may be a computer system, a server, or a signal processing device. In one embodiment, the

conference terminal

10,20 may also serve as the cloud server 50. In another embodiment, the cloud server 50 may be a separate cloud server from the

conference terminal

10,20. In some embodiments, the cloud server 50 includes (but is not limited to) the same or similar communication transceiver 55, memory 57, and processor 59, and the implementation and functions of the components will not be described again.

In one embodiment, the sound watermark generating apparatus 70 may be the

conference terminal

10,20 or the cloud server 50. The sound watermark generating device 70 is used to generate a sound watermark signal, and will be described in detail in the following embodiments.

Hereinafter, the method according to the embodiment of the present invention will be described with reference to various devices, components and modules in the conference communication system 1. The various processes of the method may be adjusted according to the implementation, and are not limited thereto.

It should be noted that, for convenience of description, the same components may perform the same or similar operations, and are not described in detail again. For example, the processor 19 of the conference terminal 10, the processor 19 of the conference terminal 20, and/or the processor 59 of the cloud server 50 may implement the same or similar methods according to the embodiments of the present invention.

Fig. 3 is a flowchart of a processing method of sound watermarking according to an embodiment of the present invention. Referring to fig. 3, the processor 29 obtains the call receiving sound signal S by recording through the radio 21 _Rx (step S310). Specifically, assume that the

conference terminal

10,20 establishes a conference call. For example, a conference is established by video software, voice call software, or a telephone call, and the speaker can start speaking. After being recorded/picked up by the radio 21, the processor 29 can obtain the call receiving sound signal S _Rx . This call receives a sound signal S _Rx As related to the voice content of the corresponding speaker of the conference terminal 20 (and may also include ambient sounds or other noise). The processor 29 of the conference terminal 20 may transmit the call reception sound signal S through the communication transceiver 25 (i.e., via the network interface) _Rx . In some embodiments, the call receives a sound signal S _Rx Possibly via echo cancellation, noise filtering, and/or other sound signal processing.

The processor 59 of the cloud server 50 receives the call reception sound signal S from the conference terminal 20 through the communication transceiver 55 _Rx . Processor 59 generates reflected sound signal S 'from the virtual reflection condition and the call reception sound signal' _Rx (step S330). Specifically, the general echo cancellation algorithm adaptively cancels a component belonging to a reference signal (for example, a call reception sound signal S of a call reception path) in a sound signal received from the outside by the radio 11,21 _Rx ). The sound recorded by the

radio

11,21 includes the shortest path from the

speaker

13,23 to the

radio

11,21 and the different reflected paths of the environment (i.e., the paths formed by the sound reflecting off of external objects). Reflected soundThe tone signal is influenced by the reflection coefficient of the object being reflected, and the location of the reflection influences the time delay and attenuation of the sound signal. Furthermore, the reflected sound signals may also come from different directions, leading to a phase shift. In the embodiment of the invention, the sound signal S of the known call receiving path is utilized _Rx Generating a virtual/simulated reflected sound signal which can be cancelled by an echo cancellation mechanism and, in dependence thereon, generating a sound watermark signal S _WM 。

FIG. 4 is a sound watermark S according to an embodiment of the invention _WM Is generated according to the method of (1). Referring to FIG. 4, the processor 59 may set a virtual reflection condition to generate the reflected sound signal S' _Rx (step S410). Specifically, the virtual reflection condition includes a positional relationship between the

radio

11,21, a sound source (e.g., a talker, a speaker 13,23), and an external object (e.g., a wall, a ceiling, furniture, or a person). For example, the distance between the radio receiver 11 and the external object, the distance between the radio receiver 11 and the sound source, and/or the distance between the sound source and the external object. And reflects the sound signal S' _Rx The sound signal is obtained by reflecting the sound emitted by the simulated sound source by an external object and recording the sound by the

radio

11,21.

In one embodiment, the processor 59 may determine the reflected sound signal S 'according to the position relationship and the reflection coefficient of the external object' _Rx Compared with the call receiving sound signal S _Rx Time delay and amplitude decay. For example, FIG. 5 is a schematic diagram illustrating a virtual reflection condition according to an embodiment of the invention. Referring to fig. 5, assuming that the virtual reflection condition is a single wall (i.e., an external object), the reflection coefficient of the wall W is γ _w (e.g., 0.7, 0.3, or 1). The distance between the radio 21 and the sound source SS is d _s (e.g., 0.3, 0.5, or 0.8 meters) and the distance d between the radio 21 and the wall W _w (e.g., 1, 1.5 or 2 meters) of the acoustic signal S' _Rx Receiving a voice signal S in connection with a call _Rx Can be expressed as follows:

wherein T is _s For the sampling time, v _s Then the speed of sound and n is the sampling point or time.

If the reflected sound signal S 'is set' _Rx Compared with the call receiving sound signal S _Rx With time delay of gamma _w And amplitude attenuation alpha _w Then, sound signal S 'is reflected' _Rx Receiving a voice signal S in connection with a call _Rx Can be expressed as follows:

s′ _Rx (n)＝α _w ·s _Rx (n-n _w )…(2)

. According to the equations (1) and (2), it can be obtained:

wherein n is _f The time delay incurred for the filter (optionally, and as will be described in further detail below),

the time delay caused by the phase offset (optional, and will be described in further detail in the following embodiments).

It should be noted that the variation in the virtual reflection condition can be further adjusted according to different design requirements. For example, more than one foreign object or relative position.

Referring to fig. 3, the processor 59 generates the watermark identifier W _O Offset reflected sound signal S' _Rx To generate a watermark sound signal S _WM (step S350). Specifically, in the conventional echo cancellation scheme, the time delay and amplitude variation of the reflected sound signal have a larger effect on the error of the echo cancellation scheme than the phase shift of the reflected sound signal. This change is as if it were in a completely new interference environment and makes the echo cancellation mechanism necessaryTo be adapted again. Thus, the watermark identifier W of embodiments of the present invention _O Corresponding to different values in the audio watermark signal S _WM There is only a phase difference, but the time delay and amplitude are the same. I.e. the watermark sound signal S _WM Comprising one or more phase shifted reflected sound signals S' _Rx 。

Referring to fig. 4, in one embodiment, processor 59 may select a filter to generate a filtered reflected sound signal S " _Rx (step S430). Specifically, a general echo cancellation mechanism has a slow convergence rate (e.g., 3 kilohertz (kHz) or below 4 kHz) for processing a low-frequency sound signal, but has a fast convergence rate (e.g., 10 milliseconds (ms) or below) for processing a high-frequency sound signal (e.g., 3kHz or above 4 kHz). Thus, processor 59 may target only reflected sound signals S 'of high frequencies (e.g., 4kHz, 5kHz and above)' _Rx The phase shift is made and the interference of the signal is made less perceptible to humans (i.e., the frequency of the high frequency sound signal is outside the human hearing range).

For example, fig. 6 is a diagram illustrating a filtering process according to an embodiment of the invention. Referring to FIG. 6, the processor 59 may process the reflected sound signal S 'through the low pass filter LPF' _Rx Low pass filtering processing is performed to output the reflected sound signal through the low pass filtering processing

For example, the low pass filter LPF blocks signals above 4kHz and allows only signals below 4kHz to pass through. On the other hand, the processor 59 may pair the reflected sound signal S 'through the high-pass filter HPF' _Rx Performing high-pass filtering processing to output the reflected sound signal processed by the high-pass filtering processing

For example, the high pass filter HPF blocks signals below 4kHz and allows only signals above 4kHz to pass.

In another embodiment, processor 59 may not reflect sound signal S' _Rx Filtering processing of specific frequency. I.e. reflecting the sound signal S " _Rx Is equal to the reflected sound signal S' _Rx 。

Referring to fig. 4, the processor 59 may be configured to determine the watermark identifier W _O For reflected sound signal S " _Rx The phase shift is performed (step S450). In an embodiment, the watermark identifier W _O Is encoded in a multi-level scheme, and the multi-level scheme is applied to the watermark identifier W _O Each of the one or more bits of provides a plurality of values. In binary system, for example, watermark identifier W _O The value of each bit in (a) may be "0" or "1". Taking hexadecimal system as an example, the watermark identifier W _O The value of each bit in (a) may be "0", "1", "2", …, "E", "F". In another embodiment, the watermark identifier is encoded in letters, words and/or symbols. For example, the watermark identifier W _O The value of each bit in (a) may be any one of the english "a" to "Z".

In an embodiment, the watermark identifier W _O Those different values at each bit of (a) correspond to different phase offsets. For example, fig. 7 is a diagram illustrating multiphase offset according to an embodiment of the invention. Referring to fig. 7, assume watermark identifier W _O Is an N-bit system (N being a positive integer), N values may be provided for each bit. The N different values correspond to different phase offsets respectively

Fig. 8 is a diagram illustrating two phase offsets according to an embodiment of the invention. Referring to fig. 7, assume that the watermark identifier W _O Is binary, 2 values (i.e., 1 and 0) may be provided for each bit. These 2 different values correspond to two phase offsets, respectively

For example, phase shift

Is 90 DEG and is phase shifted

Is-90 ° (i.e., -1).

The processor 59 may be responsive to the watermark identifier W _O Is shifted by the value of one or more bits of the reflected sound signal S " _Rx The phase of (c). Taking fig. 7 as an example, the processor 59 bases on the watermark identifier W _O Selecting a phase offset by one or more values of

And using a selected phase offset

Is performed with a phase shift. For example, the watermark identifier W _O Is 1, the outputted phase shifted reflected sound signal

With respect to reflected sound signal S " _Rx Offset of

The rest of the reflected sound signal

And so on. The phase shift can be achieved by using Hilbert transform (Hilbert transform) or other phase shift algorithm.

In an embodiment, the watermark identifier comprises a plurality of bits. This watermarked sound signal S _WM Comprising a plurality of phase shifted reflected sound signals, each phase shifted reflected sound signal occupying a watermark sound signal S _WM The length of time in (1). Suppose that the time length of each bit is L _b (e.g., 0.1, 0.5, or 1 second, and greater than the time delay n _w ) And (4) showing. Similar to the concept of time division multiplexing, the processor 59 watermarks the sound signal S _WM According to the watermark identifier W _O The included number of bits is divided into sub-time units of the same or different time lengths, and each sub-time unitCarries phase-shifted reflected sound signals corresponding to different bits.

In one embodiment, if the filtering process of FIG. 6 is employed, processor 59 may synthesize one or more phase shifted reflected sound signals and a reflected sound signal processed by low pass filtering

Using FIG. 8 as an example, reflected sound signal processed by high-pass filtering

Through a phase shift of 90 DEG

(Generation of phase-shifted reflected Sound Signal S _90° ) And outputs a phase-shifted reflected sound signal S _WO . The processor 59 further synthesizes the reflected sound signal processed by the low-pass filtering

And a phase-shifted reflected sound signal S _WO To generate a watermarked sound signal S _WM1 。

In some embodiments, the processor 59 may generate a plurality of identical watermark sound signals. These watermark sound signals correspond to different master time units, respectively. I.e. cyclically outputting the watermarked sound signal. To distinguish between adjacent watermark sound signals, the processor 59 may add a space between adjacent watermark sound signals. For example, a mute signal or other known high frequency sound signal is added at intervals.

In one embodiment, the processor 59 may transmit the call receiving sound signal S through the communication transceiver 55 respectively _Rx And watermark sound signal S _WM . In another embodiment, the processor 59 may synthesize the call receiving sound signal S _Rx And watermark sound signal S _WM To generate an embedded watermark signal S _Rx +S _WM . The processor 59 may then transmit the embedded watermark signal S via the communication transceiver 55 _Rx +S _WM 。

FIG. 9A is a diagram illustrating an example of a call receiving audio signal S _Rx And fig. 9B is an example illustrating the embedded watermark signal S _Rx +S _WM A simulation diagram of (1). Referring to fig. 9A and 9B, the two sounds are very close and difficult or impossible for a person to distinguish.

The processor 19 of the conference terminal 10 receives the watermarked sound signal S via the network through the communication transceiver 15 _WM Or embedding a watermark signal S _Rx +S _WM To obtain a transmission audio signal S _A (i.e. the transmitted watermark sound signal S _WM Or embedding the watermark signal S _Rx +S _WM ). Due to the watermark sound signal S _WM Including a speech reception sound signal (i.e., a reflected sound signal) delayed in time and attenuated in amplitude, the echo cancellation mechanism of the processor 19 is effective to cancel the watermark sound signal S _WM . Thus, the voice signal S can be transmitted without affecting the communication on the communication transmission path _Tx (e.g., a call intended by the conference terminal 10 to transmit via a network receives a voice signal).

For a watermarked sound signal S _WM Fig. 10 is a flow chart illustrating watermark identification according to an embodiment of the present invention. Referring to fig. 10, in an embodiment, if the filtering process of fig. 6 is adopted, the processor 19 may use the same or similar high pass filter HPF to transmit the sound signal S _A High-pass filtering processing is performed (step S910) to output a transmission sound signal through the high-pass filtering processing

In another embodiment, if the filtering process of fig. 6 is not used, step S910 (i.e., transmitting the sound signal) can be omitted

Equivalent to transmitting a sound signal S _A )。

The processor 19 may shift the transmission sound signal according to the correspondence between the value and the phase shift in step S450

Is detected (i.e., step S930, phase shift is performed). Taking fig. 8 as an example, the processor 19 generates a transmission sound signal phase-shifted by 90 °

The processor 19 may be responsive to the transmitted sound signal

And phase-shifted transmitted sound signals

The correlation between the watermark identifiers W _E (step S950). For example, the processor 19 will transmit a sound signal

And transmitting the sound signal

At a time delay of n _w To calculate the orthogonal cross correlation R _xy (n _w ) And-1. Ltoreq. R _xy (n _w ) Less than or equal to 1. Processor 19 defines a threshold Th _R Then watermark identifier W _E Can be expressed as:

i.e. if the correlation is above the threshold Th _R Then processor 19 determines that the value of this bit is a value corresponding to a phase offset of 90 (e.g., 1); if the correlation is below threshold Th _R Then the processor 19 determines that the value of this bit is a value corresponding to a phase offset of-90 deg. (e.g., 0). In another embodiment, processor 19 may transmit the sound signal through deep learning based classifier identification

Corresponding values in different time units of the order.

In summary, in the method for processing an audio watermark and the apparatus for generating an audio watermark according to the embodiments of the present invention, a reflected audio signal is simulated according to the principle of an echo cancellation mechanism, and the audio watermark signal is encoded by shifting the phase of the reflected audio signal. Therefore, at the receiving end, the voice watermark signal obtained through the feedback path can be eliminated by the echo cancellation mechanism, and the voice watermark signal will not affect the communication transmission signal on the communication transmission path.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A processing method of sound watermark is suitable for a conference terminal, the conference terminal comprises a radio receiver, and the processing method of the sound watermark comprises the following steps:

acquiring a call receiving sound signal through the radio;

generating a reflected sound signal according to a virtual reflection condition and the call receiving sound signal, wherein the virtual reflection condition comprises the position relation among the radio, a sound source and an external object, and the reflected sound signal is a sound signal obtained by simulating the sound emitted by the sound source, reflecting the sound by the external object and recording the sound by the radio; and

shifting a phase of the reflected sound signal according to a watermark identifier to generate a watermark sound signal, wherein the watermark sound signal comprises at least the reflected sound signal shifted in phase.

2. The method of claim 1, wherein the step of generating the reflected sound signal according to the virtual reflection condition and the call receiving sound signal comprises:

and determining the time delay and amplitude attenuation of the reflected sound signal compared with the call receiving sound signal according to the position relation and the reflection coefficient of the external object.

3. The method of processing a sound watermark according to claim 1, wherein the watermark identifier is encoded in a multi-carry scheme that provides a plurality of values in each of at least one bit of the watermark identifier, and the step of offsetting the phase of the reflected sound signal according to the watermark identifier comprises:

shifting a phase of the reflected sound signal according to a value of the bit in the watermark identifier, wherein different of the values correspond to different phase shifts.

4. The method of processing a sound watermark according to claim 3, wherein the bits of the watermark identifier comprise a plurality of bits, the watermark sound signal comprises a plurality of the phase-shifted reflected sound signals, and each of the phase-shifted reflected sound signals occupies a length of time in the watermark sound signal.

5. The method of processing a sound watermark according to claim 1, wherein the step of shifting the phase of the reflected sound signal according to the watermark identifier is preceded by the step of:

low-pass filtering the reflected sound signal; and

performing a high-pass filtering process on the reflected sound signal, wherein only the phase of the reflected sound signal subjected to the high-pass filtering process is shifted, and the step of generating the watermark sound signal further includes:

synthesizing the phase-shifted reflected sound signal and the reflected sound signal processed by the low-pass filtering.

6. The method of claim 1, wherein the method further comprises:

receiving a transmission sound signal via a network, wherein the transmission sound signal includes the transmitted watermark sound signal;

shifting a phase of the transmission sound signal; and

identifying the watermark identifier based on a correlation between the transmitted sound signal and the phase-shifted transmitted sound signal.

7. A sound watermark generation apparatus comprising:

a memory to store program code; and

a processor coupled to the memory, wherein the processor is configured to load and execute the program code to:

obtaining a call receiving sound signal, wherein the call receiving sound signal is obtained by recording through a radio;

8. The sound watermark generation apparatus of claim 7, wherein the processor is further configured to:

9. The sound watermark generation device of claim 7, wherein the watermark identifier is encoded in a multi-level scheme that provides a plurality of values in each of at least one bit of the watermark identifier, and the processor is further configured to:

10. The sound watermark generation apparatus according to claim 9, wherein the bits of the watermark identifier comprise a plurality of bits, the watermark sound signal comprises a plurality of the phase-shifted reflected sound signals, and each of the phase-shifted reflected sound signals occupies a length of time in the watermark sound signal.

11. The sound watermark generation apparatus of claim 7, wherein the processor is further configured to:

low-pass filtering the reflected sound signal;

performing a high-pass filtering process on the reflected sound signal, wherein only a phase of the reflected sound signal subjected to the high-pass filtering process is shifted; and

12. The sound watermark generation apparatus according to claim 7, wherein the watermark identifier is identified based on a correlation between the transmitted watermark sound signal and the phase-shifted watermark sound signal.