TWI740460B

TWI740460B - Voice watermark processing method, voice system and voice watermark processing device

Info

Publication number: TWI740460B
Application number: TW109113032A
Authority: TW
Inventors: 楊淳凱
Original assignee: 宏碁股份有限公司
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2021-09-21
Also published as: TW202141465A

Abstract

A voice watermark processing method, a voice system and a voice watermark processing device are provided. The voice watermark processing method includes the following steps. A first voice signal is received during a first time period. A predicted voice signal corresponding a second time period is predicted based on the first voice signal. A voice strength information of the predicted voice signal over time and frequency is analyzed. Based on the voice strength information of the predicted voice signal, a superimposed watermark signal is generated. A second voice signal is received during the second time period. The superimposed watermark signal is played so that the superimposed watermark signal and the second voice signal are superimposed.

Description

Voice floating watermark processing method, voice system and voice floating water Print processing device

本發明是有關於一種訊號處理方法、電子系統及處理裝置，且特別是有關於一種語音浮水印處理方法、語音系統及語音浮水印處理裝置。 The present invention relates to a signal processing method, an electronic system and a processing device, and more particularly to a voice watermark processing method, a voice system and a voice watermark processing device.

對於演講者或者為了保護自己談話作品的人，往往會禁止有人側錄，以避免不友善的分享。然而，錄音裝置的體積不斷縮小，甚至演變出各種偽裝造型。在實際場合中，難以完全杜絕側錄情況。 For speakers or people who want to protect their own conversation work, it is often forbidden to sniff out, so as to avoid unfriendly sharing. However, the volume of recording devices has been shrinking, and even a variety of camouflage shapes have evolved. In actual situations, it is difficult to completely eliminate the case of skimming.

再者，公眾人物在公開場合發表的談話可能會遭到有心人士惡意剪接，扭曲談話原意。輕者毀壞談話者的名譽，重者可能會嚴重破壞社會秩序。 Furthermore, the conversations of public figures in public may be maliciously edited by interested people, distorting the original meaning of the conversation. The lighter ones destroy the reputation of the interviewer, and the more serious ones may severely disrupt social order.

因此，研究人員正極力研發一種語音浮水印技術，期望透過語音浮水印能夠追蹤出散布源頭，並且期望透過語音浮水印來檢視語音是否遭到惡意剪接。 Therefore, researchers are striving to develop a voice watermark technology, hoping to track the source of the distribution through the voice watermark, and hope to check whether the voice has been maliciously spliced through the voice watermark.

此外，語音浮水印需要在談話時就能夠立即加入，才能夠取得實際的保護效果。如何達成即時處理亦為目前技術上的瓶頸之一。 In addition, the voice watermark needs to be added immediately when talking to achieve the actual protection effect. How to achieve real-time processing is also one of the current technical bottlenecks.

本發明係有關於一種語音浮水印處理方法、語音系統及語音浮水印處理裝置，其可以在談話者或演講者談話時，即時產生適應性的疊加浮水印訊號，使疊加浮水印訊號與語音訊號疊加。如此一來，對方在側錄時，會錄下疊加浮水印訊號與語音訊號。疊加浮水印訊號含有標準時間、或地點等資訊。日後若經過散布，則可以解析出疊加浮水印訊號，而知道散布源頭。並且，可透過疊加浮水印訊號來檢視語音訊號是否遭到剪接。 The present invention relates to a voice watermark processing method, a voice system and a voice watermark processing device, which can generate an adaptive superimposed watermark signal in real time when the talker or lecturer talks, so that the superimposed watermark signal and the voice signal can be superimposed Overlay. In this way, the other party will record the superimposed watermark signal and voice signal when recording. The superimposed watermark signal contains information such as the standard time or location. If it is spread in the future, the superimposed watermark signal can be analyzed and the source of the spread can be known. And, you can check whether the voice signal has been spliced by superimposing the watermark signal.

根據本發明之第一方面，提出一種語音浮水印處理方法。語音浮水印處理方法包括以下步驟。於一第一時間區間接收一第一語音訊號。依據第一語音訊號預測一第二時間區間之一預測語音訊號。分析預測語音訊號隨時間與頻率之一語音強度資訊。依據預測語音訊號之語音強度資訊，產生一疊加浮水印訊號。於第二時間區間接收一第二語音訊號。播放疊加浮水印訊號，以使疊加浮水印訊號與第二語音訊號疊加。 According to the first aspect of the present invention, a voice watermark processing method is proposed. The voice watermark processing method includes the following steps. A first voice signal is received in a first time interval. Predicting a predicted voice signal in one of the second time intervals based on the first voice signal. Analyze and predict the speech intensity information of the speech signal over time and frequency. According to the voice intensity information of the predicted voice signal, a superimposed watermark signal is generated. Receive a second voice message in the second time interval No. Play the superimposed watermark signal so that the superimposed watermark signal and the second voice signal are superimposed.

根據本發明之第二方面，提出一種語音系統。語音系統包括一收音裝置、一語音浮水印處理裝置及一播放裝置。收音裝置用以於一第一時間區間接收一第一語音訊號。語音浮水印處理裝置包括一預測單元、一分析單元及一浮水印產生單元。預測單元用以依據第一語音訊號預測一第二時間區間之一預測語音訊號。分析單元用以分析預測語音訊號隨時間與頻率之一語音強度資訊。浮水印產生單元用以依據預測語音訊號之語音強度資訊，產生一疊加浮水印訊號。收音裝置更於第二時間區間接收一第二語音訊號。播放裝置用以播放疊加浮水印訊號，以使疊加浮水印訊號與第二語音訊號疊加。 According to the second aspect of the present invention, a speech system is provided. The voice system includes a radio device, a voice watermark processing device and a playback device. The radio device is used for receiving a first voice signal in a first time interval. The voice watermark processing device includes a prediction unit, an analysis unit and a watermark generation unit. The prediction unit is used for predicting a predicted voice signal in a second time interval based on the first voice signal. The analysis unit is used for analyzing and predicting the speech intensity information of the speech signal over time and frequency. The watermark generating unit is used for generating a superimposed watermark signal based on the voice intensity information of the predicted voice signal. The radio device further receives a second voice signal in the second time interval. The playback device is used for playing the superimposed watermark signal so that the superimposed watermark signal and the second voice signal are superimposed.

根據本發明之第三方面，提出一種語音浮水印處理裝置。語音浮水印處理裝置包括一預測單元、一分析單元及一浮水印產生單元。預測單元用以依據一第一時間區間接收之一第一語音訊號預測一第二時間區間之一預測語音訊號。分析單元用以分析預測語音訊號隨時間與頻率之一語音強度資訊。浮水印產生單元用以依據預測語音訊號之語音強度資訊，產生一疊加浮水印訊號。疊加浮水印訊號用以與第二時間區間接收之一第二語音訊號疊加。 According to the third aspect of the present invention, a voice watermark processing device is provided. The voice watermark processing device includes a prediction unit, an analysis unit and a watermark generation unit. The prediction unit is used for predicting a predicted voice signal in a second time interval according to a first voice signal received in a first time interval. The analysis unit is used for analyzing and predicting the speech intensity information of the speech signal over time and frequency. The watermark generating unit is used for generating a superimposed watermark signal based on the voice intensity information of the predicted voice signal. The superimposed watermark signal is used for superimposing a second voice signal received in the second time interval.

為了對本發明之上述及其他方面有更佳的瞭解，下文特舉實施例，並配合所附圖式詳細說明如下： In order to have a better understanding of the above and other aspects of the present invention, the following specific examples are given in conjunction with the accompanying drawings to describe in detail as follows:

100,100’:收音裝置 100,100’: Radio device

200,200’:語音浮水印處理裝置 200,200’: Voice watermark processing device

210:預測單元 210: prediction unit

220:分析單元 220: Analysis unit

230:浮水印產生單元 230: watermark generation unit

240,250:傳輸單元 240, 250: transmission unit

300,300’:播放裝置 300,300’: playback device

900:網路 900: Internet

1000,1000’:語音系統 1000,1000’: Voice system

MS,MS2,MS3,MS4:疊加浮水印訊號 MS, MS2, MS3, MS4: superimposed watermark signal

MS0:模板浮水印訊號 MS0: Template watermark signal

S110,S120,S130,S140,S141,S142,S143,S144,S150,S160:步驟 S110, S120, S130, S140, S141, S142, S143, S144, S150, S160: steps

S1:語音強度資訊 S1: Voice intensity information

S2:模板強度資訊 S2: Template strength information

S3:疊加強度資訊 S3: Overlay intensity information

TP1,TP2,TP3,TP4:時間區間 TP1, TP2, TP3, TP4: time interval

VS,VS’,VS1,VS2,VS3,VS4:語音訊號 VS,VS’,VS1,VS2,VS3,VS4: Voice signal

VS2’,VS3’,VS4’:預測語音訊號 VS2’, VS3’, VS4’: Predict the voice signal

α:疊加值 α : superimposed value

β:預定值 β : predetermined value

第1圖繪示根據一實施例之語音系統的示意圖。 Figure 1 is a schematic diagram of a speech system according to an embodiment.

第2圖繪示根據另一實施例之語音系統的示意圖。 Figure 2 is a schematic diagram of a voice system according to another embodiment.

第3圖說明語音訊號與疊加浮水印訊號之關係。 Figure 3 illustrates the relationship between the voice signal and the superimposed watermark signal.

第4圖繪示根據一實施例之語音系統之方塊圖。 Figure 4 shows a block diagram of a speech system according to an embodiment.

第5圖繪示根據一實施例之語音浮水印處理方法的流程圖。 Figure 5 shows a flowchart of a voice watermark processing method according to an embodiment.

第6圖繪示根據一實施例之步驟S140的細部流程圖。 FIG. 6 shows a detailed flowchart of step S140 according to an embodiment.

第7圖繪示根據一實施例之模板浮水印訊號的示意圖。 Figure 7 is a schematic diagram of a template watermark signal according to an embodiment.

第8圖繪示連續執行語音浮水印處理方法之示意圖。 Figure 8 shows a schematic diagram of a method for continuously executing voice watermark processing.

請參照第1圖，其繪示根據一實施例之語音系統1000的示意圖。在一實施例中，語音系統1000例如是由麥克風、桌上型電腦與擴音機所組成的系統。語音系統1000包括一收音裝置100、一語音浮水印處理裝置200及一播放裝置300。在第1圖之實施例中，例如是演講者手持著收音裝置100(例如是麥克風)進行演講，語音訊號VS傳輸至語音浮水印處理裝置200後，產生疊加浮水印訊號MS。透過播放裝置300(例如是擴音機)同時播放疊加浮水印訊號MS與語音訊號VS，以使兩者疊加。如此一來，聽講者在側錄時，會錄下疊加浮水印訊號MS與語音訊號VS。疊加浮水印訊號MS含有標準時間、或地點等資訊。日後若經過散布，則可以解析出疊加浮水印訊號MS，而知道散布源頭。並且，可透過疊加浮水印訊號MS來檢視語音訊號VS是否遭到剪接。 Please refer to FIG. 1, which shows a schematic diagram of a speech system 1000 according to an embodiment. In one embodiment, the voice system 1000 is, for example, a system composed of a microphone, a desktop computer, and an amplifier. The voice system 1000 includes a radio device 100, a voice watermark processing device 200, and a playback device 300. In the embodiment of FIG. 1, for example, the speaker is holding the radio device 100 (for example, a microphone) to give a speech. After the voice signal VS is transmitted to the voice watermark processing device 200, the superimposed watermark signal MS is generated. Play the superimposed watermark signal MS and the voice signal VS at the same time through the playback device 300 (for example, an amplifier) to Make the two superimpose. In this way, the listener will record the superimposed watermark signal MS and the voice signal VS when recording. The superimposed watermark signal MS contains information such as standard time or location. If it is dispersed in the future, the superimposed watermark signal MS can be analyzed and the source of the dispersion can be known. In addition, it is possible to check whether the voice signal VS has been spliced by superimposing the watermark signal MS.

請參照第2圖，其繪示根據另一實施例之語音系統1000’的示意圖。在一實施例中，語音系統1000例如是由筆記型電腦、伺服器所組成的系統。語音系統1000’包括一收音裝置100’、一語音浮水印處理裝置200’及一播放裝置300’。在第2圖之實施例中，例如是員工以筆記型電腦進行視訊會議。在會議進行中，員工對著收音裝置100’(例如是筆記型電腦之麥克風)發言，語音訊號VS’透過網路900傳輸至語音浮水印處理裝置200’(例如是伺服器)後，產生疊加浮水印訊號MS’。接著，再透過網路將疊加浮水印訊號MS’與語音訊號VS’傳遞至播放裝置300’(例如是另一筆記型電腦的喇叭)，以同時播放疊加浮水印訊號MS’與語音訊號VS’，使兩者疊加。如此一來，對方在側錄時，會錄下疊加浮水印訊號MS’與語音訊號VS’。疊加浮水印訊號MS’含有標準時間、或地點等資訊。日後若經過散布，則可以解析出疊加浮水印訊號MS’，而知道散布源頭。並且，可透過疊加浮水印訊號MS’來檢視語音訊號VS’是否遭到剪接。 Please refer to Figure 2, which shows a schematic diagram of a voice system 1000' according to another embodiment. In one embodiment, the voice system 1000 is, for example, a system composed of a notebook computer and a server. The voice system 1000' includes a radio device 100', a voice watermark processing device 200', and a playback device 300'. In the embodiment shown in Figure 2, for example, employees use laptop computers to conduct video conferences. During the meeting, the employee speaks to the radio device 100' (such as the microphone of a notebook computer), and the voice signal VS' is transmitted to the voice watermark processing device 200' (such as a server) through the network 900, and then superimposed The watermark signal MS'. Then, the superimposed watermark signal MS' and the voice signal VS' are transmitted to the playback device 300' (for example, a speaker of another laptop) through the network to simultaneously play the superimposed watermark signal MS' and the voice signal VS' , Make the two superimpose. In this way, the other party will record the superimposed watermark signal MS’ and the voice signal VS’ when recording. The superimposed watermark signal MS’ contains information such as standard time or location. If it is spread in the future, the superimposed watermark signal MS' can be analyzed and the source of the spread can be known. In addition, it is possible to check whether the voice signal VS' has been spliced by superimposing the watermark signal MS'.

請參照第3圖，其說明語音訊號VS與疊加浮水印訊號MS之關係。語音訊號VS係由收音裝置100連續擷取。本實施例之疊加浮水印訊號MS係根據語音訊號VS之內容進行適應性微調(例如是隨著語音訊號VS之強度依比例增強/減弱)，而不是固定不變的訊號，如此可避免聽到與語音訊號VS衝突的雜音。 Please refer to Figure 3, which illustrates the relationship between the voice signal VS and the superimposed watermark signal MS. The voice signal VS is continuously captured by the radio device 100. The superimposed watermark signal MS in this embodiment is adaptively fine-tuned according to the content of the voice signal VS (for example, with the voice The intensity of the signal VS is increased/decreased in proportion), rather than a fixed signal, so that noise that conflicts with the voice signal VS can be avoided.

然而，由於產生疊加浮水印訊號MS需要處理時間，因此語音系統1000於時間區間TP1接收語音訊號VS1時，就利用語音訊號VS1來預測出對應於時間區間TP2的預測語音訊號VS2’，進而與模板浮水印訊號MS0交集後產生疊加浮水印訊號MS2。如此一來，即可在時間區間TP2接收到語音訊號VS2時，同時播出真實的語音訊號VS2及疊加浮水印訊號MS2。 However, since the generation of the superimposed watermark signal MS requires processing time, when the voice system 1000 receives the voice signal VS1 in the time interval TP1, it uses the voice signal VS1 to predict the predicted voice signal VS2' corresponding to the time interval TP2, and then combines it with the template The superimposed watermark signal MS2 is generated after the intersection of the watermark signal MS0. In this way, when the voice signal VS2 is received in the time interval TP2, the real voice signal VS2 and the superimposed watermark signal MS2 can be broadcast at the same time.

請參照第4圖，其繪示根據一實施例之語音系統1000之方塊圖。語音浮水印處理裝置200包括一預測單元210、一分析單元220、一浮水印產生單元230、一傳輸單元240及一傳輸單元250。語音浮水印處理裝置200例如是桌上型電腦、筆記型電腦、或遠端之伺服器。預測單元210、分析單元220、浮水印產生單元230例如是一電路、一晶片、一電路板、一程式模組、或儲存程式碼之記憶裝置。傳輸單元240及傳輸單元250例如是3.5mm音源連接埠、6.3mm音源連接埠、有線網路傳輸單元或無線網路傳輸模組。語音浮水印處理裝置200透過預測技術獲得預測語音訊號VS2’，進而與模板浮水印訊號MS0交集後產生疊加浮水印訊號MS2。如此一來，即可在時間區間TP2接收到語音訊號VS2時，同時播出語音訊號VS2及疊加浮水印訊號MS2。以下更搭配一流程圖詳細說明上述各項元件之運作。 Please refer to FIG. 4, which shows a block diagram of a speech system 1000 according to an embodiment. The voice watermark processing device 200 includes a prediction unit 210, an analysis unit 220, a watermark generation unit 230, a transmission unit 240, and a transmission unit 250. The voice watermark processing device 200 is, for example, a desktop computer, a notebook computer, or a remote server. The prediction unit 210, the analysis unit 220, and the watermark generation unit 230 are, for example, a circuit, a chip, a circuit board, a program module, or a memory device storing program codes. The transmission unit 240 and the transmission unit 250 are, for example, a 3.5 mm audio source port, a 6.3 mm audio source port, a wired network transmission unit, or a wireless network transmission module. The voice watermark processing device 200 obtains the predicted voice signal VS2' through the prediction technology, and then intersects the template watermark signal MS0 to generate the superimposed watermark signal MS2. In this way, when the voice signal VS2 is received in the time interval TP2, the voice signal VS2 and the superimposed watermark signal MS2 can be simultaneously broadcast. The following is a flowchart to explain the operation of the above components in detail.

請參照第3~5圖，第5圖繪示根據一實施例之語音浮水印處理方法的流程圖。在步驟S110中，收音裝置100於一第一時間區間接收一第一語音訊號(如第3圖所示，例如是於時間區間TP1接收語音訊號VS1)。 Please refer to Figures 3 to 5. Figure 5 shows a flowchart of a voice watermark processing method according to an embodiment. In step S110, the radio device 100 operates in a first time interval Receive a first voice signal (as shown in FIG. 3, for example, the voice signal VS1 is received in the time interval TP1).

接著，在步驟S120中，預測單元210依據第一語音訊號預測一第二時間區間之一預測語音訊號(如第3圖所示，例如是依據語音訊號VS1預測對應於時間區間TP2之預測語音訊號VS2’)。 Then, in step S120, the prediction unit 210 predicts a predicted voice signal in a second time interval based on the first voice signal (as shown in Figure 3, for example, it predicts the predicted voice signal corresponding to the time interval TP2 based on the voice signal VS1 VS2').

然後，在步驟S130中，分析單元220分析預測語音訊號隨時間與頻率之一語音強度資訊(如第3圖所示，例如是分析預測語音訊號VS2’隨時間與頻率之語音強度資訊S1)。在時頻圖(橫軸為時間，縱軸為頻率)中，不同頻率與不同時間對應到不同的強度。灰階較深者，代表強度高，灰階較淺者，代表強度低。在此步驟中，預測單元210係透過一長短期記憶網路(Long Short Term Memory network,LSTM)演算法獲得預測語音訊號VS2’。 Then, in step S130, the analysis unit 220 analyzes the voice intensity information of the predicted voice signal over time and frequency (as shown in FIG. 3, for example, analyzes the voice intensity information S1 of the predicted voice signal VS2' over time and frequency). In the time-frequency diagram (the horizontal axis is time, the vertical axis is frequency), different frequencies and different times correspond to different intensities. The darker gray scale represents high intensity, and the lighter gray scale represents low intensity. In this step, the prediction unit 210 obtains the predicted voice signal VS2' through a Long Short Term Memory network (LSTM) algorithm.

接著，在步驟S140中，浮水印產生單元230依據預測語音訊號之語音強度資訊，產生疊加浮水印訊號(如第3圖所示，例如是依據預測語音訊號VS2’之語音強度資訊S1，產生疊加浮水印訊號MS2)。步驟S140包括數個子步驟。請參照第6圖，其繪示根據一實施例之步驟S140的細部流程圖。步驟S140包括步驟S141~S144。在步驟S141中，浮水印產生單元230提供模板浮水印訊號MS0。舉例來說，請參照第7圖，其繪示根據一實施例之模板浮水印訊號MS0的示意圖。模板浮水印訊號MS0之模板強度資訊S2在時頻圖呈現一文字或一圖樣。文字或圖樣可以相關於標準時間或地點。模板浮水印訊號MS0之模板強度資訊S2係為二位元圖樣(僅有全黑或全白)。 Then, in step S140, the watermark generating unit 230 generates a superimposed watermark signal based on the voice intensity information of the predicted voice signal (as shown in Figure 3, for example, based on the voice strength information S1 of the predicted voice signal VS2', the superimposed watermark signal is generated. Watermark signal MS2). Step S140 includes several sub-steps. Please refer to FIG. 6, which shows a detailed flowchart of step S140 according to an embodiment. Step S140 includes steps S141 to S144. In step S141, the watermark generating unit 230 provides a template watermark signal MS0. For example, please refer to FIG. 7, which shows a schematic diagram of a template watermark signal MS0 according to an embodiment. The template strength information S2 of the template watermark signal MS0 presents a text or a pattern in the time-frequency diagram. The text or drawing can be related to the standard time or place point. The template intensity information S2 of the template watermark signal MS0 is a binary pattern (only all black or all white).

在步驟S142中，浮水印產生單元230判斷語音強度資訊S1是否高於一預定值β且模板強度資訊S2是否大於0。若語音強度資訊S1高於預定值β且模板強度資訊S2大於0，則進入步驟S143；若語音強度資訊S1不高於預定值或模板強度資訊S2不大於0，則進入步驟S144。 In step S142, the watermark generating unit 230 determines whether the voice strength information S1 is higher than a predetermined value β and the template strength information S2 is greater than zero. If the voice strength information S1 is higher than the predetermined value β and the template strength information S2 is greater than 0, then step S143 is entered; if the voice strength information S1 is not higher than the predetermined value or the template strength information S2 is not greater than 0, then step S144 is entered.

在步驟S143中，浮水印產生單元230設定疊加強度資訊S3為一疊加值α。在一實施例中，疊加值α例如是一固定值。在另一實施例中，疊加值α亦可以是語音強度資訊S1之一比例值(如0.1倍的語音強度資訊S1)。 In step S143, the watermark generating unit 230 sets the superimposition strength information S3 as a superimposition value α . In one embodiment, the superimposed value α is, for example, a fixed value. In another embodiment, the superimposed value α can also be a proportional value of the speech intensity information S1 (for example, 0.1 times the speech intensity information S1).

在步驟S144中，浮水印產生單元230設定疊加強度資訊S3為0。 In step S144, the watermark generating unit 230 sets the superimposition strength information S3 to 0.

上述步驟例如是下式(1)：

The above steps are, for example, the following formula (1):

其中，f為頻率，t為時間，S3(f,t)為各頻率與各時間之下的疊加強度資訊S3，S1(f,t)為各頻率與各時間之下的語音強度資訊S1，S2(f,t)為各頻率與各時間之下的模板強度資訊S2。 Among them, f is the frequency, t is the time, S 3( f,t ) is the superimposed intensity information S3 under each frequency and each time, and S 1( f,t ) is the speech intensity information under each frequency and each time S1, S 2 ( f, t ) are template intensity information S2 under each frequency and each time.

在每一頻率與時間均設定了疊加強度資訊S3後，即可獲得疊加浮水印訊號MS2。如第3圖所示，原本在模板浮水印訊號MS0所顯示的「t」圖樣，在模板浮水印訊號MS0與預測語音訊號VS2’取得交集後，疊加浮水印訊號MS2仍可大略看出「t」圖樣，而此圖樣係配合預測語音訊號VS2’而產生，不會衍生語音訊號VS2以外的雜音。 After setting the superimposed intensity information S3 for each frequency and time, the superimposed watermark signal MS2 can be obtained. As shown in Figure 3, the original "t" pattern displayed on the template watermark signal MS0, after the template watermark signal MS0 and the predicted voice signal VS2' have intersection, the superimposed watermark signal MS2 can still roughly see "t "pattern, This pattern is generated in conjunction with the predicted voice signal VS2', and no noise other than the voice signal VS2 will be generated.

接著，在步驟S150中，收音裝置100於第二時間區間接收第二語音訊號(如第3圖所示，例如是於時間區間TP2接收語音訊號VS2)。 Next, in step S150, the radio device 100 receives the second voice signal in the second time interval (as shown in FIG. 3, for example, the voice signal VS2 is received in the time interval TP2).

然後，在步驟S160中，播放裝置300播放疊加浮水印訊號，以使疊加浮水印訊號與之第二語音訊號疊加。如第3圖所示，播放裝置300係同時撥放對應於時間區間TP2之疊加浮水印訊號MS2與語音訊號VS2。側錄者在進行側錄時會同時錄到疊加浮水印訊號MS2與語音訊號VS2。 Then, in step S160, the playback device 300 plays the superimposed watermark signal so that the superimposed watermark signal is superimposed with the second voice signal. As shown in Figure 3, the playback device 300 simultaneously plays the superimposed watermark signal MS2 and the voice signal VS2 corresponding to the time interval TP2. The skimmer will simultaneously record the superimposed watermark signal MS2 and the voice signal VS2 during the skimming.

上述步驟係重複執行，以連續的播放疊加浮水印訊號MS與語音訊號VS。請參照第8圖，其繪示連續執行語音浮水印處理方法之示意圖。在第一次執行語音浮水印處理方法(第一語音訊號與第二語音訊號分別為語音訊號VS1及語音訊號VS2)時，利用時間區間TP1之語音訊號VS1來預測出時間區間TP2的預測語音訊號VS2’，進而與模板浮水印訊號MS0交集後產生疊加浮水印訊號MS2。 The above steps are performed repeatedly to continuously play the superimposed watermark signal MS and the voice signal VS. Please refer to Figure 8, which illustrates a schematic diagram of a method for continuously executing voice watermark processing. When the voice watermark processing method is executed for the first time (the first voice signal and the second voice signal are voice signal VS1 and voice signal VS2, respectively), the voice signal VS1 in the time interval TP1 is used to predict the predicted voice signal in the time interval TP2 VS2', and then intersect with the template watermark signal MS0 to generate a superimposed watermark signal MS2.

再次執行語音浮水印處理方法(第一語音訊號與第二語音訊號分別為語音訊號VS2及語音訊號VS3)時，利用時間區間TP2之語音訊號VS2來預測出時間區間TP3的預測語音訊號VS3’，進而與模板浮水印訊號MS0交集後產生疊加浮水印訊號MS3。 When the voice watermark processing method is executed again (the first voice signal and the second voice signal are voice signal VS2 and voice signal VS3, respectively), the voice signal VS2 in the time interval TP2 is used to predict the predicted voice signal VS3' in the time interval TP3, Then, after intersecting with the template watermark signal MS0, a superimposed watermark signal MS3 is generated.

同理，再次執行語音浮水印處理方法(第一語音訊號與第二語音訊號分別為語音訊號VS3及語音訊號VS4)時，利用時間區間 TP3之語音訊號VS3來預測出時間區間TP4的預測語音訊號VS4’，進而與模板浮水印訊號MS0交集後產生疊加浮水印訊號MS4，依此類推。 Similarly, when the voice watermark processing method is executed again (the first voice signal and the second voice signal are voice signal VS3 and voice signal VS4, respectively), the time interval is used The voice signal VS3 of TP3 is used to predict the predicted voice signal VS4' of the time interval TP4, which is then intersected with the template watermark signal MS0 to generate the superimposed watermark signal MS4, and so on.

根據上述實施例，在談話者或演講者談話時，可以即時產生適應性的疊加浮水印訊號MS，使疊加浮水印訊號MS與語音訊號VS疊加。如此一來，對方在側錄時，會錄下疊加浮水印訊號MS與語音訊號VS。疊加浮水印訊號MS含有標準時間、或地點等資訊。日後若經過散布，則可以解析出疊加浮水印訊號MS，而知道散布源頭。並且，可透過疊加浮水印訊號MS來檢視語音訊號VS是否遭到剪接。 According to the above-mentioned embodiment, when the talker or the lecturer is talking, the adaptive superimposed watermark signal MS can be generated in real time, so that the superimposed watermark signal MS and the voice signal VS are superimposed. In this way, the other party will record the superimposed watermark signal MS and the voice signal VS when recording. The superimposed watermark signal MS contains information such as standard time or location. If it is dispersed in the future, the superimposed watermark signal MS can be analyzed and the source of the dispersion can be known. In addition, it is possible to check whether the voice signal VS has been spliced by superimposing the watermark signal MS.

綜上所述，雖然本發明已以實施例揭露如上，然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作各種之更動與潤飾。因此，本發明之保護範圍當視後附之申請專利範圍所界定者為準。 In summary, although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Those with ordinary knowledge in the technical field to which the present invention belongs can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be subject to those defined by the attached patent application scope.

100:收音裝置 100: Radio device

200:語音浮水印處理裝置 200: Voice watermark processing device

210:預測單元 210: prediction unit

220:分析單元 220: Analysis unit

230:浮水印產生單元 230: watermark generation unit

240,250:傳輸單元 240, 250: transmission unit

300:播放裝置 300: playback device

1000:語音系統 1000: Voice system

MS2:疊加浮水印訊號 MS2: superimposed watermark signal

MS0:模板浮水印訊號 MS0: Template watermark signal

S1:語音強度資訊 S1: Voice intensity information

S2:模板強度資訊 S2: Template strength information

S3:疊加強度資訊 S3: Overlay intensity information

VS1,VS2:語音訊號 VS1, VS2: Voice signal

VS2’:預測語音訊號 VS2’: Predict the voice signal

α:疊加值 α: superimposed value

β:預定值 β: predetermined value

Claims

A voice watermark processing method includes: receiving a first voice signal in a first time interval; predicting a predicted voice signal in a second time interval based on the first voice signal; analyzing the predicted voice signal with time and frequency A voice strength information; generate a superimposed watermark signal based on the voice strength information of the predicted voice signal; receive a second voice signal in the second time interval; and play the superimposed watermark signal to make the superimposed watermark The signal is superimposed on the second voice signal; wherein in the step of analyzing the voice intensity information of the predicted voice signal over time and frequency, the predicted voice signal is calculated through a Long Short Term Memory network (LSTM) By the law.

The voice watermark processing method according to claim 1, wherein the superimposed watermark signal has superimposed intensity information over time and frequency, and the step of generating the superimposed watermark signal includes: providing a template watermark signal, the template floating The watermark signal has template intensity information with time and frequency; if the voice intensity information is higher than a predetermined value and the template intensity information is greater than 0, then the superimposed intensity information is set as a superimposed value, where the superimposed value is the speech intensity A ratio of information.

The voice watermark processing method according to claim 2, wherein the step of generating the superimposed watermark signal further includes: if the voice intensity information is not higher than the predetermined value and or the template intensity information is not greater than 0, then setting the superimposition The intensity information is 0.

The voice watermark processing method according to claim 2, wherein the template strength information of the template watermark signal presents a text or a pattern in a time-frequency image.

The voice watermark processing method according to claim 4, wherein the text or the pattern is related to a standard time or a place.

A voice system includes: a radio device for receiving a first voice signal in a first time interval; a voice watermark processing device including: a prediction unit for predicting a second voice signal based on the first voice signal A predicted voice signal in a time interval; an analysis unit for analyzing the voice intensity information of the predicted voice signal over time and frequency; and a watermark generating unit for generating based on the voice intensity information of the predicted voice signal A superimposed watermark signal; and a playback device, and the radio device further receives a second time interval Two voice signals. The playback device is used to play the superimposed watermark signal so that the superimposed watermark signal and the second voice signal are superimposed; wherein the analysis unit analyzes the voice intensity information of the predicted voice signal over time and frequency The prediction unit obtains the predicted voice signal through a Long Short Term Memory network (LSTM) algorithm.

The voice system according to claim 1, wherein the superimposed watermark signal has superimposed intensity information with one of time and frequency, and the watermark generating unit provides a template watermark signal, and the template watermark signal has a ratio of time and frequency A template strength information; if the voice strength information is higher than a predetermined value and the template strength information is greater than 0, the watermark generating unit sets the superimposed strength information as a superimposed value, wherein the superimposed value is one of the voice strength information Proportion value.

The voice system according to claim 7, wherein if the voice strength information is not higher than the predetermined value and or the template strength information is not greater than 0, the watermark generating unit sets the superimposed strength information to 0.

The voice system according to claim 7, wherein the template strength information of the template watermark signal presents a text or a pattern in a time-frequency image.

The speech system according to claim 9, wherein the text Or the pattern is related to a standard time or a place.

A voice watermark processing device includes: a prediction unit for receiving a first voice signal in a first time interval to predict a predicted voice signal in a second time interval; and an analysis unit for analyzing the predicted voice The signal is one of voice strength information with time and frequency; and a watermark generating unit for generating a superimposed watermark signal based on the voice strength information of the predicted voice signal, and the superimposed watermark signal is used to communicate with the second time A second voice signal is superimposed in the interval received; wherein when the analysis unit analyzes the voice intensity information of the predicted voice signal with time and frequency, the prediction unit uses a Long Short Term Memory network (LSTM) to calculate Method to obtain the predicted voice signal.