TWI661421B

TWI661421B - System and method with audio watermark

Info

Publication number: TWI661421B
Application number: TW107112560A
Authority: TW
Inventors: 呂仲理; 陳保清; 蘇亞凡; 廖宜斌
Original assignee: 中華電信股份有限公司
Priority date: 2018-04-12
Filing date: 2018-04-12
Publication date: 2019-06-01
Also published as: TW201944395A

Abstract

本發明揭露一種具音訊浮水印之系統及方法。該方法包括：依據用戶偏好設定及背景噪音樣本選擇音訊浮水印傳輸組態參數；依據音訊浮水印傳輸組態參數將動作代碼腳本轉換成內嵌符號序列；將內嵌符號序列嵌入原始音訊中以成為內嵌浮水印之音訊；將內嵌浮水印之音訊解碼出還原符號與時間序列；以及將還原符號與時間序列解譯為還原動作代碼腳本以及錄音與原始音訊起始時間之時相位移資訊。 The invention discloses a system and method with audio watermark. The method includes: selecting an audio watermark transmission configuration parameter according to a user preference setting and a background noise sample; converting the action code script into an embedded symbol sequence according to the audio watermark transmission configuration parameter; embedding the embedded symbol sequence into the original audio to Become an embedded watermark audio; decode the embedded watermark audio to restore the symbol and time series; and interpret the restored symbol and time series into a restore action code script and the phase shift information of the recording and the original audio start time .

Description

System and method with audio watermark

本發明係關於一種音訊浮水印之技術，特別是指一種具音訊浮水印之系統及方法。 The invention relates to a technology of audio watermarking, in particular to a system and method with audio watermarking.

音訊浮水印技術為一種資訊傳播的方法，傳送裝置可透過信號空間轉換及訊息碼框加嵌(encode)技術，預先將所欲傳遞之訊息嵌入至音訊中，而接收裝置可透過訊息碼框解譯(decode)技術將訊息還原。 Audio watermarking technology is a method of information dissemination. The transmitting device can embed the desired message into the audio in advance through the signal space conversion and the encoding technology of the message frame, and the receiving device can decode it through the message code frame. Decoding technology restores the message.

音訊浮水印技術主要有兩種應用目的：其一為檢測音訊檔案之完整性或來源驗證；其二為在音訊中加嵌訊息，並透過空氣通道傳送之，以供接收裝置解譯訊息。音訊浮水印技術之音訊之信號空間轉換嵌入方法包括低位元編碼(low-bit encoding)、相位編碼(phase encoding)、回聲隱藏(echo hiding)及展頻調變(spread spectrum)等方法。 Audio watermarking technology has two main application purposes: one is to detect the integrity or source verification of audio files; the other is to embed messages in the audio and send them through the air channel for the receiving device to interpret the messages. The method of spatial conversion and embedding of audio signals in audio watermarking technology includes low-bit encoding, phase encoding, echo hiding, spread spectrum and other methods.

在一現有技術中，提出一種利用時變性浮水印進行資訊完整性確認、竄改位置偵測、竄改方法判斷及受損區域還原之架構，可解決數位錄音資料在法院上判斷真偽之問題。而在另一現有技術中，提出一種浮水印載入裝置，可先計算原始音訊之音量和音高，並篩選浮水印載入目標段的音量和音高之門限值，再以音量和音高之門限值進行對比，以得出浮水印載入目標段及載入浮水印資訊。 In the prior art, a framework using time-varying watermarks for information integrity confirmation, tampering position detection, tampering method judgment, and restoration of damaged areas is proposed, which can solve the problem of digital recording data's judgment in court. In another prior art, a watermark loading device is proposed. First calculate the volume and pitch of the original audio, and filter the volume and pitch thresholds of the watermark loading target segment, and then compare the volume and pitch thresholds to obtain the watermark loading target segment and loading watermark. Information.

然而，在上述現有技術中，並無法依據不同的用戶偏好設定及背景噪音樣本選取不同的音訊浮水印傳輸組態參數，以調整內嵌浮水印之音訊之傳輸速度或精確度；或者，無法以音訊浮水印達成發送裝置與接收裝置之時戳及資訊同步。 However, in the above prior art, different audio watermark transmission configuration parameters cannot be selected according to different user preferences and background noise samples to adjust the transmission speed or accuracy of the embedded watermark audio; The audio watermark synchronizes the timestamp and information of the sending device and the receiving device.

本發明提供一種具音訊浮水印之系統及方法，其可用以調整內嵌浮水印之音訊之傳輸速度或精確度，或以音訊浮水印達成發送裝置與接收裝置之時戳及資訊同步。 The invention provides a system and method with an audio watermark, which can be used to adjust the transmission speed or accuracy of the embedded watermark audio, or to synchronize the time stamp and information of the sending device and the receiving device with the audio watermark.

本發明中具音訊浮水印之系統係包括：音訊浮水印傳輸組態選擇模組，其依據用戶偏好設定及背景噪音樣本選擇音訊浮水印傳輸組態參數；符號序列生成模組，其依據音訊浮水印傳輸組態選擇模組所選擇之音訊浮水印傳輸組態參數，將欲嵌入之動作代碼腳本轉換成內嵌符號序列；音訊浮水印內嵌模組，其將來自符號序列生成模組之內嵌符號序列嵌入原始音訊中以成為內嵌浮水印之音訊；音訊浮水印解碼模組，其將來自音訊浮水印內嵌模組之該內嵌浮水印之音訊解碼出還原符號與時間序列；以及符號序列解譯模組，其將來自音訊浮水印解碼模組之還原符號與時間序列解譯為還原動作代碼腳本以及錄音與原始音訊起始時間之時相位移資訊。 The system with an audio watermark in the present invention includes: an audio watermark transmission configuration selection module that selects audio watermark transmission configuration parameters according to user preference settings and background noise samples; a symbol sequence generation module that The audio watermark transmission configuration parameter selected by the watermark transmission configuration selection module converts the action code script to be embedded into the embedded symbol sequence; the audio watermark embedded module is from the symbol sequence generation module The embedded symbol sequence is embedded in the original audio to become the audio of the embedded watermark; the audio watermark decoding module decodes the audio of the embedded watermark from the audio watermark embedded module to restore the symbol and time series; Symbol sequence An interpretation module, which interprets the restored symbols and time series from the audio watermark decoding module into the restored action code script and the time-shift information of the recording and the original audio start time.

本發明中具音訊浮水印之方法係包括下列步驟：依據用戶偏好設定及背景噪音樣本選擇音訊浮水印傳輸組態參數；依據所選擇之音訊浮水印傳輸組態參數，將欲嵌入之動作代碼腳本轉換成內嵌符號序列；將內嵌符號序列嵌入原始音訊中以成為內嵌浮水印之音訊；將內嵌浮水印之音訊解碼出還原符號與時間序列；以及將還原符號與時間序列解譯為還原動作代碼腳本以及錄音與原始音訊起始時間之時相位移資訊。 The method with audio watermark in the present invention includes the following steps: selecting audio watermark transmission configuration parameters according to user preference settings and background noise samples; according to the selected audio watermark transmission configuration parameters, scripting action codes to be embedded Convert to embedded symbol sequence; embed the embedded symbol sequence into the original audio to become the embedded watermark audio; decode the embedded watermark audio to restore the symbol and time series; and interpret the restored symbol and time series as Restore the action code script and the phase shift information of the recording and the start time of the original audio.

為讓本發明之上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明。在以下描述內容中將部分闡述本發明之額外特徵及優點，且此等特徵及優點將部分自所述描述內容顯而易見，或可藉由對本發明之實踐習得。本發明之特徵及優點借助於在申請專利範圍中特別指出的元件及組合來認識到並達到。應理解，前文一般描述與以下詳細描述兩者均僅為例示性及解釋性的，且不欲約束本發明所主張之範圍。 In order to make the above features and advantages of the present invention more comprehensible, embodiments are described below in detail with reference to the accompanying drawings. Additional features and advantages of the present invention will be partially explained in the following description, and these features and advantages will be partially obvious from the description, or may be learned through practice of the present invention. The features and advantages of the invention are realized and achieved by means of elements and combinations specifically pointed out in the scope of the patent application. It should be understood that both the foregoing general description and the following detailed description are merely exemplary and explanatory and are not intended to limit the scope of the invention as claimed.

01‧‧‧具音訊浮水印之系統 01‧‧‧System with audio watermark

010‧‧‧用戶偏好設定 010‧‧‧User Preferences

011‧‧‧雜訊容忍度 011‧‧‧Noise tolerance

012‧‧‧信賴度 012‧‧‧Reliability

020‧‧‧背景噪音 020‧‧‧ background noise

021‧‧‧背景噪音樣本 021‧‧‧Background Noise Sample

030‧‧‧動作代碼定義表 030‧‧‧Action code definition table

040‧‧‧動作代碼腳本 040‧‧‧Action code script

042‧‧‧執行動作和原始音訊起始點之時間差 042‧‧‧ Time difference between the starting point of the action and the original audio

050‧‧‧原始音訊 050‧‧‧Original Audio

051‧‧‧文字資訊 051‧‧‧Text Information

060‧‧‧空氣通道 060‧‧‧air channel

100‧‧‧音訊浮水印傳輸組態選擇模組 100‧‧‧Audio watermark transmission configuration selection module

110‧‧‧背景噪音屬性分析子模組 110‧‧‧Background Noise Attribute Analysis Submodule

119‧‧‧背景噪音屬性 119‧‧‧Background Noise Properties

120‧‧‧音訊浮水印通訊協定決策子模組 120‧‧‧ Audio Watermarking Protocol Decision Sub-module

130‧‧‧基底位元組生成子模組 130‧‧‧ Base Byte Generation Submodule

190‧‧‧音訊浮水印傳輸組態參數 190‧‧‧Audio watermark transmission configuration parameters

191‧‧‧同步符號組定義 191‧‧‧Sync Symbol Group Definition

192‧‧‧封包符號組定義 192‧‧‧ packet symbol group definition

193‧‧‧音框長度 193‧‧‧ length of sound box

194‧‧‧錯誤更正機制 194‧‧‧Error correction mechanism

195‧‧‧符號集合定義 195‧‧‧Symbol set definition

196‧‧‧基底位元組長度 196‧‧‧basic byte length

197‧‧‧預估解碼時間 197‧‧‧Estimated decoding time

198‧‧‧符號及基底位元組對應表 198‧‧‧ Symbol and base byte correspondence table

199‧‧‧音訊取樣率 199‧‧‧Audio sampling rate

200‧‧‧符號序列生成模組 200‧‧‧Symbol sequence generation module

210‧‧‧動作代碼時軸調整子模組 210‧‧‧ Action code time axis adjustment submodule

211‧‧‧封包編號 211‧‧‧packet number

212‧‧‧封包起始點和原始音訊起始點之時間差 212‧‧‧Time difference between packet start point and original audio start point

213‧‧‧動作代碼 213‧‧‧action code

214‧‧‧執行動作和封包起始點之時間差 214‧‧‧ Time difference between the execution of the action and the start of the packet

219‧‧‧動作代碼時軸調整腳本 219‧‧‧Action code time axis adjustment script

220‧‧‧文字轉換符號組子模組 220‧‧‧Text Conversion Symbol Group Submodule

221‧‧‧封包編號 221‧‧‧packet number

222‧‧‧封包起始點和原始音訊起始點時間差之符號組 222‧‧‧Symbol set of time difference between packet start point and original audio

223‧‧‧動作代碼符號組 223‧‧‧Action code symbol group

224‧‧‧執行動作和封包起始點時間差之符號組 224‧‧‧Symbol set of time difference between execution and packet start point

229‧‧‧動作符號組腳本 229‧‧‧Action Symbol Group Script

230‧‧‧錯誤更正碼演算子模組 230‧‧‧ Error Correction Code Operator Module

231‧‧‧封包編號 231‧‧‧packet number

232‧‧‧封包起始點和原始音訊起始點時間差之符號組 232‧‧‧Symbol set of time difference between packet start point and original audio start point

233‧‧‧具錯誤更正之動作代碼符號組 233‧‧‧ Action code symbol set with error correction

234‧‧‧具錯誤更正之執行動作和封包起始點時間差之符號組 234‧‧‧Symbol set with time difference between execution action with error correction and packet start

239‧‧‧具錯誤更正之動作符號組腳本 239‧‧‧ Action Symbol Group Script with Error Correction

240‧‧‧符號序列組合子模組 240‧‧‧Symbol sequence combination submodule

290‧‧‧內嵌符號序列 290‧‧‧ Embedded Symbol Sequence

300‧‧‧音訊浮水印內嵌模組 300‧‧‧ Audio Watermark Embedded Module

310‧‧‧音框切割子模組 310‧‧‧ Sound frame cutting submodule

319‧‧‧音訊音框 319‧‧‧Audio frame

320‧‧‧訊號轉換子模組 320‧‧‧Signal Conversion Submodule

329‧‧‧音訊參數 329‧‧‧Audio parameters

330‧‧‧音訊浮水印增益值決策子模組 330‧‧‧Audio Watermark Gain Value Decision Sub-module

338‧‧‧前一音框之音訊浮水印增益值 338‧‧‧ Audio watermark gain value of previous frame

339‧‧‧音訊浮水印增益值 339‧‧‧Audio watermark gain value

340‧‧‧符號框切割子模組 340‧‧‧Symbol Frame Cutting Submodule

349‧‧‧內嵌符號框 349‧‧‧inline symbol box

350‧‧‧符號框內嵌子模組 350‧‧‧Symbol Box Embedded Module

359‧‧‧內嵌浮水印之音訊參數 359‧‧‧Embedded watermark audio parameters

360‧‧‧反訊號轉換子模組 360‧‧‧Anti-signal conversion sub-module

369‧‧‧內嵌浮水印之音訊音框 369‧‧‧Embedded watermark audio frame

370‧‧‧音框組合子模組 370‧‧‧Sound frame combination submodule

390‧‧‧內嵌浮水印之音訊 390‧‧‧Embedded Watermark Audio

391‧‧‧起始點未知之含背景噪音之內嵌浮水印音訊樣本 391‧‧‧ Embedded watermark audio samples with background noise and unknown starting point

400‧‧‧音訊浮水印解碼模組 400‧‧‧Audio Watermark Decoding Module

410‧‧‧符號起始位置搜尋子模組 410‧‧‧ symbol starting position search submodule

419‧‧‧未來最接近之符號起始位置 419‧‧‧The closest starting position of the symbol in the future

420‧‧‧音框切割子模組 420‧‧‧ Sound frame cutting submodule

428‧‧‧已解碼之終點位置 428‧‧‧ decoded end position

429‧‧‧音訊音框 429‧‧‧Audio frame

430‧‧‧訊號轉換子模組 430‧‧‧Signal conversion sub-module

439‧‧‧音訊參數 439‧‧‧Audio parameters

440‧‧‧符號解碼子模組 440‧‧‧Symbol Decoding Submodule

448‧‧‧解碼強度分數 448‧‧‧ Decoding Strength Score

449‧‧‧解碼符號 449‧‧‧ decoded symbol

450‧‧‧符號序列組合子模組 450‧‧‧Symbol sequence combination submodule

490‧‧‧還原符號與時間序列 490‧‧‧ Restored symbols and time series

500‧‧‧符號序列解譯模組 500‧‧‧Symbol sequence interpretation module

510‧‧‧符號序列切割子模組 510‧‧‧Symbol sequence cutting submodule

511‧‧‧封包起始和原始音檔起始之時間差之符號組 511‧‧‧The symbol group of the time difference between the beginning of the packet and the beginning of the original file

512‧‧‧具錯誤更正之動作代碼符號組 512‧‧‧ Action code symbol set with error correction

513‧‧‧具錯誤更正之執行動作和原始音訊起始點時間差之符號組 513‧‧‧Symbol set with time difference between execution action with error correction and original audio start point

514‧‧‧封包起始和錄音起始點之時間差 514‧‧‧Time difference between packet start and recording start

519‧‧‧具錯誤更正之還原動作符號組腳本 519‧‧‧Restoring Action Symbol Group Script with Error Correction

520‧‧‧錯誤更正子模組 520‧‧‧ Error Correction Submodule

521‧‧‧動作代碼符號組 521‧‧‧Action code symbol set

522‧‧‧執行動作和原始音訊起始點時間差之符號組 522‧‧‧Symbol set of time difference between starting action and original audio start point

529‧‧‧還原動作符號組腳本 529‧‧‧ Restore Action Symbol Group Script

530‧‧‧還原動作代碼腳本生成子模組 530‧‧‧Restore action code script generation submodule

531‧‧‧動作代碼 531‧‧‧action code

532‧‧‧執行動作時間 532‧‧‧ Execution time

580‧‧‧還原動作代碼腳本 580‧‧‧Restore Action Code Script

590‧‧‧錄音與原始音訊起始時間之時相位移資訊 590‧‧‧ Phase shift information from the start time of recording and original audio

A‧‧‧傳送裝置 A‧‧‧Transmission device

A1‧‧‧揚聲器 A1‧‧‧Speaker

B‧‧‧接收裝置 B‧‧‧ Receiving Device

B1‧‧‧麥克風 B1‧‧‧Microphone

S1至S5‧‧‧步驟 Steps S1 to S5‧‧‧‧

第1圖係繪示本發明中具音訊浮水印之系統之示意架構圖；第2圖係繪示本發明第1圖中音訊浮水印傳輸組態選擇模組之示意方塊圖；第3圖係繪示本發明第1圖中符號序列生成模組之示意方塊圖；第4圖係繪示本發明第1圖中音訊浮水印內嵌模組之示意方塊圖；第5圖係繪示本發明第1圖中音訊浮水印解碼模組之示意方塊圖；第6圖係繪示本發明第1圖中符號序列解譯模組之示意方塊圖；第7圖係繪示本發明中封包符號組定義之示意圖；第8圖係繪示本發明中符號及基底位元組對應表之示意圖；第9圖係繪示本發明中動作代碼定義表之示意圖；第10圖係繪示本發明中動作代碼腳本之示意圖；第11圖係繪示本發明第3圖中動作代碼時軸調整子模組之示意圖；第12圖係繪示本發明中動作代碼時軸調整腳本之示意圖；第13圖係繪示本發明中動作符號組腳本之示意圖；第14圖係繪示本發明中具錯誤更正之動作符號組腳本之示意圖；第15圖係繪示本發明中內嵌符號表之示意圖；第16圖係繪示本發明中內嵌符號序列之示意圖；第17圖係繪示本發明中還原符號與時間序列之示意圖；第18圖係繪示本發明中同步符號組與負載符號組之資料表之示意圖；第19圖係繪示本發明中具錯誤更正之還原動作符號組腳本之示意圖；第20圖係繪示本發明中還原動作符號組腳本之示意圖；第21圖係繪示本發明中錯誤更正方式之示意圖；第22圖係繪示本發明中還原動作代碼及時間差序列之示意圖；第23圖係繪示本發明中還原動作代碼腳本之示意圖；第24圖係繪示本發明中錄音與原始音訊起始時間之時相位移資訊之示意圖；第25圖係繪示本發明中具音訊浮水印之方法之示意流程圖；以及第26圖係繪示以本發明具音訊浮水印之系統及方法達成傳送裝置與接收裝置之時脈及資訊同步之示意圖。 FIG. 1 is a schematic block diagram of a system with an audio watermark in the present invention; FIG. 2 is a schematic block diagram of an audio watermark transmission configuration selection module in FIG. Fig. 3 is a schematic block diagram of a symbol sequence generating module in Fig. 1 of the present invention; Fig. 4 is a schematic block diagram of an audio watermark embedded module in Fig. 1 of the present invention; FIG. 6 is a schematic block diagram of an audio watermark decoding module in FIG. 1 of the present invention; FIG. 6 is a schematic block diagram of a symbol sequence interpretation module in FIG. 1 of the present invention; FIG. 7 is a view illustrating the present invention The schematic diagram of the definition of the medium packet symbol group; FIG. 8 is a schematic diagram showing the correspondence table of symbols and base bytes in the present invention; FIG. 9 is a schematic diagram showing the action code definition table in the present invention; and FIG. 10 is a diagram showing The schematic diagram of the action code script in the present invention; FIG. 11 is a schematic diagram showing the action code time axis adjustment sub-module in the third diagram of the present invention; FIG. 12 is the schematic diagram of the action code time axis adjustment script in the present invention; FIG. 13 is a schematic diagram of an action symbol group script in the present invention; FIG. 14 is a schematic diagram of an action symbol group script with error correction in the present invention; FIG. 15 is a schematic diagram of an embedded symbol table in the present invention. Schematic diagram; FIG. 16 shows the embedded symbols in the present invention A schematic diagram of a column; Fig. 17 illustrates a system schematic of the present invention, reduction of time series symbols; FIG. 18 is a schematic diagram showing a data table of a synchronization symbol group and a load symbol group in the present invention; FIG. 19 is a schematic diagram showing a script of a restoration action symbol group with error correction in the present invention; Schematic diagram of the script for restoring action symbols in the invention; Fig. 21 is a diagram showing the error correction method in the present invention; Fig. 22 is a diagram showing the restoration action code and time difference sequence in the present invention; and Fig. 23 is a drawing Schematic diagram of restoring action code script in the invention; Figure 24 is a schematic diagram showing the phase shift information of the recording and original audio start time in the invention; Figure 25 is a schematic diagram showing the method with audio watermark in the invention The flowchart and FIG. 26 are schematic diagrams showing the synchronization of the clock and information of the transmitting device and the receiving device by the system and method with audio watermark of the present invention.

以下藉由特定的具體實施形態說明本發明之實施方式，熟悉此技術之人士可由本說明書所揭示之內容輕易地了解本發明之其他優點與功效，亦可藉由其他不同的具體實施形態加以施行或應用。 The following describes the embodiments of the present invention with specific specific implementation forms. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this description, and can also be implemented by other different specific implementation forms. Or apply.

第1圖係繪示本發明中具音訊浮水印之系統01之示意架構圖。如圖所示，具音訊浮水印之系統01可由五個主要模組所構成，分別為音訊浮水印傳輸組態選擇模組100、符號序列生成模組200、音訊浮水印內嵌模組300、音訊浮水印解碼模組400及符號序列解譯模組500。 FIG. 1 is a schematic architecture diagram of an audio watermarking system 01 in the present invention. As shown in the figure, the system 01 with audio watermark can be composed of five main modules, which are the audio watermark transmission configuration selection module 100 and the symbol. The number sequence generating module 200, the audio watermark embedded module 300, the audio watermark decoding module 400, and the symbol sequence interpretation module 500.

音訊浮水印傳輸組態選擇模組100可選擇或決定整體音訊浮水印之傳輸協定及組態參數，並以用戶偏好設定010及背景噪音樣本021作為音訊浮水印傳輸組態選擇模組100之輸入，再據此輸出為音訊浮水印傳輸組態參數190，以供接續模組(如符號序列生成模組200、音訊浮水印內嵌模組300、音訊浮水印解碼模組400或符號序列解譯模組500)嵌入及解碼資訊。 The audio watermark transmission configuration selection module 100 can select or determine the transmission protocol and configuration parameters of the overall audio watermark transmission. The user preference setting 010 and the background noise sample 021 are used as the input of the audio watermark transmission configuration selection module 100. , And then output as the audio watermark transmission configuration parameter 190 for the subsequent modules (such as the symbol sequence generation module 200, the audio watermark embedded module 300, the audio watermark decoding module 400, or the symbol sequence interpretation Module 500) embed and decode information.

符號序列生成模組200可依據音訊浮水印傳輸組態參數190所指定之傳輸組態，將用戶欲嵌入之動作代碼腳本040轉換成內嵌符號序列290。 The symbol sequence generating module 200 can convert the action code script 040 that the user wants to embed into the embedded symbol sequence 290 according to the transmission configuration specified by the audio watermark transmission configuration parameter 190.

音訊浮水印內嵌模組300可依據音訊浮水印傳輸組態參數190之音框長度193(見第2圖)所指定之傳輸組態，將內嵌符號序列290嵌入原始音訊050中以成為內嵌浮水印之音訊390。 The audio watermark embedding module 300 may embed the embedded symbol sequence 290 in the original audio 050 according to the transmission configuration specified by the sound frame length 193 (see FIG. 2) of the audio watermark transmission configuration parameter 190. Audio 390 with embedded watermark.

音訊浮水印解碼模組400可依據音訊浮水印傳輸組態參數190所指定之傳輸組態，隨機起始錄製通過空氣通道060並外加背景噪音020之內嵌浮水印之音訊390，亦即自「起始點未知之含背景噪音之內嵌浮水印音訊樣本391」解碼出音訊內含之「還原符號與時間序列490」。 The audio watermark decoding module 400 can randomly record the audio 390 embedded in the watermark through the air channel 060 and the background noise 020 according to the transmission configuration specified by the audio watermark transmission configuration parameter 190. The "embedded watermark audio sample 391 with background noise and unknown starting point" decodes the "restoration symbol and time series 490" contained in the audio.

符號序列解譯模組500可依據音訊浮水印傳輸組態參數190所指定之傳輸組態，將「還原符號與時間序列490」解譯為「還原動作代碼腳本580」以及「錄音與原始音訊起始時間之時相位移資訊590」。 The symbol sequence interpretation module 500 can interpret the "restored symbols and time series 490" into "restored action code script 580" and "recording and original audio from the transmission configuration specified by the audio watermark transmission configuration parameter 190" Phase shift information at the beginning of time 590 ".

第2圖係繪示本發明第1圖中音訊浮水印傳輸組態選擇模組100之示意方塊圖。如圖所示，音訊浮水印傳輸組態選擇模組100具有背景噪音屬性分析子模組110、音訊浮水印通訊協定決策子模組120與基底位元組生成子模組130，且音訊浮水印傳輸組態選擇模組100係以用戶偏好設定010及背景噪音樣本021作為其輸入。用戶偏好設定010包含雜訊容忍度011及信賴度012，雜訊容忍度011為用戶指定對嵌入資訊所造成雜訊之容忍度，信賴度012為用戶指定對傳輸信賴度要求之描述。背景噪音樣本021為用戶依據未來應用環境之背景噪音020模擬或實際錄製出之音訊樣本。 FIG. 2 is a schematic block diagram of the audio watermark transmission configuration selection module 100 in FIG. 1 of the present invention. As shown in the figure, the audio watermark transmission configuration selection module 100 has a background noise attribute analysis submodule 110, an audio watermark communication protocol decision submodule 120, and a base byte generation submodule 130, and the audio watermark The transmission configuration selection module 100 takes user preference settings 010 and background noise samples 021 as its inputs. The user preference setting 010 includes a noise tolerance 011 and a reliability 012. The noise tolerance 011 is a user-specified tolerance for noise caused by embedded information, and the reliability 012 is a user-specified description of a transmission reliability requirement. The background noise sample 021 is an audio sample simulated or actually recorded by the user based on the background noise 020 of the future application environment.

背景噪音屬性分析子模組110可透過音頻轉換演算法自背景噪音樣本021中分析出背景噪音屬性119，且背景噪音屬性119包含背景噪音樣本021之音量屬性與動態範圍屬性。 The background noise attribute analysis sub-module 110 can analyze the background noise attribute 119 from the background noise sample 021 through an audio conversion algorithm, and the background noise attribute 119 includes the volume attribute and the dynamic range attribute of the background noise sample 021.

音訊浮水印通訊協定決策子模組120可依據背景噪音屬性119及用戶偏好設定010決定出音訊浮水印傳輸組態參數190，且音訊浮水印傳輸組態參數190包含同步符號組定義191、封包符號組定義192、音框長度193、錯誤更正機制194、符號集合定義195、基底位元組長度196、預估解碼時間197及音訊取樣率199。 The audio watermark communication protocol decision sub-module 120 may determine the audio watermark transmission configuration parameter 190 according to the background noise attribute 119 and the user preference setting 010, and the audio watermark transmission configuration parameter 190 includes a synchronization symbol group definition 191, a packet symbol Group definition 192, sound frame length 193, error correction mechanism 194, symbol set definition 195, base byte length 196, estimated decoding time 197, and audio sampling rate 199.

基底位元組生成子模組130可依據符號集合定義195中指定之符號集合個數及基底位元組長度196產生指定組數之基底位元組，以將指定組數之基底位元組記錄在音訊浮水印傳輸組態參數190所指定之「符號及基底位元組對應表198」。 The base byte generation sub-module 130 may generate a base byte of the specified number according to the number of the symbol set specified in the symbol set definition 195 and the base byte length 196 to record the base byte of the specified number of bytes. Floating in audio "Symbol and base byte correspondence table 198" specified by watermark transmission configuration parameter 190.

第3圖係繪示本發明第1圖中符號序列生成模組200之示意方塊圖。如圖所示，符號序列生成模組200具有動作代碼時軸調整子模組210、文字轉換符號組子模組220、錯誤更正碼演算子模組230與符號序列組合子模組240，且符號序列生成模組200係以音訊浮水印傳輸組態參數190及動作代碼腳本040作為輸入而產生出內嵌符號序列290，以依據內嵌符號序列290決定次階段內嵌入音訊之符號序列。 FIG. 3 is a schematic block diagram of the symbol sequence generating module 200 in FIG. 1 of the present invention. As shown in the figure, the symbol sequence generation module 200 has an action code time axis adjustment sub-module 210, a text conversion symbol group sub-module 220, an error correction code calculation sub-module 230, and a symbol sequence combination sub-module 240. The sequence generating module 200 generates the embedded symbol sequence 290 by using the audio watermark transmission configuration parameter 190 and the action code script 040 as inputs, so as to determine the symbol sequence of embedded audio in the next stage according to the embedded symbol sequence 290.

用戶須提供動作代碼定義表030及動作代碼腳本040，以指定需嵌入音訊之內容。動作代碼定義表030包含動作代碼及動作內容描述(見第9圖)兩者，且兩者搭配還原動作代碼腳本580(見第23圖)即可在應用端進行指定之動作。動作代碼腳本040為一文件，且動作代碼腳本040之內容描述一或多組動作代碼腳本資訊，每組動作代碼腳本資訊包含動作代碼及執行動作距原始音訊開頭之時間差(見第10圖)。 The user must provide an action code definition table 030 and an action code script 040 to specify the content to be embedded in the audio. The action code definition table 030 contains both an action code and a description of the action content (see FIG. 9), and the two can be used to restore the action code script 580 (see FIG. 23) to perform the specified action on the application side. The action code script 040 is a file, and the content of the action code script 040 describes one or more sets of action code script information. Each set of the action code script information includes an action code and a time difference between execution of the action and the beginning of the original audio (see FIG. 10).

動作代碼時軸調整子模組210可依據音訊浮水印傳輸組態參數190之封包符號組定義192、音框長度193及預估解碼時間197決定實際內嵌資訊封包之位置，並生成動作代碼時軸調整腳本219。動作代碼時軸調整腳本219之內容包含「封包編號211」、「封包起始點和原始音訊起始點之時間差212」、「動作代碼213」、及「執行動作和封包起始點之時間差214」。而且，動作代碼時軸調整子模組210可依據封包符號組定義192、音訊取樣率199及音框長度193計算出封包時間長度，並將音訊從起始處依封包時間長度切成封包框，再將封包框依序編列封包流水號。同時，動作代碼時軸調整子模組210可將動作代碼腳本040指定之執行動作時間減去預估解碼時間197及封包時間長度以成為封包最遲起始點，並取得封包框起始位置小於封包最遲起始點之最近封包作為內嵌封包，再記錄內嵌封包之封包編號211。 The action code time axis adjustment sub-module 210 can determine the position of the actual embedded information packet according to the packet symbol group definition 192, the sound frame length 193, and the estimated decoding time 197 of the audio watermark transmission configuration parameter 190. Axis adjustment script 219. The content of the action code time axis adjustment script 219 includes "packet number 211", "time difference between the start point of the packet and the original audio start point 212", "action code 213", and "the time difference between the execution of the action and the start point of the packet 214 ". In addition, the action code time axis adjustment sub-module 210 can be calculated according to the packet symbol group definition 192, the audio sampling rate 199, and the sound frame length 193. Out the packet time length, cut the audio from the beginning to the packet frame according to the packet time length, and then arrange the packet frame number in sequence. At the same time, the action code time axis adjustment sub-module 210 can reduce the execution time specified by the action code script 040 minus the estimated decoding time 197 and the packet time length to become the latest start point of the packet, and obtain the start position of the packet frame less than The latest packet at the latest starting point of the packet is regarded as the embedded packet, and the packet number 211 of the embedded packet is recorded.

文字轉換符號組子模組220可依據音訊浮水印傳輸組態參數190之符號集合定義195將動作代碼時軸調整腳本219之文字轉換為符號組序列。而且，文字轉換符號組子模組220可將動作代碼時軸調整腳本219之「封包編號211」、「封包起始點和原始音訊起始點之時間差212」、「動作代碼213」及「執行動作和封包起始點之時間差214」，分別轉換為動作符號組腳本229中之「封包編號221」、「封包起始點和原始音訊起始點時間差之符號組222」、「動作代碼符號組223」及「執行動作和封包起始點時間差之符號組224」。 The text conversion symbol group sub-module 220 can convert the text of the action code time axis adjustment script 219 into a symbol group sequence according to the symbol set definition 195 of the audio watermark transmission configuration parameter 190. In addition, the text conversion symbol group submodule 220 can set the "packet number 211" of the action code time axis adjustment script 219, "the time difference 212 between the start point of the packet and the original audio start point", "action code 213", and "execute The time difference 214 between the action and the start point of the packet is converted into "packet number 221", "the symbol set 222 of the time difference between the start point of the packet and the original audio start point" in the action symbol set script 229, and the "action code symbol set 223 "and" Symbol set 224 of time difference between execution action and packet start point ".

錯誤更正碼演算子模組230可依據錯誤更正機制194將動作符號組腳本229中之「動作代碼符號組223」及「執行動作和封包起始點時間差之符號組224」進行錯誤更正碼之運算，以提升傳送內容之可信度。經錯誤更正碼演算子模組230之運算後，可產生具錯誤更正之動作符號組腳本239，且具錯誤更正之動作符號組腳本239之內容包含「封包編號231」、「封包起始點和原始音訊起始點時間差之符號組232」、「具錯誤更正之動作代碼符號組233」及「具錯誤更正之執行動作和封包起始點時間差之符號組234」。 The error correction code operator module 230 may perform the operation of the error correction code according to the error correction mechanism 194 on the "action code symbol group 223" in the action symbol group script 229 and "the symbol group 224 that performs the time difference between the action and the start point of the packet" To increase the credibility of the content being delivered. After the operation of the error correction code calculation module module 230, an action symbol group script 239 with error correction can be generated, and the content of the action symbol group script 239 with error correction includes "packet number 231", "packet start point and Symbol set 232 of the time difference of the original audio starting point ", "Action code symbol group 233 with error correction" and "Symbol group 234 with time difference between execution action and packet start point with error correction".

符號序列組合子模組240可將具錯誤更正之動作符號組腳本239依據音訊浮水印傳輸組態參數190之符號集合定義195、封包符號組定義192及同步符號組定義191組合成內嵌符號序列290。內嵌符號序列290為一由符號集合定義195內定義之符號所排成之序列，此序列由封包接續排列而成，封包內之符號組以封包符號組定義192排列，且封包符號組定義192包含同步符號組定義191之同步符號組，以供解碼端(或接收裝置)進行符號序列之解碼。 The symbol sequence combination sub-module 240 can combine the action symbol group script 239 with error correction according to the audio watermark transmission configuration parameter 190, the symbol set definition 195, the packet symbol group definition 192, and the synchronization symbol group definition 191 into an embedded symbol sequence. 290. The embedded symbol sequence 290 is a sequence formed by the symbols defined in the symbol set definition 195. This sequence is successively arranged by the packet. The symbol groups in the packet are arranged by the packet symbol group definition 192, and the packet symbol group definition 192 A synchronization symbol group including a synchronization symbol group definition 191 is provided for a decoding end (or a receiving device) to decode a symbol sequence.

第4圖係繪示本發明第1圖中音訊浮水印內嵌模組300之示意方塊圖。如圖所示，音訊浮水印內嵌模組300具有音框切割子模組310、訊號轉換子模組320、音訊浮水印增益值決策子模組330、符號框切割子模組340、符號框內嵌子模組350、反訊號轉換子模組360與音框組合子模組370，且音訊浮水印內嵌模組300以音訊浮水印傳輸組態參數190所指定之「音框長度193」與「符號及基底位元組對應表198」，將內嵌符號序列290嵌入原始音訊050中以成為內嵌浮水印之音訊390。 FIG. 4 is a schematic block diagram of the audio watermark embedded module 300 in the first image of the present invention. As shown in the figure, the audio watermark embedded module 300 has a sound frame cutting submodule 310, a signal conversion submodule 320, an audio watermark gain value decision submodule 330, a symbol frame cutting submodule 340, and a symbol frame. Embedded sub-module 350, anti-signal conversion sub-module 360 and sound frame combination sub-module 370, and the audio watermark embedded module 300 transmits the "sound frame length 193" specified by the audio watermark transmission configuration parameter 190 Corresponding to the "Symbol and Base Byte Mapping Table 198", the embedded symbol sequence 290 is embedded in the original audio 050 to become the embedded audio 390 watermark.

音框切割子模組310可將原始音訊050依據音框長度193依序切成音訊音框319，以由訊號轉換子模組320對所有切出之音訊音框319進行訊號處理。訊號轉換子模組320可將音訊音框319進行指定之訊號轉換以得到音訊參數329。音訊浮水印增益值決策子模組330可依據音訊參數329計算平均音量，並參考前一音框之音訊浮水印增益值338決定出欲嵌入目前音框之音訊浮水印增益值339。 The sound frame cutting sub-module 310 may sequentially cut the original audio 050 into the audio sound frame 319 according to the sound frame length 193, so that the signal conversion sub-module 320 performs signal processing on all the cut out audio sound frames 319. The signal conversion sub-module 320 may convert the specified signal of the audio frame 319 to obtain the audio parameter 329. Audio watermark gain value decision sub-module 330 can be calculated according to audio parameter 329 The average volume, and referring to the audio watermark gain value 338 of the previous frame, determines the audio watermark gain value 339 of the current frame to be embedded.

符號框切割子模組340可依序自內嵌符號序列290中取出內嵌符號框349，以指定嵌入目前音框之符號。符號框內嵌子模組350可依據「符號及基底位元組對應表198」與「音訊浮水印增益值339」，將內嵌符號框349內之符號以對應之基底位元組藏入音訊參數329而獲得內嵌浮水印之音訊參數359。反訊號轉換子模組360可將內嵌浮水印之音訊參數359經過反訊號轉換，以將音訊參數轉換回音框成為內嵌浮水印之音訊音框369。音框組合子模組370可將內嵌浮水印之音訊音框369依序組合，以還原成內嵌浮水印之音訊390。 The symbol frame cutting sub-module 340 may sequentially extract the embedded symbol frame 349 from the embedded symbol sequence 290 to specify the symbol to be embedded in the current sound frame. The sub-module 350 embedded in the symbol frame can hide the symbols in the embedded symbol frame 349 into the audio according to the corresponding symbol and base byte correspondence table 198 and the audio watermark gain value 339. Parameter 329 to obtain the audio parameter 359 of the embedded watermark. The anti-signal conversion sub-module 360 can convert the audio parameter 359 of the embedded watermark through the inverse signal conversion to convert the audio parameter back to the sound box to form the audio sound box 369 of the embedded watermark. The sound frame combination sub-module 370 may sequentially combine the audio sound frames 369 with embedded watermarks to restore the audio watermark 390 with embedded watermarks.

第5圖係繪示本發明第1圖中音訊浮水印解碼模組400之示意方塊圖。如圖所示，音訊浮水印解碼模組400具有符號起始位置搜尋子模組410、音框切割子模組420、訊號轉換子模組430、符號解碼子模組440與符號序列組合子模組450，且音訊浮水印解碼模組400係以音訊浮水印傳輸組態參數190所指定之「音框長度193」與「符號及基底位元組對應表198」對「起始點未知之含背景噪音之內嵌浮水印音訊樣本391」進行音訊浮水印之解碼，最後解碼出還原符號與時間序列490。 FIG. 5 is a schematic block diagram of the audio watermark decoding module 400 in FIG. 1 of the present invention. As shown in the figure, the audio watermark decoding module 400 has a symbol starting position search submodule 410, a sound frame cutting submodule 420, a signal conversion submodule 430, a symbol decoding submodule 440, and a symbol sequence combination submodule. Group 450, and the audio watermark decoding module 400 uses the "sound frame length 193" and "symbol and base byte correspondence table 198" specified by the audio watermark transmission configuration parameter 190 to contain "unknown starting point" The embedded watermark audio sample 391 "of the background noise is used to decode the audio watermark, and finally the restored symbol and time series 490 are decoded.

符號起始位置搜尋子模組410可從「起始點未知之含背景噪音之內嵌浮水印音訊樣本391」之「已解碼之終點位置428」，並依據「音框長度193」與「符號及基底位元組對應表198」搜尋「未來最接近之符號起始位置419」。在初始狀態時，已解碼之終點位置428即為「起始點未知之含背景噪音之內嵌浮水印音訊樣本391」之起點。而在解碼強度分數448高於門檻值時，未來最接近之符號起始位置419可直接依據前次之未來最接近之符號起始位置419加上音框長度193即可，藉此節省運算之時間。 The symbol starting position search sub-module 410 can select the "decoded end position 428" from "embedded watermark audio sample 391 with unknown background and background noise", and based on "sound frame length 193" and "symbol Basal byte pairs "Table 198" is searched for "future closest symbol starting position 419". In the initial state, the decoded end position 428 is the starting point of the "embedded watermark audio sample 391 with background noise and unknown starting point". When the decoding intensity score 448 is higher than the threshold value, the closest starting position of the symbol 419 in the future can be directly based on the previous closest starting position of the symbol 419 plus the sound box length 193, thereby saving the calculation time. time.

音框切割子模組420可從未來最接近之符號起始位置419切取一長度為音框長度193之音訊音框429，進而將已解碼之終點位置428更新為未來最接近之符號起始位置419加上音框長度193之數字。 The sound frame cutting sub-module 420 can cut an audio frame 429 with a length of 193 from the closest starting position of the symbol in the future, and then update the decoded end position 428 to the closest starting position of the symbol in the future. 419 plus the number of sound box length 193.

訊號轉換子模組430可將音訊音框429以指定之訊號轉換方式轉換成音訊參數439。符號解碼子模組440可將音訊參數439以「符號及基底位元組對應表198」之基底位元組逐一進行距離計算，以得到作為解碼強度分數448之距離，而距離最短所對應之符號即為解碼符號449。 The signal conversion sub-module 430 may convert the audio frame 429 into an audio parameter 439 in a specified signal conversion method. The symbol decoding sub-module 440 may calculate the distance of the audio parameter 439 by using the base byte of the "symbol and base byte correspondence table 198" one by one to obtain the distance as the decoding intensity score 448, and the symbol corresponding to the shortest distance That is, the decoded symbol 449.

符號序列組合子模組450可暫存符號解碼子模組440所計算之所有解碼符號449、及以未來最接近之符號起始位置419推測解碼符號449距錄音起始點之時間差，並將前述解碼符號449與時間差兩項資訊串接為「還原符號與時間序列490」。 The symbol sequence combination sub-module 450 may temporarily store all the decoded symbols 449 calculated by the symbol decoding sub-module 440, and infer the time difference between the decoded symbol 449 and the recording start point from the closest starting position of the symbol 419 in the future. The two pieces of information of the decoded symbol 449 and the time difference are concatenated as "restored symbol and time series 490".

第6圖係繪示本發明第1圖中符號序列解譯模組500之示意方塊圖。如圖所示，符號序列解譯模組500具有符號序列切割子模組510、錯誤更正子模組520與還原動作代碼腳本生成子模組530，且符號序列解譯模組500係以音訊浮水印傳輸組態參數190所指定之同步符號組定義191、封包符號組定義192、錯誤更正機制194及符號集合定義195，對還原符號與時間序列490進行符號序列解譯，以得到「還原動作代碼腳本580」及「錄音與原始音訊起始時間之時相位移資訊590」。 FIG. 6 is a schematic block diagram of the symbol sequence interpretation module 500 in FIG. 1 of the present invention. As shown in the figure, the symbol sequence interpretation module 500 includes a symbol sequence cutting sub-module 510, an error correction sub-module 520, and a restoration action code script generation sub-module 530. The symbol sequence interpretation module 500 is based on audio floating water. Print the synchronization symbol group definition 191, the packet symbol group definition 192, the error correction mechanism 194, and the symbol set definition 195 specified by the transmission configuration parameter 190, and perform symbol sequence interpretation on the restored symbols and time series 490 to obtain the "restore action code Script 580 "and" Phase shift information 590 of the start time of recording and original audio ".

符號序列切割子模組510可依據同步符號組定義191將還原符號與時間序列490進行同步符號組比對，並依據封包符號組定義192解譯出「具錯誤更正之還原動作符號組腳本519」，且具錯誤更正之還原動作符號組腳本519包含「封包起始和原始音檔起始之時間差之符號組511」、「具錯誤更正之動作代碼符號組512」及「具錯誤更正之執行動作和原始音訊起始點時間差之符號組513」。符號序列切割子模組510亦可依據封包之第一個符號在還原符號與時間序列490中記錄和錄音起始點之時間差，並記錄為「封包起始和錄音起始點之時間差514」。 The symbol sequence cutting sub-module 510 can compare the synchronization symbol group with the time series 490 according to the synchronization symbol group definition 191, and interpret the "recovery action symbol group script 519 with error correction" according to the packet symbol group definition 192. , And a set of recovery action symbol set script 519 with error correction includes "symbol set 511 of the time difference between the start of the packet and the original audio file", "action code symbol set 512 with error correction", and "executive action with error correction" Symbol set 513 ″ from the time difference from the original audio start point. The symbol sequence cutting submodule 510 may also record the time difference between the recorded symbol and the recording start point in the time series 490 according to the first symbol of the packet, and record it as "the time difference 514 between the start of the packet and the recording start point".

錯誤更正子模組520可依據錯誤更正機制194之指定方式，將具錯誤更正之還原動作符號組腳本519中之「具錯誤更正之動作代碼符號組512」及「具錯誤更正之執行動作和原始音訊起始點時間差之符號組513」分別還原為「動作代碼符號組521」及「執行動作和原始音訊起始點時間差之符號組522」，再加上「封包起始和原始音檔起始之時間差之符號組511」及「封包起始和錄音起始點之時間差514」，以成為還原動作符號組腳本529。 The error correction sub-module 520 may, according to the designation method of the error correction mechanism 194, restore the "action code symbol group 512 with error correction" and the "execution action with error correction and original" The symbol group 513 of the time difference of the audio start point is restored to the "action code symbol group 521" and "the symbol group 522 of the time difference between the action and the original audio start point", plus the "packet start and original audio file start The time difference symbol group 511 "and" the time difference between the start of the packet and the recording start point 514 "become the recovery action symbol group script 529.

還原動作代碼腳本生成子模組530可依據符號集合定義195將還原動作符號組腳本529中之符號組解讀回文字，並生成「還原動作代碼腳本580」及「錄音與原始音訊起始時間之時相位移資訊590」。還原動作代碼腳本580可包含動作代碼531及執行動作時間532。錄音與原始音訊起始時間之時相位移資訊590可由「封包起始和原始音檔起始之時間差之符號組511」及「封包起始和錄音起始點之時間差514」之差距中位數推測可得。 Restore action code script generation submodule 530 can be determined based on the symbol set Yi Yi 195 interprets the symbol group in the restoration action symbol group script 529 back to the text, and generates "restore action code script 580" and "phase shift information 590 of the recording and original audio start time". The restoration action code script 580 may include an action code 531 and an execution action time 532. The time shift information 590 between the recording and the original audio start time can be divided by the median of "the symbol set 511 of the time difference between the start of the packet and the original audio file" and "the time difference 514 of the start of the packet and the start of the recording" Presumably available.

針對上述第2圖至第6圖中各個模組之較佳實施例，茲進一步搭配第7圖至第24圖舉例說明如下。須說明者，以下僅供舉例說明，本發明不以此為限。 Aiming at the above-mentioned preferred embodiments of each module in FIGS. 2 to 6, it is further illustrated with reference to FIGS. 7 to 24 as follows. It should be noted that the following is for illustration only, and the present invention is not limited thereto.

如第2圖所示，音訊浮水印傳輸組態選擇模組100係以用戶偏好設定010及背景噪音樣本021作為其輸入。用戶偏好設定010包含雜訊容忍度011及信賴度012，且雜訊容忍度011可為用戶指定對嵌入資訊所造成雜訊之容忍度。例如，對嵌入資訊所造成雜訊之雜訊容忍度011(如Tlr)可設為3(高)、2(中)、1(低)，並將用戶可選擇之信賴度012(如Conf)設為3(較安全)、2(中庸)、1(較快速)。背景噪音樣本021為用戶依據未來應用之環境之背景噪音020模擬或實際錄製出之音訊樣本。 As shown in FIG. 2, the audio watermark transmission configuration selection module 100 takes the user preference setting 010 and the background noise sample 021 as its inputs. The user preference setting 010 includes a noise tolerance 011 and a reliability 012, and the noise tolerance 011 can specify a tolerance for noise caused by embedded information for a user. For example, the noise tolerance 011 (such as Tlr) for noise caused by embedded information can be set to 3 (high), 2 (medium), 1 (low), and the user-selectable reliability 012 (such as Conf) Set to 3 (safer), 2 (moderate), 1 (faster). The background noise sample 021 is an audio sample simulated or actually recorded by the user based on the background noise 020 of the future application environment.

第2圖之背景噪音屬性分析子模組110可透過計算嵌入音訊與背景噪音樣本021之平均音量差，將背景噪音屬性119之音量屬性(如V_noise)設為3(強)、2(中)、1(弱)。同時，背景噪音屬性分析子模組110可將背景噪音樣本021線性縮放至0~1區間，並透過計算背景噪音樣本021之音量變動之標準差，且將背景噪音屬性119之動態範圍屬性(如DR_noise)設為3(大)、2(中)、1(小)。 The background noise attribute analysis sub-module 110 in FIG. 2 can calculate the average volume difference between the embedded audio and the background noise sample 021, and set the volume attribute (such as V _noise ) of the background noise attribute 119 to 3 (strong), 2 (medium) ), 1 (weak). At the same time, the background noise attribute analysis sub-module 110 can linearly scale the background noise sample 021 to the interval of 0 ~ 1, and calculate the standard deviation of the volume variation of the background noise sample 021, and the dynamic range attribute of the background noise attribute 119 (such as DR _noise is set to 3 (large), 2 (medium), and 1 (small).

第2圖之音訊浮水印通訊協定決策子模組120可依據背景噪音屬性119及用戶偏好設定010決定出音訊浮水印傳輸組態參數190。例如，[1]將音框長度193訂為L_window=1024(此數字可依據要求的信賴度進行調整)；[2]將基底位元組長度196訂為L_c=512(此數字最大不得超過音框長度)；[3]將錯誤更正機制194定義為每六組重複資訊投票機制(此機制可依據要求的信賴度進行調整)；[4]將預估解碼時間197訂為D_decode=3秒，L_decode=D_decode*SR=132300樣本(此數字須經由經驗或實際實驗取得)；[5]將音訊取樣率199訂為SR=44100樣本/秒；[6]依據符號集合定義195定義符號個數N_S=6，可使用的符號集合為S={S₁=‘1’,S₂=‘2’,S₃=‘3’,S₄=‘4’,S₅=‘a’,S₆=‘b’}，可使用的同步符號組之符號集合為S_sync={S₅=‘a’,S₆=‘b’}，可使用的負載符號組之符號集合為S_payload={S₁=‘0’,S₂=‘1’,S₃=‘2’,S₄=‘3’}；[7]依據同步符號組定義191定義同步符號組SS_SW={SS_SW1=S₅S₅S₆S₆,SS_SW2=S₅S₆S₅S₆,SS_SW3=S₆S₆S₅S₅}。 The audio watermark communication protocol decision sub-module 120 in FIG. 2 can determine the audio watermark transmission configuration parameter 190 according to the background noise attribute 119 and the user preference setting 010. For example, [1] set the sound frame length 193 as L _window = 1024 (this number can be adjusted according to the required reliability); [2] set the base byte length 196 as L _c = 512 (this number must not be greater than Exceeds the length of the sound box); [3] defines the error correction mechanism 194 as a voting mechanism for every six sets of repeated information (this mechanism can be adjusted according to the required reliability); [4] sets the estimated decoding time 197 as D _decode = 3 seconds, L _decode = D _decode * SR = 132300 samples (this number must be obtained through experience or actual experiments); [5] set the audio sampling rate to 199 to SR = 44100 samples / second; [6] to define 195 according to the symbol set Define the number of symbols N _S = 6, and the set of symbols that can be used is S = {S ₁ = '1', S ₂ = '2', S ₃ = '3', S ₄ = '4', S ₅ = ' a ', S ₆ =' b '}, the symbol set of the available synchronization symbol group is S _sync = {S ₅ =' a ', S ₆ =' b '}, the symbol set of the usable load symbol group is S _payload = {S ₁ = '0', S ₂ = '1', S ₃ = '2', S ₄ = '3'}; [7] Define the synchronization symbol group SS _SW = {according to the definition of synchronization symbol group 191 SS _SW1 = S ₅ S ₅ S ₆ S ₆ , SS _SW2 = S ₅ S ₆ S ₅ S ₆ , SS _SW3 = S ₆ S ₆ S ₅ S ₅ }.

第7圖係繪示本發明中封包符號組定義192之示意圖。如第2圖與第7圖所示，音訊浮水印通訊協定決策子模組120能以封包符號組定義192定義出同步符號組、負載符號組及符號組間之組合方式。本實施例中，一個封包分為六個部分，依序包含第一同步符號組(如SS_SW1，由4個符號組成)、封包起始點和原始音訊起始點時間差之符號組232(如 SS_start，由8個符號組成)、第二同步符號組(如SS_SW2，由4個符號組成)、具錯誤更正之動作代碼符號組233(如ECSSIX_action，由24個符號組成)、第三同步符號組(如SS_SW3，由4個符號組成)、及具錯誤更正之執行動作和封包起始點時間差之符號組234(如ECSS_offset，由36個符號組成)。一個封包之符號長度SSL_packet=80個符號長度。 FIG. 7 is a schematic diagram illustrating a packet symbol group definition 192 in the present invention. As shown in FIG. 2 and FIG. 7, the audio watermark communication protocol decision sub-module 120 can use the packet symbol group definition 192 to define a synchronization symbol group, a load symbol group, and a combination manner between the symbol groups. In this embodiment, a packet is divided into six parts, which sequentially include a first synchronization symbol group (such as SS _SW1 , consisting of 4 symbols), a symbol group 232 (such as the time difference between the packet start point and the original audio start point) (such as SS _start , consisting of 8 symbols), second synchronization symbol group (such as SS _SW2 , consisting of 4 symbols), action code symbol group with error correction 233 (such as ECSSIX _action , consisting of 24 symbols), third Synchronous symbol group (such as SS _SW3 , consisting of 4 symbols), and symbol group 234 (such as ECSS _offset , consisting of 36 symbols) with the time difference between the execution action of the error correction and the packet start point. The symbol length of a packet is SSL _packet = 80 symbol length.

第8圖係繪示本發明中符號及基底位元組對應表198之示意圖。如第2圖與第8圖所示，基底位元組生成子模組130可依據符號集合定義195中指定之符號個數N_S=6、及基底位元組長度196(L_c=512)，用隨機方式選擇六組長度為L_c之基底位元組C₁~C₆分別對應六個符號S₁~S₆，其中，六個符號S₁~S₆及六組長度為Lc之基底位元組C₁~C₆之對應表，即為第8圖之符號及基底位元組對應表198。另外，可針對基底位元組兩兩之間之正交性進行限制，可用餘弦距離之絕對值運算兩基底位元組之向量距離，若向量距離小於特定門檻(如0.99)(此特定門檻可依據信賴度Conf，在0~1之間進行調整)，則須重新生成基底位元組之組合。 FIG. 8 is a schematic diagram showing a symbol and base byte correspondence table 198 in the present invention. As shown in FIG. 2 and FIG. 8, the base byte generation submodule 130 may be based on the number of symbols N _S = 6 and the base byte length 196 (L _c = 512) specified in the symbol set definition 195. , Randomly select six sets of base bytes C ₁ to C _{6 of} length L _c corresponding to six symbols S ₁ to S _{6 respectively} , among which, six signs S ₁ to S ₆ and six sets of bases of length Lc The correspondence table of the bytes C ₁ to C ₆ is the symbol and base byte correspondence table 198 of FIG. 8. In addition, the orthogonality between two pairs of base bytes can be limited. The absolute distance of the cosine distance can be used to calculate the vector distance between the two base bytes. If the vector distance is less than a specific threshold (such as 0.99) (this specific threshold can be According to the reliability Conf, adjust between 0 ~ 1), then the combination of base bytes must be regenerated.

第3圖之符號序列生成模組200係以音訊浮水印傳輸組態參數190及動作代碼腳本040作為輸入以產生出內嵌符號序列290，進而依據內嵌符號序列290決定次階段內嵌入音訊之符號序列。用戶須提供動作代碼定義表030及動作代碼腳本040，以據此指定需嵌入音訊之內容。 The symbol sequence generating module 200 in FIG. 3 takes the audio watermark transmission configuration parameter 190 and the action code script 040 as input to generate an embedded symbol sequence 290, and then determines the embedded audio sequence in the next stage according to the embedded symbol sequence 290. Symbol sequence. The user must provide an action code definition table 030 and an action code script 040 to specify the content to be embedded in the audio accordingly.

第9圖係繪示本發明中動作代碼定義表030之示意圖。如圖所示，動作代碼定義表030為一文件。舉例而言，動作代碼3所對應的動作內容描述即為「關閉瀏覽器」。 FIG. 9 is a schematic diagram showing an action code definition table 030 in the present invention. As shown in the figure, the action code definition table 030 is a file. For example, action The description of the action content corresponding to code 3 is "close the browser".

第10圖係繪示本發明中動作代碼腳本040之示意圖。如圖所示，動作代碼腳本040為一文件，且動作代碼腳本040之內容描述一或多組動作代碼及其執行時間之資訊。舉例而言，動作編號ASN=2對應之動作代碼為IX_action=3，執行動作和原始音訊起始點之時間差042為T_action=20000(毫秒)；以樣本數來計算時間長度，則為SI_action=T_action*SR/1000=882000(樣本數)。例如，此動作編號指定在距原始音訊開頭的第20000毫秒(即第20秒)或第882000個樣本處執行動作代碼為3的動作，即為第9圖所示之「關閉瀏覽器」。 FIG. 10 is a schematic diagram showing an action code script 040 in the present invention. As shown in the figure, the action code script 040 is a file, and the content of the action code script 040 describes one or more sets of action codes and their execution time. For example, the action code corresponding to action number ASN = 2 is IX _action = 3, and the time difference between the execution of the action and the original audio start point 042 is T _action = 20000 (milliseconds); if the time length is calculated by the number of samples, it is SI _action = T _action * SR / 1000 = 882000 (number of samples). For example, this action number specifies that the action with the action code of 3 is performed at the 20,000 milliseconds (that is, the 20th second) or 882000th sample from the beginning of the original audio, which is the "close browser" shown in Figure 9.

第11圖係繪示本發明第3圖中動作代碼時軸調整子模組210之示意圖。如第3圖與第11圖所示，針對動作代碼時軸調整子模組210之參數計算、封包框切割及時軸調整，茲舉例說明如下。 FIG. 11 is a schematic diagram of the time code adjustment sub-module 210 of the action code in FIG. 3 of the present invention. As shown in FIG. 3 and FIG. 11, the parameter calculation of the axis adjustment sub-module 210 for the action code, the cutting of the packet frame, and the time axis adjustment are illustrated as examples below.

在參數計算上，動作代碼時軸調整子模組210可依據音框長度193(如L_window=1024)、音訊取樣率199(如SR=44100)及第7圖之封包符號組定義192，計算出符號樣本長度(如L_symbol=L_window=1024)、封包符號長度(如SSL_packet=80)、封包樣本長度(如L_packet=L_symbol*SSL_packet=1024*80=81920)。另外，預估解碼時間197為L_decode=132300。 In terms of parameter calculation, the action code time axis adjustment sub-module 210 can calculate according to the sound frame length 193 (such as L _window = 1024), the audio sampling rate 199 (such as SR = 44100), and the packet symbol group definition 192 in FIG. 7. The length of the symbol samples (such as L _symbol = L _window = 1024), the packet symbol length (such as SSL _packet = 80), and the packet sample length (such as L _packet = L _symbol * SSL _packet = 1024 * 80 = 81920). In addition, the estimated decoding time 197 is L _decode = 132300.

在封包框切割上，動作代碼時軸調整子模組210可將音訊從起始處依封包樣本長度L_packet切成封包框，並從0開始依序編列封包編號PSN，第k個封包起始點為 SI_packet(k)=k*L_packet。 In packet frame cutting, the action code time axis adjustment sub-module 210 can cut the audio from the beginning to the packet frame according to the packet sample length L _packet , and sequentially sequence the packet number PSN from 0, starting at the kth packet The point is SI _packet (k) = k * L _packet .

在時軸調整上，「執行動作編號m的內嵌封包之最遲起始點和原始音訊起始點之時間差SI_deadline(m)」為以「動作編號m的執行時間和原始音訊起始點之時間差(樣本數)SI_action(m)」減去L_decode及L_packet求得。「動作編號m的內嵌封包起始點和原始音訊起始點之時間差SI_embed(m)」即為封包框起始位置小於SI_deadline(m)之最近封包之起始點SI_packet(k)，亦即SI_embed(m)=SI_packet(k)。「執行動作編號m和其內嵌封包起始點之時間差SI_offset(m)」之求法為SI_offset(m)=SI_action(m)-SI_embed(m)。同時，將SI_deadline(m)、SI_embed(m)以及SI_offset(m)記錄至動作代碼時軸調整腳本219。 On the time axis adjustment, "the time difference between the latest start point of the embedded packet executing the action number m and the starting point of the original audio SI _deadline (m)" is taken as "the execution time of the action number m and the starting point of the original audio The time difference (number of samples) SI _action (m) ”is obtained by subtracting L _decode and L _packet . "Time difference SI _embed (m) between the starting point of the embedded packet of the action number m and the starting point of the original audio" is the SI _packet (k) of the starting point of the most recent packet whose packet frame start position is less than the SI _deadline (m) , That is, SI _embed (m) = SI _packet (k). The solution of the "time _offset SI _offset (m) between the execution action number m and the start point of its embedded packet" is SI _offset (m) = SI _action (m)-SI _embed (m). At the same time, the SI _deadline (m), SI _embed (m), and SI _offset (m) are recorded in the action code time axis adjustment script 219.

第12圖係繪示本發明中動作代碼時軸調整腳本219之示意圖。舉例而言，在上述時軸調整上，SI_action(2)=882000₍₁₎；SI_deadline(2)=SI_action(2)-L_packet-L_decode=882000-81920-132300=667780₍₂₎；封包起始位置小於667780之最近封包為封包編號8₍₃₎(即PSN=floor(667780/81920)=8)，所以動作編號ASN=2內嵌浮水印之起點SI_embed(2)就放置於封包編號8之起點位置SI_packet(8)，SI_embed(2)=SI_packet(8)=L_packet*8=655360₍₄₎；SI_offset(2)=SI_action(2)-SI_embed(2)=882000-655360=226640₍₅₎。 FIG. 12 is a schematic diagram showing an action code time axis adjustment script 219 in the present invention. For example, in the above time axis adjustment, SI _action (2) = 882000 ₍₁₎ ; SI _deadline (2) = SI _action (2) -L _packet -L _decode = 882000-81920-132300 = 667780 ₍₂₎ ; The most recent packet whose packet start position is less than 667780 is the packet number 8 ₍₃₎ (ie PSN = floor (667780/81920) = 8), so the action number ASN = 2 the starting point of the embedded watermark SI _embed (2) is placed SI _packet (8) at the starting position of packet number 8, SI _embed (2) = SI _packet (8) = L _packet * 8 = 655360 ₍₄₎ ; SI _offset (2) = SI _action (2) -SI _embed ( 2) = 882000-655360 = 226640 ₍₅₎ .

再者，第12圖之動作代碼時軸調整腳本219可包含「動作編號ASN」、「動作代碼IX_action」、「動作編號m的執行時間和原始音訊起始點之時間差(樣本數)SI_action(m)」、「動作編號m內嵌封包之最遲起始點和原始音訊起始點之時間差SI_deadline(m)」、「封包編號PSN」、「動作編號m的內嵌封包起始點和原始音訊起始點之時間差SI_embed(m)」及「動作編號m的執行時間和其內嵌封包起始點之時間差SI_offset(m)」。 Furthermore, the action code time axis adjustment script 219 in FIG. 12 may include "action number ASN", "action code IX _action ", "execution time of action number m and time difference (sample number) from the original audio start point SI _action (m) "," Time difference between the latest start point of the embedded packet in action number m and the original audio start point SI _deadline (m) "," packet number PSN "," embedded packet start point in action number m "Time difference SI _embed (m) from the original audio start point" and "Time _offset SI _offset (m) between the execution time of the action number m and the start point of its embedded packet".

第13圖係繪示本發明中動作符號組腳本229之示意圖。如第3圖與第13圖所示，文字轉換符號組子模組220可依據符號集合定義195將動作代碼時軸調整腳本219中之文字轉換為符號組序列，並將動作代碼時軸調整腳本219中之「封包編號PSN」、「動作代碼IX_action」及「動作編號m的執行時間和其內嵌封包起始點之時間差SI_offset(m)」，分別轉換為動作符號組腳本229中之「封包編號符號組SSPSN」、「動作代碼符號組SSIX_action」及「動作編號m的執行時間和其內嵌封包起始點時間差符號組SSSI_offset(m)」。 FIG. 13 is a schematic diagram showing an action symbol group script 229 in the present invention. As shown in FIG. 3 and FIG. 13, the text conversion symbol group submodule 220 can convert the text in the action code time axis adjustment script 219 to a symbol group sequence according to the symbol set definition 195, and the action code time axis adjustment script The "packet number PSN", "action code IX _action ", and "the time difference SI _offset (m) between the execution time of the action number m and its embedded packet start point" in 219 are respectively converted into the action symbol group script 229. "Packet number symbol group SSPSN", "action code symbol group SSIX _action " and "execution time of action number m and its embedded packet start point time difference symbol group SSSI _offset (m)".

例如，文字轉換符號組子模組220可依序新增未被指定封包編號之資料列，且「動作代碼IX_action」及「動作編號m的執行時間和其內嵌封包起始點之時間差SI_offset(m)」均預設為0。動作代碼符號組SSIX_action為動作代碼IX_action以四進位表示，並以0補滿四位而成，補滿位數可視動作代碼總個數變動。封包編號符號組SSPSN為封包編號PSN以四進位表示，並以0補滿8位而成，補滿位數可視需內嵌音訊長度變動。「動作編號m的執行時間和其內嵌封包起始點時間差符號組SSSI_offset(m)」為「動作編號m的執行時間和其內嵌封包起始點之時間差SI_offset(m)」除以1000後四捨五入，以四進位表示，並以0補滿6位而成，補滿位數可視需內嵌音訊長度變動。 For example, the text conversion symbol group sub-module 220 may sequentially add data rows without a specified packet number, and the time difference SI between the execution time of "action code IX _action " and "action number m and the start point of its embedded packet" _offset (m) ”are all preset to 0. The action code symbol group SSIX _action is an action code. The IX _action is represented by a quartet and is filled with four digits. The number of full digits varies depending on the total number of action codes. The packet number symbol group SSPSN represents the packet number PSN in quaternary and is filled with 0 to 8 digits. The number of digits to be filled may vary depending on the embedded audio length. "The execution time of action number m and its embedded packet start point time difference symbol set SSSI _offset (m)" is the "time difference between the execution time of action number m and its embedded packet start point SI _offset (m)" divided by After 1000, it is rounded up, expressed as a quartet, and filled with 6 digits. The number of full digits can be changed according to the embedded audio length.

第14圖係繪示本發明中具錯誤更正之動作符號組腳本239之示意圖。如第3圖與第14圖所示，錯誤更正碼演算子模組230可為錯誤更正碼之運算以產生具錯誤更正之動作符號組腳本239，據此提升傳送內容之可信度。在本實施例中，錯誤更正機制194使用每六組重複資訊投票機制。例如，原本之符號組表示為‘0003’，經錯誤更正碼演算子模組230運算後，即以‘000300030003000300030003’成為具錯誤更正之符號組。同時，經錯誤更正碼演算子模組230運算後，可產生「具錯誤更正之動作符號組腳本239」，且具錯誤更正之動作符號組腳本239之內容包含動作編號ASN、封包編號PSN、封包編號符號組SSPSN，以及錯誤更正碼演算子模組230所產生之「具錯誤更正之動作代碼符號組ECSSIX_action」及「具錯誤更正之動作編號m的執行時間和其內嵌封包起始點時間差符號組ECSSSI_offset(m)」。 FIG. 14 is a schematic diagram showing the action symbol group script 239 with error correction in the present invention. As shown in FIG. 3 and FIG. 14, the error correction code operation module 230 may perform an operation of the error correction code to generate an action symbol group script 239 with error correction, thereby improving the credibility of the transmitted content. In this embodiment, the error correction mechanism 194 uses a repeat information voting mechanism every six groups. For example, the original symbol group is expressed as '0003'. After being calculated by the error correction code operator module 230, the symbol group with error correction is set to '000300030003000300030003'. At the same time, after the operation of the error correction code operator module 230, "action symbol group script 239 with error correction" can be generated, and the content of the action symbol group script 239 with error correction includes action number ASN, packet number PSN, and packet Numbered symbol set SSPSN and error correction code operator module 230 generated "action code symbol set with error correction ECSSIX _action " and "action time with error corrected action number m and its embedded packet start point time difference Symbol group ECSSSI _offset (m) ".

第15圖係繪示本發明中內嵌符號表之示意圖。如第3圖與第15圖所示，符號序列組合子模組240可依據具錯誤更正之動作符號組腳本239，以封包符號組定義192依封包編號PSN排序，將「封包編號PSN」、「第一同步符號組SS_SW1」、「封包編號符號組SSPSN」、「第二同步符號組SS_SW2」、「具錯誤更正之動作代碼符號組ECSSIX_action」、「第三同步符號組SS_SW3」「具錯誤更正之動作編號m的執行時間和其內嵌封包起始點時間差符號組ECSSSI_offset(m)」組合成內嵌符號表。 FIG. 15 is a schematic diagram showing an embedded symbol table in the present invention. As shown in FIG. 3 and FIG. 15, the symbol sequence combination submodule 240 can sort the packet symbol group definition 192 according to the packet symbol group definition 192 according to the action symbol group script 239 with error correction, and sort the “packet number PSN”, “ First synchronization symbol group SS _SW1 "," Packet number symbol group SSPSN "," Second synchronization symbol group SS _SW2 "," Action code symbol group with error correction ECSSIX _action "," Third synchronization symbol group SS _SW3 "" The execution time of the action number m with the error correction and the time difference symbol group ECSSSI _offset (m) "of its embedded packet start point are combined into an embedded symbol table.

第16圖係繪示本發明中內嵌符號序列290之示意圖。如第3圖與第16圖所示，除了封包編號PSN外，符號序列組合子模組240可依序將內嵌符號表由左到右及由上至下，將內嵌符號表內定義之符號排成一序列，以成為內嵌符號序列290(如sbl)。 FIG. 16 is a schematic diagram showing the embedded symbol sequence 290 in the present invention. As shown in Figures 3 and 16, in addition to the packet number PSN, the symbol sequence combination sub-module 240 can sequentially define the embedded symbol table from left to right and top to bottom, and define the embedded symbol table in The symbols are arranged in a sequence to become an embedded symbol sequence 290 (such as sbl ).

在第4圖之音訊浮水印內嵌模組300中，音框切割子模組310可將原始音訊050(如x)依據音框長度193(如L_window=1024)依序切成樣本長度為1024之音訊音框319(如x _i)，再對所有切出之音訊音框319進行接續之訊號處理。 In the audio watermark embedded module 300 in FIG. 4, the sound frame cutting sub-module 310 can sequentially cut the original audio 050 (such as x ) according to the sound frame length 193 (such as L _window = 1024) into a sample length of The audio frame 319 of 1024 (such as x _i ) is then subjected to subsequent signal processing for all cut-out audio frames 319.

第4圖之訊號轉換子模組320可將音訊音框319(如x _i)進行指定之訊號轉換，在本實施例中，採用1024點之離散餘弦轉換(DCT)以得到1024維度之音訊參數329(如f _i)。第i個音框(如x _i)包含音訊樣本(如x _i,0~x _i,1023)，音訊參數329為 ,m=0~1023。 The signal conversion sub-module 320 in FIG. 4 can convert the specified signal of the audio frame 319 (such as x _i ). In this embodiment, a discrete cosine transform (DCT) of 1024 points is used to obtain audio parameters of 1024 dimensions. 329 (such as f _i ). The i-th frame (such as x _i ) contains audio samples (such as x _{i , 0} ~ x _{i , 1023} ). The audio parameter 329 is , m = 0 ~ 1023.

第4圖之音訊浮水印增益值決策子模組330可依據音訊參數329計算平均音量(如v_i)，並參考前一音框之音訊浮水印增益值338(如g_i-1)決定出欲嵌入目前音框之音訊浮水印增益值339(如g_i)，其中，v _i=Σ_m|f _i,m|²,g ₁=αv ₁,g _i=β(αv _i)+(1-β)g _i-1。α,β可依據聽覺實驗或經驗設定。第4圖之符號框切割子模組340可依序將內嵌符號序列290(如sbl)取出一內嵌符號框349(如sbl _i)，以指定嵌入目前音框之符號。 The audio watermark gain decision sub-module 330 in FIG. 4 can calculate the average volume (such as v _i ) according to the audio parameter 329, and refer to the audio watermark gain value 338 (such as g _i-1 ) of the previous sound frame. To embed the current audio frame watermark gain value 339 (such as g _i ), where v _i = Σ _m | f _{i, m} | ² , g ₁ = α v ₁ , g _i = β (α v _i ) + (1-β) g _{i -1} . α, β can be set according to auditory experiment or experience. The symbol frame cutting sub-module 340 in FIG. 4 can sequentially take out an embedded symbol sequence 290 (such as sbl ) from an embedded symbol frame 349 (such as sbl _i ) to specify a symbol to be embedded in the current sound frame.

第4圖之符號框內嵌子模組350可依據第8圖之符號及基底位元組對應表198找出對應sbl _i的基底位元組，例如， sbl _i=a，a對應的基底位元組即為C_5,k,k=0~511。而且，第4圖之符號框內嵌子模組350可將基底位元組藏入音訊參數329(如f _i,m)成為內嵌浮水印之音訊參數359(如)，其中，以下為一較簡單實施之 awm_encode函式： FIG 4 symbols of the embedded block submodule 350 may be based on symbols, and the base of FIG. 8 byte correspondence table 198 to find the corresponding byte sbl _i of the substrate, e.g., sbl _i = a, a substrate corresponding bit The tuple is C _{5, k} , k = 0 ~ 511. Moreover, the sub-module 350 embedded in the symbol frame of FIG. 4 can hide the base byte into the audio parameter 329 (such as f _{i, m} ) to become the audio parameter 359 (such as ),among them, The following is a simpler implementation of the awm_encode function:

第4圖之反訊號轉換子模組360可依據下列算法，將內嵌浮水印之音訊參數359(如)經過反訊號轉換，並將音訊參數轉換回音框以成為內嵌浮水印之音訊音框369(如)，且,k=0~1023。 The anti-signal conversion sub-module 360 in FIG. 4 can convert the audio parameters of the embedded watermark 359 (such as ) After anti-signal conversion, and convert the audio parameters back to the sound box to become the audio sound box with embedded watermark 369 (such as ), And , k = 0 ~ 1023.

依據上述算法，可以得到內嵌浮水印之音訊音框369(如)，第4圖之音框組合子模組370再將依序組合，還原成內嵌浮水印之音訊390(如x')。 According to the above algorithm, an audio frame 369 with embedded watermark (such as ), The sound box combination sub-module 370 in Figure 4 Combine them in order to restore the audio 390 (such as x ' ) with embedded watermark.

在第5圖之音訊浮水印解碼模組400中，符號起始位置搜尋子模組410可從起始點未知之含背景噪音之內嵌浮水印音訊樣本391(如y)之已解碼之終點位置428(如offset)，並依據音框長度193(如L_window=1024)、與符號及基底位元組對應表198(如C)搜尋出未來最接近之符號起始位置419(如next_sbl_start)。在初始狀態，已解碼之終點位置428為offset=L_window/2，解碼強度分數448為score=0。下列(一)至(三)以一簡例描述如何找出next_sbl_start： In the audio watermark decoding module 400 in FIG. 5, the symbol starting position search sub-module 410 can decode the decoded end point of the embedded watermark audio sample 391 (such as y ) with background noise that has an unknown starting point. Position 428 (such as offset ), and search for the closest symbol starting position 419 (such as next_sbl_start ) in the future according to the sound frame length 193 (such as L _window = 1024), and the corresponding table of symbols and base bytes 198 (such as C) . In the initial state, the decoded end position 428 is offset = L _window / 2, and the decoding strength score 448 is score = 0. The following (a) to (c) describe how to find next_sbl_start with a brief example:

(一)若前一次解碼之解碼強度分數448(如score)大於解碼強度分數門檻(如γ)，則未來最接近之符號起始位置 419為next_sbl_start=offset+1，並直接進行下一步之音框切割子模組420；若score γ，則繼續符號起始位置搜尋子模組410接續之流程。 (1) If the decoding strength score 448 (such as score ) of the previous decoding is greater than the decoding strength score threshold (such as γ), the next closest symbol starting position 419 is next_sbl_start = offset +1, and the next step is directly performed. Frame cutting sub-module 420; if score γ, then continue the process of searching for the starting position of the symbol sub-module 410.

(二)從y的第offset-L_window/2的位置開始，取得一長度為L_window之音框z ₀；從y的第的位置開始，取得一長度為L_window之音框z ₁；依此類推，取得z ₀~z _p，其中，p=L _window-1=1023。 (B) starting from the position of y offset -L _window / 2 acquires a sound of a length L _window frame z _0; y is from of Starting from the position of, to obtain a sound box z _{1 of} length L _window ; and so on, to obtain z ₀ ~ z _p , where p = L _window -1 = 1023.

(三)以相同於訊號轉換子模組430之音訊轉換方式將音訊音框429(如z ₀~z _p)轉換為音訊參數439(如h ₀~h _p)，並使用符號解碼子模組440得到對h ₀~h _p之解碼符號449(如sync_sbl ₀~sync_sbl _p)及解碼強度分數448(如sync_score ₀~sync_score _p)。若sync_score _n為最大值，則 next_sbl_start應更新為，並將解出的符號 sync_sbl _n記錄至「還原符號與時間序490」，解碼強度分數448為score=sync_score _n。 (3) Convert the audio frame 429 (such as z ₀ ~ z _p ) into the audio parameter 439 (such as h ₀ ~ h _p ) in the same audio conversion method as the signal conversion sub-module 430, and use the symbol decoding sub-module 440 obtains decoding symbols 449 (such as sync_sbl ₀ ~ sync_sbl _p ) and decoding strength scores 448 (such as sync_score ₀ ~ sync_score _p ) for h ₀ ~ h _p . If sync_score _n is the maximum value, next_sbl_start should be updated to , And record the resolved symbol sync_sbl _n to "restore the symbol and time sequence 490", and the decoding intensity score 448 is score = sync_score _n .

第5圖之音框切割子模組420可從未來最接近之符號起始位置419(如next_sbl_start)切取一長度為音框長度193(如L_window=1024)之音訊音框429(如z _{next_sbl_start})，進而將已解碼之終點位置428更新為offset=next_sbl_start+L _window-1。 The sound frame cutting sub-module 420 in FIG. 5 can cut an audio frame 429 (such as z _{next_sbl_start} ) with a length of 193 (such as L _window = 1024) from the nearest starting position of the symbol 419 (such as next_sbl_start ). ), And then the decoded end position 428 is updated to offset = next_sbl_start + L _window -1.

第5圖之訊號轉換子模組430可將音訊音框429(如z _{next_sbl_start})以指定之訊號轉換方式轉換成音訊參數439(如h _{next_sbl_start})，且訊號轉換子模組430之訊號轉換方式須與音訊浮水印內嵌模組300之訊號轉換子模組320之訊號轉換方式一致。 The signal conversion sub-module 430 in FIG. 5 can convert the audio sound box 429 (such as z _{next_sbl_start} ) into the audio parameter 439 (such as h _{next_sbl_start} ) in a specified signal conversion method, and the signal conversion method of the signal conversion sub-module 430 must The signal conversion method is the same as the signal conversion sub-module 320 of the audio watermark embedded module 300.

第5圖之符號解碼子模組440可將音訊參數439(如h _{next_sbl_start})有內嵌浮水印之元素依照嵌入順序排列為h _ordered，如本實施例為h _ordered=h _{next_sbl_start}[512：1023]。符號解碼子模組440可將h _ordered和「符號及基底位元組對應表198」中之基底位元組C _i，逐一計算出相似度similarity _i，相似度可採用餘弦相似度，據此得到的最大相似度對應之符號即為解碼符號449(如Sdec _n)，並將解碼符號449存入為暫存還原符號序列(如tmp_symbol_seq)之最後一個符號，此符號距錄音起始點之時間差為tdec=next_sbl_start/SR，並將tdec存入為暫存還原時間序列(如tmp_time_seq)之最後一個數字，解碼強度分數448為score=similarity _i，已解碼之終點位置428更新為offset'=offset+L _window。 FIG symbol decoding submodule 5440 may be the first audio parameter 439 (e.g., _h next_sbl_start) element with an embedded watermark in accordance with the arrangement order of embedding h _ordered, as in this embodiment _{_{h ordered = h next_sbl_start [512:}} 1023] . The symbol decoding sub-module 440 may calculate the similarity similarity _i one by one by using h _ordered and the base byte C _i in the "Symbol and Base Byte Correspondence Table 198" one by one, and the similarity may be cosine similarity. , The symbol corresponding to the maximum similarity obtained according to this is the decoded symbol 449 (such as Sdec _n ), and the decoded symbol 449 is stored as the last symbol of the temporary restoration symbol sequence (such as t mp_symbol_seq ). The time difference between the starting points is tdec = next_sbl_start / SR , and tdec is stored as the last number of the temporary restoration time series (such as tmp_time_seq ). The decoding intensity score 448 is score = similarity _i , and the decoded ending position 428 is updated to offset. ' = offset + L _window .

第17圖係繪示本發明中還原符號與時間序列490之示意圖。如第5圖與第17圖所示，符號序列組合子模組450可將tmp_symbol_seq與tmp_time_seq整合串接為「還原符號與時間序列490」，且還原符號與時間序列490包含「還原符號編號DEC_SIX」、「還原符號DEC_S」及「還原符號和錄音起始點之時間差DEC_T」。 FIG. 17 is a schematic diagram showing a reduction symbol and a time series 490 in the present invention. As shown in FIG. 5 and FIG. 17, the symbol sequence combination sub-module 450 can integrate t mp_symbol_seq and tmp_time_seq into a “reduction symbol and time series 490”, and the reduction symbol and time series 490 includes “reduction symbol number DEC_SIX "," Restore Symbol DEC_S "and" Time Difference DEC_T Between Restore Symbol and Recording Start Point ".

第18圖係繪示本發明中還原同步符號組與還原負載符號組之資料表之示意圖。如第6圖與第18圖所示，符號序列解譯模組500之符號序列切割子模組510可依據同步符號組定義191，將「還原符號與時間序列490」進行同步符號組比對及切割，以產生同步符號組與負載符號組之資料表。 FIG. 18 is a schematic diagram showing a data table of the restored synchronization symbol group and the restored load symbol group in the present invention. As shown in FIG. 6 and FIG. 18, the symbol sequence cutting sub-module 510 of the symbol sequence interpretation module 500 can perform a synchronization symbol group comparison of “restored symbols and time series 490” according to the synchronization symbol group definition 191 and Cut to generate a data table of the synchronization symbol group and the load symbol group.

第19圖係繪示本發明中具錯誤更正之還原動作符號組腳本519之示意圖。如第6圖與第19圖所示，符號序列切割子模組510可依據封包符號組定義192組合解譯出具錯誤更正之還原動作符號組腳本519，且具錯誤更正之還原動作符號組腳本519包含「解碼封包編號DPSN」、「還原封包編號符號組DEC_SSPSN」、「具錯誤更正之還原動作代碼符號組DEC_ECSSIX_action」及「具錯誤更正之還原之執行時間和其內嵌封包起始點時間差符號組DEC_ECSSSI_offset」。再者，符號序列切割子模組510可依據封包之第一個符號在「同步符號組與負載符號組之資料表」中記錄和錄音起始點之時間差，將其記錄為「封包起始和錄音起始點之時間差REC_T_packet」。 FIG. 19 is a schematic diagram showing a script 519 of a restoration action symbol group with error correction in the present invention. As shown in FIG. 6 and FIG. 19, the symbol sequence cutting sub-module 510 can decode the restoration action symbol group script 519 with error correction and the restoration action symbol group script 519 with error correction according to the packet symbol group definition 192 combination. Includes "Decoded Packet Number DPSN", "Restored Packet Number Symbol Set DEC_SSPSN", "Restoration Action Code Symbol Set with Error Correction DEC_ECSSIX _action ", and "Run Time of Recovery with Error Correction and Its Embedded Packet Start Point Time Difference Symbol Group DEC_ECSSSI _offset ". In addition, the symbol sequence cutting sub-module 510 may record the time difference between the recording and recording start points in the "data table of the synchronization symbol group and the load symbol group" according to the first symbol of the packet, and record it as "the packet start and Time difference REC_T _{packet from the} recording start point ”.

第20圖係繪示本發明中還原動作符號組腳本529之示意圖。如第6圖與第20圖所示，錯誤更正子模組520可依據錯誤更正機制194之指定方式，將「具錯誤更正之還原動作符號組腳本519」中之「具錯誤更正之還原動作代碼符號組DEC_ECSSIX_action」及「具錯誤更正之還原之執行時間和其內嵌封包起始點時間差符號組DEC_ECSSSI_offset」，分別還原為「還原動作代碼符號組DEC_SSIX_action」及「還原之執行時間和其內嵌封包起始點時間差符號組DEC_SSSI_offset」，再加上「封包起始和錄音起始點之時間差REC_T_packet」，以成為還原動作符號組腳本529。 FIG. 20 is a schematic diagram showing a script 529 for restoring action symbols in the present invention. As shown in FIG. 6 and FIG. 20, the error correction submodule 520 may, according to the designation method of the error correction mechanism 194, convert the “recovery action code with error correction” in the “recovery action symbol set script with error correction 519” Symbol group DEC_ECSSIX _action "and" Reduction execution time with error correction and its embedded packet start point time difference symbol group DEC_ECSSSI _offset "are restored to" Restore action code symbol group DEC_SSIX _action "and" Restore execution time and its Embedded packet start point time difference symbol group DEC_SSSI _offset ”, plus“ time difference between _packet start point and recording start point REC_T _packet ”to become the recovery action symbol group script 529.

第21圖係繪示本發明中錯誤更正方式之示意圖。在上述第20圖之實施例中，以還原封包編號為2為例，其具錯誤更正之動作代碼符號組為「001200100010001000103010」，錯誤更正機制194之指定錯誤更正方式為四個符號重複六組投票，如第21圖所示錯誤更正方式之投票過程，還原錯誤後之動作代碼符號組(投票結果)即為「0010」。 FIG. 21 is a schematic diagram showing an error correction method in the present invention. In the embodiment in FIG. 20 above, the restored packet number is 2 as an example, which has an error. The correct action code symbol set is "001200100010001000103010". The designated error correction method of the error correction mechanism 194 is to repeat six groups of votes with four symbols. As shown in Figure 21, the voting process of the error correction method is restored. (Voting result) is "0010".

第5圖之還原動作代碼腳本生成子模組530可依據符號集合定義195，將還原動作符號組腳本529中之符號組解讀回文字，並據之生成「還原動作代碼腳本580」及「錄音與原始音訊起始時間之時相位移資訊590」。 The restoration action code script generation submodule 530 in FIG. 5 can interpret the symbol group in the restoration action symbol group script 529 back to text according to the symbol set definition 195, and generate “restore action code script 580” and “recording and Phase shift information 590 "for the original audio start time.

第22圖係繪示本發明中還原動作代碼及時間差序列之示意圖。如第5圖與第22圖所示，還原動作代碼腳本生成子模組530可將符號組解讀回文字之方式需與文字轉換符號組子模組220相對應，解碼範例如第22圖與下列(一)至(三)所示： FIG. 22 is a schematic diagram showing a restoration action code and a time difference sequence in the present invention. As shown in FIG. 5 and FIG. 22, the way in which the recovery action code script generation submodule 530 can interpret the symbol group back to the text needs to correspond to the text conversion symbol group submodule 220. For example, FIG. 22 and the following (A) to (c):

(一)還原動作代碼DEC_IX_action可由四進位表示之還原動作代碼符號組DEC_SSIX_action轉回十進位而得。例如，還原動作代碼符號組DEC_SSIX_action為「0010」，還原動作代碼DEC_IX_action即為「4」。 (1) Reduction action code DEC_IX _action can be obtained by reverting the DEC_SSIX _{action of the} set of reduction action code symbols represented by quaternary to decimal. For example, the DEC_SSIX _action of the restoration action code symbol group is "0010", and the DEC_IX _{action of} the restoration action code is "4".

(二)還原封包編號DEC_PSN可由四進位表示之還原封包編號符號組DEC_SSPSN轉回十進位而得。例如，還原封包編號符號組DEC_SSPSN為「00000103」，還原封包編號DEC_PSN即為「19」。 (2) The restored packet number DEC_PSN can be obtained by converting the restored packet number symbol group DEC_SSPSN indicated by quaternion back to decimal. For example, the restored packet number symbol group DEC_SSPSN is "00000103", and the restored packet number DEC_PSN is "19".

(三)「還原之執行時間和其內嵌封包起始點時間差符號組DEC_SI_offset」可由四進位表示之「還原之執行時間和其內嵌封包起始點時間差符號組DEC_SSSI_offset」轉回十進位再乘以1000而得。例如，「還原之執行時間和其內嵌封包起始點時間差符號組DEC_SSSI_offset」為「010201」，轉回十進位為「289」，「還原之執行時間和其內嵌封包起始點時間差符號組DEC_SI_offset」即為「289000」。 (C) "execution time and the reduction of its embedded packet starting point of the time difference symbol group DEC_SI _offset" by ary representation of "reducing the execution time of the packet and its built-in symbol set the starting point of the time difference DEC_SSSI _offset" back to decimal Multiply by 1000 to get it. For example, "the set of time difference between the execution time of the restoration and its embedded packet start point DEC_SSSI _offset " is "010201", and the decimal place is "289", "the time difference of the execution time of the restoration and the start time of its embedded packet The group "DEC_SI _offset " is "289000".

第23圖係繪示本發明中還原動作代碼腳本580之示意圖。如圖所示，還原動作代碼腳本580包含「還原動作代碼DEC_IX_action」、「還原之執行時間和原始音訊起始點時間差(樣本數)DEC_SI_action」及「還原之執行時間和原始音訊起始點時間差(毫秒)DEC_T_action」，「還原之執行時間和原始音訊起始點時間差(樣本數)DEC_SI_action」由「還原封包編號DEC_PSN」、「封包樣本長度SL_packet」與「還原之執行時間和其內嵌封包起始點時間差符號組DEC_SI_offset」計算而得，DEC_SI_action=DEC_PSN*L_packet+DEC_SI_offset，「還原之執行時間和原始音訊起始點時間差(毫秒)DEC_T_action」之計算方式為DEC_T_action=DEC_SI_action/SR。 FIG. 23 is a schematic diagram showing the restoration action code script 580 in the present invention. As shown in the figure, the restoration action code script 580 includes "reduction action code DEC_IX _action ", "the difference between the execution time of the restoration and the original audio start point time (sample number) DEC_SI _action ", and "the execution time of the restoration and the original audio start point Time difference (msec) DEC_T _action "," Time difference between the execution time of the restore and the original audio start point (sample number) DEC_SI _action "consists of" restore packet number DEC_PSN "," packet sample length SL _packet "and" recovery execution time and its Calculate the embedded packet start point time difference symbol group DEC_SI _offset ", DEC_SI _action = DEC_PSN * L _packet + DEC_SI _offset ," the time difference between the execution time of the restore and the original audio start point (ms) DEC_T _action "is calculated as DEC_T _action = DEC_SI _action / SR.

第24圖係繪示本發明中錄音與原始音訊起始時間之時相位移資訊590之示意圖。如圖所示，錄音與原始音訊起始時間之時相位移資訊590可由「封包起始點和原始音訊起始點之時間差(毫秒)DEC_T_packet」及「封包起始和錄音起始點之時間差REC_T_packet」之差距中位數推測可得，其中，DEC_T_packet=DEC_PSN*L_packet/SR。 FIG. 24 is a schematic diagram showing the phase shift information 590 of the recording and original audio start time in the present invention. As shown in the figure, the time shift information 590 between the recording and the original audio start time can be calculated from the "time difference (ms) of the packet start point and the original audio start point DEC_T _packet " and "the time difference between the start of the packet and the start of recording The median of the gap between REC_T _packet ”can be estimated, where DEC_T _packet = DEC_PSN * L _packet / SR.

第25圖係繪示本發明中具音訊浮水印之方法之示意流程圖。如第1圖與第25圖所示，該方法之主要技術內容如下，其餘技術內容如同上述第1圖至第24圖所載，於此不再重覆敘述。 FIG. 25 is a schematic flowchart of a method with an audio watermark in the present invention. As shown in Figure 1 and Figure 25, the main technical content of this method is as follows, The rest of the technical content is as shown in the above-mentioned Figures 1 to 24, and will not be repeated here.

在第25圖之步驟S1中，由音訊浮水印傳輸組態選擇模組100依據用戶偏好設定010及背景噪音樣本021選擇音訊浮水印傳輸組態參數190。 In step S1 of FIG. 25, the audio watermark transmission configuration selection module 100 selects the audio watermark transmission configuration parameter 190 according to the user preference setting 010 and the background noise sample 021.

在第25圖之步驟S2中，由符號序列生成模組200依據音訊浮水印傳輸組態選擇模組100所選擇之音訊浮水印傳輸組態參數190，將用戶欲嵌入之動作代碼腳本040轉換成內嵌符號序列290。 In step S2 in FIG. 25, the symbol sequence generation module 200 converts the action code script 040 that the user wants to embed into the audio watermark transmission configuration parameter 190 selected by the audio watermark transmission configuration selection module 100. Embedded symbol sequence 290.

在第25圖之步驟S3中，由音訊浮水印內嵌模組300將來自符號序列生成模組200之內嵌符號序列290嵌入原始音訊050中以成為內嵌浮水印之音訊390。 In step S3 of FIG. 25, the embedded audio sequence 290 from the symbol sequence generating module 200 is embedded in the original audio 050 by the audio watermark embedded module 300 to become the embedded watermark 390.

在第25圖之步驟S4中，由音訊浮水印解碼模組400將來自音訊浮水印內嵌模組300之內嵌浮水印之音訊390解碼出還原符號與時間序列490。 In step S4 of FIG. 25, the audio watermark decoding module 400 decodes the audio signal 390 embedded in the watermark from the audio watermark embedding module 300 into a restored symbol and a time series 490.

在第25圖之步驟S5中，由符號序列解譯模組500將來自音訊浮水印解碼模組400之還原符號與時間序列490解譯為還原動作代碼腳本580以及錄音與原始音訊起始時間之時相位移資訊590。 In step S5 of FIG. 25, the symbol sequence interpretation module 500 interprets the restored symbols and time series 490 from the audio watermark decoding module 400 into the restored action code script 580 and the recording and original audio start time. Phase shift information 590.

第26圖係繪示以本發明具音訊浮水印之系統及方法達成傳送裝置A與接收裝置B之時脈及資訊同步之示意圖。 FIG. 26 is a schematic diagram showing the synchronization of the clock and information of the transmitting device A and the receiving device B by the system and method with audio watermark of the present invention.

如第1圖與第26圖所示，本發明具音訊浮水印之系統及方法可用於具有揚聲器A1之傳送裝置A與具有麥克風B1之接收裝置B。此傳送裝置A可透過該系統(如音訊浮水印內嵌模組300)或方法，將內嵌符號序列(如文字資訊051)內嵌至原始音訊050(如多媒體音訊)以成為內嵌浮水印之音訊390，並透過傳送裝置A之揚聲器A1將內嵌浮水印之音訊390經由麥克風B1傳送至接收裝置B，以供接收裝置B透過該系統(如音訊浮水印解碼模組400)或方法，自內嵌浮水印之音訊390中解碼出還原動作代碼腳本(如文字資訊051)以及錄音與原始音訊起始時間之時相位移資訊590。 As shown in Fig. 1 and Fig. 26, the system and method with audio watermarking of the present invention can be used for a transmitting device A having a speaker A1 and a receiving device B having a microphone B1. This transmission device A can pass through the system (such as the audio watermark (Embedded module 300) or method, embed an embedded symbol sequence (such as text information 051) into the original audio 050 (such as multimedia audio) to become an embedded watermarked audio 390, and use the speaker A1 of the transmission device A to insert the internal The embedded watermarked audio 390 is transmitted to the receiving device B through the microphone B1 for the receiving device B to decode the restoration action code from the embedded watermarked audio 390 through the system (such as the audio watermark decoding module 400) or method. Scripts (such as text information 051) and phase shift information 590 of the recording and original audio start time.

此外，在上述第1圖與第26圖中，本發明具音訊浮水印之系統及方法可提供兩層式架構之符號編碼層與音訊編碼層，並在符號編碼層執行動作符號組腳本之編碼、序列化或錯誤更正，且在音訊編碼層以數組之基底位元組將欲傳送之內嵌符號序列嵌入或藏於原始音訊中。 In addition, in the above-mentioned FIG. 1 and FIG. 26, the system and method with audio watermarking of the present invention can provide a two-layer structure of a symbol encoding layer and an audio encoding layer, and perform encoding of an action symbol group script on the symbol encoding layer. , Serialization or error correction, and embed the embedded symbol sequence to be transmitted in the audio coding layer with the base byte of the array in the original audio.

綜上所述，本發明具音訊浮水印之系統及方法中，可具有下列技術功效或優點之至少一者。 In summary, the system and method with audio watermarking of the present invention may have at least one of the following technical effects or advantages.

(一)本發明可依據不同的用戶偏好設定及背景噪音樣本選取不同的音訊浮水印傳輸組態參數，以據此調整內嵌浮水印之音訊之傳輸速度或精確度，或者客製化地在可接受之精確度下選擇最快之傳輸速度。 (1) The present invention can select different audio watermark transmission configuration parameters according to different user preference settings and background noise samples, so as to adjust the transmission speed or accuracy of the embedded watermark audio, or customize the Select the fastest transmission speed with acceptable accuracy.

(二)本發明可透過時軸調整及時戳傳送機制，讓接收裝置可獲取傳送裝置之播放時戳資訊，以達成多螢同步之效果。又，本發明可透過傳送裝置之特殊序列安排與封包設計，使接收裝置可以得知傳送裝置目前多媒體播放之時間點及指定時間須完成之動作代碼。 (2) The present invention can adjust the time stamp transmission mechanism through the time axis, so that the receiving device can obtain the playback time stamp information of the transmitting device to achieve the effect of multi-screen synchronization. In addition, the present invention can enable the receiving device to know the current multimedia playback time point of the transmitting device and the action code to be completed at a specified time through the special sequence arrangement and packet design of the transmitting device.

(三)具有揚聲器之傳送裝置可透過本發明之系統或方法，將內嵌符號序列(如文字資訊)內嵌至原始音訊(如多媒體音訊)以成為內嵌浮水印之音訊，並透過揚聲器將內嵌浮水印之音訊傳送至具有麥克風之接收裝置。而且，接收裝置可透過本發明之系統或方法，自內嵌浮水印之音訊中解碼出還原動作代碼腳本(如文字資訊)以及錄音與原始音訊起始時間之時相位移資訊。 (3) The transmission device with speakers can be transmitted through the system or method of the present invention. Method, embed the embedded symbol sequence (such as text information) into the original audio (such as multimedia audio) to become the embedded watermark audio, and send the embedded watermark audio to the receiving device with a microphone through the speaker. In addition, the receiving device can decode the restoration action code script (such as text information) and the phase shift information of the recording and the original audio start time from the embedded watermark audio through the system or method of the present invention.

(四)本發明提出兩層式架構之符號編碼層與音訊編碼層，適用於各種音訊浮水印之嵌入方式與錯誤保護機制，提供應用端與開發端更佳的使用彈性。同時，本發明在符號編碼層進行動作符號組腳本之編碼、序列化或錯誤更正保護機制，且在音訊編碼層以數組基底位元組將欲傳送之內嵌符號序列嵌入或藏於原始音訊中，透過兩層式架構的編碼架構及清楚的處理流程，讓該系統或方法更具彈性，以適應不同的應用情境。 (4) The present invention proposes a two-layer structure of a symbol encoding layer and an audio encoding layer, which is applicable to various audio watermark embedding methods and error protection mechanisms, and provides better application flexibility for application and development ends. At the same time, the present invention performs the encoding, serialization, or error correction protection mechanism of the action symbol group script in the symbol encoding layer, and embeds or hides the embedded symbol sequence to be transmitted in the audio encoding layer with the array base byte in the original audio Through the two-layered coding structure and clear processing flow, the system or method is more flexible to adapt to different application scenarios.

舉例而言，本發明具音訊浮水印之系統及方法之應用情境可如下所述，但不以此為限。 For example, the application scenarios of the audio watermarking system and method of the present invention can be described as follows, but not limited thereto.

(一)多螢即時互動遊戲或節目：以音訊浮水印同步多螢，可達成多人作答之互動性益智節目，亦可依據不同使用者推播不同之廣告。 (1) Multi-screen real-time interactive games or programs: Synchronized multi-screen with audio watermarks can achieve interactive puzzle programs for multiple people to answer, and can also broadcast different ads according to different users.

(二)電子看板、演唱會或大型活動：以音訊浮水印進行資訊廣播，建立群眾間之同步性。 (2) Electronic signboards, concerts or large-scale events: Information broadcast with audio watermarks to establish synchronization among the masses.

(三)版權保護：以音訊浮水印提供媒體版權保護機制。 (3) Copyright protection: Provide media copyright protection mechanism with audio watermark.

(四)電子商品間之離線溝通：以音訊浮水印進行資訊傳遞交換，達成物聯網之效果。 (IV) Offline communication between electronic goods: use audio watermark for information Pass the exchange to achieve the effect of the Internet of Things.

上述實施形態僅例示性說明本發明之原理、特點及其功效，並非用以限制本發明之可實施範疇，任何熟習此項技藝之人士均可在不違背本發明之精神及範疇下，對上述實施形態進行修飾與改變。任何運用本發明所揭示內容而完成之等效改變及修飾，均仍應為申請專利範圍所涵蓋。因此，本發明之權利保護範圍，應如申請專利範圍所列。 The above-mentioned embodiments merely exemplify the principles, features, and effects of the present invention, and are not intended to limit the implementable scope of the present invention. Anyone who is familiar with this technology can perform the above operations without departing from the spirit and scope of the present invention. Modifications and changes to the implementation form. Any equivalent changes and modifications made by using the disclosure of the present invention should still be covered by the scope of patent application. Therefore, the scope of protection of the rights of the present invention should be as listed in the scope of patent application.

Claims

A system with audio watermark includes: audio watermark transmission configuration selection module, which selects audio watermark transmission configuration parameters according to user preference settings and background noise samples; a symbol sequence generation module, which is based on the audio watermark The audio watermark transmission configuration parameter selected by the watermark transmission configuration selection module converts the action code script to be embedded into an embedded symbol sequence; the audio watermark embedded module will be derived from the symbol sequence generation module The embedded symbol sequence is embedded in the original audio to become the embedded watermark audio; the audio watermark decoding module decodes the embedded watermark audio from the audio watermark embedded module into a restored symbol and Time sequence; and a symbol sequence interpretation module, which interprets the restored symbol and time sequence from the audio watermark decoding module into a restoration action code script and time shift information of the recording and the original audio start time.

The system according to item 1 of the scope of patent application, wherein the audio watermark transmission configuration selection module has a background noise attribute analysis sub-module, which analyzes the background noise from the background noise sample through an audio conversion algorithm. Attribute, and the background noise attribute includes a volume attribute and a dynamic range attribute of the background noise sample.

The system according to item 2 of the scope of patent application, wherein the audio watermark transmission configuration selection module further has an audio watermark communication protocol decision sub-module, which determines the audio watermark based on the background noise attribute and the user preference setting. Audio watermark transmission configuration parameters.

The system according to item 3 of the scope of patent application, wherein the audio watermark transmission configuration selection module further has a base byte generation sub-module, which is defined in the symbol set definition of the audio watermark transmission configuration parameters. The specified number of symbol sets and the length of the base byte generate the base byte of the specified number of records to record the base byte of the specified number of bits in the symbol and base bit specified by the audio watermark transmission configuration parameter Tuple correspondence table.

The system according to item 1 of the scope of patent application, wherein the symbol sequence generating module has an action code time axis adjustment sub-module, which is based on the definition of the packet symbol group and the sound frame length of the audio watermark transmission configuration parameter. And the estimated decoding time determine the position of the embedded information packet and generate an action code time axis adjustment script.

The system according to item 5 of the scope of patent application, wherein the symbol sequence generating module further has a text conversion symbol group submodule, which is based on the definition of the symbol set of the audio watermark transmission configuration parameter. The text of the axis adjustment script is converted into a sequence of symbol groups.

The system according to item 6 of the scope of patent application, wherein the symbol sequence generating module further has an error correction code operator module, and the error correction code operator module is specified according to the audio watermark transmission configuration parameter. The error correction mechanism performs the operation of the error correction code on the action code symbol group in the action symbol group script and the symbol group that executes the time difference between the action and the packet start point to generate an action symbol group script with error correction.

The system according to item 7 of the scope of patent application, wherein the symbol sequence generating module further has a symbol sequence combination sub-module, which transmits the action symbol group script with error correction according to the audio watermark transmission configuration parameter The symbol set definition, the packet symbol group definition, and the synchronization symbol group definition are combined into the embedded symbol sequence.

The system described in item 1 of the scope of patent application, wherein the audio watermark embedded module is embedded in the embedded symbol sequence according to the transmission configuration specified by the sound box length of the audio watermark transmission configuration parameter. The original audio becomes the audio of the embedded watermark.

The system described in item 1 of the scope of patent application, wherein the audio watermark embedded module has a sound frame cutting sub-module and a signal conversion sub-module, and the sound frame cutting sub-module is based on the original audio according to the The length of the sound frame specified by the audio watermark transmission configuration parameter is sequentially cut into audio frames, and the signal conversion sub-module performs signal conversion on the cut out audio frame to obtain audio parameters.

The system according to item 10 of the scope of patent application, wherein the audio watermark embedded module further includes an audio watermark gain value decision submodule and a symbol frame cutting submodule, and the audio watermark gain value decision submodule. The group calculates the average volume according to the audio parameter, and determines the audio watermark gain value to be embedded in the current audio frame by referring to the audio watermark gain value of the previous audio frame, and the symbol frame cutting submodule is derived from the embedded symbol sequence. Remove the inline symbol box to specify the symbol embedded in the current sound box.

The system according to item 11 of the scope of patent application, wherein the audio watermark embedded module further has a symbol frame embedded sub-module, which is based on the symbol and base bit specified by the audio watermark transmission configuration parameter. Group the correspondence table with the audio watermark gain value of the current sound frame, and hide the symbol in the embedded symbol frame with the corresponding base byte into the audio parameter to obtain the audio parameter of the embedded watermark.

The system according to item 12 of the scope of patent application, wherein the audio watermark embedded module further has an anti-signal conversion sub-module and a sound frame combination sub-module, and the anti-signal conversion sub-module converts the embedded floating The watermark audio parameters are converted by the inverse signal to convert the audio parameters back to the audio frame to become the embedded audio watermark audio frame, and the audio frame combination sub-module sequentially combines the embedded audio watermark audio frame to restore Into the embedded watermark.

The system described in item 1 of the scope of patent application, wherein the audio watermark decoding module is based on the transmission configuration specified by the audio watermark transmission configuration parameter, and randomly starts recording through the air channel and adds background noise. Embed the audio of the watermark to decode the restored symbol and time series from the embedded watermark audio samples containing the background noise.

The system according to item 1 of the scope of patent application, wherein the audio watermark decoding module has a symbol starting position searching sub-module, which is obtained from the embedded watermark audio samples containing background noise whose starting point is unknown. The decoded end position is searched for the closest future start position of the symbol according to the sound box length, symbol and base byte correspondence table specified by the audio watermark transmission configuration parameter.

The system according to item 15 of the scope of patent application, wherein the audio watermark decoding module further has a sound frame cutting sub-module, which cuts a length from the future closest symbol starting position to a length of the sound frame length The audio frame is further updated with the decoded end position to the future closest symbol start position plus the number of the frame length.

The system according to item 16 of the scope of patent application, wherein the audio watermark decoding module further has a signal conversion sub-module and a symbol decoding sub-module, and the signal conversion sub-module converts the audio frame into a signal. Converted into audio parameters, and the symbol decoding sub-module calculates the distance of the audio parameters one by one according to the symbols and the base byte of the base byte correspondence table to obtain the distance as the decoding intensity score, and the shortest distance corresponds to The symbol is a decoding symbol.

The system according to item 17 of the scope of patent application, wherein the audio watermark decoding module further has a symbol sequence combination submodule, which will decode the decoding symbols from the symbol decoding submodule and the closest to the future. The start position of the symbol is presumed that the time difference between the decoded symbol and the recording start point is concatenated into a restored symbol and a time series.

The system according to item 1 of the scope of patent application, wherein the symbol sequence interpretation module has a symbol sequence cutting sub-module, which, according to the definition of the synchronization symbol group specified by the audio watermark transmission configuration parameter, The restoration symbol is compared with the time series to synchronize the symbol group, and the script of the restoration action symbol group with error correction is decoded according to the packet symbol group definition specified by the audio watermark transmission configuration parameter.

The system according to item 19 of the scope of patent application, wherein the symbol sequence interpretation module further has an error correction sub-module, which includes the error correction action code symbol group in the script of the error correction restoration action symbol group script And the symbol group with the time difference between the execution action and the original audio start point with error correction is restored to the action code symbol group and the symbol group with the time difference between the execution action and the original audio start point, respectively.

The system according to item 20 of the scope of patent application, wherein the symbol sequence interpretation module further includes a recovery action code script generation submodule, which reads back the symbol group in the error correction recovery action symbol group script. Text, and based on it, restore action code script and phase shift information of recording and original audio start time.

The system described in item 1 of the scope of the patent application is for a transmitting device and a receiving device, and the transmitting device uses the system to embed the embedded symbol sequence into the original audio to become the audio of the embedded watermark. The audio of the embedded watermark is transmitted to the receiving device through the transmitting device, so that the receiving device decodes the restoration action code script and the recording and original audio start from the embedded watermark audio through the system. Phase shift information of time.

The system described in item 1 of the scope of patent application further includes a two-layered structure of a symbol encoding layer and an audio encoding layer to perform encoding, serialization, or error correction of the action symbol group script at the symbol encoding layer, and At the audio coding layer, the embedded symbol sequence to be transmitted is embedded or hidden in the original audio with the base bytes of the array.

A method with an audio watermark includes the following steps: selecting an audio watermark transmission configuration parameter according to a user preference setting and a background noise sample; according to the selected audio watermark transmission configuration parameter, an action code script to be embedded Convert into embedded symbol sequence; embed the embedded symbol sequence into the original audio to become embedded watermark audio; decode the embedded watermark audio to restore the symbol and time series; and restore the restored symbol and time series Interpreted as a script to restore the action code and time shift information from the start of the recording and the original audio.

The method according to item 24 of the scope of the patent application is for a transmitting device and a receiving device, and the transmitting device uses the method to embed the embedded symbol sequence into the original audio to become the embedded watermark audio, and then The audio of the embedded watermark is transmitted to the receiving device through the transmitting device, so that the receiving device decodes the restoration action code script and the recording and original audio start from the embedded watermark audio through the method. Phase shift information of time.

The method according to item 24 of the scope of patent application, further comprising providing a two-layered structure of a symbol encoding layer and an audio encoding layer to perform encoding, serialization, or error correction of the action symbol group script at the symbol encoding layer, and At the audio coding layer, the embedded symbol sequence to be transmitted is embedded or hidden in the original audio with the base bytes of the array.