TWI377559B

TWI377559B - Singing system with situation sound effect and method thereof

Info

Publication number: TWI377559B
Application number: TW97150672A
Authority: TW
Inventors: Jim W Chen; Po Ling Chang; Hilda Wang
Original assignee: Inventec Besta Co Ltd
Priority date: 2008-12-25
Filing date: 2008-12-25
Publication date: 2012-11-21
Also published as: TW201025289A

Description

九、發明說明：【發明所屬之技術領域】本發明為有關於-種歌唱系統及其方法，特別是指一 =能夠分析歌唱語音、視訊影像及歌曲曲調來載入合適的音效語音之具情境音效的歌唱系統及其方法。【先前技術】近年來’隨著半導體工業的蓬勃發展，以及國人休閒娛樂的意識逐步提升，許多難需外出才能夠進行的娱樂項目’如：卡啦0K，已經能夠在家中靠伴唱機來達成。、然而，隨著時間的演進，使用者已經不能滿足僅單純提供歌曲=音的伴唱機’因此，如何為伴唱機增加功能已經成為各家廠商亟欲解決的問題。一般而言’卡拉0K之所以具有娛樂性，除了能夠藉由曰歌宣構料，能夠和其他聽眾進行絲亦非常重要’然而’獨卜人絲唱歌時，並不會有其他聽眾給予合適的反饋，如：喝采、鼓掌.....·等，因此，將產生娛樂性不足的問題。有鑑於此，便有廠商提出透過網路的方式來實現網路歌唱讀與其簡時在網路中的聽眾進行i 動’然而，並非每個地方都能夠輕易上網，也並非在任時候都有解願意聽自己唱歌，因此，上述鱗歌唱的方式仍然不足以解決娛樂性不足的問題。综上所述，可知先祕射長_來_直存在歌唱娱樂性不足之問題，因此實有必要提出改進的技術手段來解決此一問題β 【發明内容】境音她爾-種具情 -斜ί發明所，露之具情境音效的歌唱系統，包含：歌曲二心、歌唱彳磁、語音分析模組、影像辨識模組、曲調二理模組及音效模組。其中，歌曲f _以音效語音，針各音效語音分騎應一個，歌以接收選擇條件，並根據選擇條件 =歌曲語音之―；語音分減朗以接錄唱語音，並 =語音料分析歌唱語音後產輯緒參數；影像辨識模触航影像’並根據臉部演算辨識視訊影像後產生表情參數；曲調分龍_啸據麟演算分析播放中的歌曲語音，並根據分析結果產生曲調參數·處理模組用以根據載人條件載人情緒參數、表情參數及曲調參數至少其中=-以計算門檻值；音效模組用以將門檻值與門植區門進行比對’並根據比對結果載人對應門檀區間之音效語音進行播放。 ° 至於本發明之具情境音效陳唱方法，其步驟包括：提供歌曲时及音效語音’其巾各音效語音分麟應一個門檀區間；接收選擇條件’並根據選擇條件播放歌曲語音之一；接㈣唱語音，並根據語音分析歌唱語音後產生情緒參數；揭取視訊影像’並根據臉部演算辨識該視訊 1377559 影像後產生表情參數；根據頻譜演算分析播放中的歌曲語音，並根據分析結果產生曲調參數；根據载入條件载入情緒參數、表情參數及曲調參數至少其中之一以計算門檀值；將門檻值與門檻區間進行比對，並根據比對結果载入對應門檻區間之音效語音進行播放。本發明所揭露之系統與方法如上，與先前技術之間的差異在於本發明透過分析歌唱語音、視訊影像及歌曲曲調來計算門檻值，並且將此門檻值與音效語音所對應的門檻區間進行比對’以便根據比對結果載入並播放合適的音效語音。透過上述的技術手段，本發明可以達到提高歌唱娛樂性之技術功效。【實施方式】以下將配合圖式及實施例來詳細說明本發明之實施方式，藉輯本發明如何顧技術手段來解決技術問題並達成技術功效的實現過程能充分理解並據以實施。在說明本發明所揭露的具情境音效的歌唱系統及其方法之前，先縣發日_朗環境進行制。本發明可應用於連接魏音裝置、攝職置及揚縣置的電子設備中，其中，魏子賴齡有歌曲語音及音效語音並且 ”有數位峨處理單①仰獅Si_ _，D仰來處理 >曰裝置及攝4置所取得的聲音及視訊影像，在實際實施上’此触訊贼理單元可透過賴及硬.少其中之 TS] 7 /只現，在本發明中，數位訊號處理單元是由語音分析核組、影_聰組、曲調分析模組及處理模組所組成，其處理流雜麵後配合@式作詳細說明。接下來在5兒日月本發明的實施例之前先配合圖式對本發明具情境音效的歌唱系統及其方法作進一步的說 =，請參閱「第！圖」’「第！圖」為本發明具情境音效的歌唱系統之方塊圖，包含：歌曲資料庫UU、歌唱模組 1〇2、語音分析模組103、影像辨識模組104、曲調分析模 =〇5、處理模組1〇6及音效模组1〇7，且更具有收音裝及攝影裝置⑴等硬體設備，所述收音裝置ιι〇為 ^歌唱過程中用以即時接收聲音(例如：使用者的歌唱語 ^硬體’如··麥克風，而攝影裝置⑴則為在歌曰過权中用以即時掏取視訊影像的硬體設備，如：攝影機。承士所述，其中歌曲資料庫1〇1用以儲存歌曲語音及吾曰，其中各音效語音分別對應一個門檀區間，所述歌^吾音及音效語音皆為檔案形式（例如：副檔名為二二ντ等f放的多媒體槽案’其歌曲語音用以提 : 進仃伴％，而音效語音㈣聽眾的反饋聲音， 2喝采聲音、鼓掌聲音等音效。特別要說明的是，㈡語音T應-個門檀區間，舉例來說，假設音效 °。曰為a.mp3，且其所對應的門禮區間為“M00”， $際實施上，可透過-個儲存於歌曲資料庫ι〇ι的對庳表來_音效語音及其門檻區間的對應關係，亦或是直接，以門㈣為纽語音的難名稱，如：“丨〜臟吻用以貝現音效語音及其門健間的對應_。上述對應方式僅為酬之用，本發明絲以上述方式限定音效語音及其門檻區間的對應方式。。歌0曰担組102用以接收選擇條件，並根據選擇條件播放歌曲語音之—，職選鑛件可透過紐魏鍵或點選的方式進行輸入’舉例來說’假設歌曲資料庫1〇1中的歌曲扣曰均各自對應—個數值作為曲號，當透過鍵盤輸入歌曲語音的編號(即曲號)後，歌唱模組1〇2將根據所輸入的曲破作為賴條件，用以自歌曲資料庫m載人並播放所選擇的歌曲語音，以便提供使用者進行伴唱。 —語音分析模組103用以接收歌唱語音，並根據語音演算刀析歌f 5#音後產生情緒參數，*且該語音分析模組 103更可包含於接收歌唱語音後，根據過渡參數過遽歌唱語音的背景聲音，所述歌唱語音是透過收音裝置11〇(例如.麥克風)進行接收，而語音演算則是對所接收的歌唱語音進行錄的演算_’並且在讀魏生相應的情緒參數，舉例來說，假設歌唱語音為高昂短促的聲音，語音分析模組103將根據語音演算來分析此歌唱語音，並且】生相應於此歌唱s吾音的情緒參數，如：數值“8〇” .若歌唱語音為悲怨悠長的聲音，同樣根據語音演算來分析此歌唱語音以產生相應的情緒參數，如：數值“2〇”，以此例而吕，高昂短促的聲音經分析後所得到的情緒參數之數值，將大於悲怨悠長的聲音，如此一來，便可藉由情绪灸數的數值大小來得知歌唱語音鱗音_(即高昂短促或悲怨悠長），用以推斷使用者歌唱的情緒(例如：高昂短促代表情緒為「高興」；悲怨悠長代表情緒為「悲怨」換而言之，語音分析模组Η)3根據語音演算將歌唱語音量化為情緒參數的參數值於語音演算為f知技術不多作贅述。匕影像辨識模組104用以擷取視訊影像，並根據臉部演算辨識視tfl縣後產生表情參數，所述視訊為透過攝影裝置111進行接收，而臉部演算則包含人臉偵測、特徵擷取及表❹?識三侧段，且可職出的表情至少包含高興、傷心、生氣、驚評、厭惡及害怕等六種基本表情，由赠部演算為習知技術’故在此便衫㈣述，在實際實 %上’可將六縣本表情定義為不同的數值赠為表情參數的參數值，舉例來說，假設將高興、傷心、、生氣、驚言牙、 ^惡及害怕等六種基本表情分別定義為數值“1”、數值、數值“3”、數值“4”、數值“5”及數值“6”，田衫像辨識模組1〇4根據臉部演算將視訊影像辨識為「高 /、」後，其產生的表情參數之參數值為數值“丨，’並以此類推。。曲調分析模組105用以根據頻譜演算分析播放中的歌曲5吾音’並根據分析結果產生曲調參數，其頻譜演算至上包含音量、音高及音色其中之一的計算，也就是說根據頻譜演算來分_放中的歌曲語音之音量音高及音色至少其中之-’用崎識其歌曲語音的祕來產生曲調參數舉例來5兑，假設歌曲語音的聲音強度(即音量)為低、聲曰頻率(即音南)為低’以及音色為柔和其曲調分析模組105根據頻譜演算對此歌曲語音進行分析後，將產生相應於此歌曲語音的曲調參數，如：數值“ 10” ；若歌曲語 s的聲a強度為⑤、聲音鮮為高，以及音色為尖銳則，同樣產生相應於此歌曲語音的曲調參數，如：數值“如以此例而^，輕柔的歌曲語音其所對應的曲調參數之參數值’將小於慷慨激昂的歌曲語音所對應的曲調參數之 f數值。上述舉霞為方便解說之用，本發賴未以此限疋曲調參數的計算方式。除此之外，由於頻譜演算在音頻分析的領域中亦為習知技術，故在此不多作資述。處理模組106用以根據載入條件載入情緒參數、表情參數及曲調參數至少其中之一以計算門榧值，所述載入條件可為預設的參數值’例如：載人條件絲值“丨”代表僅載入情緒參數；載场件為數值“2”代表僅載入表情 >數，载人條件為數值‘3”代表載人情緒參數及表情參數’載入條件為數值“4”代表僅載人曲調參數......並以此類推直至載人條件為數值“7”代表載人情緒參數、表清參數及曲調參數’而且此載入條件可透過使用者以按壓功能鍵或透過游標點選的方式進行設定。另外，門檻值可為數字及文字，舉例來說，假設情緒參數為數值“2〇”、 1377559 表情參數為數值“Γ及曲調參條件為盔枯“7” / 為數值10 ，且载入數)(即載入情緒參數、表情參數及曲調參數）其處理核組106根據载入條全數值進行計算(例如··經由 ^二個，數之參瞀谧楫料經由傅立葉轉換運忙)後件到«值，如：數值“ 88” 前述情緒參數、表情錢及曲^^摘要4明的疋， B 4㈥參數可以持續或間隔的方IX. Description of the Invention: [Technical Field] The present invention relates to a singing system and a method thereof, and particularly to a situation in which a voice, a video image, and a song tune can be analyzed to load a suitable sound effect speech. Sound singing system and its method. [Prior Art] In recent years, with the rapid development of the semiconductor industry and the gradual improvement of the awareness of leisure and entertainment for the Chinese people, many entertainment projects that are difficult to go out can be achieved, such as: karaoke 0K, which has been able to achieve at home with a phonograph. . However, with the evolution of time, users have been unable to satisfy the phonograph that only provides songs = sounds. Therefore, how to add functions to the phonograph has become a problem that various manufacturers are eager to solve. Generally speaking, 'Karl 0K is entertaining, except that it can be played with other audiences through the singer's material. 'However, when the singer is singing, there is no other audience to give appropriate. Feedback, such as: applause, applause, etc., therefore, will create problems of lack of entertainment. In view of this, some manufacturers have proposed to use the Internet to implement online singing and listening to the listeners on the Internet. However, not every place can easily access the Internet, and it is not always available. I am willing to listen to my own singing. Therefore, the above-mentioned scale singing method is still not enough to solve the problem of lack of entertainment. In summary, it can be seen that the first secret shot _ _ _ there is a problem of lack of entertainment entertainment, so it is necessary to propose improved technical means to solve this problem β [Summary] 境音赫尔- The oblique singing system, the singing system with contextual sound effects, includes: song two hearts, singing magnetic, speech analysis module, image recognition module, tune two-module module and sound effect module. Among them, the song f _ is sound effect, the sound of each sound effect is divided into one, the song is to receive the selection condition, and according to the selection condition = the voice of the song; the voice is divided to sing the voice, and the voice material is analyzed and sung. The voice post-production sequence parameter; the image recognition mode touches the image' and generates the expression parameter after recognizing the video image according to the face calculation; the tune-distant _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The processing module is configured to calculate a threshold value according to at least one of the manned emotional parameter, the expression parameter and the tune parameter according to the manned condition; the sound effect module is configured to compare the threshold value with the door planting door and according to the comparison result The manned corresponds to the sound effect voice of the door-to-door section. ° As for the contextual sound-sounding method of the present invention, the steps include: providing a song and a sound effect 'the sound of each of the voices of the towel should be a door-to-door interval; receiving the selection condition' and playing one of the song voices according to the selection condition; Connect (4) to sing the voice, and analyze the singing voice according to the voice to generate the emotional parameters; uncover the video image 'and identify the video 1377559 image according to the facial calculus to generate expression parameters; analyze the song voice in the play according to the spectrum calculation, and according to the analysis result Generating a tune parameter; loading at least one of the emotion parameter, the expression parameter, and the tune parameter according to the loading condition to calculate the threshold value; comparing the threshold value with the threshold interval, and loading the sound effect corresponding to the threshold interval according to the comparison result Voice playback. The system and method disclosed by the present invention are as above, and the difference from the prior art is that the present invention calculates the threshold by analyzing the singing voice, the video image and the song tune, and compares the threshold with the threshold interval corresponding to the sound effect. For 'to load and play the appropriate sound effects based on the comparison results. Through the above technical means, the present invention can achieve the technical effect of improving the entertainment of singing. [Embodiment] Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings and embodiments, and the implementation of the present invention in the technical means to solve the technical problems and to achieve the technical effects can be fully understood and implemented. Before explaining the singing system and the method of the contextual sound effect disclosed in the present invention, the prefecture is issued to the environment. The invention can be applied to the electronic equipment connecting the Weiyin device, the photo placement and the Yangxian. Among them, Wei Zi Lai Ling has song voice and sound effect voice and "there is a number of 峨 processing single 1 Yang Shi Si_ _, D come back Handling & 曰曰及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及及声音声音声音声音声音声音声音声音声音声音声音The signal processing unit is composed of a voice analysis core group, a shadow group, a melody analysis module and a processing module, and the processing flow is mixed with the @式 for detailed description. Next, the implementation of the invention is carried out in 5 days. In the example, the singing system and the method for the contextual sound effect of the present invention are further described in conjunction with the drawings. Please refer to "the! map" and "the! map" as a block diagram of the singing system with contextual sound effects of the present invention, including : song database UU, singing module 1, 2, speech analysis module 103, image recognition module 104, tune analysis module = 〇 5, processing module 1 〇 6 and sound effect module 1 〇 7, and more radio Hardware equipment such as mounting and photographing device (1) The device ιι〇 is used to instantly receive sounds during the singing process (for example: the user's vocal ^ hardware] such as a microphone, and the photographic device (1) is used for capturing video images in the singer. Hardware equipment, such as: camera. According to the award, the song database 1〇1 is used to store the song voice and the 曰, where each sound effect corresponds to a gate interval, the song and the sound and voice For the file format (for example, the sub-file name is 2nd ντ, etc.), the song voice is used to mention: enter the partner, and the sound effect (four) the feedback sound of the audience, 2 the sound of applause, applause, etc. In particular, (2) the voice T should be a gate interval, for example, assuming that the sound effect is °. 曰 is a.mp3, and its corresponding gate interval is "M00", and the implementation of the $ can be Through a pair of 储存的的 _ _ _ _ _ _ _ _ 音音音音音音音音音音音音音音音音音音音 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The corresponding sound effect of the sound and the correspondence between the door and the door The above corresponding manner is only for the purpose of compensation, and the present invention limits the corresponding manner of the sound effect speech and its threshold interval in the above manner. The song 0曰组 group 102 is used to receive the selection condition and play the song voice according to the selection condition. The job selection parts can be input through the New Weiwei button or click on the method. For example, suppose the songs in the song database 1〇1 are each corresponding to a value as a track number. When the song is input through the keyboard. After the number (ie, the track number), the singing module 1〇2 will be used as a dependency condition according to the input song break, and can be used to carry the selected song voice from the song database m to provide the user with the accompaniment. The speech analysis module 103 is configured to receive the singing voice, and generate an emotional parameter according to the speech calculus, and the speech analysis module 103 can further include the speech parameter, and after the receiving the speech voice, the transition parameter is used. Singing the background sound of the voice, the singing voice is received through the radio device 11 (for example, a microphone), and the speech calculus is a calculation for recording the received singing voice _' And reading the corresponding emotional parameters of Wei Sheng, for example, assuming that the singing voice is a high and short sound, the speech analysis module 103 will analyze the singing voice according to the speech calculus, and the emotion parameters corresponding to the singing sings For example, the value "8〇". If the singing voice is a long and sorrowful voice, the singing voice is also analyzed according to the phonetic calculus to generate corresponding emotional parameters, such as: the value "2〇", by way of example, Lu, high The value of the sentiment parameter obtained by the analysis of the short sound will be greater than the long and sorrowful sound. In this way, the vocal voice scale can be known by the numerical value of the emotional moxibustion number (ie, high shortness or sadness). Long), used to infer the emotions of the user's singing (for example: high and short to express emotions as "happy"; long and sad to represent emotions as "sorrowful", in other words, speech analysis module Η) 3 will sing according to speech calculus The speech quantization is the parameter value of the emotional parameter, and the speech calculus is not described in detail. The image recognition module 104 is configured to capture the video image, and generate an expression parameter according to the face calculation, and the video is received by the photographing device 111, and the face calculation includes the face detection and the feature. Learn and express the three sides, and the expressions of the job include at least six basic expressions such as happiness, sadness, anger, shock, disgust and fear. In the shirt (4), in the actual real%, the expressions of the six counties can be defined as different values for the parameter values of the expression parameters. For example, suppose that you will be happy, sad, angry, shocked, evil and afraid. The six basic expressions are defined as the value "1", the value, the value "3", the value "4", the value "5" and the value "6". The T-shirt identification module 1〇4 will video according to the face calculation. After the image is recognized as "high/,", the parameter value of the expression parameter generated is the value "丨," and so on. The tune analysis module 105 is used to analyze the song 5 in the playing according to the spectrum calculation. Produce melody based on the analysis results The spectrum calculation includes the calculation of one of the volume, the pitch and the timbre, that is to say, according to the spectrum calculation, the volume pitch and the timbre of the song voice are at least one of them - 'using the voice of the song The secret is to generate a tune parameter example to 5, assuming that the sound intensity (ie, volume) of the song voice is low, the sonar frequency (ie, sound south) is low, and the tone color is soft. The tune analysis module 105 calculates the song according to the spectrum calculation. After the speech is analyzed, a tune parameter corresponding to the speech of the song is generated, for example, the value “10”; if the intensity a of the song s is 5, the sound is too high, and the tone is sharp, the same is generated. The tune parameters of the song voice, such as: the value "such as this example ^, the soft song voice corresponding to the parameter value of the tune parameter" will be smaller than the f value of the tune parameter corresponding to the impassioned song voice. The above-mentioned lifting is convenient for explanation, and this issue is not limited to the calculation method of the tune parameter. In addition, since spectrum calculus is also a well-known technique in the field of audio analysis, it is not mentioned here. The processing module 106 is configured to load at least one of an emotional parameter, an expression parameter, and a tune parameter according to a loading condition to calculate a threshold value, where the loading condition may be a preset parameter value, for example: a manned conditional wire value “丨” means only the emotional parameters are loaded; the field member has the value “2” for the expression only the expression>, the manned condition for the value '3' for the manned emotional parameter and the expression parameter 'loading condition for the value' 4" represents the manned tune parameter only... and so on until the manned condition is the value "7" represents the manned emotional parameter, the clearing parameter and the tune parameter 'and this loading condition can be passed through the user Press the function key or select it by cursor selection. In addition, the threshold value can be numeric and text. For example, suppose the emotional parameter is the value “2〇”, 1377559, and the expression parameter is the value “Γ and the tune parameter is the helmet. The dead "7" / is the value 10, and the number of loads) (that is, the emotional parameter, the expression parameter, and the tune parameter are loaded). The processing core group 106 is calculated based on the full value of the loading bar (for example, via ^2, number) Participation Member to the «value via a Fourier transform operation is busy), such as: the value" 88 "the emotion parameter, and the curved face money ^^ summary Ming Cloth 4, B 4㈥ parameters may be continuously or spaced side

㈣來^的產生方式為例，情緒參數、表情參數及調肩可根據預設的時間間隔(例如:五秒)來間隔產 t特別要說明的是，本發明並未限定載入條件及值，數值’也狀說’所計算出的載人條件及門檻值除了為數值之外，亦可以文字，如：“A”、“B”……等作為代表。(4) Taking the generation method of ^ as an example, the emotional parameter, the expression parameter, and the adjustment shoulder can be separated according to a preset time interval (for example, five seconds). Specifically, the present invention does not limit the loading condition and value. In addition to the numerical value, the calculated manning conditions and threshold values can be represented by words such as "A", "B", etc. as the representative.

音效模組107肖以將門檻值與門檻區間進行比對，並根據比對、纟t果載人對應門;^區間之音效語音進行播放，舉例來說，假設門檻值為數值“88”，其音效模組1〇7將根據此門檻值與各音效語音所對應的門檻區間進行比對，若經比對後得知門檻值在門檻區間的範圍内，例如：音效語音“a.mp3”所對應的門檻區間為“丨〜丨㈨”，而門檻值 88位於門捏區間1〜1〇〇内，因此，載入對應門播區間之音效語音“a.mp3”進行播放。如「第2圖」所示，「第2圖」為本發明具情境音效的歌唱方法之流程圖，包含下列步驟：提供歌曲語音及音效語音’其中各音效語音分別對應一個門植區間（步驟 12 1377559 • 2G1);接收選擇條件’絲據麵條件麟歌曲語音之一 • (步驟2G2)，接錄唱語音，並根據語音演算分析歌唱語音後產生情緒參數(步驟2〇3);擷取視訊影像，並根據臉冑演异辨I域tfLf彡像後產生表情參糾步驟2()4);根據麵演算分浦財的歌岭音，並減分析結果產生曲調錄(轉2〇5);根龍人條件載人情緒錄、表情參數及曲調參數至少射之—料朗檻值(轉2G0);將健服11騎行比對，並鐵比縣雜入對應門播區間之音效語音進行播放(步驟2G7卜透過上述步驟，即可透過分析歌唱語音、視訊影像及歌曲曲調來計算門植值’並且將此Η檻值與音效語音所對應的門檻區間進行比對’以便根據輯結果載人並播放合適的音效語音，用以提面歌唱娱樂性。以下配合「第3圖」至「第5圖」以實施例的方式進行如下說明，如「第3圖」所示意，「第3圖」為應用本 • ㈣轉賴語音進储放之示：til。當個者要進行歌唱時，可先透過歌曲選擇介面3〇〇中的曲目顯示區塊3〇2 查詢歌曲資料庫101所提供的歌曲語音，並透過曲號輸入區塊301輸入认歌唱的歌曲語音之曲號後，點選開始元件 . 地確定進行歌唱’此時’歌唱模組102接收所輸入的曲 ' $作騎祕件’並且娜此選擇餅銳對應的歌曲語曰:另外’使用者亦可透過點選重唱元件3〇4使播放的歌曲語音重新播放，以達到重唱的目的。The sound effect module 107 compares the threshold value with the threshold interval, and plays the sound effect voice according to the comparison, 纟t fruit carrying corresponding door; ^ interval, for example, assuming the threshold value is "88", The sound effect module 1〇7 will compare the threshold value with the threshold interval corresponding to each sound effect voice, and if the threshold value is found to be within the threshold range, for example, the sound effect voice “a.mp3” The corresponding threshold interval is "丨~丨(9)", and the threshold value 88 is located in the door pinch interval 1~1〇〇, so the sound effect voice "a.mp3" corresponding to the gate interval is loaded for playback. As shown in "Fig. 2", "Fig. 2" is a flow chart of a singing method with contextual sound effects of the present invention, which includes the following steps: providing a song voice and a sound effect voice, wherein each sound effect voice corresponds to a gate interval (step 12 1377559 • 2G1); Receive selection condition 'one of the silk condition speech of the silk surface condition• (step 2G2), record the vocal speech, and analyze the singing voice according to the speech calculus to generate emotional parameters (step 2〇3); Video image, and according to the face 胄辨 I I I t I I I 产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生产生); root dragon conditions, manned emotional records, expression parameters and tune parameters at least shot - material reading value (transfer 2G0); will be compared to the health service 11 riding, and iron than the county mixed into the corresponding sound range voice Play (Step 2G7 Bu through the above steps, you can calculate the physiology value by analyzing the singing voice, video image and song tunes] and compare this threshold with the threshold interval corresponding to the sound effect voice. Manned and played The appropriate sound effect is used to enhance the entertainment of the face. The following is a description of the following examples in conjunction with "3" to "5", as shown in "3", "3" Application Note • (4) Recalling the voice into the storage: til. When you want to sing, you can first query the song provided by the song database 101 through the track display block 3〇2 in the song selection interface. After the voice is input through the track number input block 301, the song number of the song is recognized, and the start component is clicked. The song is determined to be 'singing' and the singing module 102 receives the input song '$ for the secret'. And then choose the song language corresponding to the pie sharp: In addition, the user can also replay the played song by clicking the re-singing component 3〇4 to achieve the purpose of re-singing.

T 13 明進1 閱「第4圖」，「第4圖」為應用本發 ==之，。當使时選_語音後，將從歌曲^^面300切換至歌唱介面_，並且於塊·播放歌曲語音，此時，使用者可以透過收音= 進行歌唱，其收音裝置~:用者的^傳遞至語音分析顯⑼，接著，該語音分析模音裝置110接收此聲音作為歌唱語音，並且根來=音2來對歌唱語音進行分析以產生情緒參數，舉例 ^自制者歌唱的聲音(也就是所歌唱語音)為悲二長時’語音分析模組1〇3根據語音演算來分析此歌唱 -曰‘並，將此歌唱語音進行量化以產生情緒參數’如：數值“20”，以此例而言，情緒參數之參數值越低，代表使用者的情緒越為悲怨。另外’除了上述收音裝置11〇接收使用者的歌唱聲音之卜更具有攝景，裝置出拍攝使用者的即時景嫌，並且將此即時f彡像傳遞絲像辨職組⑽，㈣，影像辨識模組104擷取攝影裝置⑴所拍攝的即時影像作為視訊影像’並且轉臉部鮮來順觀f彡像的錢後產生表情參數’在實際實施上，可預先設置使时的各種表情特 =如：高興、細、线、料、厭惡及害怕等，用以提商臉部演算職表情的精確度。除此之外，視訊影像亦可顯示於視訊顯示區塊411中，方便讓使用者得知自己本身的表情’而且此視觸示區塊扣可透過點選視訊元件 1377559 12進行隱藏或顯示。承上親，在語音分频組進行分析及辨識的同時，曲贺八心〜像辨識柄組104 算來分_財的料雜據頻譜演假設歌曲語音為抒情歌生曲調參數，舉例來說， ^ urn 調77析杈組！〇5根據頻譜演如語音的旋律(例色)來產生曲調參數’如：數值“ 1〇,，，以此例而祕Γ曲語音其所軸的_參數之參數值，將小 ;慷慨激昂的歌曲語音所對應的曲調參數之參數值。 =下來’處理模組1〇6根據預設的載入條件，如數 ” 載入上述所產生的情緒參數“20”、表情參數“】及曲調參數“1〇”來計算門檀值，如··數值“88”，呈 2算的方式可透過四則運算或經由傅立葉轉換來實現。^ 際實施上，使用者亦可透過按墨功能鍵，如：按驗盤，鍵“s”來開啟載入條件的設定視窗(圖中未示），用以設疋載入條件，舉例來說’當遇到使用者在歌唱時習慣面無表情的情況T ’可設絲人條件為雜‘V (即載入情緒參數及曲調參數），使處理模組1〇6根據所設定的載入條件僅載入情緒參數“20”及曲調參數“ 1〇”來計算門檻值，因為在面無表情的情況下，其表情參數的參考價值較低，故不適合作為計算門檻值的參數。當門檻值計算完成後’音效模組107將此門檻值與歌曲資料庫101中齡音效語音之門顧間進行比對，並且根2對結果自歌曲資料庫⑼載人職門檻區間之音效語音進行播放，舉例來說，假設門檻值為數值“88”，右音效語音“a.mP3，，所對應的Π檻區間為“1〜100”，則代表門播值位於其門檻區間中，故比對結果為符合反之’右門檻值不在數值“!，，至數值“ 1〇〇”的範圍中，則比對結果為不符合，由於此例的比對結果符合，因此，自歌曲貝料庫101載入所對應的音效語音“amp3”，並且透過揚聲裝置，如：喇叭，進行播放。另外’當制者不欲繼魏唱時，亦可透過點選返回元件413停止播放歌曲語音，並且由當前的歌唱介面4〇〇返回歌曲選擇介面。除此之外，亦可將前述所判斷的情緒參數、讀參數及_參數，蚊字的方式顯示於情境顯示區機414,舉例來說，假設情緒參數為數值“2〇”、 f情參數為數值T及曲調參數為“10”則分別以文字心心」傷〜」及「柔和」來進行顯示。特別要說明的是’上例三個參數的數值與所代表社字僅作為說明之用，本發明並未限定這三個參數的表現形式。另外’如「第5圖」所示意，「第5圖」為應用本發明設定Η檻區間之示細。前面提到，各音效語音均對應 -個門檻區間’在實際實施上，其對應方式可透過_表來達成’舉例來說，以一個對照表記錄門檻區間及音效語音的對應職’而且此門檻區間可透過設定介面500進行设定，其设定方式可透過門檻區間設定區塊51〇設定門檻區間的數值範圍，如：數值為“101〜200” ，以及所對應的音效語音之檔案名稱“bwav” ，並且在設定完成後點選確定元件520儲存設定，亦或是點選取消元件530取消所作的設定。特別要說明的是，在輸入音效語音之檔案名稱“b.wav”時’更可在此檔案名稱前輸入檔案的路徑。综上所述’可知本發明與先前技術之間的差異在於透過分析歌唱語音、視訊影像及歌曲曲調來計算門檻值，並且將此門檻值與音效語音所對應的門檻區間進行比對，以便根據比對結果載入並播放合適的音效語音，藉由此一技術手段可以在不同的情境下播放合適的音效語音，來解決先剷技術所存在的問題，進而達成提高歌唱娛樂性之技術功效。雖然本發明以前述之實施例揭露如上，然其並非用以限定本發明，任何«相像技藝者，在不脫離本發明之精神和视圍内，當可作些許之更動與潤飾，因此本發明之專利保護範圍須視本·之申請專職_界定者為準。【圖式簡單說明】第1圖為本發明具情境音朗歌唱系狀方塊圖。第2圖為本發明具情境音效的歌於法之流程圖。第3圖為_本發縣擇歌曲語音進行賊之示意圖0 第4圖為應用本發明進行歌唱之示意圖。 17 1377559 第5圖為應用本發明設定門檻區間之示意圖。【主要元件符號說明】 101歌曲資料庫 102歌唱模組 103語音分析模組 104影像辨識模組 105曲調分析模組 106處理模組 107音效模組 110收音裝置 111攝影裝置 300歌曲選擇介面 301曲號輸入區塊 302曲目顯示區塊 303開始元件 304重唱元件 400歌唱介面 410影音顯示區塊 411視訊顯示區塊 412視訊元件 413返回元件 414情境顯示區塊 500設定介面 510門檻區間設定區塊 520確定元件 530取消元件步踢201提供至少一歌曲語音及至少一音效語音，其中各該音聽音分卿應-門捏區間曰步驟202接收-選擇條件，並根據該選擇條件播放該些歌曲語音之一步驟203接收-歌唱語音，錄據—語音演算分析該歌唱語音後產生一，膏緒參數步驟204擷取-視訊影像’並根據一臉部演算辨識 5亥視訊影像後產生一表情參數步驟205根據一頻譜演算分析播放中的該歌曲語音’並根據分析結果產生一曲調參數步驟206根據一载入條件載入該情緒參數、該表情參數及該曲調參數至少其中之一以計算一門檻值步驟207將該門檻值與該門檻區間進行比對，並根據比對結果載入對應該門播區間之該音效語音進行播放T 13 明进1 Read "4th picture", "4th picture" is the application of this issue ==. When the _ voice is selected, the song ^^ face 300 is switched to the singing interface _, and the song is played in the block. At this time, the user can sing through the radio = the radio device~: the user's ^ Passed to the speech analysis display (9), then, the speech analysis mode device 110 receives the sound as a singing voice, and root = sound 2 to analyze the singing voice to generate emotional parameters, for example, the voice of the self-producer singing (that is, The singing voice) is the second long time of the sorrow. The speech analysis module 1〇3 analyzes the singing-曰' according to the speech calculus, and quantizes the singing speech to generate an emotional parameter such as: the value “20”, as an example. In other words, the lower the parameter value of the emotional parameter, the more sorrowful the emotion representing the user. In addition, in addition to the above-mentioned sound receiving device 11 〇 receiving the user's singing voice, there is more shooting, the device takes the user's instant scene, and the instant image is transmitted to the silk image recognition group (10), (4), image recognition The module 104 captures the real-time image captured by the photographing device (1) as the video image and rotates the face to produce the expression parameter after the money of the f-image. In actual implementation, various expressions of the time can be set in advance. Such as: happy, thin, line, material, disgust and fear, to promote the accuracy of facial expressions. In addition, the video image can also be displayed in the video display block 411, so that the user can know the expression of his own body, and the touch-sensitive block button can be hidden or displayed by clicking the video component 1377559. In the same time, in the voice crossover group for analysis and identification, Qu He Ba Xin ~ like the identification of the handle group 104 to calculate the financial data of the hypothesis, the sound of the song is a lyric song tune parameter, for example , ^ urn tune 77 analysis group! 〇5 according to the melody of the spectrum (such as color) to generate the tune parameter 'such as: the value "1〇,,, in this case, the value of the parameter of the _ parameter of the axis of the voice is small; impassioned The parameter value of the tune parameter corresponding to the song voice. = Down 'Processing module 1〇6 loads the above generated emotional parameter "20", expression parameter "] and tune parameter according to the preset loading condition, such as " 1〇” to calculate the gate value, such as the value “88”, the method of 2 calculations can be realized by four arithmetic operations or by Fourier transform. In the implementation, the user can also press the ink function key, such as: Press the check, the key "s" to open the loading condition setting window (not shown), to set the loading condition, for example, 'when the user is used to singing without expression" 'The configurable silk condition is miscellaneous 'V (ie loading emotional parameters and tune parameters), so that the processing module 1〇6 only loads the emotional parameter “20” and the tune parameter “1〇” according to the set loading condition. To calculate the threshold, because in the absence of expression, The reference value of the emotional parameter is lower, so it is not suitable as a parameter for calculating the threshold value. After the threshold value calculation is completed, the sound effect module 107 compares the threshold value with the gate of the middle-aged sound effect of the song database 101, and the root 2 The result is played from the song database (9). The sound effect of the player's threshold is played. For example, the threshold value is “88”, the right sound is “a.mP3, and the corresponding interval is “1”. ～100”, it means that the homing value is in its threshold interval, so the comparison result is in the opposite case. If the right threshold is not in the range “!,, to the value “1〇〇”, the comparison result is not met. Since the comparison result of this example is consistent, the corresponding sound effect sound "amp3" is loaded from the song library 101, and is played through a speaker device such as a speaker. When Wei sings, the song voice can also be stopped by clicking the return component 413, and the song selection interface is returned from the current singing interface. In addition, the aforementioned emotional parameters can be The parameters and _ parameters, the way of mosquitoes are displayed in the context display area machine 414. For example, if the emotional parameters are the value "2", the f-condition parameter is the value T, and the tune parameter is "10", the words are in the heart of the text. Injury ~" and "soft" are displayed. In particular, the values of the three parameters in the above example and the social words represented are for illustrative purposes only, and the present invention does not limit the expression of these three parameters. In addition, as shown in the "figure 5", "figure 5" is a description of the interval set by the application of the present invention. As mentioned above, each sound effect corresponds to a threshold interval. In actual implementation, the corresponding way can be achieved through the _ table. For example, the threshold interval and the corresponding voice of the sound effect are recorded in a comparison table and this threshold is The interval can be set through the setting interface 500, and the setting mode can be set through the threshold interval setting block 51 to set the numerical range of the threshold interval, for example, the value is "101~200", and the corresponding sound effect voice file name " Bwav", and after the setting is completed, click the determining component 520 to store the setting, or click the cancel component 530 to cancel the setting. In particular, when entering the file name of the sound effect voice “b.wav”, you can enter the path of the file before the file name. In summary, the difference between the present invention and the prior art is that the threshold value is calculated by analyzing the singing voice, the video image, and the song tune, and the threshold value is compared with the threshold interval corresponding to the sound effect speech, so as to The comparison result loads and plays the appropriate sound effect speech, and the technical sound can be played in different situations to solve the problems existing in the first shovel technology, thereby achieving the technical effect of improving the entertainment of singing. While the present invention has been described above in the foregoing embodiments, it is not intended to limit the invention, and the present invention may be modified and retouched without departing from the spirit and scope of the invention. The scope of patent protection shall be subject to the definition of this application. [Simple description of the drawing] Fig. 1 is a block diagram of a situational sing-song system of the present invention. Figure 2 is a flow chart of the song with contextual sound effects of the present invention. Figure 3 is a schematic diagram of the thief's choice of songs in Benfa County. Figure 0 Figure 4 is a schematic diagram of singing using the present invention. 17 1377559 Figure 5 is a schematic diagram of setting the threshold interval using the present invention. [Main component symbol description] 101 song database 102 singing module 103 voice analysis module 104 image recognition module 105 tune analysis module 106 processing module 107 sound module 110 radio device 111 photography device 300 song selection interface 301 track number Input block 302 track display block 303 start component 304 re-singing component 400 singing interface 410 video display block 411 video display block 412 video component 413 return component 414 context display block 500 setting interface 510 threshold interval setting block 520 determining component 530 cancel component step kick 201 provides at least one song voice and at least one sound effect voice, wherein each of the voice listener sounds should be - door pinch interval 曰 step 202 receives - select condition, and one of the song voices is played according to the selection condition Step 203: receiving - singing voice, recording - voice calculus analysis of the singing voice to generate a, paste parameter step 204 capture - video image 'and according to a facial algorithm to identify 5 Hai video images after generating an expression parameter step 205 according to A spectrum calculus analyzes the song's voice in play' and generates a tune parameter step based on the analysis result Step 206 loads at least one of the emotion parameter, the expression parameter and the tune parameter according to a loading condition to calculate a threshold value step 207 to compare the threshold value with the threshold interval, and load according to the comparison result. Play the sound of the sound corresponding to the gated interval

Claims

1377559 X. Patent application scope: 1. A singing system with context sound effects, comprising: a song database for storing at least one song voice and at least one sound effect voice, and each of the sound effects of the towel has a threshold-threshold interval; a singing module for receiving a selection condition and playing one of the song voices according to the selection condition;

a voice analysis module for receiving a singing voice, and analyzing the singing voice according to a voice calculus to generate an emotional parameter; an image recognition module for capturing a video image and identifying the image according to a facial algorithm An image parameter is generated after the video image; a tune analysis module is configured to analyze the song voice during playback according to a spectrum calculation, and generate a tune parameter according to the analysis result;

a processing module for loading at least one of the emotion parameter, the expression parameter and the tune parameter according to a loading condition to calculate a threshold value; and a sound effect module for using the threshold value and the threshold value The threshold interval is compared, and the sound effect corresponding to the threshold is loaded according to the comparison result. 2. The singing system with contextual sounds as described in item 1 of the patent application, wherein the threshold interval is a range of values and characters. 3. The sing-song system of the contextual sound effect described in item 1 of the patent application is in which the singing voice is transmitted through a radio device, and the video image is taken through a photographing device. 4. The singing system of the contextual sound effect described in the patent application scope item </ br> wherein the collection parameters, the expression parameters and the adjustment parameters are generated in a manner or intervals. 5. The singing system of the contextual sound effect described in the item 〗〖, wherein the speech analysis module further comprises: after receiving the singing voice, filtering the background sound of the singing voice according to a filtering parameter. 6. A method for singing a situational sound effect, the method comprising: providing at least a song voice and at least a sound effect voice, wherein each of the sound effects voices respectively correspond to a threshold interval; receiving-selecting conditions, and playing the sound according to the selection condition One of the song voices; receiving-singing voice 'and singing_speeching# analyzing the singing voice to generate an emotional parameter; ▲ capturing the video image, and generating an expression parameter according to the facial algorithm to identify the video image; - frequency tf calculation analyzes the song voice in the play and generates a tune parameter according to the analysis result; according to the manned condition, at least one of the material parameter, the silk condition parameter and the tune parameter is used to calculate a threshold value; The threshold value is compared with the gantry interval, and the sound effect of the corresponding threshold is loaded according to the comparison result. 7. A song with a situational sound effect as described in claim 6 of the patent application. Chang method, wherein the threshold interval is a numerical value and a form of the text of one of them. 8. A method for singing a situational sound effect as described in the patent application, wherein the singing voice is received through a radio device, and The video image is captured by a photographic device. 9. For example, (4) The general circumstance of the sixth singer's singer's singer's singer's singularity's singularity, the expression parameter and the tune parameter are generated in a continuous or interval manner. 〇········································································································ T S1

twenty two