TWI356636B

TWI356636B - System and method for recording and reproducing vi

Info

Publication number: TWI356636B
Application number: TW96150444A
Authority: TW
Inventors: Jim W Chen
Original assignee: Inventec Besta Co Ltd
Priority date: 2007-12-27
Filing date: 2007-12-27
Publication date: 2012-01-11
Also published as: TW200930088A

Description

ij56636 九、發明說明：【發明所屬之技術領域】本發明是關於一種影音播放系統及方法，且特別是有關於一種能夠個別調整錄製之人聲'影像與音樂的播放時間點進行同步播放的影音錄放系統及方法。【先前技術】Ij56636 IX. INSTRUCTIONS: [Technical Field] The present invention relates to an audio-visual playback system and method, and more particularly to a video recording and playback capable of individually adjusting a recorded vocal 'image and music playback time point for simultaneous playback. System and method. [Prior Art]

卡拉OK(karaoke)這項休閒活動由日本引進台灣後，逐漸演變出歌唱結合影像晝面的KTV，而唱歌的地點也由開放式的場所演變為包m至發展出家麟唱機，使得唱歌逐漸成為現代人熱門的休閒活動。由於網際網路的技術日精月益，使用網際網路的人愈來愈多，因此目前也發展出線上KTV __Karaoke (karaoke), a leisure activity introduced by Japan from Taiwan, gradually evolved into a KTV that sings and combines images, and the location of singing has evolved from an open place to a m-to-development chorus, making singing gradually A popular leisure activity for modern people. Due to the increasing technology of the Internet, more and more people use the Internet, so the online KTV __ has also been developed.

時都能享受唱KTV的樂趣。當使用者透過網路將歌曲音樂伴奏槽案下載至電腦，並_電關邊設備（如麥克風及揚聲器）子又曰歌*趣的啊’也可以彻錄音軟體與網路攝影機分別 i寻自己的歌聲及唱爾的影像錄製下來，崎作成個人專屬的音樂錄影帶(music video, 。 2 ’由於鱗攝影機在擷取影像時，往往會因為電腦的發生影像延遲的狀況。而且，網路攝影機的錄二I用的錄音功能通常需透過電腦軟體來啟動，因此實際上錚音錄影魏的_亦開始播放伴奏音樂，可能有树伴奏音樂的播放時間點也由此τ知，如果直接以網路攝影機所錄製 5 丄356636 的影像與轉及音躲合鮮彡雜’财減此影音槽時合發現有影像與轉㈣步，甚至是使財的鱗魅在影像^ 嘴型對不起來的問題。【發明内容】有繁於利用習知技術所錄製之影音槽在播放時會有聲音與影像不同步的問題，本發明遂揭露—種影音錄放系統與方法了以便於能夠同步躲所雜的聲音賴;及影像訊號。Enjoy the fun of singing KTV. When the user downloads the song music accompaniment slot to the computer through the network, and the _ electric equipment (such as the microphone and the speaker), the singer and the singer are also interested in the recording software and the network camera. The songs and the images of the singer were recorded, and the singer made a personal music video (music video. 2 'Because the scale camera captures the image, the image is delayed due to the computer. Moreover, the webcam The recording function used for recording II usually needs to be started by computer software, so in fact, the voice recording Wei _ also starts to play the accompaniment music, there may be a tree accompaniment music playback time point is also known from the τ, if directly The video recorded by the road camera is 5 丄 356636 and the sound and the sound are confusing. When the video is reduced, the video and the turn (four) steps are found, and even the scaly charm of the money is not in the image. [Explanation] There is a problem that the sound and video groove recorded by using the conventional technology may have sound and image unsynchronization during playback, and the present invention discloses a video recording and playback system and The method is so as to be able to synchronize the hidden sounds; and the image signal.

“本發明提供-種影音錄放系統，其包括：錄製模組、調整拉組、訊號合成模組以及減模組。錄製模組是用以錄製人聲 =號以及影像訊號。其中，該人聲訊號具有—第—時間點，該影像訊號具有-第二時_。調整餘是肋載人具有第三時間點的音樂_，㈣音樂訊號為基準調整該人聲訊號與該影像訊號在喃軸向上的位移，以使該第—時間點及該第二時間點在時間軸上的位置與該第三親_同^訊號合成模組則是用以將調整後之該人聲訊號_音魏號合絲—聲音訊號，並依據該聲音訊號與該影像訊魅成—影音訊號，然後再由播放模組負責播放該影音訊號。本發明提供-種影音錄放方法，其實行步驟是：先錄製人聲訊號與影像訊號，其中該人聲訊號具有第—時間點，該影像訊號具有第二時間點。接著，載人具有第三時間點的音樂訊號，以该音樂訊號絲準來!驗該人聲峨無影像峨在時間轴向上的位移，以使該第-時間點及該第二時間點在時間軸上的位置與該第三時間點㈣。之後，賴紐之該人聲訊號與該 6 1356636 曰二减為聲音訊號，魏魏聲音喊與該影像訊號產生影音訊號’额再賊此f彡音訊號。本么月所揭4之系統與方法如上，與先前技術之間的差異在；本毛月疋在進行訊號合成之前，提供使用者可將人聲訊號中開始出現歌聲的時間點及影像訊號巾開始出現歌財第一個 :之嘴型的aHlii，調為與音樂纖巾開始出現歌曲主旋律的 k S IX /肖除後續所播放之影音訊號巾聲音訊號與影像而虎的㈣差，進而達朗步播放聲音訊號與影像訊號的功效。【實施方式】有關本發明之詳細特徵與實作，兹配合圖式在實施方式中 «羊Ί兒月如下’其内容足以使任何熟習相關技藝者了解本發明之技術内容域以實施，且根據本制書所揭露之内容及圖式’任何熟f侧技藝者可輕祕理解本發明相關之目的及優「第1圖」繪示為本發明之影音錄放系統在第一實施例中的方塊示意圖。請參照「第丨圖」，影音錄放系統議包括錄製模組110、調整模組120、訊號合成模組13〇以及播放模組14〇。其中’錄製极組100是負責錄製人聲訊號^及影像訊號V。在本實施例中’影音錄放系統100射以包括有影像擷取模組150 與音说接賊組160。影像減V例如是藉由影侧取模組15〇擷取之後S傳送至錄製模组110中。人聲訊號4貝何以是利用影音錄放系統刚之音訊接收模組16G所接收到的訊號，且音訊接收模組16〇在接收到人聲訊號^後，會將其傳送至錄製 7 模組110，以便於進行錄製。在本實施例中，音樂訊號am可以是預儲存於影音錄放系統 1〇〇中的訊號，也可以是從其他儲存裝置或網路傳送至影音錄放系統100中的訊號。本發明並未在此對音樂訊號~的來源加以限定。承上所述，影像擷取模組150例如是網路攝影機(web camera)，音訊接收模組16〇例如是麥克風。當然，在其他實施例中，影像擷取模組150與音訊接收模組16〇也可以整合為單一裝置，例如具有麥克風的網路攝影機，本發明並不在此對其做任何限制。〃在透過錄製模組110進行人聲訊號Ap肖影像訊號v的錄製後’使用者可以透過操作界面而在人聲訊號^與影像訊號乂中分別標記出第一時間點ρ與第二時間點ν’如「第2八圖」所示。其中，此處所謂第-時間點ρ為人聲訊號Αρ中開始出現歌聲的時間點；第二時間點ν為影像訊號V +，開始出現歌曲中第一個字之嘴型啊_。而儲存於影音錄放系統⑽的音樂訊號The present invention provides a video recording and playback system, which includes: a recording module, an adjustment pull group, a signal synthesis module, and a subtraction module. The recording module is used for recording a voice = number and an image signal, wherein the voice signal has - at the first time point, the image signal has - the second time _. The adjustment is the music of the rib carrier having the third time point, and (4) the music signal is used as a reference to adjust the displacement of the vocal signal and the image signal in the ridge axis So that the position of the first time point and the second time point on the time axis and the third parental and the same signal synthesis module are used to adjust the vocal signal _ sound Wei number - The sound signal is based on the sound signal and the image signal, and then the playback module is responsible for playing the video signal. The present invention provides a video recording and playback method, the steps of which are: recording the voice signal and the image first. a signal, wherein the human voice signal has a first time point, and the video signal has a second time point. Then, the manned person has a music signal at the third time point, and the music signal is approved by the music signal!峨 no image 峨 displacement in the time axis, so that the position of the first time point and the second time point on the time axis and the third time point (four). After that, the human voice of the Lai and the 6 1356636 曰减为声音声音声音魏魏 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 356 Before the signal synthesis, this month provides the user with the time when the vocal sound begins to appear in the vocal signal and the video signal towel begins to appear. The first one of the mouth type: aHlii, which is adjusted to start with the music scarf. The main theme of the song is k S IX / Xiao except for the subsequent playback of the audio and video signal, the sound signal and the image, and the tiger's (four) difference, and then the effect of playing the sound signal and the image signal. [Embodiment] Detailed features and implementations of the present invention In the embodiment, the content of the following is sufficient for any skilled person to understand the technical content field of the present invention, and is disclosed in accordance with the present disclosure. </ RTI> </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; Please refer to the "figure map". The video recording system includes a recording module 110, an adjustment module 120, a signal synthesizing module 13A, and a playing module 14A. The 'recording pole group 100 is responsible for recording the vocal signal ^ and the video signal V. In the present embodiment, the video recording and playback system 100 includes an image capturing module 150 and a vocal group. The image subtraction V is transmitted to the recording module 110, for example, by the shadow side capture module 15〇. The vocal signal 4 is used to receive the signal received by the audio receiving module 16G of the video recording and playback system, and the audio receiving module 16 transmits the vocal signal ^ to the recording 7 module 110, so that For recording. In this embodiment, the music signal am may be a signal pre-stored in the video recording and playback system, or may be a signal transmitted from other storage devices or the network to the video recording and playback system 100. The present invention does not limit the source of the music signal to this. As described above, the image capturing module 150 is, for example, a web camera, and the audio receiving module 16 is, for example, a microphone. Of course, in other embodiments, the image capturing module 150 and the audio receiving module 16 can also be integrated into a single device, such as a network camera having a microphone, and the present invention is not limited thereto. 〃 After the recording of the vocal signal Ap visor video v by the recording module 110, the user can mark the first time point ρ and the second time point ν' in the vocal signal ^ and the image signal 透过 through the operation interface respectively. As shown in "Figure 8". Here, the so-called first-time point ρ is the time point at which the singing voice begins to appear in the human voice signal Αρ; the second time point ν is the video signal V+, and the mouth type of the first word in the song begins to appear. And the music signal stored in the video recording and playback system (10)

Am則具有第三時間點m ’其為音樂訊號Am中開始出現歌曲主° 旋律的時間點。调整模組12〇即是用來調整人聲訊號~與影像訊號V在時間轴向上的位移，以使第—時間點P及第二時間點v在時間軸 j的位置與第三時間點m相同，如「第2B圖」所示。也就是 έ兄’調整模組120可對錄製模組110所錄製的人聲訊號Ap及影像訊號v進行調整，以使人聲訊號Ap中開始出現歌聲的日;間= 1356636 及影像訊號v "始出現歌財第—個字之嘴型的時間輯立樂訊號Am _開始出現歌曲主旋律的時間點相同。一曰特別的是，本實蝴射財調整餘⑶對，影像訊號：進行調整之後’以及透過訊號合成模組⑽ 喊合成之前’先利用播放模組⑽分別播放出音樂訊號軋 =調整後的人聲訊號Ap及影像訊號v，以供使用相看試聽調正後的人雜號AP的第—時間點及影像訊號v的第二時門點gAm has a third time point m ', which is the time point at which the main melody of the song begins to appear in the music signal Am. The adjustment module 12 is used to adjust the displacement of the vocal signal ~ and the image signal V in the time axis so that the position of the first time point P and the second time point v on the time axis j and the third time point m The same, as shown in "Figure 2B". That is, the adjustment module 120 can adjust the vocal signal Ap and the image signal v recorded by the recording module 110 so that the day when the vocal sound begins to appear in the vocal signal Ap; between = 1356636 and the video signal v " The time when the word of the song is the first word of the word, the music signal Am _ begins to appear the same time. In particular, the actual butterfly adjustments (3) pairs, image signal: after adjustment 'and before the signal synthesis module (10) shouting synthesis 'first use the playback module (10) to play the music signal respectively = adjusted The vocal signal Ap and the video signal v are used for the first time point of the person's miscellaneous AP and the second time point of the video signal v after the audition is adjusted.

否確實與音樂訊號Am㈣三時間點相同，以達到同步播放的= 果。Whether it is the same as the music signal Am (four) three time points, in order to achieve the synchronous playback = fruit.

請繼續參照「第1圖」，在湘調整餘12G將人聲訊號 AP中開始纽歌聲科間點及影像訊號v巾開始出現歌曲中^ -個字之嘴型的時間點，調整為與音樂訊號、中開始出現歌曲主旋律的時_相同之後，接著即由訊號合成模組m負責將调整後的人聲喊AP與音樂峨Am麵為聲音域A，並且依據聲音㈣A與影像峨v生成—影音訊號％。熟習此技藝者應該瞭解影音訊號合成的技術，在此不再贅述。播放模組140是用以播放訊號合成模組13〇所產生的影音訊號vM。本實施例之影音訊號Vm的檔案格式例如是 MPEG-4(movie picture experts groups-4)檔或 wMV(windows media video)擋。當然’在其他實施例中，影音訊號％的檔案格式也可以是其他目前通用的影音檔案格式，本發明並不對其加以限定。而要〉主意的是’由於聲音訊號A是由人聲訊號Ap與音樂訊 9 U56636 唬八_音合成而得’因此聲音訊號a的總長度是取決於音樂喊am的總長度。細，雜製模組則所錄製的影像訊號v U度與聲曰喊A的總長度不符，則需在合成影音訊號％的過程中對f彡像職V進行處理，贿其與聲音訊號A具有相同的檔案長度。士「第3A圖」繪示為本發明之影像訊號v與聲音訊號a的在時間軸上的示意圖。請參照「第3A圖」，雖然影像訊號之第 •=時^點V與聲音訊號之第四時間點a (等同於音樂訊號VM的第二時間點m)在時間轴上的位置相同，但在此實施例中，錄製模組110所錄製到的影像訊號V之總長度小於聲音訊號A的總長度’所以此時需要另外增加一個補償訊號來補償影像訊號 V與聲音訊號a之關時間差’以避免在_始播放影音訊號 VM時發生有聲音無影像的情形。下文將舉實關配合圖式，以對此做進一步說明。 • 「第4圖」繪示為本發明之影音錄放系統在第二實施例中的方塊不思圖。請參照「第4圖」，影音錄放系統4〇〇包括錄製模組410、凋整模組420、訊號合成模組“ο以及播放模組44〇。其中，錄製模組410、調整模組420以及播放模組44〇分別與前述實施例之錄製模組11〇、調整模組12〇以及播放模組14〇相似。當然，影音錄放系統4〇〇也可以包括有影像擷取模組45〇與音訊接收模組460,其功能分別與影像擷取模組45〇及音訊接收模組460相似，此處均不再贅述。如前所述’為了解決在影音合成過程中所造成影像跟不上榲l 的師延遲躲’在第二實施财本判對訊號合成俊㈣★做3些改進’其中訊號合成模組430更進-步包括影理早70 432，在訊號合成模組430依據聲音訊號A與二《 V生成影音訊號Vm的過程中’影像訊號處理單元极訊$\"像m與聲音訊號八之間的時間差來產生補償影像 c *與原衫像说號v合成為後製景》像斯虎％。其中， =影像峨Vg_擔_影像娜與聲音訊號a :時間長度—差’如「第3B圖」所示。由此可知，本實施例〜Vm μ際上疋由後製影像訊號VB及聲音訊號A合成而得。除此之外’錄製模組彻所錄製的影像訊號v之總長度也 J能會大於聲音訊號A的總長度，如「第5A圖」所示。此時， Γ像訊號處理單元432例如是將影像訊號V超出聲音訊號A的 P刀刪除掉’以生成財音訊號A具有烟時間長度的後製影像矾號VB，如「第5B圖」所示。 *上〃可知树明之影音紐系統是在將其所錄製到的人聲訊號Ap、影像訊號v以及音樂職AM進行合成之前，先 =整人聲_ Ap與影像簡v麵_向上的位移，然後再對吳些訊號騎合成。如此—來，柯消除各訊號之贴錄製過程中因各_素所導__差，進而避免發生聲音愈影像不同步的問題。程中，先對影像訊號進行 1356636 訊號時間長度相同的後製影像訊號。之後利用此後製影像訊號與聲音訊號，即可合成出聲音與影像同步的影音訊號。為使熟習此技藝者更加瞭解本發明，以下將舉實例說明上述影音錄放系統的運作流程。「第6圖」繪示為本發明在第—實施例下所實現之影音錄放方法的步驟流程圖。請參照「第6圖」，#使用者啟動網路攝影機的錄影功能、播放音樂並開始唱歌時，系統是先分别接收 # 使用者的人聲訊號及伴奏的音樂訊號，並操取使用者的影像訊號（步驟610)。接著開始錄製人聲訊號以及影像訊號（步驟 620) ’並同柃紀錄人聲訊號中開始出現歌聲的時間點，以及影像訊號中開始出現歌曲中第一個字之嘴型的時間點（如「第从圖」所示）。錄&兀畢後’系統會將原儲存在系統中的音樂訊號及步驟 6「2〇所錄製到的影像訊號與人聲訊號在時間轴上的位置圖（見鲁「第2A圖」）顯示在螢幕上供使用者參考，並提供使用者—操作，面，則使用者可以音樂訊號做為基準，而輸入指令來調整人聲。K號及影像訊號在時間軸向上的位移（步驟㈣），使人聲訊號中開始出現歌聲的時間點及影像訊號中開始出現歌曲中^ 一個字之嘴型的時間點，均與音樂訊號中開始出現主旋律的時間點相同（如「第2B圖」所示）。。特別的是，當使用者透過操作介面調整人聲訊號及影像訊逮在時間轴上的位置後’本實施例例如是先播放調整後的人聲訊號、影像訊號及音樂訊號（步驟640)，以供使用者試看試聽， 12 1356636 亚藉由使财來觸±述三個峨是否同步（步驟65〇)。田步驟650中判斷出人聲訊號、影像訊號及音樂訊號仍不同步時’則重複步驟630 ’也就是依據試看試聽的結$，再次調正人每afL號與影像机號在時間軸向上的位辛多；反之，當步驟湖中判斷出人聲訊號'影像訊號及音樂訊號已同步時，則將調整後的人聲汛5虎與音樂訊號合成為一聲音訊號（步驟66〇)，並依據此聲音讯號與影像訊號產生一影音訊號（步驟67〇)。之後即可播放此影音訊號（步驟680)。在此，影音訊號Vm的檔案格式例如是 MPEG-4(movie picture experts groups-4)槽、 WMV(windows media video)檔或是其他目前通用的影音檔案格式。如如文所述’依據聲音訊號與影像訊號生成影音訊號的過程中’實際上影像訊號的總長度可能會與聲音訊號的總長度（見「第3A圖」及「第5A圖」）。因此，本發明在第二實施例中所實現之影音錄放方法如「第7圖」所示，其在步驟670中例如是先判斷影像訊號與聲音訊號的總長度是否相同（步驟771)。當影像訊號與聲音訊號的總長度相同時，則繼續進行「第6圖」所示之步驟680。反之，當影像訊號與聲音訊號的的總長度不同時，先判斷影像訊號的總長度是否小於聲音訊號的總長度（步驟772)。當影像訊號的總長度小於聲音訊號的總長度時’計算影像訊號與聲音訊號的時間長度差（步驟773) ’接著依據此時間長度差產生一補償影像訊號（步驟774)。其中’此補償影像訊號是與原 13 1356636 影像訊號合成為一後製影像訊號，且補償影像訊號可以是歌曲片頭的影像。之後再將聲音訊號及後製影像訊號合成影音訊號 (步鄉776)。如此一來，後續在播放影音訊號（如「第6圖」之步驟680)時，即可避免在剛播放時發生有聲音沒影像的情況。此外，當影像訊號的總長度大於聲音訊號的總長度時，刪除影像訊號超出聲音訊號的部分（步驟775)，以產生與聲音訊號具有相同4間長度的後製景》像訊號。之後再將聲音訊號與後製影像訊號合成為影音訊號（步驟776)。紅上所述，可知本發明與先前技術之間的差異在於本發明疋先以音樂訊號為基準來調整其所錄製到的人聲訊號及影像訊號在時_向上驗移，之後才對這魏號騎合成。藉由此 -技術手段可以解決先前技射因為所錄製的聲音訊號與影像訊號之間存在時間差的問題，進而達成同步播放人聲訊號、影像訊號以及音樂訊號的技術功效。而且，本發明亦提供在合成影音訊號的過程中對影像訊號進行處理的技術手段，以確保最後合成㈣音喊具有同步的聲音訊號與影像訊號。雖然本發明所揭露之實施方式如上，惟所述之内容並非用以直接限定本發明之專利保護範圍。任何本發明所屬技術領域中具有通常知識者，在不脫離本發明所揭露之精神和範圍的前提下，可以在實施的形式上及細節上作些許之更動。本發明之專利保魏圍，仍須以所附之申請專職麟界定者為準。【圖式簡單說明】第1圖為本發明之影音錄放系統在第一實施例中的方塊示思、圖。第2A圖為調整前之人聲訊號、影像訊號及音樂訊號在時間輛上的示意圖。第2B圖為δ周整後之人聲訊號、影像訊號及音樂訊號在時間轴上的示意圖。第3A圖為本發明之影像訊號與聲音訊號在第-實施例中於時間軸上的示意圖。 *第3B ®為本發明之補償影像訊號、影像訊號與聲音訊號在第一實施例中於時間軸上的示意圖。第4圖為本發明之影音錄放系統在第二實施例中的方塊示意圖。第5A圖為本發明之影像訊號與聲音訊號在第二實施例中於時間軸上的示意圖。第5β圖為本發明已處理之影像訊號與聲音訊號在第二實施例中於時_上的示意圖。第6圖為本發明之影音錄放方法在第一實施例中的步驟流程圖。第7圖為本發明之影音錄放方法在第二實施例中的步驟流程圖。【主要元件符號說明】影音錄放系統錄製模組 15 110 1356636Please continue to refer to "1st picture", adjust the time point of the song in the vocal signal AP and start the vocalization of the vocal signal in the vocal signal AP. After the beginning of the song main melody _ the same, then the signal synthesis module m is responsible for the vocal AP and the music 峨Am face as the sound domain A, and according to the sound (four) A and the image 峨 v generated - video signal %. Those skilled in the art should be familiar with the technology of video signal synthesis, and will not be described here. The playing module 140 is used to play the video signal vM generated by the signal synthesizing module 13A. The file format of the video signal Vm of this embodiment is, for example, MPEG-4 (movie picture experts groups-4) file or wMV (windows media video) file. Of course, in other embodiments, the file format of the video signal % may also be other currently common video file formats, which are not limited by the present invention. The idea is that the sound signal A is synthesized by the voice signal Ap and the music signal 9 U56636. Therefore, the total length of the voice signal a depends on the total length of the music call am. For the fine and miscellaneous module, the image signal v U degree recorded does not match the total length of the voice screaming A. In the process of synthesizing the video signal %, the image V is processed, and the sound signal A is bribed. Have the same file length. The "3A" diagram shows a schematic diagram of the video signal v and the audio signal a on the time axis of the present invention. Please refer to "3A", although the position of the image signal at the time of the == and the fourth time point a of the audio signal (equivalent to the second time point m of the music signal VM) are the same on the time axis, but In this embodiment, the total length of the image signal V recorded by the recording module 110 is smaller than the total length of the sound signal A. Therefore, an additional compensation signal is needed to compensate the time difference between the image signal V and the sound signal a. In order to avoid the occurrence of sound and no image when the video signal VM is initially played. This will be explained in more detail below. • Fig. 4 is a block diagram showing the video recording and reproducing system of the present invention in the second embodiment. Please refer to FIG. 4, the video recording and playback system 4 includes a recording module 410, a fading module 420, a signal synthesizing module ο, and a playing module 44 〇. The recording module 410 and the adjusting module 420 The playback module 44 is similar to the recording module 11A, the adjustment module 12A, and the playback module 14A of the foregoing embodiment. Of course, the video recording and playback system 4 can also include an image capture module 45. The function of the audio receiving module 460 is similar to that of the image capturing module 45 and the audio receiving module 460, and will not be described here. As described above, in order to solve the problem caused by the video and audio synthesis process, The teacher of the 榲l delays to hide in the second implementation of the financial statement to the signal synthesis (four) ★ do 3 improvements' where the signal synthesis module 430 further steps include the shadow as early as 70 432, based on the signal synthesis module 430 Sound signal A and two "V in the process of generating video signal Vm" video signal processing unit, the signal $\" like the time difference between m and the sound signal eight to generate a compensation image c * and the original shirt image number v is synthesized as After the scene, like the Tigers%. Among them, = image _ Vg_ supported video and audio signal Na a: the length of time - as shown in the difference 'as "FIG. 3B." Therefore, it can be seen that the ~Vm μ top 本 is synthesized by the post-production video signal VB and the audio signal A. In addition, the total length of the image signal v recorded by the recording module can be greater than the total length of the sound signal A, as shown in Figure 5A. At this time, the image signal processing unit 432 deletes the P knife whose image signal V exceeds the sound signal A, for example, to generate a post-production image nickname VB having a cigarette time length of the sound signal A, as shown in FIG. 5B. Show. *The captain can know that Shuming's video and audio system is before the vocal signal Ap, video signal v and music job AM recorded by it, first = the whole human voice _ Ap and the image v-face _ upward displacement, and then Take a ride on Wu’s signal. In this way, Ke eliminates the problem that each signal is __ poorly recorded during the recording process of each signal, thereby avoiding the problem that the sound image is not synchronized. In the process, firstly, the image signal is subjected to the post-production image signal of the same length of time 1356636. After that, the video signal and the sound signal are used to synthesize the video signal synchronized with the sound and the image. In order to make the skilled person more familiar with the present invention, the operation flow of the above video recording and playback system will be described below by way of example. Fig. 6 is a flow chart showing the steps of the video recording and playback method implemented in the first embodiment of the present invention. Please refer to "Figure 6". #Users start the video recording function of the network camera, play music and start singing. The system first receives the user's voice signal and accompaniment music signal separately, and listens to the user's image. Signal (step 610). Then, the recording of the vocal signal and the video signal (step 620) is started, and the time point at which the singing starts to appear in the vocal signal is recorded, and the time point at which the first word of the song begins to appear in the video signal (such as "the first Figure"). After recording & After the system, the system will display the music signal originally stored in the system and the position map of the video signal and the vocal signal recorded on the time axis in step 6 (see "Figure 2A"). On the screen for the user's reference, and provide user-operate, face, the user can use the music signal as a reference, and input commands to adjust the vocals. The displacement of the K number and the image signal in the time axis (step (4)), the time point at which the singing sound begins to appear in the vocal signal and the time point at which the mouth of the word begins to appear in the video signal, starting with the music signal The time at which the main melody appears is the same (as shown in Figure 2B). . In particular, after the user adjusts the position of the vocal signal and the video capture on the time axis through the operation interface, the present embodiment, for example, first plays the adjusted vocal signal, video signal and music signal (step 640). The user tries to listen to the audition, 12 1356636 by borrowing money to tell whether the three 峨 are synchronized (step 65〇). When it is determined in step 650 that the vocal signal, the video signal and the music signal are still out of synch, 'repeating step 630' is to adjust the position of each afL number and the camera number in the time axis according to the test result. On the contrary, when it is determined in the step lake that the vocal signal 'image signal and music signal have been synchronized, the adjusted vocal vocal 5 and the music signal are combined into one sound signal (step 66 〇), and according to the sound signal The number and the video signal generate an audio signal (step 67). The video signal can then be played (step 680). Here, the file format of the video signal Vm is, for example, an MPEG-4 (movie picture experts groups-4) slot, a WMV (windows media video) file, or other currently common video file format. As described in the following section, 'The total length of the video signal may be the total length of the audio signal during the process of generating the video signal based on the audio signal and the video signal (see "3A" and "5A"). Therefore, the video recording and playback method implemented in the second embodiment of the present invention is as shown in Fig. 7, and in step 670, for example, it is first determined whether the total length of the video signal and the audio signal are the same (step 771). When the total length of the video signal and the audio signal are the same, proceed to step 680 shown in Figure 6. Conversely, when the total length of the video signal and the audio signal are different, it is first determined whether the total length of the video signal is less than the total length of the audio signal (step 772). When the total length of the video signal is less than the total length of the audio signal, the time difference between the video signal and the audio signal is calculated (step 773)', and then a compensated video signal is generated according to the difference in time length (step 774). The 'compensated image signal is combined with the original 13 1356636 image signal into a post-image signal, and the compensated image signal can be the image of the song title. Then, the audio signal and the post-production video signal are combined into a video signal (Buxiang 776). In this way, when playing the video signal (such as step 680 of "Picture 6"), it can avoid the situation that there is no sound and no image when playing. In addition, when the total length of the video signal is greater than the total length of the audio signal, the portion of the video signal beyond the audio signal is deleted (step 775) to generate a rear view image signal having the same length of 4 times as the audio signal. The audio signal and the post-image signal are then combined into a video signal (step 776). As described above, it can be seen that the difference between the present invention and the prior art is that the present invention first adjusts the recorded vocal signals and video signals on the basis of the music signal, and then checks the semaphores. Riding the synthesis. By this technique, the problem of the time difference between the recorded sound signal and the image signal can be solved by the prior art technique, thereby achieving the technical effect of simultaneously playing the vocal signal, the image signal and the music signal. Moreover, the present invention also provides a technical means for processing video signals in the process of synthesizing video signals to ensure that the final synthesized (four) voices have synchronized audio signals and video signals. Although the embodiments of the present invention are as described above, the above description is not intended to directly limit the scope of the invention. Any changes in the form and details of the embodiments may be made without departing from the spirit and scope of the invention. The patent for the invention, Wei Wei, shall still be subject to the definition of the attached full-time application. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram and diagram of a video recording and playback system of the present invention in a first embodiment. Figure 2A is a schematic diagram of the human voice signal, video signal and music signal before the adjustment. Figure 2B is a schematic diagram of the human voice signal, video signal and music signal on the time axis after δ weeks. Fig. 3A is a view showing the image signal and the sound signal of the present invention on the time axis in the first embodiment. * 3B ® is a schematic diagram of the compensated video signal, video signal and audio signal of the present invention on the time axis in the first embodiment. Fig. 4 is a block diagram showing the video recording and reproducing system of the present invention in the second embodiment. Fig. 5A is a view showing the image signal and the sound signal of the present invention on the time axis in the second embodiment. The 5th figure is a schematic diagram of the image signal and the sound signal processed by the present invention in the second embodiment. Fig. 6 is a flow chart showing the steps of the video recording and playback method of the present invention in the first embodiment. Fig. 7 is a flow chart showing the steps of the video recording and playback method of the present invention in the second embodiment. [Main component symbol description] Video recording and playback system Recording module 15 110 1356636

120 調整模組 130 訊號合成模組 140 播放模組 150 影像擷取模組 160 音訊接收模組 400 影音錄放系統 410 錄製模組 420 調整模組 430 訊號合成模組 432 影像訊號處理單元 440 播放模組 450 影像擷取模組 460 音訊接收模組 A 聲音訊號 Ap 人聲訊號 Am 音樂訊號 V 影像訊號 Vc 補償影像訊號 VB 後製影像訊號 a 第四時間點 m 第三時間點 P 第一時間點 V 第二時間點 16 1356636 步驟610 步驟620 步驟630 步驟640 步驟650 步驟660 步驟670 步驟680 步驟771 步驟772 步驟773 步驟774 - 步驟775 步驟776 分別接收人聲訊號，並擷取影像訊號錄製人聲訊號及影像訊號以音樂訊號為基準，調整人聲訊號與影像訊號在時間軸向上的位移播放調整後的人聲訊號、影像訊號及音樂訊號上述三個訊號是否同步將調整後的人聲訊號與音樂訊號合成為一聲音訊號依據此聲音訊號與影像訊號產生一影音訊號播放此影音訊號景’像訊號之總長度與聲音訊號之總長度是否相同景’像§fL號之總長度是否小於聲音訊號之總長度计异影像訊號與聲音訊號之間的時間長度差依據此時間長度差產生補償影像訊號，而與影像訊號合成為後製影像訊號刪除影像訊號超出聲音訊號的部分，以生成後製影像訊號將聲音訊號及後製影像訊號合成為影音訊號 17120 Adjustment Module 130 Signal Synthesis Module 140 Playback Module 150 Image Capture Module 160 Audio Receiver Module 400 Video Recording System 410 Recording Module 420 Adjustment Module 430 Signal Synthesis Module 432 Video Signal Processing Unit 440 Playback Module 450 image capture module 460 audio receiving module A audio signal Ap vocal signal Am music signal V video signal Vc compensation image signal VB post-image signal a fourth time point m third time point P first time point V second Time point 16 1356636 Step 610 Step 620 Step 630 Step 640 Step 650 Step 660 Step 670 Step 680 Step 771 Step 772 Step 773 Step 774 - Step 775 Step 776 Receive the vocal signals separately, and capture the video signals to record the vocal signals and video signals. The music signal is used as a reference to adjust the displacement of the vocal and video signals in the time axis. The adjusted vocal, video and music signals are synchronized. The vocal and music signals are combined into an audio signal. This sound signal and video signal production An audio-visual signal plays the video signal. The total length of the video signal is the same as the total length of the audio signal. The total length of the §fL is less than the total length of the audio signal. The length of time between the video signal and the audio signal. The difference is based on the difference in length of time to generate a compensated image signal, and the image signal is combined into a post-image signal to delete the portion of the image signal beyond the sound signal to generate a post-image signal to synthesize the sound signal and the post-image signal into an audio signal 17

Claims

1356636 X. Patent application scope: L A video recording and playback system, comprising: - a recording module for recording - a human voice signal and an image signal, wherein the human voice signal has a first time point, and the image signal has a second time An adjustment module for loading a music signal having a third time point, and adjusting an axial displacement of the human voice signal and the image signal based on the music signal to make the first - the time point and the second time point on the time axis are the same as the third time point; a signal synthesis module is configured to synthesize the adjusted human voice signal and the music signal into an audio signal, and according to The sound signal and the image signal generate an audio and video signal; and a play module for playing the video signal. 2. The video recording and playback system according to claim 1, wherein after adjusting the Lusheng sound number, the displacement of the Yuhai image number in the time axis, and the singular sound signal after the synthetic adjustment, Before the image signal, the playing module further includes playing the human voice signal, the image signal and the music signal simultaneously. 3. The video recording and playback system of claim 1, wherein the signal synthesizing module further comprises a piano signal processing unit, wherein when the total length of the video signal is different from the total length of the audio signal, The image signal is processed to generate a post-image signal, wherein the total length of the post-image signal is the same as the total length of the audio signal, and the audio signal is synthesized by the audio signal 'and the post-image signal Got it. 18 1356636 4·If the patent application scope is 3 video recording and playback system, the image processing unit is used according to the image signal when the total length of the image signal is less than the total length of the sound signal. The difference in length between the sound signal and the sound signal produces a compensated image signal, and the image signal is synthesized by the image signal and the compensated image signal. 5. In the case of applying for the video recording and playback system as described in item 3 of the special design, the total length of the video signal is greater than the total length of the audio signal, and the signal is used to delete the video signal. The portion of the audio signal to generate the post-image signal. 6. The video recording and playback system of claim 1, further comprising: an image capturing module for capturing the image signal and transmitting the image signal to the recording module; and an audio receiving The module is configured to receive the vocal signal and transmit the vocal signal to the recording module. The video recording and playback system according to claim 6, wherein the image capturing module is a web camera. 8. The video recording and playback system according to claim 6, wherein the audio receiving module is a microphone. 9. A video recording and playback method, comprising the steps of: recording a human voice and an image signal, wherein the human voice signal has a first time point, the image signal has a second time point; and the loading has a second time point a music signal, wherein the vocal shouting and the displacement of the image signal in the _ up direction are adjusted based on the music signal, and the position of the first time point and the second time point on the time axis is made with 19 1356636 The three time points are the same; the adjusted human voice signal and the music signal are combined into an audio signal, and the audio signal is generated according to the sound signal and the video signal; and the video signal is played. 1. The video recording and playback method according to claim 9, wherein after adjusting the human voice signal and the video signal in the upward shift of the time vehicle and after synthesizing the adjusted human voice thief and the image signal, The method further includes simultaneously playing the adjusted human voice signal, the image signal and the music signal. 11. If the method for recording video recording according to the ninth item is applied, the step of generating the video signal according to the sound signal and the image signal includes: determining the total length of the image signal and the total length of the sound signal. Whether the same is true, wherein when the total length of the video signal is different from the total length of the audio signal: the image signal is processed to generate a post-image signal, wherein the total length of the post-image signal and the total length of the audio signal The degree is the same; and the sound signal and the post-image signal are combined into the video signal. The method for processing the video signal according to the method of claim 11, wherein when the total length of the image signal is less than the total length of the audio signal, the method for processing the image signal comprises: calculating the image signal and the sound signal A difference in length between the two; a difference between the length of 20 1356636 and the length of the difference between the two - the county is reduced by the sum of the compensated image signal and the image signal. The image recording and playback method of claim 11, wherein when the total length of the image signal is greater than the total length of the sound signal, the method for processing the image signal includes deleting the image signal beyond the sound signal. The part to generate the post-image signal. 14. The video recording and playback method of claim 9, wherein before the audible signal and the image signal are clamped, the method further comprises: capturing the image signal; and receiving the vocal signal.

twenty one