TW200939797A

TW200939797A - Portable electronic device and image and audio data combining method

Info

Publication number: TW200939797A
Application number: TW97107747A
Authority: TW
Inventors: Yi Li
Original assignee: Mitac Int Corp
Priority date: 2008-03-05
Filing date: 2008-03-05
Publication date: 2009-09-16

Abstract

The present invention discloses a portable electronic device comprising a storage unit, an operation interface, a recognition unit and a processing unit. The storage unit is used for storing at least one image data and one audio data. The operation interface is used for selecting the above-mentioned image and audio data. The recognition unit is used for converting the selected audio data into a text message. The processing unit is used for combining the above-mentioned text message and the selected image data, so as to generate a combined image. Accordingly, the portable electronic device of this invention such as a cell phone or a digital camera is able to convert recorded or communication audio data into text messages, and then combine them with image data. Therefore, the usage efficiency of the data storage space of the portable electronic device is improved and the advantage of adding text messages to image data is achieved.

Description

200939797 九、發明說明：【發明所屬之技術領域】本發明是有關於一種可攜式電子裝置及其影像與音訊資料合成方法，特別是有關於一種在可攜式裝置上方便於影像上標註文字之技術領域。【先前技術】 =職遊並且拍照，用圖片記載著生活的點點滴滴。虽人們出門旅行並沿途紀錄當地風景之I，很有可能希望在照片上附加拍攝地點等資訊。此外，當人 =圖例如新聞圖片、藝術照片、或醫療用的俨圖3 在圖片上加註適當的說明。而目前在實 f圖片=註說明的方式，往往是在實體圖片之上或背則往::須在電子圖片上加註說明的方式，員透過觸控板、鍵盤輸入等手動方式為之。 Ο 通時此:卜藉：ί人使用手機等通訊裝置與其他人進行溝行通話疋而有時訊;料的方式給接收者來進字和文字訊息，如==之:訊資料内容為簡單之數等訊息。而目前使用之通訊裝置別=上:::及二址用紙筆的方式記錄上述之訊广=者轉收者必須使攜帶紙筆時’將無法紀錄上：之=話接收者忘記通訊裝置實用性大打折扣。訊w，使得目前使用之 200939797 為此，目前市面上有業者提供可錄製音訊的通訊裝置，提供使用錄製上述之數字和文字訊息，但受限於通訊裝置内建儲存記憶體或硬碟容量之大小，無法進行長時間之錄製，導致此可錄製音訊的通訊裝置於實用上受到限制。因此，若將上述之口述内容辨識技術引入通訊裝置，應可大幅提升儲存記憶體或硬碟容量之使用效率。有鑑於習知技藝之各項問題，為了能夠兼顧解決之，本發明人基於多年研究開發與諸多實務經驗，提出 $ —種可攜式電子裝置及其影像與音訊資料合成方法，以作為改善上述缺點之實現方式與依據。【發明内容】有鑑於此，本發明之目的就是在提供一種可攜式電子裝置及其影像與音訊資料合成方法，其利用語音内容辨識技術的方式來為電子圖片加註說明，以提高操作便利性。 ❹ 根據本發明之一目的，提出一種可攜式電子裝置，其包含一儲存單元、一操作介面、一辨識單元以及一處理單元。儲存單元係用以儲存至少一影像資料及至少一音訊資料，操作介面係用以對上述影像資料及音訊資料進行選擇；辨識單元係用以將所選之音訊資料轉換成一文字訊息，處理單元係用以合成上述文字訊息及所選之影像資料以產生一合成影像。 6 200939797 此外，本發明更提出一種影像與音訊資料合成方法’其包含下列步驟’首錢過—影像錄單元自外部擷取一影像資料。然後，透過一麥克風自外部接收一聲音。然後，利用-錄音單元將此聲音轉換為一音訊資料。之後，再將前述音訊資料轉換為一文字訊息。最後，合成文字訊息於影像資料之中以產生一合成影像。此外’本發明再接中—4¾ Μό VA tffci λ 法’其包含下列步驟，首出先種自景1，二；訊：料合成方選影像。然後，透過一無後二儲，令取得-所訊號。缺後，伽粒早70自外部接收一無線音訊資；。然後上述無線訊號轉換為- 息。最後合成:ϊ文訊片段轉換為-文字訊合成影像。 4文子訊息與上述所選圖片以產生- 與音成H本發明之可搞式電子裝置及其影像話，轉換為文字袼、，可以直接將使用者口述的一段本發明可藉由辨^並附加於—數位影像之上。因此，於在相同訊息將音訊資料轉換成文字訊息，由訊息所需之儲存訊息與音訊資料相比較下，文字藉此，本發明之小於音訊資料所需之儲存空間，此外，本發明切5電子裝置亦可儲存更多訊息量。為一段影像附加旁=電子裝置亦可透過口述的方式，影像之目的。或字幕，進而達到快速編輯一數位200939797 IX. Description of the Invention: [Technical Field] The present invention relates to a portable electronic device and a method for synthesizing the same, and an image and audio data, in particular, a method for facilitating labeling on an image above a portable device The technical field. [Prior Art] = Traveling and taking pictures, using pictures to record the bits and pieces of life. Although people travel and record the local scenery along the way, it is very possible to attach information such as the location of the photo to the photo. In addition, when a person = a picture such as a news picture, an art photo, or a medical picture 3, an appropriate description is added to the picture. At present, in the real f picture = note description, it is often on the physical picture or on the back:: must be added to the description of the electronic picture, through the touchpad, keyboard input and other manual methods. Ο Ο 此 : : : : : : ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί ί Simple numbers and other information. The communication device currently used does not have the following::: and the second site uses the pen and paper to record the above-mentioned information. If the recipient has to carry the pen and paper, it will not be recorded: the recipient will forget the communication device. Sexually discounted. For the purpose of this, 200939797 is currently used. For this reason, there are currently available communication devices for recording audio in the market, which provide the use of recording digital and text messages, but limited by the built-in storage memory or hard disk capacity of the communication device. The size and the inability to record for a long time make the communication device capable of recording audio practically limited. Therefore, if the above-mentioned dictation identification technology is introduced into the communication device, the use efficiency of the storage memory or the hard disk capacity should be greatly improved. In view of the problems of the prior art, in order to be able to solve the problem, the inventor has proposed a portable electronic device and a method for synthesizing the image and audio data based on years of research and development and many practical experiences. The implementation and basis of the shortcomings. SUMMARY OF THE INVENTION In view of the above, the object of the present invention is to provide a portable electronic device and a method for synthesizing the same with the image and audio data, which utilizes a voice content recognition technology to add an annotation to the electronic image to improve operation convenience. Sex. According to one aspect of the present invention, a portable electronic device includes a storage unit, an operation interface, an identification unit, and a processing unit. The storage unit is configured to store at least one image data and at least one audio data, and the operation interface is used for selecting the image data and the audio data; the identification unit is configured to convert the selected audio data into a text message, and the processing unit is The text message and the selected image data are synthesized to generate a composite image. 6 200939797 In addition, the present invention further provides a method for synthesizing video and audio data, which comprises the following steps: the first money-recording unit captures an image data from the outside. Then, a sound is received from the outside through a microphone. Then, the sound is converted into an audio material by the -recording unit. After that, the audio data is converted into a text message. Finally, a text message is synthesized in the image data to produce a composite image. Further, the present invention re-connects the -43⁄4 Μό VA tffci λ method, which includes the following steps, firstly, the first scene is taken from the scene 1 and the second; Then, through the absence of the second storage, the signal is obtained. After the absence, the gamma grain 70 receives a wireless audio resource from the outside. Then the above wireless signal is converted to - interest. The final synthesis: ϊ text segment is converted to - text synthesis image. 4 text message and the selected picture to generate - and sound into H the inventive electronic device and its video words, converted into text, can directly dictate a part of the user can be identified by Attached to the - digital image. Therefore, in the same message, the audio data is converted into a text message, and the stored information required by the message is compared with the audio data, whereby the text is smaller than the storage space required for the audio data, and the present invention is cut. Electronic devices can also store more messages. Adding a side to an image = the electronic device can also be dictated by the way of the image. Or subtitles, so as to achieve a quick edit of a digit

之技術特徵及所達到謹佐以較佳之實施例 200939797 茲為使貴審查委員對本發明之功效有更進一步之瞭解與認識，及配合洋細之說明如後。【實施方式】可相關圖式，說明依本發明較佳實施例之 :攜式電子裝置及其影像與音訊資料合成方法下述實施例中之相同元件係以相同之符號標示 1圖’其係為本發明之可攜式電子裝置之 =月:=圖。本發明之可攜式電子裝置剛包含一儲存二兀、一，介面12〇、一辨識單元13〇以及一處理 : 儲存單元110係用以儲存至少一影像資料⑴ 資料:12,操作介面120係用以對上述影像 i θ訊資料112進行選擇。辨識單元130係用以 150夕二、日2貝料、112轉換成一文字訊息140，處理單元 ν 1 4 成上述文字訊息140及所選之影像資料111 成影像151。藉此，本發明之可播式電子裝置 Γ資Γ二訊資料112轉換為文字訊息140，並合成於影資料心番之中進而達到提升可攜式電子裝置100之 f 3存量，以及快速為影像資料U1㈣文字訊息140 資料111可為一靜態圖片、-動態圖片或 -’田、中：可攜式電子襞置100更可包含-影像擷取單、外°卩擷取影像作為影像資料1Π ;亦或，可攜 200939797 式電子裝置100更可包含一麥克風及一錄音單元，用以自外部接收一聲音來錄製成音訊資料112 ;亦或，可攜式電子裝置100更可包含一無線通訊單元及一錄音單元，用以自外部接收一無線訊號來錄製成音訊資料112。請繼續參閱第2圖，其為本發明之可攜式電子裝置之實施例之方塊圖。在此實施例中，可攜式電子裝置101 包括一麥克風113及一錄音單元114。麥克風113可接收使用者的口述旁白，或使用者想輸入此可攜式電子裝置 j 100的聲音。而錄音單元114則將收到的聲音轉換為音訊資料112，然後儲存在儲存單元110之中。而上述之音訊資料112可為一包含數字和文字等資訊之電話號碼、一個人識別號、一密碼或一地址等訊息。辨識單元130較佳的是以一音源/音效處理應用程式之軟體方式來實現，或採用具有語音辨識功能之單晶片來實現。藉此，辨識單元130可以透過波形比對等方式將音訊資料112 轉換為一文字訊息140。因此，透過辨識單元130的口述 ❹ 内容辨識（Speech To Text)功能，本發明之可攜式電子裝置100可儲存更多訊息量。此外，辨識單元130更可過濾音訊資料112之背景聲音，進而避免背景音樂或雜訊干擾文字訊息的辨識度，例如僅採樣頻率介於0Hz到 4kHz的人聲。本發明之可攜式電子裝置101視需要更可包含一無線通訊單元115。並透過無線通訊單元115來接收一無線訊號，然後再用錄音單元114來將無線訊號轉換為音訊資料112,再儲存於儲存單元110之中。因此，本發明之 200939797 可攜式電子裝置101可為一可攜式通訊裝置、一個人數位助理（Personal digital assistant，PDA)、一 MP3 播放機或其它類似裝置。從另一個角度觀之，本發明之可攜式電子裝置100 之實施例中更可包含一影像擷取單元116。影像擷取單元 116係用以自外部擷取多個影像來做為影像資料111，並將這些影像資料111儲存於儲存單元110。影像擷取單元 — 116可包含一鏡頭117及一感測器118，由鏡頭117接收 > 外部環境之光學訊號，再由感測器118將光學訊號轉換成墊子訊號，藉此來擷取外部環境之影像。其中，感測器118可為一 CCD感測器或一 CMOS感測器。因此，本發明之可攜式電子裝置100之實施例中亦可為一數位相機或一攝影機。承上所述，本發明之可攜式電子裝置100於一實施例中亦可為一行動數位電視。此時，無線通訊單元115 可包含一數位電視訊號接收晶片，以符合美規之ATSC ❹ 8-VSB無線數位電視傳輸標準、歐規之DVB-T COFDM 無線數位電視傳輸標準或日規之ISDB-T C0FDM無線數位電視傳輸標準；並透過内建的影像擷取單元116來擷取正在播放的圖框（Frame)。藉此，本發明之可攜式電子裝置100可以適當的擷取影像資料111及音訊資料112，並透過辨識單元130將音訊資料112轉換為文字訊息 140，再將文字訊息140與影像資料111合成為合成影像 151。因此，使用者可以在一段時間之後，再透過合成影像151來進行回憶或追蹤此影像資料111之出處。 200939797 此外，操作介面120可包括一音訊觸發單元121，音訊觸發單元121可於被觸發後開始選擇上述之音訊資料 112。舉例而言，音訊觸發單元121可為一彈簧按鈕開關，並於使用者壓下彈簧按鈕開關後，開始節錄音訊資料 112，當使用者放開彈簧按鈕開關後，則中止節錄音訊資料112,藉此，使用者可以從音訊資料112中選擇所需的片段來進行語音辨識。 ' 此外，本發明之可攜式電子裝置100亦可以應用於 > 映晝字幕之製作。當影像資料111為一映晝（video)時，則文字訊息140可以字幕方式與映晝相結合。因此，本發明之可攜式電子裝置100可以透過口述的方式，為一段影像附加旁白或字幕，進而達到快速編輯一段影像之目的。此外，本發明之可攜式電子裝置101更可包含一編輯單元160，以進一步編輯合成影像151。舉例而言，請參考第3圖，其為本發明一實施例之合成影像之示意圖。 ❹ 圖中，合成影像200包含一影像訊息210及一文字訊息 220。而文字訊息220與影像訊息210之相對位置可以預設為文字訊息220在影像訊息210之上或之下，亦可以透過編輯單元160修改預設值、加入其他參數或小圖示 (icon)、修改亮度及彩度等。舉例而言，請參考第4圖，當使用者已將一第一文字訊息320與一影像訊息310合成為一合成影像200後；可再將合成影像200當作影像訊息310,再與第二文字訊 11 200939797 息330合成，來產生合成影像300。此時，使用者可透過編輯單元160預先設定第一文字訊息320、第二文字訊息 330與影像訊息310之相對位置。歸納上述，本發明之可攜式電子裝置100，藉由辨識單元130將音訊資料112轉換成文字訊息140,不但可準確紀錄音訊資料112,也可方便使用者不須用紙筆的方式記錄音訊資料112。且本發明之可攜式電子裝置100，由 ' 於文字訊息140所需之儲存容量遠小於音訊資料112所 > 需之儲存容量，所以能儲存更多音訊資料112所包括之訊息量。更進一步的說，本發明之可攜式電子裝置1〇〇，可避免使用者使用紙張記錄個人機密資料，如帳號，密碼，及識別號碼等訊息時，會因不慎漏失紙張而造成秘密外泡。請參閱第5圖，其係為本發明之影像與音訊資料合成方法之步驟流程圖。圖中，此方法包含下列步驟：首先，如步驟S11所示，透過一影像擷取單元自外部擷取〇一影像資料。然後，如步驟S12所示，透過一麥克風自外部接收一聲音。然後，如步驟S13所示，利用一錄音單元將此聲音轉換為一音訊資料。之後，如步驟S14所示，再將前述音訊資料轉換為一文字訊息。最後，如步驟S15所示，合成文字訊息於影像資料之中以產生一合成影像。此外，本發明之影像與音訊資料合成方法於一實施例中，更包括將音訊資料紀錄於一儲存單元之中。而且， 12 200939797 上述之影像擷取單元至少包含一鏡頭及一感測器；亦即，此影像擷取單元較佳可為一攝影機。此外，為了提升文字辨識度，本實施例更包括過濾音訊資料的背景聲音，而僅聚焦於人聲辨識。最後，本實施例更可利用一編輯單元來編輯映畫字幕，以收盡善盡美之效。藉此，本發明之影像與音訊資料合成方法可較佳地應用於廣播教育。亦即，本發明之影像與音訊資料合成 ' 方法可透過攝影機取得教學晝面，並透過麥克風取得與 > 教學晝面相關之講解内容。再透過辨識單元快速地將講解内容轉換為文字訊息，並附加於教學晝面之上。請繼續參考第6圖，其係為本發明之影像與音訊資料合成方法之步驟流程圖。其包含下列步驟：首先，如步驟S21所示，自一儲存單元中取得一所選影像。然後，如步驟S22所示，透過一無線通訊單元自外部接收一無線訊號。然後，如步驟S23所示，利用一錄音單元將上述無線訊號轉換為一音訊資料。然後，如步驟S24所示，〇利用一操作介面自上述音訊資料中擷取一音訊片段。然後，如步驟S25所示，將上述音訊片段轉換為一文字訊息。之後，如步驟S26所示，合成上述文字訊息與上述所選影像以產生一合成影像。最後，如步驟S27所示，再儲存上述之合成影像。承上所述，本發明之影像與音訊資料合成方法於一實施例中，上述的操作介面更包括一音訊觸發單元，音訊觸發單元可於被觸發後開始節錄音訊資料，並於終止 13 200939797 觸發後取得音訊片段，藉此節錄使用者真正想要的音訊片段。因此，音訊觸發單元較佳可為一彈簧按鈕開此外，本實施例更包括利用一影像擷取單元來自外部擷取多個圖片，並將這些圖片儲存於儲存單元之中。影像擷取單元至少包含一鏡頭及一感測器，感測器可^用 CCD感測器或CMOS感測器。為了提升文字辨識度，本實施例更包括過濾音訊片段之背景聲音，進而避免背景音樂或雜訊干擾文字訊息的辨識度。最後，本實施例更Technical Features and Achieved </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; Embodiments of the present invention: a portable electronic device and an image and audio data synthesizing method according to the preferred embodiment of the present invention. The same components in the following embodiments are denoted by the same reference numerals. It is the month of the portable electronic device of the present invention: = map. The portable electronic device of the present invention includes a storage unit, an interface 12, an identification unit 13 and a process: the storage unit 110 is configured to store at least one image data (1) data: 12, the operation interface 120 It is used to select the image i θ information 112. The identification unit 130 is configured to convert the text into a text message 140, and the processing unit ν 1 4 into the text message 140 and the selected image data 111 into an image 151. In this way, the broadcastable electronic device of the present invention converts the data 12 into a text message 140 and synthesizes it into the heart of the video data to improve the storage capacity of the portable electronic device 100, and quickly Image data U1 (4) Text message 140 Data 111 can be a still picture, a dynamic picture or - 'Tian, Zhong: Portable electronic device 100 can also include - image capture, external capture image as image data 1 Or, the portable electronic device 100 can include a microphone and a recording unit for receiving a sound from the outside for recording into the audio data 112. Alternatively, the portable electronic device 100 can further include a wireless communication. The unit and a recording unit are configured to receive a wireless signal from the outside to record the audio data 112. Please refer to FIG. 2, which is a block diagram of an embodiment of a portable electronic device of the present invention. In this embodiment, the portable electronic device 101 includes a microphone 113 and a recording unit 114. The microphone 113 can receive the dictation of the user, or the user wants to input the sound of the portable electronic device j 100. The recording unit 114 converts the received sound into audio data 112 and stores it in the storage unit 110. The audio data 112 may be a telephone number, a personal identification number, a password or an address including information such as numbers and texts. Preferably, the identification unit 130 is implemented in a software mode of a sound source/sound processing application or by using a single chip with voice recognition function. Thereby, the identification unit 130 can convert the audio material 112 into a text message 140 by means of waveform comparison. Therefore, the portable electronic device 100 of the present invention can store more messages through the vocabulary S Speech To Text function of the identification unit 130. In addition, the identification unit 130 can filter the background sound of the audio data 112, thereby avoiding the recognition of background music or noise interference text messages, for example, only vocals with a frequency between 0 Hz and 4 kHz. The portable electronic device 101 of the present invention may further include a wireless communication unit 115 as needed. The wireless communication unit 115 receives a wireless signal, and then the recording unit 114 converts the wireless signal into the audio data 112 and stores it in the storage unit 110. Therefore, the 200939797 portable electronic device 101 of the present invention can be a portable communication device, a personal digital assistant (PDA), an MP3 player or the like. Another embodiment of the portable electronic device 100 of the present invention may further include an image capturing unit 116. The image capturing unit 116 is configured to capture a plurality of images from the outside as the image data 111, and store the image data 111 in the storage unit 110. The image capturing unit 116 may include a lens 117 and a sensor 118. The optical signal of the external environment is received by the lens 117, and the optical signal is converted into a mat signal by the sensor 118, thereby extracting the external An image of the environment. The sensor 118 can be a CCD sensor or a CMOS sensor. Therefore, the embodiment of the portable electronic device 100 of the present invention may also be a digital camera or a camera. As described above, the portable electronic device 100 of the present invention may also be a mobile digital television in one embodiment. At this time, the wireless communication unit 115 may include a digital television signal receiving chip to comply with the US standard ATSC ❹ 8-VSB wireless digital television transmission standard, the European standard DVB-T COFDM wireless digital television transmission standard or the Japanese standard ISDB-T C0FDM wireless The digital television transmission standard; and through the built-in image capturing unit 116 to capture the frame being played. Therefore, the portable electronic device 100 of the present invention can appropriately capture the image data 111 and the audio data 112, convert the audio data 112 into the text message 140 through the identification unit 130, and synthesize the text message 140 and the image data 111. To synthesize image 151. Therefore, the user can reproduce or track the source of the image material 111 through the synthetic image 151 after a certain period of time. In addition, the operation interface 120 can include an audio trigger unit 121, and the audio trigger unit 121 can start to select the audio data 112 after being triggered. For example, the audio triggering unit 121 can be a spring button switch, and after the user presses the spring button switch, the node recording data 112 is started. When the user releases the spring button switch, the node recording data 112 is suspended. Therefore, the user can select a desired segment from the audio material 112 for speech recognition. In addition, the portable electronic device 100 of the present invention can also be applied to the production of > When the image data 111 is a video, the text message 140 can be combined with the image. Therefore, the portable electronic device 100 of the present invention can add a narration or subtitle to a segment of the image by means of dictation, thereby achieving the purpose of quickly editing a segment of the image. In addition, the portable electronic device 101 of the present invention may further include an editing unit 160 to further edit the synthesized image 151. For example, please refer to FIG. 3, which is a schematic diagram of a synthetic image according to an embodiment of the present invention. In the figure, the composite image 200 includes an image message 210 and a text message 220. The relative position of the text message 220 and the image message 210 may be preset to the text message 220 above or below the image message 210. The preset value may be modified through the editing unit 160, and other parameters or icons may be added. Modify the brightness and chroma. For example, please refer to FIG. 4, after the user has synthesized a first text message 320 and an image message 310 into a composite image 200; the synthesized image 200 can be further regarded as the image message 310, and then the second text. News 11 200939797 Information 330 is synthesized to produce a composite image 300. At this time, the user can preset the relative positions of the first text message 320, the second text message 330, and the image message 310 through the editing unit 160. In summary, the portable electronic device 100 of the present invention converts the audio data 112 into the text message 140 by the identification unit 130, which not only accurately records the audio data 112, but also facilitates the user to record the audio data without using a pen and paper. 112. Moreover, the portable electronic device 100 of the present invention can store more of the amount of information included in the audio data 112 because the storage capacity required for the text message 140 is much smaller than the storage capacity required by the audio data 112. Furthermore, the portable electronic device of the present invention can prevent a user from using a paper to record personal confidential information, such as an account number, a password, and an identification number, which may cause a secret due to accidental loss of paper. bubble. Please refer to FIG. 5, which is a flow chart of the steps of the method for synthesizing the image and audio data of the present invention. In the figure, the method includes the following steps: First, as shown in step S11, an image data is extracted from the outside through an image capturing unit. Then, as shown in step S12, a sound is received from the outside through a microphone. Then, as shown in step S13, the sound is converted into an audio material by a recording unit. Then, as shown in step S14, the audio data is converted into a text message. Finally, as shown in step S15, a text message is synthesized in the image data to produce a composite image. In addition, in an embodiment, the image and audio data synthesizing method of the present invention further includes recording audio data in a storage unit. Moreover, the image capturing unit of the above-mentioned 12 200939797 includes at least one lens and a sensor; that is, the image capturing unit is preferably a camera. In addition, in order to improve the character recognition, the embodiment further includes filtering the background sound of the audio material, and focusing only on the vocal recognition. Finally, in this embodiment, an editing unit can be used to edit the subtitles in order to achieve perfect results. Thereby, the image and audio data synthesizing method of the present invention can be preferably applied to broadcast education. That is to say, the method for synthesizing the image and the audio data of the present invention can obtain the teaching face through the camera, and obtain the explanation content related to the > teaching face through the microphone. Then, through the identification unit, the explanation content is quickly converted into a text message and attached to the teaching surface. Please refer to Fig. 6, which is a flow chart of the steps of the method for synthesizing the image and audio data of the present invention. It comprises the following steps: First, as shown in step S21, a selected image is taken from a storage unit. Then, as shown in step S22, a wireless signal is received from the outside through a wireless communication unit. Then, as shown in step S23, the above wireless signal is converted into an audio material by a recording unit. Then, as shown in step S24, an audio segment is retrieved from the audio data by using an operation interface. Then, as shown in step S25, the above audio segment is converted into a text message. Thereafter, as shown in step S26, the text message and the selected image are synthesized to generate a composite image. Finally, as shown in step S27, the above synthesized image is stored. In an embodiment, the operation interface of the present invention further includes an audio triggering unit, and the audio triggering unit can start the recording of the data after being triggered, and trigger at the termination 13 200939797. The audio clip is then obtained, thereby excerpting the audio clips that the user really wants. Therefore, the audio triggering unit is preferably a spring button. In addition, the embodiment further includes extracting a plurality of images from the outside by using an image capturing unit, and storing the images in the storage unit. The image capturing unit includes at least one lens and a sensor, and the sensor can use a CCD sensor or a CMOS sensor. In order to improve the recognition of the text, the embodiment further includes filtering the background sound of the audio segment, thereby preventing the background music or the noise from interfering with the recognition of the text message. Finally, this embodiment is more

包括利用一編輯單元來編輯合成影像，以收圖文並茂之功。以上所述僅為舉例性，而非為限制性者。任離本發明之精神與而對其進行之等效修改或變更，均應包含於後附之申請專利範圍中。【圖式簡單說明】子裝置之方塊圖；子裝置之實施例之方塊第1圖係為本發明之可攜式電第2圖係為本發明之可攜式電圃，第3圖第4圖第5圖 =本發明之合絲像之範例之示意圖；之°成影像之另一範例之示意圖; 發明之影像與音訊資料合流程圖；以及第6圖係為本發明之影像與音訊流程圖。資料合成方法之步驟 14 200939797 【主要元件符號說明】 100、101 :可攜式電子裝置； 110 ··儲存單元； 111 :影像資料； 112 :音訊資料； 113 :麥克風； 114 :錄音單元； 115:無線通訊單元； ^ 116 :影像擷取單元； j 117 :鏡頭； 118 :感測器； 120 :操作介面； 121 :音訊觸發單元； 130 :辨識單元； 140 :文字訊息； 150 :處理單元； 151 :合成影像； 1/60 :編輯單元； ❹ 200、300 :合成影像； 210、310 :影像資料； 220 :文字訊息； 320 :第一文字訊息； 330 :第二文字訊息；以及 S11〜S15、S21〜S27 :流程步驟。 15This includes the use of an editing unit to edit the composite image to capture the artwork. The above is intended to be illustrative only and not limiting. Equivalent modifications or variations of the present invention are intended to be included within the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a portable electric device of the present invention, and FIG. 3 is a fourth embodiment of the present invention. Figure 5 is a schematic view of an example of a composite image of the present invention; a schematic diagram of another example of image formation; a flow chart of an image and audio data of the invention; and a sixth image showing the image and audio flow of the present invention Figure. Step 14 of data synthesis method 200939797 [Explanation of main component symbols] 100, 101: Portable electronic device; 110 · Storage unit; 111: Image data; 112: Audio data; 113: Microphone; 114: Recording unit; Wireless communication unit; ^ 116: image capturing unit; j 117: lens; 118: sensor; 120: operation interface; 121: audio trigger unit; 130: identification unit; 140: text message; 150: processing unit; : synthetic image; 1/60: editing unit; ❹ 200, 300: synthetic image; 210, 310: image data; 220: text message; 320: first text message; 330: second text message; and S11~S15, S21 ~S27: Process steps. 15

Claims

ύ ❹ 200939797 X. Patent application scope: 1. A portable electronic device, comprising: □ consulting, storage unit is used to store at least - image data and at least one tone 5 tribute; select Τ operation interface, The utility model is characterized in that the image data and the audio data are used to carry out a treasure identification sheet 70 for converting the selected audio data into a text message; and the caution information is used to synthesize the text message. And the selected image asset to generate a composite image. 2. The portable electronic device according to the scope of the invention, further comprising a gram wind and a recording unit, the microphone is configured to receive a sound from the outside, and the recording unit converts the sound into the audio (4) and stores In the storage unit. 3. If you apply for a patent scope! The portable electronic device further includes a wireless communication unit and a recording unit. The wireless communication unit is configured to convert the wireless signal to the audio data and store the audio signal. In the storage unit. 4. The portable electronic device of claim 3, wherein the portable electronic device is a portable communication device. x 5. The electronic device as described in the scope of the patent application, the image lion unit, which is manufactured from the outside, takes the plurality of images, and the portable electronic device as described in claim 5 The device, the complex image capturing unit comprises a lens and a sensor. The portable electronic device of claim 5, wherein the portable electronic device is a digital camera or a camera. 8. The portable electronic device of claim 1, wherein the operating interface comprises an audio triggering unit, and a user can trigger an audio triggering unit to retrieve an audio segment from the audio data. 9. The portable electronic device of claim 8, wherein the audio triggering unit is a spring button switch. 10. The portable electronic device of claim 1, wherein if the image data is a video, the text message can be combined with the image by subtitle or marquee. 11. The portable electronic device of claim 1, wherein the identification unit further filters a background sound of the audio material. 12. The portable electronic device of claim 1, further comprising an editing unit to edit the composite image. 13. A method for synthesizing video and audio data, comprising the steps of: capturing an image data from an external image through an image capturing unit; receiving a sound from the outside through a microphone; converting the sound into an audio data by using a recording unit Converting the audio data into a text message; and synthesizing the text message into the image data to generate a composite image. 14. The method of synthesizing image and audio data according to claim 13 wherein the image capturing unit comprises a lens and a sensor. 15. The method for synthesizing video and audio data as described in claim 13 of the patent application, further comprising filtering a background sound of the audio material. 16. A method for synthesizing video and audio data, comprising the steps of: 17 200939797 obtaining a selected image from a storage unit; receiving a wireless signal from the outside through a wireless communication unit; converting the wireless signal into a recording unit by using a recording unit An audio data; capturing an audio segment from the audio data by using an operation interface; converting the audio segment into a text message; and synthesizing the text message and the selected image to generate a composite image. 17. The image and audio data synthesizing method according to claim 16, wherein the operation interface comprises an audio triggering unit, and the audio data side begins to be excerpted after the audio triggering unit is triggered, and is obtained after the termination triggering. The audio clip. 18. The method of synthesizing image and audio data according to claim 17, wherein the audio trigger unit is a spring button switch. 19. The method for synthesizing video and audio data according to claim 16 of the patent application, further comprising: using an image capturing unit to extract a plurality of images from the outside, and storing the plurality of images in the storage unit. 20. The method of synthesizing picture and audio data according to claim 19, wherein the image capturing unit comprises a lens and a sensor. 18