TWI359603B - A personal reminding apparatus and method thereof - Google Patents

A personal reminding apparatus and method thereof Download PDF

Info

Publication number
TWI359603B
TWI359603B TW96110927A TW96110927A TWI359603B TW I359603 B TWI359603 B TW I359603B TW 96110927 A TW96110927 A TW 96110927A TW 96110927 A TW96110927 A TW 96110927A TW I359603 B TWI359603 B TW I359603B
Authority
TW
Taiwan
Prior art keywords
image
person
camera
mobile phone
user
Prior art date
Application number
TW96110927A
Other languages
Chinese (zh)
Other versions
TW200840312A (en
Inventor
Jung Tang Huang
Original Assignee
Jung Tang Huang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jung Tang Huang filed Critical Jung Tang Huang
Priority to TW96110927A priority Critical patent/TWI359603B/en
Publication of TW200840312A publication Critical patent/TW200840312A/en
Application granted granted Critical
Publication of TWI359603B publication Critical patent/TWI359603B/en

Links

Landscapes

  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Description

1359603 九、發明說明: 【發明所屬之技術領域】 本發明是有關於一種相機智慧型手機應用於人物辨識之方 法’特別是有關於個人或年長者記憶提示的裝置與方法。 【先前技術】 在現代的生活當中,幾乎每人都有一隻手機,且每天都會攜 帶並使用到。根據一些分析報告以及一些追縱紀錄顯示,平均每 個人每天都會利用手機聯絡親友或娛樂打發時間。然而這樣普及 化的結果若只單純用來撥打電話似乎有些浪費,若皞在手機上加 入更多應用功能與加值服務則可使市場商機變得更大。其中,手 機在協助年長者或記憶力較弱者的應用功能方面仍是寥寥可數, 或是只用到GPS(衛星定位系統)協尋,以避免走失,但更生活化的 應用卻沒被開發出來,尤其是減輕老年人記憶衰退的問題並未 艾到重視。因此需要一個協助老年人或記憶力較弱者記憶的方 法,提醒老年人或記憶力較弱者相遇者是誰,而避免想不起來或 一直重複同樣的話使人生煩。 【發明内容】 由於老年人口的增加及手機的普及化,本發明將無線耳機整 合相機模組,使成為可擷取影像並傳輸至手機中,經影像辨識後 將人名發音傳輸至無線耳機,提示年長者或記憶力較弱者對方是 誰;,或者利用内‘建的麥克風來判斷,在許多應用場合中,需要認 出當下說話者到底是誰,或是對一個宣稱他是誰的人,從他說話 的聲音來確奴倾真喊是這個人。錢―步施行談話内容的 相似度比對’以避免重複同樣的話使人生須。 此種協助老年人或記憶力較弱者提示記憶的助憶裝置與其設 5 1359603 疋方法’係使用相機手機來擷取相遇者的影像,或是紀錄相遇者 的聲音’據此影像及聲音資料傳回資料庫來進行比對相遇者為何 亡’此方法包含下列步驟:⑻_治具來固定手機使其不易晃動 亚使手機中相機的位置朝前’啟動相機程式來齡相遇者的影像 進行辨識或啟動麥克風的錄音程式,並由此程式將影像語音資料 紀錄儲存;(b)將儲存的資料透過藍芽或無線網路傳給資料庫進杆 比對判斷’若比料格則㈣料庫賊過藍芽或無_路回傳此 相遇者的姓名到手機’再透過手機透過藍料機發音給使用者, 即可知道_料人錢者是麵無紐或說話重複_擾 另外’亦可透過語音較資料崎,補強影像躺的可靠度;⑼ 進步’可以對於陌生人的交談採取紀錄或傳送至照護者判斷, 以避免陌生人的欺騙,杜絕安全漏洞。 相滿可知,本發明之職裝置乃细敍化的城手機將 遇者的衫像及/或語音儲存後辨識,進行比對後可得知相遇者為 2 ’而奸料人或記憶力鶴者記赌 【實施方式】. ·裡衣直 祕益1_老年人4記憶力較财的助賊置,其設定方 =藉由-手機城齡相财H據關_遇者為何 ㈣it克風錄音後來進行比對,據此判斷該相遇者為何人, 直重複同樣的話或健忘他人的名字。在人物辨識方 =係=顿轉難#_(Haa]Mike feat咖)料以取得 搜尋是否含有臉部特徵。當取得臉部區塊後, 步驟以及相識者之判斷步驟,此兩步驟 塊與人細議庫崎含之场影像做比 對以識助遇者綠,錢—判_遇者是蝴目識者。 6 1359603 在語音辨識方面,主要可以區分為_部份:模型建立 練;未知語音辨識。其中無論是模型建立或是語者辨識均需要透 過語者語音雖之她。因此本剌啸_方絲遠端之一妒 上系統的模擬與分析。透過各種的模^ …數調$使辨識引擎的辨識率在較低的運算量下可以 =目^,的語者辨識系統仍然在以個人電腦為主的系二 =發’以欣人式系統開發的案例較少’即使是在嵌人式系統 發’也多半以LPC或是LPCC等,計算量少 無 拮㈣比對。不過若是相機手機的硬體能力強,== 入式系統開發完成。 r聽』以嵌 在許錢㈣合巾’需要認出當下說話到底是誰,或 ^宣=是_人’從域話鱗音來確奴否贿的就是這個 =’别者是語者識別(Speaker Identificati〇n),後者是語者驗 ς peaker Verif lcatlQn)。這兩觀識方式其核心、技術相似, 因此只需要透過些許的修改,即可同時具備兩種功能。 在語音辨識方面,本發明第一部份採用梅爾倒頻譜 el-scale Frequency Cepstral Coefficient, MFCC)^%,^ 錾ιί特徵參數:此部分的工作是透過手機的麥克風操取到 f曰的訊號,藉由放大ϋ與類比聰位轉換器’崎取數位 :信’再透過MFCC的運算獲取語者的聲紋特徵。目前在市面: =得的聲音編解碼器(Audi〇 c〇Dec)已經將前端類比放大與數位 太=比轉換的部份全部納入。其取樣頻率最高為侧z,足以應付 明所需。在取得數位的信號後,以獲取語者的聲紋特徵。 比*第:部份是將談朗容的聲音加以獅,以進行聲音訊號的 、处 耷音訊號具有連續性的特性,因此無法只針對單一取樣 的、、果進行研判。本部份可採用改良型的族群向量量化演算法1359603 IX. Description of the Invention: [Technical Field] The present invention relates to a method for applying a smart phone to a character recognition, particularly for a device or method for personal or elderly memory reminders. [Prior Art] In modern life, almost everyone has a mobile phone that is carried and used every day. According to some analysis reports and some tracking records, on average, each person uses their mobile phones to contact friends and relatives or entertain their time every day. However, the result of such popularization seems to be a waste if it is simply used to make a call. If you add more application functions and value-added services to your mobile phone, the market opportunity will become even bigger. Among them, the mobile phone is still very few in terms of assisting the elderly or the memory function of the weaker memory, or only using GPS (satellite positioning system) to find faults, but the more life-friendly applications have not been developed. In particular, the problem of alleviating the memory decline in the elderly has not been taken seriously. Therefore, there is a need for a method to assist the elderly or those with weaker memory to remind the elderly or those who have a weaker memory, and to avoid thinking about it or to repeat the same words to make life annoying. SUMMARY OF THE INVENTION Due to the increase of the elderly population and the popularization of the mobile phone, the present invention integrates the wireless earphone into the camera module, so that the image can be captured and transmitted to the mobile phone, and the name pronunciation is transmitted to the wireless earphone after the image recognition, prompting Seniors or those with weaker memory; or use the built-in microphone to judge, in many applications, you need to recognize who the current speaker is, or who is claiming who he is, from him The voice of the voice is really a slave. The similarity between the money-steps and the content of the conversation is 'to avoid repeating the same words to make life necessary. This kind of mnemonic device that assists the elderly or the memory of the weaker memory reminds the user to use the camera phone to capture the image of the encounterer, or to record the voice of the encounterer, according to the video and voice data. The database is used to compare the reasons for the death of the person's method. The method includes the following steps: (8) _ fixture to fix the phone so that it is not easy to shake the camera, the position of the camera in the phone is forward--the camera image is used to identify the image of the age-matching person or Start the recording program of the microphone, and store the video and voice data records by this program; (b) Transfer the stored data to the database through the Bluetooth or wireless network to compare the scores. After passing the blue bud or no _ road, return the name of the acquaintance to the mobile phone' and then pronounce it to the user through the mobile phone through the blue machine, you can know that the person who pays the money is no face or repeats the message _ disturb the other Respond to the reliability of the image lying through the voice, and (9) Progress can record the stranger's conversation or transmit it to the caregiver to avoid the deception of strangers. Vulnerability. It can be seen that the device of the present invention is a well-detailed city mobile phone that recognizes the person's shirt image and/or voice after storage, and after comparing, can know that the encounter is 2' and the person who is a scraper or a memory crane Gambling [Implementation]. · Li Yi Zhi Mi Yi 1_ the elderly 4 memory power of the help of the thief set, its set side = by - mobile phone city age wealth H according to the _ why the person (four) it Kefeng recording later Make a comparison and judge the person who met the person, repeat the same words or forget the name of others. In the character recognition side = system = 顿转难#_(Haa]Mike feat coffee) to get a search for facial features. After obtaining the face block, the steps and the judgment steps of the acquaintance, the two-step block compares with the image of the scene of the Kusaki-containing field to identify the helper green, and the money-judgment is a blind person. . 6 1359603 In terms of speech recognition, it can be mainly divided into _ part: model establishment practice; unknown speech recognition. Whether it is model establishment or speaker recognition, it is necessary to pass the speaker's voice. Therefore, the simulation and analysis of the system on one of the remote ends of the _ _ _ square wire. Through various modes, the recognition rate of the recognition engine can be reduced at a lower computational level, and the speaker recognition system is still in the personal computer-based system. There are fewer cases developed - even in the embedded system, most of them are based on LPC or LPCC, and the amount of calculation is small (4). However, if the camera phone has strong hardware capabilities, the == system is developed. r listen to "embedded in the money (four) towel" need to recognize who is the current speaker, or ^ Xuan = is _ person's from the domain scales to make sure that the slave is not a bribe is this = 'other is the speaker recognition (Speaker Identificati〇n), the latter is the verbal tester peaker Verif lcatlQn). The two methods of observation are core and technically similar, so only a few modifications are required to have both functions. In terms of speech recognition, the first part of the invention adopts Mel-scale Frequency Cepstral Coefficient, MFCC)^%,^ 錾ιί characteristic parameter: the work of this part is to obtain the signal of f曰 through the microphone of the mobile phone. The voiceprint feature of the speaker is obtained by the operation of the MFCC by the magnifying ϋ and the analog-like converter 'saturated digit: letter'. Currently in the market: = The audio codec (Audi〇 c〇Dec) has incorporated all of the front-end analog-to-digital and digital-to-contrast conversions. The sampling frequency is up to the side z, which is sufficient for the purpose. After obtaining the digital signal, the voiceprint feature of the speaker is obtained. Compared with *: Part of it is to talk about the sound of Langrong with the lion, to carry out the sound signal, and the continuity of the voice signal, so it is impossible to judge only the single sampling and fruit. This part can use improved population vector quantization algorithm

GVQ 雜㈣n 特訓練後的模型可成為小範圍内的 〜’目的是為了使GVQ賴使用競爭式學習法的VQ類 厶’·可以和傳統VQ方法一樣使用統計模型計算參數通機 二演經網路擁有很高分類正轉的優點。改良二 、二矛'了保留GVQ將特徵參數分群經輸人LVQ網路中的機 進"比#加入了有條件的整體訓練機制,以針對整群的聲音信號 / ^ ’本發贿使用簡则擎也可透過瓣遞㈣經網路進 订语者的_。_傳_神_路也可以透過硬體的方式實 Φ ΐ透1^樣的方式,整個辨識系統的特徵擷取與辨識的工作必 夕可以X由硬體實現’在效率上姐使錄體的方式要高上許 二相遇者的與語音辨識可以轉換成文字,即可以利用文字 比對搜㈣擎,_j人㈣庫裡賴所屬相遇者名下的談話内容 ^錄做,字比對’若她度高於—門触即視_相遇者知道 述的^州谷’此時可轉換話題或延續但加深内容,避免使人 生煩若相遇者為陌生人,其交談應採取紀錄或傳送至照護者判 斷或特別比對是否有匯款或金錢交易的文字内容,以避免陌生 人的欺騙,杜絕安全漏洞。 Μ參閱第1圖’其齡示本發明_實關之助憶裝置使用環 境配置圖。其中老年人或記憶力較弱者1G1係代表適用此產品的 主角。本發明之辅助記憶的方法所需之硬體設備則包括相機手機 105、藍芽耳機107以及相遇者1〇3。 %參閱第2圖’其雜示本發明助憶裝置之可能的相機手機 配置圖’其中攝影機2Q1為内建之數位相機,挪為内建之麥克 1359603 風’ 205為顯示屏幕。 月 > 閱第3圖,其係繪示本發明助憶裝置實施例之一的設定 程式机程® ’其巾當啟動系統後’使用者於偵測有人靠近時,則 執灯=式啟動步驟3G1。接著啟動拍照及/或錄音程式啟動步驟3〇3 來將〜像及/或聲音開始紀錄如步驟3〇5,然後啟動步驟謝辨識 程式來判斷相遇者為何人並儲存紀錄檔案如步驟期,將紀錄檀案 以無線方式傳到貧料庫中作比對如步驟3n,比對影像及/或聲音 步驟313,以判斷辨識及回傳動作是否都正常。 完成比對步驟315後,則將比對符合的人名回傳至手機步驟 手機將相遇者人名的聲音透過無線耳機發出如步驟SB,辨 w凡畢後則關閉程式321,或進入談話内容紀錄與比對的程序,其 過私為步驟307辨識程式來儲存談話内容紀錄檔案如步驟3〇9,將 、、:己錄檔案以無線方式傳到資料庫巾作比對如步驟Ml,比對聲音步 驟313以判斷辨識及回傳動作是否都正常。完成比對步驟犯 則將比對符合的談話内容哺至手機步驟317,手機將談話内 各相似度的聲音透過無料機發出如步驟319,由使用者自行決定 接續的談話内容。 上述人物比對的技齡要先根據人臉縣賴,接著利用語 (Speaker Identification^ « (Speaker ^ification)藉由影像與語音的雙重辨識,更容易提高辨識率。 右疋此兩種觸方法仍未能提供唯—的人名,本發縣會依照相 ^度排序回射柏人名,提供朗者縮小其思索翻 快速回憶其人名。 、 旦人名正確,之後將會把談話内容加以錄存並傳送至遠端 的資料庫中,人名之下的談話内容檔之内並與之前儲存的内容 匕對凡疋有夕個關鍵字或關鍵句相似者即回傳給使用者告知 9 1359603 二前的談話内容曾在過去最近的日期中對此相遇者談過,應避 重複。若是沒有乡侧鍵字或關網她者,即依照 以記錄於該相遇者人名之下的談話内容檔之内。 關加 …總而言之’本發明提鄕助個人提示記憶的設定方法, 係藉由-手機械擷取相遇者之影像絲音,(此處辭機可 ‘具有照相與聲音輸出入功能的行動電話、pDA等行動裝置) 此判斷相遇者為何人;或是透過聲音觸,來提示使用者是否與 相遇者重複談話的内容,其特徵包含下列步驟·· 〃 (a) 啟動-相機設定程式,並由該相機設定程式將攝錄的影像 做儲存分析並做_,亦啟動錄音程式錄下相遇者騎 ^ 並提醒使用者; 到斷 (b) 判斷該影像是否有齡並傳送到域,若 =後則由該影像處理程式執行步驟⑽,若該情目機== 衫像完全,則由該手機相機另外執行步驟⑽,包含以下步驟·· (bl)驅動該手機相機擷取—相遇者影像,若未棘到該相 像貝步驟(bll),若齡到該相遇者影像則執行步 驟(bl2),包含以下步驟: …⑽)發th警告訊息要求使財難手機相機位置,使 得該手機相機能有效擷取相遇者影像; ⑽)依據所齡之郷縣__遇者為何人, 包含辨認該相遇者概,以進—步傳送到資料庫做 以及 ⑽未舰到姆者影像時職錢用者確認是否有使 手機相機位置朝前’以利易於捕捉影像判斷是否為資料庫中的 相識者; 在步驟⑽以及⑽中,若該使用者超過-段時間未使用 辨認功能則手機則進入休眠的一般待機狀況; (c)啟*動—麥克風錄音設定程式,並由該麥克風錄音設定程式 春=製的聲音做儲存分析並做辨認;主要的目的是將關鍵的字眼 立日例如人時事地物等加以棟取’以進行聲音訊號的比對。因聲 "訊號具有連續性的特性’因此無法只針對單-取樣的結果進行 研判。本發明可以採用: (二1)語音標財考慮到語音模型,直細語音的特徵參數, 在另-語音文件中進行比對,希望找出最接近的語音内容 式雖”越語言模型,但在長時間的語音文件中,辨識 杜〜同’並不符合高準確性的要求。*語音文字檢索,則是在 会疋S吾音模^下’先進行語音辨識,再以辨識出來的文字進行檢 ,、由於目則文字比對技術成熟,因此此法的關鍵在於語 ^壞,辨識文字内容則可十分廣泛。(⑵語音辨識,主要$ 來辨識出聲音的文字内容為何。 、有關本發明所需裝較佳實施例,參見第4圖,基本上槪八 為四大模組:輸人端4〇、輸人工具5G、人卫智慧模㈣、盘輸= 助含自我胃音輸人41可用來提供簡易聲控命 『追蹤』、『辨識』、『重來』、『停止』等;他人語音輸入/用曰來』提 供相遇者隨機或相識者職的語音輸人;影像輸人43料提 遇者隨機或相騎㈣雜人;_或記針輸人來 供使用者自行提供藉由鍵盤、難式或記憶卡輸人所需 料;無線資料傳輸45用來提供無線通訊的方式輸入所需的資^ 輸入工具50主要是指麥克風51與數位相機52,其中數位 具有自動鎖定臉部的魏,也可進—步加人望遠鏡頭,使盆 較遠的人物進行臉部偵測。 ’、爿b對 11 像進模組61,細相遇者影 以儲存,或社入^識疋否為相識者’或將新認識者的影像特徵加 加以從/、Ί_識衫為相識者,紐職識者的聲紋特徵 ,鼻模組62 ’進行除了辨識與談話内容比對以外所$ 存次t,,解釋模組63 ’用來比對相遇者與其談話内容;儲 塞二:.4,館存相遇者的影像特徵與其談話内容’尤其是人時 成語轉音模組65 ’用於將解釋模組63得到的結果轉 於王:。卫智慧模組6G較佳為建立於遠端的個人電腦或屬 =機服務範圍的值器内,若手機硬體功能強大也可内建=2 的佶用輸$1工0具70包含椒或揚聲器咖㈣71與耳機72,較佳 、吏用選擇疋耳機’尤其是無線耳機,增加其方便性。 H的硬齡财式,包含—钱城、—贿手機相機 鏡頭朝_定人物臉部㈣具,及—遠端電腦(具資料庫與運 能)。 #以上所述已詳細說明本發明之精神,如在此技術領域中具有通 常知識者麵解本發明之較佳實關後,#可由本發明所教示之技 術,加以改變及修飾,其並不脫離本發明之精神與範圍。 【圖式簡單說明】 為讓本發明之上述和其他目的、特徵、優點與實施例能更明 顯易懂’所附圖式之詳細說明如下: 第1圖係繪示依照本發明一實施例的使用者與相遇者的環境 第2圖係繪示依照本發明一實施例的相機手機平面圖(以直 立式手機為例)。 第3圖係繪示依照本發明一實施例的助憶裝置設定程式流程 1359603GVQ Miscellaneous (four) n special training model can be a small range ~ 'The purpose is to make GVQ rely on the VQ class of competitive learning method 厶' can use the same statistical model to calculate the parameters as the traditional VQ method The road has the advantage of high classification and forward rotation. Improved two and two spears 'retained GVQ to group the characteristic parameters into the LVQ network by entering the machine into the LVQ network to join the conditional overall training mechanism to target the entire group of voice signals / ^ 'this bribe use Jane will also be able to pass the message through the Internet (4). _ _ _ God _ road can also be Φ through the hardware way ΐ through the 1 ^ sample way, the entire identification system features capture and identification work can be achieved by the hardware The way to be high is to meet the two people and the speech recognition can be converted into words, that is, you can use the text comparison search (four) engine, _j people (four) Curry Lai's name of the conversation under the name of the record ^ record, word comparison She is higher than the door-to-door view _ the encounter knows that the state of the valley can change the topic or continue but deepen the content, to avoid making the life of the encounter as a stranger, the conversation should be recorded or transmitted to care The judge judges or specifically compares the text content of the remittance or money transaction to avoid the fraud of strangers and prevent security breaches. Referring to Figure 1, the age of the present invention is shown in the context of the use of the environment. Among them, the elderly or the weaker memory 1G1 represents the protagonist of this product. The hardware devices required for the method of assisting memory of the present invention include a camera phone 105, a Bluetooth headset 107, and an encounterer 1-3. % refers to Fig. 2', which shows a possible camera phone configuration diagram of the present invention. The camera 2Q1 is a built-in digital camera, and the built-in microphone 1359603 wind '205 is a display screen. Month> Referring to Figure 3, it shows a setting program of one of the embodiments of the present invention memory device. Step 3G1. Then start the photo and/or recording program to start step 3〇3 to start the image and/or sound recording as in step 3〇5, then start the step identification program to determine the person who met the person and save the record file as the step period, The recorded Tan case is wirelessly transmitted to the poor repository for comparison. Step 3n, compare image and/or sound step 313 to determine whether the identification and return actions are normal. After the comparison step 315 is completed, the matching person name is transmitted back to the mobile phone step. The mobile phone sends the voice of the meeting person's name through the wireless earphone as step SB, and then closes the program 321 or enters the conversation content record and The comparison program, the private procedure for the step 307 identification program to store the conversation content record file, as shown in step 3, 9, the:, the recorded file is wirelessly transmitted to the database towel for comparison, such as step Ml, the comparison sound Step 313 is to determine whether the identification and return actions are normal. After the comparison step is made, the matching conversation content is fed to the mobile phone step 317, and the mobile phone sends out the similarity sounds in the conversation through the no-machine. As in step 319, the user decides the subsequent conversation content. The technical age of the above-mentioned characters must be based on the face of the county, and then the use of the language (Speaker Identification^ « (Speaker ^ification) by the dual identification of image and voice, it is easier to improve the recognition rate. If the name of the person is not available, the county will sort back the name of the person in accordance with the degree of the degree, and provide the person to narrow down his thoughts and quickly recall the name of the person. Once the name is correct, the conversation will be recorded and transmitted. In the remote database, the content of the conversation under the name of the person and the previously stored content 匕 疋 个 关键字 关键字 关键字 关键字 关键字 关键字 关键字 关键字 关键字 关键字 关键字 个 个 个 个 个 个 个 个 个 9 9 9 9 9 9 9 9 9 9 9 The content has been discussed in this past date in the past, and should be avoided. If there is no township key or the network is closed, it will be recorded in the conversation content under the name of the person in the meeting. In addition, the invention provides a method for setting the personal reminder memory by using the hand-mechanical device to capture the image of the encounter. (There is a camera and sound input and output function here. Mobile phone, pDA and other mobile devices) This is the reason for the person who judges the encounter; or the content that prompts the user to repeat the conversation with the person through the voice. The features include the following steps: 〃 (a) Startup - Camera Setup And the camera setting program will store and analyze the recorded images and make _, and also start the recording program to record the encounter rider and remind the user; to break (b) to determine whether the image is old and transmitted to the domain, If =, then the image processing program executes step (10). If the girl machine == the shirt image is complete, the mobile phone camera additionally performs step (10), including the following steps: (bl) driving the mobile phone camera to capture - encounter If the image is not in the spine (bll), if the image of the encounter is reached, the step (bl2) is executed, including the following steps: ... (10)) sending a warning message requesting the location of the camera phone, so that The mobile phone camera can effectively capture the images of the encounterers; (10)) According to the age of the county, the person who is involved in the situation, including identifying the person who met the encounter, to transfer to the database and (10) when the image of the unmanned person is Job money It is confirmed whether the camera position of the mobile phone is facing forward to facilitate easy image capture to determine whether it is an acquaintance in the database; in steps (10) and (10), if the user does not use the recognition function for more than a period of time, the mobile phone goes to sleep. General standby condition; (c) Start-motion-microphone recording setting program, and the storage and analysis of the sound of the microphone recording setting program is performed; the main purpose is to set the key words for example, such as human affairs. Things are taken to the 'to compare the sound signals. Because the sound "signal has continuity characteristics', it is not possible to judge only the results of single-sampling. The invention can adopt: (2) Voice standardization Considering the speech model, the characteristic parameters of the direct and fine speech are compared in another voice file, and it is hoped to find the closest speech content formula although the "transliteration model", but In the long-term voice file, the identification of Du ~ the same 'does not meet the requirements of high accuracy. * Voice text search, is in the meeting 吾 吾 音 ^ ^ ^ first voice recognition, and then identify the text For the inspection, because the text is more mature than the text, the key to this method is that the language is bad, and the text content can be widely used. ((2) Speech recognition, mainly $ to identify the text content of the sound. The preferred embodiment of the invention is required. Referring to Figure 4, basically, the four modules are four modules: the input terminal 4〇, the input tool 5G, the human health wisdom module (4), the disk transmission = the self-stomach transmission Person 41 can be used to provide simple voice control "tracking", "identification", "re-come", "stop", etc.; other people's voice input / use 曰 to provide the voices of the random or acquaintance of the encounter; 43 candidates are random or Ride (four) miscellaneous; _ or record the input to the user to provide the user with the keyboard, hard or memory card to enter the required materials; wireless data transmission 45 is used to provide wireless communication means input required ^ input The tool 50 mainly refers to the microphone 51 and the digital camera 52. The digits have a Wei that automatically locks the face, and can also be added to the telescope head to make the face detection of the farther person. ', 爿b to 11 Like entering the module 61, the thin meets the shadows to store, or the social input is recognized as the acquaintance' or adds the new acquaintance's image features to /, Ί _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The pattern of the pattern, the nose module 62' performs the save time t in addition to the identification and the conversation content, and the interpretation module 63' is used to compare the conversation with the interviewer; the storage plug 2: .4, the library encounter The image feature and its conversation content 'especially the human idiom audio module 65' are used to transfer the result obtained by the interpretation module 63 to the king: the health smart module 6G is preferably a remotely built personal computer or genus = In the value range of the machine service range, if the mobile phone hardware is powerful, it can be built in. =2 佶 输 $ $ $ $ 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 City, bribe mobile phone camera lens towards the _ person face (four), and - remote computer (with database and capacity). # The above has detailed the spirit of the present invention, as in this technical field has the usual The present invention may be modified and modified by the teachings of the present invention without departing from the spirit and scope of the present invention. And other objects, features, advantages and embodiments can be more clearly understood. The detailed description of the drawings is as follows: FIG. 1 is a diagram showing the environment of the user and the encounteree according to an embodiment of the present invention. A plan view of a camera phone according to an embodiment of the present invention (take an upright mobile phone as an example). 3 is a flow chart showing a setting procedure of a memory aid device according to an embodiment of the invention 1359603

第4圖係繪示依照本發明一實施例的助憶裝置架構模組方塊 圖。 【主要元件符號說明】 40 輸入端 41 自我語音輸入 42 他人語音輸入 43 影像輪入 44 鍵盤或記憶卡輸入 45 無線資料傳輸 50 輸入工具 51 麥克風 52 數位相機 60 人工智慧模組 61 語音影像辨識模組 62 運算模組 63 解釋模組 64 儲存資料庫 65 文字轉語音模組 70 輪出工具 71 制口八或揚聲器(speaker) 72 耳機 101 使用者 103 相遇者 105 手機相機 107 無線耳機 201 内建相機 203 麥克風 205 _示屏幕 301 偵測有人靠近步驟 303 敎動程式步驟 305 紀錄步驟’ 307 辨識判斷 309 儲存紀錄步驟 311 無線回傳資料步驟 313 判斷比對步驟 315 此對完成步驟 317傳回相符人名步驟 319 通知使用者步驟 321 關閉程式步驟FIG. 4 is a block diagram showing a module structure of a mnemonic apparatus according to an embodiment of the invention. [Main component symbol description] 40 Input 41 Self voice input 42 Other voice input 43 Image wheel entry 44 Keyboard or memory card input 45 Wireless data transmission 50 Input tool 51 Microphone 52 Digital camera 60 Artificial intelligence module 61 Voice image recognition module 62 Operation module 63 Interpretation module 64 Storage database 65 Text-to-speech module 70 Round-off tool 71 Port eight or speaker (speaker) 72 Headphones 101 User 103 Meetr 105 Mobile phone camera 107 Wireless headset 201 Built-in camera 203 Microphone 205 _ display screen 301 detects someone approaching step 303 敎 program step 305 recording step ' 307 identification determination 309 storage record step 311 wireless return data step 313 determination comparison step 315 this pair of completion step 317 returns the matching person name step 319 Notify the user Step 321 Close the program step

Claims (1)

十、申請專利範圍: 1. -種協助個人提示記憶的設定方法 遇者之影像或聲音,據此判斷相遇者為相機摘取相 識,來提示使用者是否與相遇者者透過聲音辨 列步驟: U室偏話_容’其特徵包含下 做峨錄的影像 並提醒使岐:後動錄音料錄下麵麵對話來顯 ,⑻判斷該影像是否存並傳送社機,若 象後則由一影像處理程式執行乡_ 11 影像—人,目α二 (),若該手機相機未捕捉 以象二則由該手機相機另外執行步驟㈣,包含以下步驟: 遇者該手機相機擷取—相遇者影像,若未齡到該相 =劇執砂驟_,若擷取_㈣者影侧執行步驟 (bl2) ’包含以下步驟: 〜(bll)發出警告訊息要求使用者調整手機相機位置,使 得該手機相機能有效擷取相遇者影像; (M2)依據所擷取之該影像來辨識該相遇者為何人包 含辨認該相遇者特徵,以進-步傳送到資料庫做比對,比 對後將結果回傳使用者;以及 (b2)未捕捉到相遇者影像時則提示使用者確認是否有使手 機相機位置朝前,以利易於捕捉影像判斷是否為資料庫中的相 識者; 在步驟(M)以及(b2)中,若該使用者超過一段時間未使用 辨認功能則手機則進入休眠的一般待機狀況。 2, 如申請專利範圍第丨項所述的設定方法’其中該步驟(Μ2)包含 1359603 ———— 〜 101年1月$日修正替換頁 步驟_1)以及(bl22),係由該助憶裝置設定— (bl21)所得之該相遇者之身分選擇執行(M22)以及(bl23),若該使 用者為相識者則執行該步驟(M22),若為陌生者則執行該步驟 (M23),該些步驟包含: (bl21)比對該相遇者影像與資料庫内所儲存之人臉影 ‘ 像’藉以識別該使用者身份;將比對符合的人名回傳至手機, 手機將相遇者人名的聲音透過有線或無線耳機發出給使用 者; (bl22)啟動一麥克風錄音設定程式,並由該麥克風錄音 設定程式將錄製的聲音做儲存與壓縮,將壓縮檔案以無線方 式傳到資料庫中作比對,完成比對後,則將比對符合的談話 内谷回傳至手機,手機將談話内容相似度的比率透過有線或 無線耳機發出,由使用者自行決定接續的談話内容;以及 (bl23)其交談應採取紀錄或傳送至照護者判斷,或特別比 對是否有匯款或金錢交易的文字内容,以避免陌生人的欺編, 杜絕安全漏洞。 3. 如申„月專利範圍第2項所述的設定方法,其中該步驟作121)進 一步可利用語者識別(Speaker Identification),以及語者驗證 (SpeakerVerification)藉由影像與語音的雙重辨識,以提高辨 識率。 4. 如中請專利範圍第丨項所述的設定方法,其中該手機可以泛指 具有照相功能與聲音輸出入功能的行動電話、pDA等行動裝置。 5. 如申β月專利範圍帛2項所述的設定方法,其中該步驟(犯2)所 謂的相似度是指凡是有多個關鍵字或關鍵句相似者,例如人時 事地物等,即回傳給使用者告知目前的談話内容曾在過去最近 的曰期中對此相遇者談過,應避免重複;若是沒有多侧鍵字 15 1359603 年】月2日修正替換頁 或關鍵句相似者,即依照日期時間加以記錄於、 下的談話内容檔之内β I有人名之 6. 人提示記憶的裝置,係包含—手機相機,及一具資 ”運异功能的遠端電腦,該手機相機泛指具有昭相盘聲立 輸出入魏的行動裝置,分為四大模組:、n聲曰 求助=執輸入’可用來提供簡易聲控命令要 德以日音輸A ’用來提供相遇者隨 騎音H雜人,料提_遇者隨機 由鍵盤、觸控式榮幕或記_入== Η 輸’用來提供無線通訊的方式輪人所f的資料; 輸入工具,主要是指麥克風與數位相機; 傻、隹t智慧模組,包含語音影像辨識模組,用於將相遇者影 徵加且加以辨識是否為相識者,或將新認識者的影像特 聲紋特是:為:識者,或將新認識者的 以外所需的運算:處2 =握f订除了辨識與談話内容比對 内容.儲在-二 解釋模用來比對相遇者與其談話 並是人日料庫’儲存相遇者的影像特徵與其談話内容,尤 結果轉成語文字轉語音模組’用於將解釋模組得到的 耳機咖㈣,細使用選擇是 7..如申請專利範圍第6項所述的 8 m電腦或屬於手機服務範圍_服器内,一組建立於 •申請專利範圍第0項所述的穿詈, 歸屬手機服務範_伺服糾。、/m日慧模組建立於 16 1359603 1〇】年】月?日修正替換頁 9·如申請專利範圍第6項所述的裝置, 手機内。 1〇如申請專利範圍第6項所述的裝置,其愤位相機具有自動鎖 定臉部的功能,進-步加人望遠鏡頭,使其能對較遠的 行臉部偵測。 % 11.如申請專利範圍第6項所述的裝置,進一步包括—個使手機相 機鏡頭朝前鎖定人物臉部的器具。 17X. The scope of application for patents: 1. The image or sound of the person who assists the personal reminder of the memory setting method, according to which the judges are asked to pick up the acquaintance for the camera, to prompt the user whether to distinguish the person through the voice through the steps: U-room partial _ 容 'characteristics include the image of the transcript and reminder 岐: the following recordings are recorded below the recording, (8) to determine whether the image is stored and transmitted to the social machine, if the image is followed by an image The processing program executes the township_11 image-person, the object α2(). If the camera of the mobile phone is not captured, the camera performs another step (4), including the following steps: The camera captures the image of the encounterer. If the age is not reached to the phase = drama sand _, if the _ (four) image side execution step (bl2) 'includes the following steps: ~ (bll) issued a warning message asking the user to adjust the phone camera position, so that the phone The camera can effectively capture the image of the encounterer; (M2) according to the captured image to identify the person who includes the identification of the meetr, and then forward the data to the database for comparison, after comparison The result is returned to the user; and (b2) when the encounterer image is not captured, the user is prompted to confirm whether the camera position is facing forward, so that it is easy to capture the image to determine whether it is an acquaintance in the database; And (b2), if the user does not use the recognition function for a certain period of time, the mobile phone enters a general standby state of sleep. 2, as set forth in the scope of the patent application section [where the step (Μ2) contains 1359603 ————~ January 101$day correction replacement page steps _1) and (bl22) Recall that the device settings - (bl21) the selected person's identity selection execution (M22) and (bl23), if the user is a acquaintance, perform this step (M22), if it is a stranger, perform this step (M23) The steps include: (bl21) identifying the user's identity by comparing the image of the encounter with the image of the person stored in the database; returning the matching name to the mobile phone, the mobile phone will meet The name of the person is sent to the user via a wired or wireless headset; (bl22) activates a microphone recording setting program, and the microphone recording setting program stores and compresses the recorded sound, and wirelessly transmits the compressed file to the database. For comparison, after the comparison is completed, the matching conversation is transmitted back to the mobile phone, and the ratio of the similarity of the conversation content is sent through the wired or wireless earphone, and the user decides to continue the conversation. ; And (bl23) record their conversation should be taken or transmitted to the caregiver judge, or whether there are special than text or money remittance transactions, in order to avoid the bully compiled strangers, eliminate security vulnerabilities. 3. The setting method according to item 2 of the patent scope of the application, wherein the step 121) further utilizes speaker identification, and speaker verification (Speaker Verification) by dual recognition of image and voice. In order to improve the recognition rate. 4. The setting method described in the scope of the patent scope is as follows, wherein the mobile phone can be generally referred to as a mobile phone, a pDA, etc. having a camera function and a voice input and output function. The setting method described in the scope of Patent 帛 2, wherein the step (criminal 2), the so-called similarity refers to any person who has multiple keywords or key sentences, such as a person's current affairs, etc., that is, the user is informed of the return. The current conversation has been discussed in this past season in the past, and should be avoided; if there is no multi-sided key 15 1359603] 2nd day to correct the replacement page or key sentence similar, that is, according to the date and time to record In the context of the conversation, the name of the person is 6. The device for prompting memory includes a mobile phone camera and a remote computer with a different function. The camera refers to the mobile device with the sound of the sound of the disc, and is divided into four modules: n 曰 曰 = = = input can be used to provide a simple voice command to be used in the Japanese The encounter with the rider H miscellaneous, the material is _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ It refers to the microphone and the digital camera; the silly, 隹t smart module, including the voice image recognition module, is used to add the encounter image and identify whether it is a acquaintance, or the image of the new acquaintance is special : for: knowing the person, or the operation required outside the new acquaintance: at 2 = grasping f to distinguish the content of the comparison with the transcript. The stored-two interpretation model is used to compare the interview with the person and is the person's day The library 'stores the image characteristics of the encounter and its conversation content, especially the result of the idiom text-to-speech module' is used to interpret the module to obtain the headset coffee (four), the fine use selection is 7.. as claimed in the sixth item The 8 m computer or the mobile phone service range _ Within the device, set up a stand • patented range of 0 through the curse, the home phone service Fan _ servo correction. , / m 日慧模块 was established in 16 1359603 1〇] year] month? Japanese Correction Replacement Page 9 The device described in claim 6 of the patent application, in the mobile phone. 1. The device of claim 6, wherein the angling camera has a function of automatically locking the face, and further adding a telescope head to enable detection of a distant face. The device of claim 6, further comprising an instrument for locking the face of the person with the camera lens facing forward. 17
TW96110927A 2007-03-29 2007-03-29 A personal reminding apparatus and method thereof TWI359603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW96110927A TWI359603B (en) 2007-03-29 2007-03-29 A personal reminding apparatus and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW96110927A TWI359603B (en) 2007-03-29 2007-03-29 A personal reminding apparatus and method thereof

Publications (2)

Publication Number Publication Date
TW200840312A TW200840312A (en) 2008-10-01
TWI359603B true TWI359603B (en) 2012-03-01

Family

ID=44821119

Family Applications (1)

Application Number Title Priority Date Filing Date
TW96110927A TWI359603B (en) 2007-03-29 2007-03-29 A personal reminding apparatus and method thereof

Country Status (1)

Country Link
TW (1) TWI359603B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI413106B (en) * 2010-08-04 2013-10-21 Hon Hai Prec Ind Co Ltd Electronic recording apparatus and method thereof
TWI700630B (en) * 2018-05-31 2020-08-01 技嘉科技股份有限公司 Voice-controlled display device and method for retriving voice signal

Also Published As

Publication number Publication date
TW200840312A (en) 2008-10-01

Similar Documents

Publication Publication Date Title
Tao et al. End-to-end audiovisual speech recognition system with multitask learning
US7751597B2 (en) Apparatus and method for identifying a name corresponding to a face or voice using a database
US9330658B2 (en) User intent analysis extent of speaker intent analysis system
US10142461B2 (en) Multi-party conversation analyzer and logger
US9374463B2 (en) System and method for tracking persons of interest via voiceprint
Singh et al. Applications of speaker recognition
JP2004533640A (en) Method and apparatus for managing information about a person
TW201905675A (en) Data update method, client and electronic device
US20110082874A1 (en) Multi-party conversation analyzer & logger
JP2022523921A (en) Liveness detection and verification method, biological detection and verification system, recording medium, and training method for biological detection and verification system.
Upadhyay et al. [Retracted] SmHeSol (IoT‐BC): Smart Healthcare Solution for Future Development Using Speech Feature Extraction Integration Approach with IoT and Blockchain
JP2015104078A (en) Imaging apparatus, imaging system, server, imaging method and imaging program
Zewoudie et al. The use of audio fingerprints for authentication of speakers on speech operated interfaces
TWI359603B (en) A personal reminding apparatus and method thereof
KR20220005232A (en) Method, apparatur, computer program and computer readable recording medium for providing telemedicine service based on speech recognition
US10447968B1 (en) Controlled-environment facility video communications monitoring system
TWM635534U (en) Artificial intelligence voice controlled banking transaction system
CN114862420A (en) Identity recognition method, device, program product, medium and equipment
JP2012003698A (en) Conference support device, conference support method, conference support program and recording medium
JP2020155061A (en) Individual identification system, individual identification device, individual identification method and computer program
US20240038222A1 (en) System and method for consent detection and validation
Lutsenko et al. VOICE SPEAKER IDENTIFICATION AS ONE OF THE CURRENT BIOMETRIC METHODS OF IDENTIFICATION OF A PERSON
JP2024037110A (en) Information processing program, information processing method, and information processing apparatus
JP2024060181A (en) Vocabulary evaluation device, vocabulary evaluation method, and vocabulary evaluation program
JP2023141357A (en) Estimation program, estimation method, and information processing device

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees