TWI839285B - Image-to-speech assistive device for the visually impaired - Google Patents
Image-to-speech assistive device for the visually impaired Download PDFInfo
- Publication number
- TWI839285B TWI839285B TW112129404A TW112129404A TWI839285B TW I839285 B TWI839285 B TW I839285B TW 112129404 A TW112129404 A TW 112129404A TW 112129404 A TW112129404 A TW 112129404A TW I839285 B TWI839285 B TW I839285B
- Authority
- TW
- Taiwan
- Prior art keywords
- head
- speech
- mounted device
- generation module
- information
- Prior art date
Links
- 230000001771 impaired effect Effects 0.000 title claims abstract description 21
- 238000004891 communication Methods 0.000 claims abstract description 23
- 238000006243 chemical reaction Methods 0.000 claims abstract description 18
- 230000001815 facial effect Effects 0.000 claims description 20
- 238000003384 imaging method Methods 0.000 claims description 11
- 238000003062 neural network model Methods 0.000 claims description 6
- 230000001427 coherent effect Effects 0.000 claims description 4
- 239000011521 glass Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 19
- 238000013473 artificial intelligence Methods 0.000 abstract description 8
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 18
- 230000009471 action Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000000034 method Methods 0.000 description 4
- 238000010422 painting Methods 0.000 description 4
- 241001314440 Triphora trianthophoros Species 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Landscapes
- Rehabilitation Tools (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
Abstract
本發明為有關一種影像轉語音之視障輔助裝置,主要結構包括一具有攝像元件之頭戴式裝置,其上設有一第一無線通訊元件、一影像辨識模組、一文章生成模組、一語音轉換模組、及一播放裝置,另具有一導盲杖,該導盲杖具有一供驅動攝像元件之第一控制部、及一無線連結第一無線通訊元件之第二無線通訊元件。藉此,視障人員只要利用導盲杖上的第一控制部驅動攝像元件,使其拍攝眼前的景象,而後經過物件辨識技術與chatGPT的人工智慧自然語言處理技術,將影像資訊轉為文字,並透過播放裝置語音輸出給視障人員,以達到操作方便,快速辨識,自動口語說明眼前情境之功效。 The present invention is related to a visually impaired assistive device for converting images into speech. The main structure includes a head-mounted device with a camera element, on which a first wireless communication element, an image recognition module, an article generation module, a speech conversion module, and a playback device are provided. In addition, there is a guide stick, which has a first control unit for driving the camera element and a second wireless communication element wirelessly connected to the first wireless communication element. In this way, the visually impaired person only needs to use the first control unit on the guide stick to drive the camera element to shoot the scene in front of them, and then the image information is converted into text through object recognition technology and chatGPT's artificial intelligence natural language processing technology, and the voice is output to the visually impaired person through the playback device, so as to achieve the effect of convenient operation, rapid recognition, and automatic oral explanation of the situation in front of them.
Description
本發明為提供一種操作動作簡便隱蔽、能快速辨識物品以自動口語說明眼前情境的影像轉語音之視障輔助裝置。 The present invention provides a visually impaired assistive device that has simple and concealed operating actions, can quickly identify objects and automatically explain the current situation verbally.
按,視覺障礙者是指在視覺器官的構造或機能發生部分或全部障礙的人,其對外界事物無法或較難辨識,往往需要視覺輔助裝置進行輔助以完成日常生活中的相關行為。 According to the law, visually impaired people refer to those who have partial or complete impairments in the structure or function of their visual organs. They are unable to or have difficulty recognizing external objects and often need visual assistive devices to assist them in completing related behaviors in daily life.
隨著近年來的科技進步,許多功能透過數位化的技術實踐,應用於輔助裝置上也能給予使用者更多的協助,在身障以外的使用者也能提供諸多協助。目前將輔具數位化的應用較少,僅有少數結合影像辨識技術者,但是對於複雜的人工智慧運算,需要比較大型的處理器及龐大的資料庫,這些大型系統不適合隨身攜帶,且由於人工智慧在處理龐大的影像資訊時,其反應時間並無法跟上使用者需求,使得視覺障礙人員無法有一個友善的外出輔具,或使用上較不便利。 With the advancement of technology in recent years, many functions can be applied to assistive devices through digital technology to provide users with more assistance, and can also provide a lot of assistance to users other than those with disabilities. Currently, there are few applications of digital assistive devices, and only a few are combined with image recognition technology. However, for complex artificial intelligence calculations, relatively large processors and large databases are required. These large systems are not suitable for carrying around, and because artificial intelligence cannot keep up with user needs when processing large amounts of image information, the response time is not fast enough for visually impaired people to have a friendly assistive device for going out, or it is less convenient to use.
以中華民國專利第M584676號「數位輔具」為例,於使用時仍存在下列問題與缺失尚待改進: Taking the Republic of China patent No. M584676 "Digital Assistive Device" as an example, the following problems and deficiencies still exist when using it and need to be improved:
第一,其攝像鏡頭結合於穿戴配件上,隨時進行前方影像的辨識及語音回饋,但使用者並非一直需要語音輔助,且無法控制攝像鏡頭的動作時機,導致使用者耳朵接收訊息繁雜,甚至掩蓋周圍的其他聲音,反而更危險。 First, the camera is integrated with wearable accessories to identify the image in front and provide voice feedback at any time, but the user does not always need voice assistance, and cannot control the timing of the camera's movement, causing the user's ears to receive complex information and even mask other sounds around them, which is more dangerous.
第二,其攝像鏡頭為高畫質鏡頭,且不斷進行動態拍攝,對運算裝置的負擔極大,而影響語音回饋速度。 Second, its camera lens is a high-definition lens, and it continuously shoots dynamic images, which puts a heavy burden on the computing device and affects the speed of voice feedback.
第三,其語意輸出乃基於Tesseract及TTS(Text to Speech)技術,分析文字排列所產生的語意,進而轉換為語音輸出,此二技術並未使用人工智慧的自然語言處理技術,僅能以預錄的字詞拼湊,所生成之語意較為生硬。 Third, its semantic output is based on Tesseract and TTS (Text to Speech) technology, which analyzes the semantics generated by the text arrangement and then converts it into voice output. These two technologies do not use artificial intelligence natural language processing technology, but can only splice pre-recorded words, and the generated semantics are relatively stiff.
是以,要如何解決上述習用之問題與缺失,即為本發明之申請人與從事此行業之相關廠商所亟欲研究改善之方向所在者。 Therefore, how to solve the above-mentioned problems and deficiencies in usage is the direction that the applicant of this invention and related manufacturers engaged in this industry are eager to study and improve.
故,本發明之申請人有鑑於上述缺失,乃蒐集相關資料,經由多方評估及考量,並以從事於此行業累積之多年經驗,經由不斷試作及修改,始設計出此種操作動作簡便隱蔽、能快速辨識物品以自動口語說明眼前情境的影像轉語音之視障輔助裝置的發明專利者。 Therefore, in view of the above-mentioned deficiencies, the applicant of this invention has collected relevant information, evaluated and considered various aspects, and used the years of experience accumulated in this industry, after continuous trials and modifications, to design the invention patent of this image-to-speech assistive device for the visually impaired, which has simple and concealed operation, can quickly identify objects and automatically explain the current situation verbally.
本發明之主要目的在於:文章生成模組乃由深度神經網絡模型構成的語言模型chatGPT,係以AI人工智慧技術驅動自然語言處理技術的升級版聊天機器人,可自我學習、自主完善,模擬人類進行較自然流暢的對話,因此結合影像辨識模組能產生以文意通暢的句子描述影像資訊的內容,而利於使用者理解。 The main purpose of this invention is that the article generation module is a language model chatGPT composed of a deep neural network model. It is an upgraded chat robot that uses AI artificial intelligence technology to drive natural language processing technology. It can self-learn and improve itself, and simulate humans to have a more natural and fluent conversation. Therefore, combined with the image recognition module, it can generate sentences that describe the content of the image information in a fluent manner, which is conducive to user understanding.
本發明之另一主要目的在於:以導盲杖上的第一控制部驅動頭戴式裝置上的攝像元件,可僅在需要的時候的發出辨識請求,操作上更為簡單、隱蔽。 Another main purpose of the present invention is to use the first control unit on the guide stick to drive the camera element on the head-mounted device, so that the identification request can be issued only when necessary, which is simpler and more concealed in operation.
為達成上述目的,本發明之主要結構包括:一具有攝像元件之頭戴式裝置、一第一無線通訊元件、一影像辨識模組、一文章生成模組、一語音轉換模組、及一播放裝置,另具有一導盲杖,該導盲杖具有一第一控制部、及一第二無線通訊元件,其中該攝像元件設於頭戴式裝置上,且拍照方向與頭戴式裝置之視線方向相同,以供取得一影像資訊,該第一無線通訊元件設於頭戴式裝置上且電性連結攝像元件,該第一控制部供驅動攝像元件,該第二無線通訊元件電性連結第一控制部且無線連結第一無線通訊元件,該影像辨識模組設於頭戴式裝置內且電性連結攝像元件,以從影像資訊中辨識出複數個物品資訊,該文章生成模組設於頭戴式裝置內且電性連結影像辨識模組,並具有一影像描述資料庫,以根據些物品資訊產生一文意通暢之文字資訊,且文章生成模組係為由深度神經網絡模型構成之語言模型chatGPT,該語音轉換模組設於頭戴式裝置內且電性連結文章生成模組,以將該文字資訊轉為語音訊息,而該播放裝置設於頭戴式裝置上且電性連結語音轉換模組。 To achieve the above-mentioned purpose, the main structure of the present invention includes: a head-mounted device with a camera element, a first wireless communication element, an image recognition module, an article generation module, a voice conversion module, and a playback device, and a guide stick for the blind, the guide stick has a first control unit, and a second wireless communication element, wherein the camera element is arranged on the head-mounted device, and the shooting direction is the same as the sight direction of the head-mounted device to obtain image information, the first wireless communication element is arranged on the head-mounted device and is electrically connected to the camera element, the first control unit is used to drive the camera element, and the second wireless communication element is electrically connected to the first control unit and is wirelessly connected to the first control unit. The first wireless communication element is connected, the image recognition module is arranged in the head-mounted device and is electrically connected to the camera element to recognize multiple item information from the image information, the article generation module is arranged in the head-mounted device and is electrically connected to the image recognition module, and has an image description database to generate a fluent text information according to the item information, and the article generation module is a language model chatGPT composed of a deep neural network model, the voice conversion module is arranged in the head-mounted device and is electrically connected to the article generation module to convert the text information into a voice message, and the playback device is arranged on the head-mounted device and is electrically connected to the voice conversion module.
俾當使用者將本發明作為視障輔助裝置時,只要配戴頭戴式裝置 、手持導盲杖,並透過第一無線通訊元件與第二無線通訊元件進行頭戴式裝置與導盲杖的無線連結,即可隨時進行辨識及轉換。當使用者移動至任一定點,並想確認眼前的人事物景象時,只要利用導盲杖上的第一控制部驅動攝像元件,以拍攝取得一靜態影像資訊,此時影像辨識模組會自動辨識出影像資訊中的物品資訊,並由文章生成模組結合影像描述資料庫中的資料,整合影像資訊及所有物品資訊產生文意思通暢的文字資訊,最後利用語音轉換模組將文字資訊轉換為語音訊息,由播放裝置播放給使用者聽。藉此,得以簡單隱蔽的操作方式,輔助使用者對周遭環境視覺辨識力的不足。 When the user uses the present invention as a visually impaired assistive device, he only needs to wear a head-mounted device, hold a guide stick, and wirelessly connect the head-mounted device and the guide stick through the first wireless communication element and the second wireless communication element, so that he can identify and convert at any time. When the user moves to any fixed point and wants to confirm the people, things and scenes in front of him, he only needs to use the first control unit on the guide stick to drive the camera element to shoot and obtain a static image information. At this time, the image recognition module will automatically identify the object information in the image information, and the text generation module will combine the data in the image description database to integrate the image information and all the object information to generate text information with fluent meaning. Finally, the voice conversion module will convert the text information into voice messages, which will be played to the user by the playback device. This allows for a simple and concealed operation method to assist users with their limited visual perception of the surrounding environment.
藉由上述技術,可針對習用數位輔具所存在之無法控制辨識時機、語音訊息繁雜影響正常聽力、動態影像辨識導致負擔大效率低、及語意輸出未介入人工智慧較為生硬等問題點加以突破,達到上述優點之實用進步性。 Through the above technology, we can overcome the problems of digital assistive devices such as the inability to control recognition timing, complex voice messages affecting normal hearing, dynamic image recognition leading to heavy burden and low efficiency, and semantic output without artificial intelligence and being relatively stiff, and achieve the practical progress of the above advantages.
1:頭戴式裝置 1: Head-mounted device
11:攝像元件 11: Imaging components
111:第一無線通訊元件 111: First wireless communication element
12:影像辨識模組 12: Image recognition module
13:文章生成模組 13: Article generation module
131:影像描述資料庫 131: Image Description Database
132:地圖實景資料庫 132: Map real scene database
133:已知人物資料庫 133: Database of known characters
14:語音轉換模組 14: Voice conversion module
15:播放裝置 15:Playback device
16:收音元件 16: Radio components
17:定位元件 17: Positioning element
18:臉部辨識模組 18: Facial recognition module
2:導盲杖 2: Guide stick
21:第一控制部 21: First control unit
22:第二無線通訊元件 22: Second wireless communication component
23:第二控制部 23: Second control unit
第一圖 係為本發明較佳實施例之立體透視圖。 The first figure is a three-dimensional perspective view of a preferred embodiment of the present invention.
第二圖 係為本發明較佳實施例之結構方塊圖。 The second figure is a structural block diagram of a preferred embodiment of the present invention.
第三圖 係為本發明較佳實施例之拍照示意圖。 The third figure is a photographic diagram of a preferred embodiment of the present invention.
第四圖 係為本發明較佳實施例之影像辨識示意圖。 The fourth figure is a schematic diagram of image recognition of a preferred embodiment of the present invention.
第五圖 係為本發明較佳實施例之語音播放示意圖。 Figure 5 is a schematic diagram of voice playback of a preferred embodiment of the present invention.
第六圖 係為本發明再一較佳實施例之實施示意圖。 Figure 6 is a schematic diagram of another preferred embodiment of the present invention.
第七圖 係為本發明又一較佳實施例之結構方塊圖。 Figure 7 is a structural block diagram of another preferred embodiment of the present invention.
第八圖 係為本發明又一較佳實施例之精細定位示意圖。 Figure 8 is a schematic diagram of the fine positioning of another preferred embodiment of the present invention.
第九圖 係為本發明另一較佳實施例之結構方塊圖。 Figure 9 is a structural block diagram of another preferred embodiment of the present invention.
第十圖 係為本發明另一較佳實施例之人物辨識示意圖。 Figure 10 is a schematic diagram of character recognition of another preferred embodiment of the present invention.
為達成上述目的及功效,本發明所採用之技術手段及構造,茲繪圖就本發明較佳實施例詳加說明其特徵與功能如下,俾利完全了解。 In order to achieve the above-mentioned purpose and effect, the technical means and structure adopted by the present invention are described in detail in the following figure for the preferred embodiment of the present invention, and its features and functions are explained for complete understanding.
請參閱第一圖及第二圖所示,係為本發明較佳實施例之立體透視圖及結構方塊圖,由圖中可清楚看出本發明係包括: Please refer to the first and second figures, which are the three-dimensional perspective diagram and structural block diagram of the preferred embodiment of the present invention. It can be clearly seen from the figure that the present invention includes:
一頭戴式裝置1,其中該頭戴式裝置1係為眼鏡、墨鏡或護目鏡其中之一者;
A head-mounted
一攝像元件11,係設於該頭戴式裝置1上,且拍照方向與該頭戴式裝置1之視線方向相同,以供取得一影像資訊,本實施例中該攝像元件11係以鏡頭作為舉例;
An
一第一無線通訊元件111,係設於該頭戴式裝置1上且電性連結該攝像元件11;
A first
一導盲杖2,係具有一供驅動該攝像元件11之第一控制部21、及一電性連結該第一控制部21且無線連結該第一無線通訊元件111之第二無線通訊元件22,本實施例中該第一控制部21係以按壓開關作為舉例,該第一、第二無線通訊元件111、22係以藍芽連結作為舉例;
A
一影像辨識模組12,係設於該頭戴式裝置1內且電性連結該攝像元件11,以從該影像資訊中辨識出複數個物品資訊,該影像辨識模組12係以物件偵測技術(Object Detection)執行,本實施例則以YOLO、或AI DIY Playform等其中之一者軟體作為舉例;
An
一文章生成模組13,係設於該頭戴式裝置1內且電性連結該影像辨識模組12,並具有一影像描述資料庫131,以根據該影像資訊及該些物品資訊產生一文意通暢之文字資訊,且該文章生成模組13係為由深度神經網絡模型構成之語言模型chatGPT;
An
一語音轉換模組14,係設於該頭戴式裝置1內且電性連結該文章生成模組13,以將該文字資訊轉為語音訊息,本實施例係以文字轉語音技術(Text To Speech,TTS)進行轉換;及
A
一播放裝置15,係設於該頭戴式裝置1上且電性連結該語音轉換模組14,本實施例中該播放裝置15係以喇叭作為舉例。
A
藉由上述之說明,已可了解本技術之結構,而依據這個結構之對應配合,更可達到操作動作簡便隱蔽、及能快速辨識物品以自動口語說明眼前情境等優勢,而詳細之解說將於下述說明。 Through the above explanation, we can understand the structure of this technology. According to the corresponding coordination of this structure, we can achieve the advantages of simple and concealed operation, and can quickly identify objects to automatically explain the current situation. The detailed explanation will be given below.
請同時配合參閱第一圖至第五圖所示,係為本發明較佳實施例之立體透視圖至語音播放示意圖,藉由上述構件組構時,由圖中可清楚看出,本發明之實體裝備只有頭戴式裝置1及導盲杖2,與一般視障者的常用配備相同
,因此視障者仍只要配戴頭戴式裝置1、手持導盲杖2即可實施,外觀上並無二致,接著透過第一無線通訊元件111與第二無線通訊元件22進行頭戴式裝置1與導盲杖2的藍芽配對連結,即可隨時進行辨識及轉換。
Please refer to the first to fifth figures at the same time, which are stereoscopic perspective diagrams and voice playback schematic diagrams of the preferred embodiments of the present invention. Through the above-mentioned component assembly, it can be clearly seen from the figure that the physical equipment of the present invention only has a head-mounted
當使用者移動至任一定點,並想確認眼前的人事物景象時(例如眼前的一幅畫),只要按壓導盲杖2上的第一控制部21,使其驅動攝像元件11以拍攝取得一靜態影像資訊,此動作對視障者而言只是動一下握持於導盲杖2的手指,動作簡單且隱蔽,旁人並無法看出此操作動作,此時影像辨識模組12會自動辨識出影像資訊中的物品資訊,其辨識原理包括幾個部分:影像掃描(Image scan)、物件特徵辨識、及輸出介面(Output interface)。具體而言,係先透過攝像元件11將待輸入之內容掃描成一個或一個以上的影像,再將影像送給影像辨識模組12以偵測影像中的物件外觀物理特徵,並將此辨識結果與雲端的物件特徵資料庫進行匹配,即可從物件特徵資料庫中選出對應物件名稱,而將該物件名稱作為物品資訊傳遞給文章生成模組13,例如:一片沙灘、一座涼亭、一座山、三隻鳥、兩艘船。
When the user moves to any fixed point and wants to confirm the people, things and scenes in front of him (such as a painting in front of him), he only needs to press the
再由文章生成模組13結合影像描述資料庫131中的資料,整合所有物品資訊產生文意通暢的文字資訊,其中該文章生成模組13係為由深度神經網絡模型構成之語言模型chatGPT,而所述GPT為基於轉換器的生成式預訓練模型(Generative pre-trained transformers,GPT),是一種大型語言模型,也是生成式人工智慧的重要框架,並能夠處理文字和圖像的輸入,而後使用人類自然對話方式來以文字互動,還可以用於甚為複雜的語言工作,包括自動生成文字、自動問答、自動摘要等多種任務,故只要輸入一些文字或影像,文章生成模組13便能自動拼湊出一段語意通暢的文字資訊,在2023年推出的GPT-4模型版本中,更可以進一步適應特定任務和/或主題領域,形成更具針對性的系統,例如將本發明之影像描述資料庫131納入訓練模型中,使文章生成模組13更擅長針對影像資訊的內容去生成一段用於描述影像內容的文字資訊,以本實施例舉例,該影像資訊為一幅畫,並整合影像辨識模組12提供的物品資訊,而生成一段如「畫面為一幅山水畫,畫中場景為海灘,海灘與海水的盡頭有一座高山,海灘上有座涼亭,海上有兩艘帆船,天空中有三隻鳥」之文字資訊。
The
最後利用語音轉換模組14將文字資訊轉換為語音訊息,由播放裝置15播放給使用者聽,其中語音轉換模組14的轉換原理包括本文處理及
語音合成兩個步驟,第一,本文處理乃進行語言分析、確定單詞邊界、及斷句操作,此步驟可提高語音合成時的品質及流暢度,第二,語音合成乃根據分析後的本文內容,利用特定的算法和規則將其生成語音訊號,或是利用在資料庫內的許多已錄好的語音連接起來。由於輸入內容為單一張照片,輸入條件較為單純,使影像辨識模組12及文章生成模組13的處理速度較快、負擔較低,可更快速的完成辨識轉換動作,對使用者而言,則得以簡單隱蔽的操作方式,輔助使用者對周遭環境視覺辨識力的不足。
Finally, the
請同時配合參閱第六圖所示,係為本發明再一較佳實施例之實施示意圖,由圖中可清楚看出,本實施例與上述實施例為大同小異,僅於該頭戴式裝置1上設有一電性連結該文章生成模組13之收音元件16,係供接收使用者之語音指令,以供該文章生成模組13調整該文字資訊之內容,並該導盲杖2上設有一第二控制部23,係供驅動該收音元件16,其中收音元件16係為微型麥克風、第二控制部23為設於第一控制部21一側的按壓開關。當使用者認為文章生成模組13所產生之文字資訊內容不夠清楚時,可利用第二控制部23啟動收音元件16,讓使用者以口述方式輸入語音指令,以對文章生成模組13輸入更具體的條件,例如「請問那些東西的位置關係為何?」,此時文章生成模組13即可再根據影像資訊中各該物品資訊的位置關係重新生成一段文字資訊「場景下方右邊為海灘,下方左邊為海水,高山從右邊延伸至海水盡頭的中央,涼亭在山腳下的海灘上,兩艘帆船大約在近攤處,三隻鳥在帆船與高山間的低空位置」。如此一來,使用者可藉由收音元件16與文章生成模組13互動,而取得滿意的回覆。
Please refer to FIG. 6 at the same time, which is a schematic diagram of another preferred embodiment of the present invention. It can be clearly seen from the figure that the present embodiment is similar to the above-mentioned embodiment, except that the head-mounted
請同時配合參閱第七圖及第八圖所示,係為本發明又一較佳實施例之結構方塊圖及精細定位示意圖,由圖中可清楚看出,本實施例與上述實施例為大同小異,僅於該頭戴式裝置1內具有一電性連結該文章生成模組13之定位元件17,係供產生一位置資訊,且該文章生成模組13乃具有一地圖實景資料庫132,以供該文章生成模組13整合該影像資訊、該物品資訊、該位置資訊,產生一精細定位資訊,其中定位元件17為GPS定位系統,地圖實景資料庫132為儲存有地圖街景圖資的資料庫。由於定位元件17係透過衛星定位,即使能夠知道使用者所在的大概位置,但一般只能掌握到某路段大約100公尺的範圍內,尤其使用者在非移動狀態下處於定點位置時,並無法判斷使用
者的方位,因此,當使用按壓第二控制部23啟動收音元件16,使其收到位置判斷指令時,文章生成模組13便會整合影像資訊、物品資訊、位置資訊,搜尋地圖實景資料庫132中的街景資料,來判斷使用者前方街景的精細位置,以輸出關於精細位置的文字資訊。
Please refer to FIG7 and FIG8 at the same time, which are a structural block diagram and a precise positioning diagram of another preferred embodiment of the present invention. It can be clearly seen from the figure that the present embodiment is similar to the above-mentioned embodiment, except that the head-mounted
舉例而言,定位元件17可定位出使用者位於XX路1號的100公尺範圍內,影像辨識模組12判斷出眼前為座落於三角窗的XX咖啡店,再整合地圖實景資料庫132即可得知XX路1號的100公尺範圍內座落於三角窗的XX咖啡店之具體地址為何,而取得一精細定位資訊及當前面對之方位,不但有利於使用者對所在位置的認知,也可在迷路時提供親友自身所在位置的正確資訊。
For example, the
請同時配合參閱第九圖及第十圖所示,係為本發明另一較佳實施例之結構方塊圖及人物辨識示意圖,由圖中可清楚看出,本實施例與上述實施例為大同小異,僅於該頭戴式裝置1內具有一臉部辨識模組18、及一電性連結該臉部辨識模組18及該文章生成模組13之已知人物資料庫133,以供該文章生成模組13針對該影像資訊中的人物調整該文字資訊之內容,另本實施例之播放裝置15係以耳機之態樣作為舉例,該臉部辨識模組18係以人臉辨識軟體做為舉例,例如FaceMe、或FaceMaze其中之一者,所述人臉辨識是生物辨識技術的一種,其運作原理係以向量方式擷取臉部特徵值,並與事先登錄的臉孔之特徵值進行比對,進而透過深度神經網路,以演算法及數學算式量測人臉的各項變數、化為特徵值,再比對資料庫以找出該人臉之正確身分。具體而言,人臉辨識的動作包含有三個步驟:臉部偵測、臉部特徵值擷取及臉部辨識,第一,臉部偵測,透過臉部偵測技術,即使僅局部的臉部出現於畫面之中,仍可於影像或影片中精準掃描、偵測及框列人臉之所在位置;第二,臉部特徵值擷取,利用臉部辨識引擎將框列出的臉部區分成n個維度,例如,高精準度的臉部辨識引擎之n值為1024時,可將臉部切分成1024維度的矩陣,將例如:鼻子的長度與寬度、額頭寬度、眼睛形狀等各項變數擷取出以向量為基礎的臉部特徵值;第三,臉部辨識,將臉部特徵值與資料庫中預先登錄的人臉進行特徵值比對,識別出正確身分,以1:N比對為例,是以在畫面中出現的臉部特徵值,與資料庫中N個預先登錄的臉部進行比對,識別出身分。
Please refer to FIG. 9 and FIG. 10 at the same time, which are a structural block diagram and a person recognition schematic diagram of another preferred embodiment of the present invention. It can be clearly seen from the figure that the present embodiment is similar to the above-mentioned embodiment, except that the head mounted
當使用者利用第一控制部21拍照,且影像辨識模組12辨識出
影像資訊中有人物存在時,便會自動連結到臉部辨識模組18,以將該人物的人臉資訊比對已知人物資料庫133中的人臉資訊,當出現比對符合的人臉資訊時,乃將物品資訊中較攏統的人物敘述(如「一個女人」)變更為該人物的名字或稱謂,以供文章生成模組13調整其文字資訊予播放裝置15,例如調整為「姊姊迎面而來」,而若已知人物資料庫133中無比對符合的人臉資訊時,則判定該人物為陌生人,其輸出的物品資訊將維持原本攏統的人物敘述。藉此,不但可幫助使用者對眼前人物作辨識,也可避免多餘的文字資訊。
When the user uses the
惟,以上所述僅為本發明之較佳實施例而已,非因此即侷限本發明之專利範圍,故舉凡運用本發明說明書及圖式內容所為之簡易修飾及等效結構變化,均應同理包含於本發明之專利範圍內,合予陳明。 However, the above is only a preferred embodiment of the present invention, and does not limit the patent scope of the present invention. Therefore, all simple modifications and equivalent structural changes made by using the contents of the present invention's specification and drawings should be included in the patent scope of the present invention and should be stated.
是以,本發明之影像轉語音之視障輔助裝置為可改善習用之技術關鍵在於: Therefore, the key technology of the image-to-speech assistive device for the visually impaired that can improve learning lies in:
第一,文章生成模組13乃由深度神經網絡模型構成的語言模型chatGPT,係以AI人工智慧技術驅動自然語言處理技術的升級版聊天機器人,可自我學習、自主完善,模擬人類進行較自然流暢的對話,因此結合影像辨識模組12能產生以文意通暢的句子描述影像資訊的內容,而利於使用者理解。
First, the
第二,以導盲杖2上的第一控制部21驅動頭戴式裝置1上的攝像元件11,可僅在需要的時候的發出辨識請求,操作上更為簡單、隱蔽。
Second, the
第三,輸入的內容為使用者半自動拍攝的單張靜態照片,輸入條件較為單純,使影像辨識模組12及文章生成模組13的處理速度較快、負擔較低,可更快速的完成辨識轉換動作。
Third, the input content is a single still photo taken semi-automatically by the user, and the input conditions are relatively simple, so that the processing speed of the
第四,利用收音元件16的設計,當使用者認為文章生成模組13所產生之文字資訊內容不夠清楚時,使用者得以口述方式輸入語音指令,以輸入更具體的條件,而藉由收音元件16與文章生成模組13互動,而取得滿意的回覆。
Fourth, by utilizing the design of the
第五,利用定位元件17及地圖實景資料庫132的設計,可取得一精細定位資訊及當前面對之方位,不但有利於使用者對所在位置的認知,也可在迷路時提供親友自身所在位置的正確資訊。
Fifth, by utilizing the design of the
第六,利用臉部辨識模組18及已知人物資料庫133的設計,不但可幫助使用者對眼前人物作辨識,也可避免針對陌生人生成多餘的文字資
訊。
Sixth, the design of the
綜上所述,本發明之影像轉語音之視障輔助裝置於使用時,為確實能達到其功效及目的,故本發明誠為一實用性優異之發明,為符合發明專利之申請要件,爰依法提出申請,盼 審委早日賜准本發明,以保障申請人之辛苦發明,倘若 鈞局審委有任何稽疑,請不吝來函指示,申請人定當竭力配合,實感德便。 In summary, the image-to-speech assistive device for the visually impaired of this invention can truly achieve its efficacy and purpose when used. Therefore, this invention is truly an invention with excellent practicality. In order to meet the application requirements for invention patents, an application is filed in accordance with the law. I hope that the review committee will approve this invention as soon as possible to protect the applicant's hard work. If the review committee of the Jun Bureau has any doubts, please feel free to write to give instructions. The applicant will do his best to cooperate and feel very grateful.
1:頭戴式裝置 1: Head-mounted device
11:攝像元件 11: Imaging components
111:第一無線通訊元件 111: First wireless communication element
12:影像辨識模組 12: Image recognition module
13:文章生成模組 13: Article generation module
131:影像描述資料庫 131: Image Description Database
14:語音轉換模組 14: Voice conversion module
15:播放裝置 15:Playback device
2:導盲杖 2: Guide stick
21:第一控制部 21: First control unit
22:第二無線通訊元件 22: Second wireless communication component
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112129404A TWI839285B (en) | 2023-08-04 | 2023-08-04 | Image-to-speech assistive device for the visually impaired |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112129404A TWI839285B (en) | 2023-08-04 | 2023-08-04 | Image-to-speech assistive device for the visually impaired |
Publications (1)
Publication Number | Publication Date |
---|---|
TWI839285B true TWI839285B (en) | 2024-04-11 |
Family
ID=91619071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW112129404A TWI839285B (en) | 2023-08-04 | 2023-08-04 | Image-to-speech assistive device for the visually impaired |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI839285B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101505710A (en) * | 2006-08-15 | 2009-08-12 | 皇家飞利浦电子股份有限公司 | Assistance system for visually handicapped persons |
CN106537290A (en) * | 2014-05-09 | 2017-03-22 | 谷歌公司 | Systems and methods for biomechanically-based eye signals for interacting with real and virtual objects |
TWM590450U (en) * | 2019-10-07 | 2020-02-11 | 陳詠涵 | Wearable navigation and risk escape device |
TW202211895A (en) * | 2020-09-23 | 2022-04-01 | 大葉大學 | Blind guidance assistance method and blind guidance assistance system for achieving the efficacy of prompting and preventing the user from colliding with the corresponding physical object |
US20230050825A1 (en) * | 2021-08-13 | 2023-02-16 | Vilnius Gediminas Technical University | Hands-Free Crowd Sourced Indoor Navigation System and Method for Guiding Blind and Visually Impaired Persons |
TWM648987U (en) * | 2023-08-04 | 2023-12-01 | 上弘醫療設備股份有限公司 | Image-to-speech assistive device for visually impaired |
-
2023
- 2023-08-04 TW TW112129404A patent/TWI839285B/en active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101505710A (en) * | 2006-08-15 | 2009-08-12 | 皇家飞利浦电子股份有限公司 | Assistance system for visually handicapped persons |
CN106537290A (en) * | 2014-05-09 | 2017-03-22 | 谷歌公司 | Systems and methods for biomechanically-based eye signals for interacting with real and virtual objects |
TWM590450U (en) * | 2019-10-07 | 2020-02-11 | 陳詠涵 | Wearable navigation and risk escape device |
TW202211895A (en) * | 2020-09-23 | 2022-04-01 | 大葉大學 | Blind guidance assistance method and blind guidance assistance system for achieving the efficacy of prompting and preventing the user from colliding with the corresponding physical object |
US20230050825A1 (en) * | 2021-08-13 | 2023-02-16 | Vilnius Gediminas Technical University | Hands-Free Crowd Sourced Indoor Navigation System and Method for Guiding Blind and Visually Impaired Persons |
TWM648987U (en) * | 2023-08-04 | 2023-12-01 | 上弘醫療設備股份有限公司 | Image-to-speech assistive device for visually impaired |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065055B (en) | Method, storage medium, and apparatus for generating AR content based on sound | |
JP7483798B2 (en) | Wordflow annotation | |
US11783524B2 (en) | Producing realistic talking face with expression using images text and voice | |
US9949056B2 (en) | Method and apparatus for presenting to a user of a wearable apparatus additional information related to an audio scene | |
JP2019535059A (en) | Sensory eyewear | |
US11482134B2 (en) | Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner | |
Prado et al. | Visuo-auditory multimodal emotional structure to improve human-robot-interaction | |
US9525841B2 (en) | Imaging device for associating image data with shooting condition information | |
CN106570473A (en) | Deaf-mute sign language recognition interactive system based on robot | |
Oliveira et al. | Automatic sign language translation to improve communication | |
US20190240588A1 (en) | Communication apparatus and control program thereof | |
TWM648987U (en) | Image-to-speech assistive device for visually impaired | |
Vogler et al. | A framework for motion recognition with applications to American sign language and gait recognition | |
Kanvinde et al. | Bidirectional sign language translation | |
Prabha et al. | Vivoice-Reading Assistant for the Blind using OCR and TTS | |
CN110139021B (en) | Auxiliary shooting method and terminal equipment | |
TWI839285B (en) | Image-to-speech assistive device for the visually impaired | |
JP7130290B2 (en) | information extractor | |
JP7096626B2 (en) | Information extraction device | |
Khan et al. | Sign language translation in urdu/hindi through microsoft kinect | |
Mustafa et al. | Intelligent glasses for visually impaired people | |
Saitoh et al. | Lip25w: Word-level lip reading web application for smart device | |
JP6754154B1 (en) | Translation programs, translation equipment, translation methods, and wearable devices | |
CN111933131A (en) | Voice recognition method and device | |
Mishra et al. | Environment descriptor for the visually impaired |