TWM648987U - Image-to-speech assistive device for visually impaired - Google Patents
Image-to-speech assistive device for visually impaired Download PDFInfo
- Publication number
- TWM648987U TWM648987U TW112208257U TW112208257U TWM648987U TW M648987 U TWM648987 U TW M648987U TW 112208257 U TW112208257 U TW 112208257U TW 112208257 U TW112208257 U TW 112208257U TW M648987 U TWM648987 U TW M648987U
- Authority
- TW
- Taiwan
- Prior art keywords
- image
- head
- information
- generation module
- mounted device
- Prior art date
Links
- 230000001771 impaired effect Effects 0.000 title claims abstract description 21
- 238000004891 communication Methods 0.000 claims abstract description 24
- 238000006243 chemical reaction Methods 0.000 claims abstract description 18
- 230000001815 facial effect Effects 0.000 claims description 18
- 238000003062 neural network model Methods 0.000 claims description 6
- 239000011521 glass Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 18
- 238000013473 artificial intelligence Methods 0.000 abstract description 7
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 16
- 230000009471 action Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 4
- 241001314440 Triphora trianthophoros Species 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 238000000034 method Methods 0.000 description 3
- 238000010422 painting Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 239000013535 sea water Substances 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Landscapes
- Rehabilitation Tools (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
Abstract
本新型為有關一種影像轉語音之視障輔助裝置,主要結構包括一具有攝像元件之頭戴式裝置,其上設有一第一無線通訊元件、一影像辨識模組、一文章生成模組、一語音轉換模組、及一播放裝置,另具有一導盲杖,該導盲杖具有一供驅動攝像元件之第一控制部、及一無線連結第一無線通訊元件之第二無線通訊元件。藉此,視障人員只要利用導盲杖上的第一控制部驅動攝像元件,使其拍攝眼前的景象,而後經過物件辨識技術與chatGPT的人工智慧自然語言處理技術,將影像資訊轉為文字,並透過播放裝置語音輸出給視障人員,以達到操作方便,快速辨識,自動口語說明眼前情境之功效。 This new model relates to an image-to-speech assistive device for the visually impaired. The main structure includes a head-mounted device with a camera element, on which a first wireless communication element, an image recognition module, an article generation module, and an The voice conversion module and a playback device also have a guide stick. The guide stick has a first control part for driving the camera component and a second wireless communication component that is wirelessly connected to the first wireless communication component. With this, visually impaired persons only need to use the first control part on the guide cane to drive the camera element to capture the scene in front of them, and then use object recognition technology and chatGPT's artificial intelligence natural language processing technology to convert the image information into text. And through the voice output of the playback device to the visually impaired, it can achieve the functions of easy operation, quick recognition, and automatic spoken explanation of the situation in front of them.
Description
本新型為提供一種操作動作簡便隱蔽、能快速辨識物品以自動口語說明眼前情境的影像轉語音之視障輔助裝置。 This new model provides an image-to-speech auxiliary device for the visually impaired that has simple and concealed operation, can quickly identify objects, and automatically explain the situation in front of you in spoken language.
按,視覺障礙者是指在視覺器官的構造或機能發生部分或全部障礙的人,其對外界事物無法或較難辨識,往往需要視覺輔助裝置進行輔助以完成日常生活中的相關行為。 According to the definition, visually impaired people refer to people who have partial or complete impairment in the structure or function of their visual organs. They are unable or have difficulty identifying external objects and often require the assistance of visual aids to complete related behaviors in daily life.
隨著近年來的科技進步,許多功能透過數位化的技術實踐,應用於輔助裝置上也能給予使用者更多的協助,在身障以外的使用者也能提供諸多協助。目前將輔具數位化的應用較少,僅有少數結合影像辨識技術者,但是對於複雜的人工智慧運算,需要比較大型的處理器及龐大的資料庫,這些大型系統不適合隨身攜帶,且由於人工智慧在處理龐大的影像資訊時,其反應時間並無法跟上使用者需求,使得視覺障礙人員無法有一個友善的外出輔具,或使用上較不便利。 With the advancement of technology in recent years, many functions have been implemented through digital technology and applied to assistive devices to provide users with more assistance. They can also provide a lot of assistance to users other than those with disabilities. At present, there are few applications for digitizing assistive devices, and only a few combine image recognition technology. However, for complex artificial intelligence calculations, relatively large processors and huge databases are needed. These large systems are not suitable for carrying around, and due to manual When processing huge amounts of image information, the response time of smart devices cannot keep up with the needs of users, making it impossible for visually impaired people to have a friendly outdoor assistive device, or making it less convenient to use.
以中華民國專利第M584676號「數位輔具」為例,於使用時仍存在下列問題與缺失尚待改進: Taking the Republic of China Patent No. M584676 "Digital Assistive Device" as an example, there are still the following problems and deficiencies that need to be improved during use:
第一,其攝像鏡頭結合於穿戴配件上,隨時進行前方影像的辨識及語音回饋,但使用者並非一直需要語音輔助,且無法控制攝像鏡頭的動作時機,導致使用者耳朵接收訊息繁雜,甚至掩蓋周圍的其他聲音,反而更危險。 First, its camera lens is combined with a wearable accessory to provide front image recognition and voice feedback at any time. However, the user does not always need voice assistance, and cannot control the movement timing of the camera lens, causing the user's ears to receive information that is complicated or even obscured. The other sounds around him were even more dangerous.
第二,其攝像鏡頭為高畫質鏡頭,且不斷進行動態拍攝,對運算裝置的負擔極大,而影響語音回饋速度。 Second, the camera lens is a high-definition lens and is constantly taking dynamic shots, which puts a huge burden on the computing device and affects the speed of voice feedback.
第三,其語意輸出乃基於Tesseract及TTS(Text to Speech)技術,分析文字排列所產生的語意,進而轉換為語音輸出,此二技術並未使用人工智慧的自然語言處理技術,僅能以預錄的字詞拼湊,所生成之語意較為生硬。 Third, its semantic output is based on Tesseract and TTS (Text to Speech) technology, which analyzes the semantics generated by text arrangement and then converts it into speech output. These two technologies do not use artificial intelligence natural language processing technology and can only predict The recorded words are pieced together, and the resulting semantic meaning is rather blunt.
是以,要如何解決上述習用之問題與缺失,即為本新型之申請人與從事此行業之相關廠商所亟欲研究改善之方向所在者。 Therefore, how to solve the above-mentioned conventional problems and deficiencies is the direction that the applicant of the present invention and relevant manufacturers engaged in this industry are eager to study and improve.
故,本新型之申請人有鑑於上述缺失,乃蒐集相關資料,經由多方評估及考量,並以從事於此行業累積之多年經驗,經由不斷試作及修改,始設計出此種操作動作簡便隱蔽、能快速辨識物品以自動口語說明眼前情境的影像轉語音之視障輔助裝置的新型專利者。 Therefore, in view of the above shortcomings, the applicant of this new model collected relevant information, evaluated and considered it from many parties, and used his many years of experience in this industry, and through continuous trials and modifications, he designed such a simple and concealed operation. The patentee of a new type of visually impaired assistive device that can quickly identify objects and automatically explain the situation in front of you in spoken language.
本新型之主要目的在於:文章生成模組乃由深度神經網絡模型構成的語言模型chatGPT,係以AI人工智慧技術驅動自然語言處理技術的升級版聊天機器人,可自我學習、自主完善,模擬人類進行較自然流暢的對話,因此結合影像辨識模組能產生以文意通暢的句子描述影像資訊的內容,而利於使用者理解。 The main purpose of this new model is: the article generation module is a language model chatGPT composed of a deep neural network model. It is an upgraded version of the chat robot driven by AI artificial intelligence technology and natural language processing technology. It can learn and improve independently, simulating human processes. A more natural and smooth dialogue, so combined with the image recognition module, it can generate content that describes the image information in smooth sentences, which is easier for users to understand.
本新型之另一主要目的在於:以導盲杖上的第一控制部驅動頭戴式裝置上的攝像元件,可僅在需要的時候的發出辨識請求,操作上更為簡單、隱蔽。 Another main purpose of the present invention is to use the first control part on the guide stick to drive the camera element on the head-mounted device, so that identification requests can be issued only when needed, making the operation simpler and more concealed.
為達成上述目的,本新型之主要結構包括:一具有攝像元件之頭戴式裝置、一第一無線通訊元件、一影像辨識模組、一文章生成模組、一語音轉換模組、及一播放裝置,另具有一導盲杖,該導盲杖具有一第一控制部、及一第二無線通訊元件,其中該攝像元件設於頭戴式裝置上,且拍照方向與頭戴式裝置之視線方向相同,以供取得一影像資訊,該第一無線通訊元件設於頭戴式裝置上且電性連結攝像元件,該第一控制部供驅動攝像元件,該第二無線通訊元件電性連結第一控制部且無線連結第一無線通訊元件,該影像辨識模組設於頭戴式裝置內且電性連結攝像元件,以從影像資訊中辨識出複數個物品資訊,該文章生成模組設於頭戴式裝置內且電性連結影像辨識模組,並具有一影像描述資料庫,以根據些物品資訊產生一文意通暢之文字資訊,且文章生成模組係為由深度神經網絡模型構成之語言模型chatGPT,該語音轉換模組設於頭戴式裝置內且電性連結文章生成模組,以將該文字資訊轉為語音訊息,而該播放裝置設於頭戴式裝置上且電性連結語音轉換模組。 In order to achieve the above purpose, the main structure of the present invention includes: a head-mounted device with a camera element, a first wireless communication element, an image recognition module, an article generation module, a voice conversion module, and a playback module. The device also has a guide stick, which has a first control part and a second wireless communication element, wherein the camera element is installed on the head-mounted device, and the photographing direction is consistent with the line of sight of the head-mounted device. The directions are the same to obtain an image information. The first wireless communication element is provided on the head-mounted device and is electrically connected to the camera element. The first control part is used to drive the camera element. The second wireless communication element is electrically connected to the second wireless communication element. A control unit is wirelessly connected to the first wireless communication element. The image recognition module is located in the head-mounted device and is electrically connected to the camera element to identify a plurality of item information from the image information. The article generation module is located in The head-mounted device is electrically connected to the image recognition module and has an image description database to generate text information with clear meaning based on some item information, and the article generation module is a language composed of a deep neural network model. Model chatGPT, the speech conversion module is located in the head-mounted device and is electrically connected to the article generation module to convert the text information into a voice message, and the playback device is located on the head-mounted device and is electrically connected to the voice message Conversion module.
俾當使用者將本新型作為視障輔助裝置時,只要配戴頭戴式裝置 、手持導盲杖,並透過第一無線通訊元件與第二無線通訊元件進行頭戴式裝置與導盲杖的無線連結,即可隨時進行辨識及轉換。當使用者移動至任一定點,並想確認眼前的人事物景象時,只要利用導盲杖上的第一控制部驅動攝像元件,以拍攝取得一靜態影像資訊,此時影像辨識模組會自動辨識出影像資訊中的物品資訊,並由文章生成模組結合影像描述資料庫中的資料,整合影像資訊及所有物品資訊產生文意思通暢的文字資訊,最後利用語音轉換模組將文字資訊轉換為語音訊息,由播放裝置播放給使用者聽。藉此,得以簡單隱蔽的操作方式,輔助使用者對周遭環境視覺辨識力的不足。 So that when the user uses this new model as a visually impaired assistive device, he only needs to wear a head-mounted device , holding the guide cane, and wirelessly connecting the head-mounted device and the guide cane through the first wireless communication element and the second wireless communication element, so that identification and conversion can be performed at any time. When the user moves to any certain point and wants to confirm the people and things in front of him, he only needs to use the first control part on the guide cane to drive the camera element to capture a static image information. At this time, the image recognition module will automatically The item information in the image information is identified, and the article generation module combines the data in the image description database to integrate the image information and all item information to generate text information with smooth meaning. Finally, the speech conversion module is used to convert the text information into The voice message is played to the user by the playback device. In this way, a simple and concealed operation method can be used to assist users with insufficient visual recognition of the surrounding environment.
藉由上述技術,可針對習用數位輔具所存在之無法控制辨識時機、語音訊息繁雜影響正常聽力、動態影像辨識導致負擔大效率低、及語意輸出未介入人工智慧較為生硬等問題點加以突破,達到上述優點之實用進步性。 Through the above technology, breakthroughs can be made to overcome the problems existing in conventional digital assistive devices such as the inability to control the recognition timing, complicated voice messages affecting normal hearing, dynamic image recognition causing high burden and low efficiency, and semantic output without involving artificial intelligence, which is more blunt. Practical progress to achieve the above advantages.
1:頭戴式裝置 1:Head mounted device
11:攝像元件 11:Camera components
111:第一無線通訊元件 111:The first wireless communication component
12:影像辨識模組 12:Image recognition module
13:文章生成模組 13: Article generation module
131:影像描述資料庫 131: Image description database
132:地圖實景資料庫 132: Map real scene database
133:已知人物資料庫 133:Known character database
14:語音轉換模組 14: Voice conversion module
15:播放裝置 15:Playback device
16:收音元件 16: Radio components
17:定位元件 17: Positioning components
18:臉部辨識模組 18: Facial recognition module
2:導盲杖 2:Guide cane
21:第一控制部 21:First Control Department
22:第二無線通訊元件 22: Second wireless communication component
23:第二控制部 23:Second Control Department
第一圖 係為本新型較佳實施例之立體透視圖。 The first figure is a three-dimensional perspective view of the preferred embodiment of the present invention.
第二圖 係為本新型較佳實施例之結構方塊圖。 The second figure is a structural block diagram of a preferred embodiment of the present invention.
第三圖 係為本新型較佳實施例之拍照示意圖。 The third figure is a schematic diagram of a preferred embodiment of the present invention.
第四圖 係為本新型較佳實施例之影像辨識示意圖。 The fourth figure is a schematic diagram of image recognition according to a preferred embodiment of the present invention.
第五圖 係為本新型較佳實施例之語音播放示意圖。 The fifth figure is a schematic diagram of voice playback according to the preferred embodiment of the present invention.
第六圖 係為本新型再一較佳實施例之實施示意圖。 Figure 6 is a schematic diagram of another preferred embodiment of the present invention.
第七圖 係為本新型又一較佳實施例之結構方塊圖。 Figure 7 is a structural block diagram of another preferred embodiment of the present invention.
第八圖 係為本新型又一較佳實施例之精細定位示意圖。 Figure 8 is a schematic diagram of fine positioning of another preferred embodiment of the present invention.
第九圖 係為本新型另一較佳實施例之結構方塊圖。 Figure 9 is a structural block diagram of another preferred embodiment of the present invention.
第十圖 係為本新型另一較佳實施例之人物辨識示意圖。 Figure 10 is a schematic diagram of person recognition according to another preferred embodiment of the present invention.
為達成上述目的及功效,本新型所採用之技術手段及構造,茲繪圖就本新型較佳實施例詳加說明其特徵與功能如下,俾利完全了解。 In order to achieve the above-mentioned purposes and effects, the technical means and structures adopted by the present invention are described in detail below with respect to the preferred embodiments of the present invention in order to facilitate a complete understanding.
請參閱第一圖及第二圖所示,係為本新型較佳實施例之立體透視圖及結構方塊圖,由圖中可清楚看出本新型係包括: Please refer to the first and second figures, which are three-dimensional perspective views and structural block diagrams of preferred embodiments of the present invention. From the figures, it can be clearly seen that the present invention includes:
一頭戴式裝置1,其中該頭戴式裝置1係為眼鏡、墨鏡或護目鏡其中之一者;
A head-mounted
一攝像元件11,係設於該頭戴式裝置1上,且拍照方向與該頭戴式裝置1之視線方向相同,以供取得一影像資訊,本實施例中該攝像元件11係以鏡頭作為舉例;
A
一第一無線通訊元件111,係設於該頭戴式裝置1上且電性連結該攝像元件11;
A first
一導盲杖2,係具有一供驅動該攝像元件11之第一控制部21、及一電性連結該第一控制部21且無線連結該第一無線通訊元件111之第二無線通訊元件22,本實施例中該第一控制部21係以按壓開關作為舉例,該第一、第二無線通訊元件111、22係以藍芽連結作為舉例;
A
一影像辨識模組12,係設於該頭戴式裝置1內且電性連結該攝像元件11,以從該影像資訊中辨識出複數個物品資訊,該影像辨識模組12係以物件偵測技術(Object Detection)執行,本實施例則以YOLO、或AI DIY Playform等其中之一者軟體作為舉例;
An
一文章生成模組13,係設於該頭戴式裝置1內且電性連結該影像辨識模組12,並具有一影像描述資料庫131,以根據該影像資訊及該些物品資訊產生一文意通暢之文字資訊,且該文章生成模組13係為由深度神經網絡模型構成之語言模型chatGPT;
An
一語音轉換模組14,係設於該頭戴式裝置1內且電性連結該文章生成模組13,以將該文字資訊轉為語音訊息,本實施例係以文字轉語音技術(Text To Speech,TTS)進行轉換;及
A
一播放裝置15,係設於該頭戴式裝置1上且電性連結該語音轉換模組14,本實施例中該播放裝置15係以喇叭作為舉例。
A
藉由上述之說明,已可了解本技術之結構,而依據這個結構之對應配合,更可達到操作動作簡便隱蔽、及能快速辨識物品以自動口語說明眼前情境等優勢,而詳細之解說將於下述說明。 Through the above description, we can understand the structure of this technology, and based on the corresponding cooperation of this structure, we can achieve the advantages of simple and concealed operation, and the ability to quickly identify objects and automatically explain the situation at hand. The detailed explanation will be in Instructions below.
請同時配合參閱第一圖至第五圖所示,係為本新型較佳實施例之立體透視圖至語音播放示意圖,藉由上述構件組構時,由圖中可清楚看出,本新型之實體裝備只有頭戴式裝置1及導盲杖2,與一般視障者的常用配備相同
,因此視障者仍只要配戴頭戴式裝置1、手持導盲杖2即可實施,外觀上並無二致,接著透過第一無線通訊元件111與第二無線通訊元件22進行頭戴式裝置1與導盲杖2的藍芽配對連結,即可隨時進行辨識及轉換。
Please also refer to Figures 1 to 5, which are three-dimensional perspective views to voice playback schematic diagrams of preferred embodiments of the present invention. When assembled with the above components, it can be clearly seen from the figures that the features of the present invention The only physical equipment is the head-mounted
當使用者移動至任一定點,並想確認眼前的人事物景象時(例如眼前的一幅畫),只要按壓導盲杖2上的第一控制部21,使其驅動攝像元件11以拍攝取得一靜態影像資訊,此動作對視障者而言只是動一下握持於導盲杖2的手指,動作簡單且隱蔽,旁人並無法看出此操作動作,此時影像辨識模組12會自動辨識出影像資訊中的物品資訊,其辨識原理包括幾個部分:影像掃描(Image scan)、物件特徵辨識、及輸出介面(Output interface)。具體而言,係先透過攝像元件11將待輸入之內容掃描成一個或一個以上的影像,再將影像送給影像辨識模組12以偵測影像中的物件外觀物理特徵,並將此辨識結果與雲端的物件特徵資料庫進行匹配,即可從物件特徵資料庫中選出對應物件名稱,而將該物件名稱作為物品資訊傳遞給文章生成模組13,例如:一片沙灘、一座涼亭、一座山、三隻鳥、兩艘船。
When the user moves to any certain point and wants to confirm the people and things in front of him (for example, a picture in front of him), he only needs to press the
再由文章生成模組13結合影像描述資料庫131中的資料,整合所有物品資訊產生文意通暢的文字資訊,其中該文章生成模組13係為由深度神經網絡模型構成之語言模型chatGPT,而所述GPT為基於轉換器的生成式預訓練模型(Generative pre-trained transformers,GPT),是一種大型語言模型,也是生成式人工智慧的重要框架,並能夠處理文字和圖像的輸入,而後使用人類自然對話方式來以文字互動,還可以用於甚為複雜的語言工作,包括自動生成文字、自動問答、自動摘要等多種任務,故只要輸入一些文字或影像,文章生成模組13便能自動拼湊出一段語意通暢的文字資訊,在2023年推出的GPT-4模型版本中,更可以進一步適應特定任務和/或主題領域,形成更具針對性的系統,例如將本新型之影像描述資料庫131納入訓練模型中,使文章生成模組13更擅長針對影像資訊的內容去生成一段用於描述影像內容的文字資訊,以本實施例舉例,該影像資訊為一幅畫,並整合影像辨識模組12提供的物品資訊,而生成一段如「畫面為一幅山水畫,畫中場景為海灘,海灘與海水的盡頭有一座高山,海灘上有座涼亭,海上有兩艘帆船,天空中有三隻鳥」之文字資訊。
Then the
最後利用語音轉換模組14將文字資訊轉換為語音訊息,由播放裝置15播放給使用者聽,其中語音轉換模組14的轉換原理包括本文處理及
語音合成兩個步驟,第一,本文處理乃進行語言分析、確定單詞邊界、及斷句操作,此步驟可提高語音合成時的品質及流暢度,第二,語音合成乃根據分析後的本文內容,利用特定的算法和規則將其生成語音訊號,或是利用在資料庫內的許多已錄好的語音連接起來。由於輸入內容為單一張照片,輸入條件較為單純,使影像辨識模組12及文章生成模組13的處理速度較快、負擔較低,可更快速的完成辨識轉換動作,對使用者而言,則得以簡單隱蔽的操作方式,輔助使用者對周遭環境視覺辨識力的不足。
Finally, the
請同時配合參閱第六圖所示,係為本新型再一較佳實施例之實施示意圖,由圖中可清楚看出,本實施例與上述實施例為大同小異,僅於該頭戴式裝置1上設有一電性連結該文章生成模組13之收音元件16,係供接收使用者之語音指令,以供該文章生成模組13調整該文字資訊之內容,並該導盲杖2上設有一第二控制部23,係供驅動該收音元件16,其中收音元件16係為微型麥克風、第二控制部23為設於第一控制部21一側的按壓開關。當使用者認為文章生成模組13所產生之文字資訊內容不夠清楚時,可利用第二控制部23啟動收音元件16,讓使用者以口述方式輸入語音指令,以對文章生成模組13輸入更具體的條件,例如「請問那些東西的位置關係為何?」,此時文章生成模組13即可再根據影像資訊中各該物品資訊的位置關係重新生成一段文字資訊「場景下方右邊為海灘,下方左邊為海水,高山從右邊延伸至海水盡頭的中央,涼亭在山腳下的海灘上,兩艘帆船大約在近攤處,三隻鳥在帆船與高山間的低空位置」。如此一來,使用者可藉由收音元件16與文章生成模組13互動,而取得滿意的回覆。
Please also refer to the sixth figure, which is a schematic diagram of another preferred embodiment of the present invention. It can be clearly seen from the figure that this embodiment is similar to the above-mentioned embodiment, except that the head-mounted
請同時配合參閱第七圖及第八圖所示,係為本新型又一較佳實施例之結構方塊圖及精細定位示意圖,由圖中可清楚看出,本實施例與上述實施例為大同小異,僅於該頭戴式裝置1內具有一電性連結該文章生成模組13之定位元件17,係供產生一位置資訊,且該文章生成模組13乃具有一地圖實景資料庫132,以供該文章生成模組13整合該影像資訊、該物品資訊、該位置資訊,產生一精細定位資訊,其中定位元件17為GPS定位系統,地圖實景資料庫132為儲存有地圖街景圖資的資料庫。由於定位元件17係透過衛星定位,即使能夠知道使用者所在的大概位置,但一般只能掌握到某路段大約100公尺的範圍內,尤其使用者在非移動狀態下處於定點位置時,並無法判斷使用
者的方位,因此,當使用按壓第二控制部23啟動收音元件16,使其收到位置判斷指令時,文章生成模組13便會整合影像資訊、物品資訊、位置資訊,搜尋地圖實景資料庫132中的街景資料,來判斷使用者前方街景的精細位置,以輸出關於精細位置的文字資訊。
Please refer to the seventh and eighth figures at the same time, which are structural block diagrams and fine positioning diagrams of another preferred embodiment of the present invention. It can be clearly seen from the figures that this embodiment is similar to the above-mentioned embodiment. , there is only a
舉例而言,定位元件17可定位出使用者位於XX路1號的100公尺範圍內,影像辨識模組12判斷出眼前為座落於三角窗的XX咖啡店,再整合地圖實景資料庫132即可得知XX路1號的100公尺範圍內座落於三角窗的XX咖啡店之具體地址為何,而取得一精細定位資訊及當前面對之方位,不但有利於使用者對所在位置的認知,也可在迷路時提供親友自身所在位置的正確資訊。
For example, the
請同時配合參閱第九圖及第十圖所示,係為本新型另一較佳實施例之結構方塊圖及人物辨識示意圖,由圖中可清楚看出,本實施例與上述實施例為大同小異,僅於該頭戴式裝置1內具有一臉部辨識模組18、及一電性連結該臉部辨識模組18及該文章生成模組13之已知人物資料庫133,以供該文章生成模組13針對該影像資訊中的人物調整該文字資訊之內容,另本實施例之播放裝置15係以耳機之態樣作為舉例,該臉部辨識模組18係以人臉辨識軟體做為舉例,例如FaceMe、或FaceMaze其中之一者,所述人臉辨識是生物辨識技術的一種,其運作原理係以向量方式擷取臉部特徵值,並與事先登錄的臉孔之特徵值進行比對,進而透過深度神經網路,以演算法及數學算式量測人臉的各項變數、化為特徵值,再比對資料庫以找出該人臉之正確身分。具體而言,人臉辨識的動作包含有三個步驟:臉部偵測、臉部特徵值擷取及臉部辨識,第一,臉部偵測,透過臉部偵測技術,即使僅局部的臉部出現於畫面之中,仍可於影像或影片中精準掃描、偵測及框列人臉之所在位置;第二,臉部特徵值擷取,利用臉部辨識引擎將框列出的臉部區分成n個維度,例如,高精準度的臉部辨識引擎之n值為1024時,可將臉部切分成1024維度的矩陣,將例如:鼻子的長度與寬度、額頭寬度、眼睛形狀等各項變數擷取出以向量為基礎的臉部特徵值;第三,臉部辨識,將臉部特徵值與資料庫中預先登錄的人臉進行特徵值比對,識別出正確身分,以1:N比對為例,是以在畫面中出現的臉部特徵值,與資料庫中N個預先登錄的臉部進行比對,識別出身分。
Please refer to Figures 9 and 10 at the same time, which are structural block diagrams and character recognition diagrams of another preferred embodiment of the present invention. It can be clearly seen from the figures that this embodiment is similar to the above-mentioned embodiment. , there is only a
當使用者利用第一控制部21拍照,且影像辨識模組12辨識出
影像資訊中有人物存在時,便會自動連結到臉部辨識模組18,以將該人物的人臉資訊比對已知人物資料庫133中的人臉資訊,當出現比對符合的人臉資訊時,乃將物品資訊中較攏統的人物敘述(如「一個女人」)變更為該人物的名字或稱謂,以供文章生成模組13調整其文字資訊予播放裝置15,例如調整為「姊姊迎面而來」,而若已知人物資料庫133中無比對符合的人臉資訊時,則判定該人物為陌生人,其輸出的物品資訊將維持原本攏統的人物敘述。藉此,不但可幫助使用者對眼前人物作辨識,也可避免多餘的文字資訊。
When the user uses the
惟,以上所述僅為本新型之較佳實施例而已,非因此即侷限本新型之專利範圍,故舉凡運用本新型說明書及圖式內容所為之簡易修飾及等效結構變化,均應同理包含於本新型之專利範圍內,合予陳明。 However, the above descriptions are only preferred embodiments of the present invention, and do not limit the patent scope of the present invention. Therefore, any simple modifications and equivalent structural changes made using the contents of the description and drawings of the present invention should be treated in the same way. It is included in the patent scope of this new model and is hereby stated.
是以,本新型之影像轉語音之視障輔助裝置為可改善習用之技術關鍵在於: Therefore, the key points of this new type of image-to-speech assistive device for the visually impaired that can improve usage are:
第一,文章生成模組13乃由深度神經網絡模型構成的語言模型chatGPT,係以AI人工智慧技術驅動自然語言處理技術的升級版聊天機器人,可自我學習、自主完善,模擬人類進行較自然流暢的對話,因此結合影像辨識模組12能產生以文意通暢的句子描述影像資訊的內容,而利於使用者理解。
First, the
第二,以導盲杖2上的第一控制部21驅動頭戴式裝置1上的攝像元件11,可僅在需要的時候的發出辨識請求,操作上更為簡單、隱蔽。
Second, by using the
第三,輸入的內容為使用者半自動拍攝的單張靜態照片,輸入條件較為單純,使影像辨識模組12及文章生成模組13的處理速度較快、負擔較低,可更快速的完成辨識轉換動作。
Third, the input content is a single static photo taken semi-automatically by the user, and the input conditions are relatively simple, so that the
第四,利用收音元件16的設計,當使用者認為文章生成模組13所產生之文字資訊內容不夠清楚時,使用者得以口述方式輸入語音指令,以輸入更具體的條件,而藉由收音元件16與文章生成模組13互動,而取得滿意的回覆。
Fourth, using the design of the
第五,利用定位元件17及地圖實景資料庫132的設計,可取得一精細定位資訊及當前面對之方位,不但有利於使用者對所在位置的認知,也可在迷路時提供親友自身所在位置的正確資訊。
Fifth, using the design of the
第六,利用臉部辨識模組18及已知人物資料庫133的設計,不但可幫助使用者對眼前人物作辨識,也可避免針對陌生人生成多餘的文字資
訊。
Sixth, the design of the
綜上所述,本新型之影像轉語音之視障輔助裝置於使用時,為確實能達到其功效及目的,故本新型誠為一實用性優異之新型,為符合新型專利之申請要件,爰依法提出申請,盼 審委早日賜准本新型,以保障申請人之辛苦創作,倘若 鈞局審委有任何稽疑,請不吝來函指示,申請人定當竭力配合,實感德便。 To sum up, the image-to-speech auxiliary device for the visually impaired of this new model can indeed achieve its effect and purpose when used. Therefore, this new model is truly a new model with excellent practicality and meets the application requirements for a new model patent. The application is submitted in accordance with the law, and we hope that the review committee will approve this model as soon as possible to protect the applicant's hard work. If the review committee of the Jun Bureau has any doubts, please feel free to write a letter for instructions. The applicant will do its best to cooperate and it will be greatly appreciated.
1:頭戴式裝置 1:Head mounted device
11:攝像元件 11:Camera components
111:第一無線通訊元件 111:The first wireless communication component
12:影像辨識模組 12:Image recognition module
13:文章生成模組 13: Article generation module
131:影像描述資料庫 131: Image description database
14:語音轉換模組 14: Voice conversion module
15:播放裝置 15:Playback device
2:導盲杖 2:Guide cane
21:第一控制部 21:First Control Department
22:第二無線通訊元件 22: Second wireless communication component
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112208257U TWM648987U (en) | 2023-08-04 | 2023-08-04 | Image-to-speech assistive device for visually impaired |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW112208257U TWM648987U (en) | 2023-08-04 | 2023-08-04 | Image-to-speech assistive device for visually impaired |
Publications (1)
Publication Number | Publication Date |
---|---|
TWM648987U true TWM648987U (en) | 2023-12-01 |
Family
ID=90040220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW112208257U TWM648987U (en) | 2023-08-04 | 2023-08-04 | Image-to-speech assistive device for visually impaired |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWM648987U (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI839285B (en) * | 2023-08-04 | 2024-04-11 | 上弘醫療設備股份有限公司 | Image-to-speech assistive device for the visually impaired |
-
2023
- 2023-08-04 TW TW112208257U patent/TWM648987U/en unknown
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI839285B (en) * | 2023-08-04 | 2024-04-11 | 上弘醫療設備股份有限公司 | Image-to-speech assistive device for the visually impaired |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065055B (en) | Method, storage medium, and apparatus for generating AR content based on sound | |
US7676372B1 (en) | Prosthetic hearing device that transforms a detected speech into a speech of a speech form assistive in understanding the semantic meaning in the detected speech | |
Rajmohan et al. | Efficient Indian Sign Language Interpreter For Hearing Impaired | |
Kishore et al. | Optical flow hand tracking and active contour hand shape features for continuous sign language recognition with artificial neural networks | |
JP2019531538A (en) | Wordflow annotation | |
KR20170034409A (en) | Method and apparatus to synthesize voice based on facial structures | |
Madhuri et al. | Vision-based sign language translation device | |
CN113835522A (en) | Sign language video generation, translation and customer service method, device and readable medium | |
Prado et al. | Visuo-auditory multimodal emotional structure to improve human-robot-interaction | |
Oliveira et al. | Automatic sign language translation to improve communication | |
TWM648987U (en) | Image-to-speech assistive device for visually impaired | |
CN110524559A (en) | Intelligent human-machine interaction system and method based on human behavior data | |
Patil et al. | Guidance system for visually impaired people | |
Kanvinde et al. | Bidirectional sign language translation | |
KR100348823B1 (en) | Apparatus for Translating of Finger Language | |
KR100730573B1 (en) | Sign Language Phone System using Sign Recconition and Sign Generation | |
TWI839285B (en) | Image-to-speech assistive device for the visually impaired | |
Khan et al. | Sign language translation in urdu/hindi through microsoft kinect | |
Saitoh et al. | Lip25w: Word-level lip reading web application for smart device | |
JP2022075662A (en) | Information extraction apparatus | |
Bhuiyan et al. | An assistance system for visually challenged people based on computer vision and iot | |
JP2022075661A (en) | Information extraction apparatus | |
JP2023117068A (en) | Speech recognition device, speech recognition method, speech recognition program, speech recognition system | |
Özkul et al. | Multimodal analysis of upper-body gestures, facial expressions and speech | |
Manglani et al. | Lip Reading Into Text Using Deep Learning |