TWI809301B

TWI809301B - Visually impaired voice assistance system and visually impaired voice assistance method

Info

Publication number: TWI809301B
Application number: TW109128593A
Authority: TW
Inventors: 曾建勳
Original assignee: 崑山科技大學
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2023-07-21
Also published as: TW202209176A

Abstract

本發明係提供一種視障語音輔助系統及視障語音輔助方法。視障語音輔助系統包含頭戴式裝置、便攜式電子裝置及辨識伺服器。頭戴式裝置擷取使用者前方的環境以產生即時環境影像。便攜式電子裝置接收即時環境影像並將其傳送。辨識伺服器的資料庫儲存複數個參考物件特徵及對應的參考物件資訊。辨識伺服器的學習模組由來自便攜式電子裝置的即時環境影像辨識出複數個環境物件特徵，且比對環境物件特徵符合參考物件特徵時産生包含對應的參考物件資訊的即時輔助資訊，並回饋至便攜式電子裝置。其中，便攜式電子裝置依據即時輔助資訊産生輔助語音訊息並控制頭戴式裝置發出。 The invention provides a voice assistance system for the visually impaired and a voice assistance method for the visually impaired. The speech assistance system for the visually impaired includes a head-mounted device, a portable electronic device and a recognition server. The head-mounted device captures the environment in front of the user to generate a real-time environment image. The portable electronic device receives real-time environmental images and transmits them. The database of the recognition server stores a plurality of reference object features and corresponding reference object information. The learning module of the identification server recognizes a plurality of environmental object features from the real-time environmental images from the portable electronic device, and generates real-time auxiliary information containing corresponding reference object information when comparing the environmental object features with the reference object features, and feeds back to Portable Electronic Devices. Wherein, the portable electronic device generates auxiliary voice messages according to the real-time auxiliary information and controls the head-mounted device to send out.

Description

Visually impaired voice assistance system and visually impaired voice assistance method

本發明有關於一種辨識系統，特別是指一種視障語音輔助系統及視障語音輔助方法。 The present invention relates to an identification system, in particular to a voice assistance system for the visually impaired and a voice assistance method for the visually impaired.

按，視障人士由於視覺上的障礙，使得在行動過程中具有諸多不便，而得益於現今科技的進步，有許多相關業者致力研發出各式的視障行動輔助工具，以讓視障人士更為方便從事各種活動，幫助視障人士的居家生活更加便利。 By the way, due to visual obstacles, the visually impaired people have a lot of inconvenience in the process of action. Thanks to the advancement of technology today, many related companies are committed to developing various visually impaired mobility aids to allow the visually impaired It is more convenient to engage in various activities and helps the visually impaired to live more conveniently at home.

然而，上述視障行動輔助工具雖可達到對視障者進行移動輔助的功效，在其實際操作施行使用上，若功能齊全時，視障行動輔助工具不僅在整體結構設計上顯得較為複雜。且，大多數的視障行動輔助工具大多會提示使用者障礙物，而戶外環境有許多的變數去限制使用者的行動，甚至造成使用者的危險，從而顯的不夠全面性。又或者，大多數的視障行動輔助工具為直接指示使用者行走的方式，而其又缺少了能讓使用者自行選擇行走路徑等等的機會。 However, although the above-mentioned mobility aids for the visually impaired can achieve the effect of assisting the movement of the visually impaired, in actual operation and use, if the functions are complete, the mobility aids for the visually impaired are not only relatively complicated in overall structural design. Moreover, most of the visually impaired mobility aids will remind the user of obstacles, and there are many variables in the outdoor environment to restrict the user's actions, and even cause the user's danger, which is not comprehensive enough. Or, most of the mobility aids for the visually impaired directly instruct the user to walk, and it lacks the opportunity for the user to choose the walking path by himself or the like.

綜觀前所述，本發明之發明者思索並設計一種視障語音輔助系統及視障語音輔助方法，以期針對習知技術之缺失加以改善，進而增進產業上之實施利用。 In view of the foregoing, the inventor of the present invention conceived and designed a voice assistance system and a voice assistance method for the visually impaired, in order to improve the deficiencies of conventional technologies, and further enhance the implementation and utilization in the industry.

有鑑於上述習知之問題，本發明的目的在於提供一種視障語音輔助系統及視障語音輔助方法，用以解決習知技術中所面臨之問題。 In view of the above known problems, the object of the present invention is to provide a voice assistance system and a voice assistance method for the visually impaired to solve the problems faced in the prior art.

基於上述目的，本發明係提供一種視障語音輔助系統，其包含頭戴式裝置、便攜式電子裝置及辨識伺服器。其中，頭戴式裝置係包含攝像模組及出音模組。該攝像模組係擷取使用者前方的環境以產生即時環境影像。該出音模組係發出輔助語音訊息。便攜式電子裝置係包含處理模組及無線傳輸模組。該無線傳輸模組係依據傳輸控制協定來執行用戶端與伺服器之間之多執行序時程，將即時環境影像傳送至辨識伺服器。辨識伺服器係包含資料庫及學習模組。該辨識伺服器係接收來自該便攜式電子裝置的該即時環境影像；該資料庫係儲存複數個參考物件特徵及分別對應的參考物件資訊；該學習模組係由該即時環境影像辨識出複數個環境物件特徵，且該學習模組比對並分析各該環境物件特徵，符合各該參考物件特徵時，該學習模組係産生包含對應的各該參考物件資訊的即時輔助資訊並回饋至該便攜式電子裝置。其中，該無線傳輸模組依據傳輸控制協定來執行用戶端與伺服器之間之多執行序時程，使得該即時輔助資訊可同步遠端儲存至雲端伺服器或本機端，以供快速查詢、擷取、佈署與分析。其中，該處理模組依據該即時輔助資訊産生該輔助語音訊息，並控制該出音模組係發出該輔助語音訊息。 Based on the above purpose, the present invention provides a speech assistance system for the visually impaired, which includes a head-mounted device, a portable electronic device, and a recognition server. Wherein, the head-mounted device includes a camera module and a sound output module. The camera module captures the environment in front of the user to generate a real-time environment image. The sound output module sends auxiliary voice messages. The portable electronic device includes a processing module and a wireless transmission module. The wireless transmission module executes multiple execution procedures between the client and the server according to the transmission control protocol, and transmits the real-time environment image to the recognition server. The recognition server system includes a database and a learning module. The recognition server receives the real-time environment image from the portable electronic device; the database stores a plurality of reference object features and corresponding reference object information; the learning module recognizes a plurality of environments from the real-time environment image Object characteristics, and the learning module compares and analyzes the characteristics of each of the environmental objects, and when it matches the characteristics of each of the reference objects, the learning module generates real-time auxiliary information containing the corresponding information of each of the reference objects and feeds it back to the portable electronic device. device. Among them, the wireless transmission module executes multiple execution procedures between the client and the server according to the transmission control protocol, so that the real-time auxiliary information can be stored synchronously and remotely to the cloud server or the local terminal for quick query , capture, deployment and analysis. Wherein, the processing module generates the auxiliary voice message according to the real-time auxiliary information, and controls the sound output module to send out the auxiliary voice message.

較佳地，該輔助語音訊息係包含各該參考物件資訊及各該參考物件資訊所對應的物件於該即時環境影像中相對於使用者的位置與距離。 Preferably, the auxiliary voice information includes each piece of reference object information and the position and distance of the object corresponding to each piece of reference object information relative to the user in the real-time environment image.

較佳地，該學習模組係依據張量流演算法之單次多框偵測器模組以及深度學習之卷積神經網路演算法來比對各該環境物件特徵，包含物件與人臉特徵，是否符合各該參考物件特徵。 Preferably, the learning module compares the features of the environmental objects, including object and face features, based on the single-shot multi-frame detector module of the tensor flow algorithm and the convolutional neural network algorithm of deep learning , whether it conforms to the characteristics of each reference object.

較佳地，該頭戴式裝置更包含收音模組，其接收來自使用者的詢問訊息，該處理模組係依據該詢問訊息及該即時輔助資訊産生該輔助語音訊息，並控制該出音模組係發出該輔助語音訊息。 Preferably, the head-mounted device further includes a radio module, which receives an inquiry message from the user, and the processing module generates the auxiliary voice message according to the inquiry message and the real-time auxiliary information, and controls the sound output module The system sends out the auxiliary voice message.

較佳地，當該即時環境影像中包含行動條碼資訊時，該處理模組係依據該行動條碼資訊連接至應用伺服器，以讀取相對應的物品資訊，並依據該物品資訊産生該輔助語音訊息，以控制該出音模組係發出該輔助語音訊息。 Preferably, when the real-time environment image contains mobile barcode information, the processing module is connected to the application server according to the mobile barcode information to read the corresponding item information, and generate the auxiliary voice according to the item information message, to control the voice module to send out the auxiliary voice message.

基於上述目的，本發明再提供一種視障語音輔助方法，包含下列步驟：利用頭戴式裝置的攝像模組擷取使用者前方的環境以產生即時環境影像；透過便攜式電子裝置接收該即時環境影像並傳送至辨識伺服器；由該即時環境影像辨識出複數個環境物件特徵，並比對各該環境物件特徵是否符合資料庫中的複數個參考物件特徵；産生包含對應於該參考物件特徵的參考物件資訊的即時輔助資訊，並回饋至該便攜式電子裝置；以及依據該即時輔助資訊産生輔助語音訊息，並控制該頭戴式裝置的出音模組係發出該輔助語音訊息。 Based on the above purpose, the present invention further provides a voice assistance method for the visually impaired, which includes the following steps: using the camera module of the head-mounted device to capture the environment in front of the user to generate a real-time environment image; receiving the real-time environment image through a portable electronic device And send it to the recognition server; identify a plurality of environmental object features from the real-time environmental image, and compare whether each of the environmental object features conforms to a plurality of reference object features in the database; generate a reference object corresponding to the reference object feature The real-time auxiliary information of the object information is fed back to the portable electronic device; and the auxiliary voice message is generated according to the real-time auxiliary information, and the sound output module of the head-mounted device is controlled to send out the auxiliary voice message.

較佳地，該輔助語音訊息係包含各該參考物件資訊及各該參考物件資訊所對應的物件於該環境影像中的位置，以及參考物件資訊所對應的物件與使用者之間的相對距離於該即時環境影像中相對於使用者的位置與距離。 Preferably, the auxiliary voice message includes the reference object information and the position of the object corresponding to the reference object information in the environment image, and the relative distance between the object corresponding to the reference object information and the user. The location and distance relative to the user in the real-time environment image.

較佳地，視障語音輔助方法更包含下列步驟：依據物件辨識之張量流演算法以及人臉辨識之深度學習卷積神經網路演算法來比對各該環境物件特徵是否符合各該參考物件特徵，並且估算所辨識之環境物件與使用者之間的距離，其中環境物件包含人臉。 Preferably, the voice assistance method for the visually impaired further includes the following steps: comparing whether the features of the environmental objects match the reference objects according to the tensor flow algorithm for object recognition and the deep learning convolutional neural network algorithm for face recognition feature, and estimate the distance between the recognized environmental object and the user, wherein the environmental object includes a human face.

較佳地，視障語音輔助方法更包含下列步驟：依據深度學習之卷積神經網路比對各該人臉特徵是否符合已訓練資料集內各該參考物件特徵。 Preferably, the voice assistance method for the visually impaired further includes the following steps: comparing each of the facial features according to the convolutional neural network of deep learning to see if they match the features of each of the reference objects in the training data set.

較佳地，視障語音輔助方法更包含下列步驟：接收來自使用者的一詢問訊息；以及依據該詢問訊息及該即時輔助資訊産生該輔助語音訊息，並控制該出音模組係發出該輔助語音訊息。 Preferably, the voice assistance method for the visually impaired further includes the following steps: receiving an inquiry message from the user; voice message.

較佳地，視障語音輔助方法更包含下列步驟：判斷該即時環境影像中是否包含一行動條碼(QR Code)資訊；若是，依據該行動條碼資訊連接至一應用伺服器，以讀取相應的一物品資訊；及依據該物品資訊産生該輔助語音訊息，以控制該出音模組係發出該輔助語音訊息。 Preferably, the voice assistance method for the visually impaired further includes the following steps: judging whether the real-time environment image contains a mobile barcode (QR Code) information; if so, connecting to an application server according to the mobile barcode information to read the corresponding An item information; and generating the auxiliary voice message according to the item information, so as to control the voice output module to send out the auxiliary voice message.

承上所述，本發明之視障語音輔助系統及視障語音輔助方法，具有一或多個下述優點： Based on the above, the speech assistance system and speech assistance method for the visually impaired according to the present invention have one or more of the following advantages:

(1)本發明之視障語音輔助系統及視障語音輔助方法，其藉由攝像模組獲得即時環境影像後，可透過便攜式電子裝置傳送至辨識伺服器作物件辨識與距離估算處理，從而進行包含行人、車輛、道路標誌、坑洞、階梯等物件辨識與距離估算以及安全提示的功能，進而增加使用者在使用時的體驗感受。 (1) In the voice assistance system and voice assistance method for the visually impaired of the present invention, after the real-time environment image is obtained by the camera module, it can be sent to the recognition server through the portable electronic device for object recognition and distance estimation processing, thereby performing Including pedestrians, vehicles, road signs, potholes, stairs and other object recognition and distance estimation and safety reminder functions, thereby increasing the user experience when using it.

(2)本發明之視障語音輔助系統及視障語音輔助方法，藉由辨識伺服器利用張量流之單次多框偵測器模組作物件辨識處理，從而可快速且有效地即時辨識出環境中的各種物體與人物。 (2) The visually impaired speech assistance system and the visually impaired speech assistance method of the present invention use tensor flow single-shot multi-frame detector modules to perform object recognition processing by the recognition server, so that real-time recognition can be performed quickly and effectively objects and people in the environment.

(3)本發明之視障語音輔助系統及視障語音輔助方法，藉由辨識伺服器利用深度學習之卷積神經網路作人臉辨識處理，從而可快速且有效地即時辨識出環境中熟識的人物。 (3) The visually impaired speech assistance system and the visually impaired speech assistance method of the present invention use the convolutional neural network of deep learning for face recognition processing by the recognition server, so that it can quickly and effectively recognize familiar faces in the environment in real time. characters.

(4)本發明之視障語音輔助系統及視障語音輔助方法，其可依據使用者的詢問，從而據以發出安全提示的功能，進而增加使用者在使用時的主動式的體驗感受。 (4) The voice assistance system and voice assistance method for the visually impaired of the present invention can issue safety reminders based on the user's inquiries, thereby increasing the user's active experience in use.

以下將以具體的實施例配合所附的圖式詳加說明本發明的技術特徵，以使所屬技術領域具有通常知識者可易於瞭解本發明的目的、技術特徵、及其優點。 The technical features of the present invention will be described in detail below with specific embodiments and accompanying drawings, so that those skilled in the art can easily understand the purpose, technical features, and advantages of the present invention.

1:視障語音輔助系統 1: Visually impaired voice assistance system

10:頭戴式裝置 10:Head mounted device

11:攝像模組 11: Camera module

111:即時環境影像 111: Real-time environment image

12:出音模組 12: Sound module

121:輔助語音訊息 121: Auxiliary voice message

13:收音模組 13: Radio module

131:詢問訊息 131: inquiry message

20:便攜式電子裝置 20: Portable Electronic Devices

21:處理模組 21: Processing Module

22:無線傳輸模組 22: Wireless transmission module

30:辨識伺服器 30: Identify the server

31:資料庫 31: Database

311:參考物件特徵 311:Reference object feature

312:參考物件資訊 312: Reference object information

32:學習模組 32: Learning modules

321:環境物件特徵 321: Environmental object characteristics

322:即時輔助資訊 322: Real-time auxiliary information

91:行動條碼資訊 91: Mobile Barcode Information

92:應用伺服器 92:Application server

921:物品資訊 921: Item information

S11~S15:步驟 S11~S15: Steps

S21~S22:步驟 S21~S22: Steps

S31~S33:步驟 S31~S33: steps

為了更清楚地說明本發明實施例的技術方案，下面將對本發明實施例描述中所需要使用的附圖作簡單地介紹，顯而易見地，下面所描述的附圖僅

是本發明的一些實施例，對於所屬技術領域中具有通常知識者來講，還可以根據這些附圖獲得其他的附圖。 In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments of the present invention. Obviously, the accompanying drawings described below are only

These are some embodiments of the present invention, and those skilled in the art can also obtain other drawings based on these drawings.

第1圖為本發明之視障語音輔助系統的第一實施例的方塊示意圖。 FIG. 1 is a schematic block diagram of the first embodiment of the voice assistance system for the visually impaired according to the present invention.

第2圖為本發明之視障語音輔助系統的第一實施例的學習模組的流程示意圖。 FIG. 2 is a schematic flow chart of the learning module of the first embodiment of the voice assistance system for the visually impaired according to the present invention.

第3圖為本發明之視障語音輔助系統的第二實施例的方塊示意圖。 FIG. 3 is a schematic block diagram of the second embodiment of the voice assistance system for the visually impaired according to the present invention.

第4圖為本發明之視障語音輔助系統的第三實施例的方塊示意圖。 FIG. 4 is a schematic block diagram of the third embodiment of the voice assistance system for the visually impaired according to the present invention.

第5圖為本發明之視障語音輔助方法的第一步驟示意圖。 Fig. 5 is a schematic diagram of the first step of the voice assistance method for the visually impaired according to the present invention.

第6圖為本發明之視障語音輔助方法的第二步驟示意圖。 Fig. 6 is a schematic diagram of the second step of the voice assistance method for the visually impaired according to the present invention.

第7圖為本發明之視障語音輔助方法的第三步驟示意圖。 Fig. 7 is a schematic diagram of the third step of the voice assistance method for the visually impaired according to the present invention.

本發明之優點、特徵以及達到之技術方法將參照例示性實施例及所附圖式進行更詳細地描述而更容易理解，且本發明可以不同形式來實現，故不應被理解僅限於此處所陳述的實施例，相反地，對所屬技術領域具有通常知識者而言，所提供的實施例將使本揭露更加透徹與全面且完整地傳達本發明的範疇，且本發明將僅為所附加的申請專利範圍所定義。 The advantages, features and technical methods achieved by the present invention will be described in more detail with reference to exemplary embodiments and accompanying drawings to make it easier to understand, and the present invention can be implemented in different forms, so it should not be understood as being limited to what is shown here The stated embodiments, on the contrary, for those skilled in the art, the provided embodiments will make the present disclosure more thorough and comprehensive and completely convey the scope of the present invention, and the present invention will be only the appended The scope of the patent application is defined.

應當理解的是，儘管術語「第一」、「第二」等在本發明中可用於描述各種元件、部件、區域、區段、層及/或部分，但是這些元件、部件、區域、區段、層及/或部分不應受這些術語的限制。這些術語僅用於將一個元件、部件、區域、區段、層及/或部分與另一個元件、部件、區域、區段、層及/或部分區分開。 It should be understood that although the terms "first", "second" and the like may be used in the present invention to describe various elements, components, regions, sections, layers and/or sections, these elements, components, regions, sections , layer and/or section should not be limited by these terms. These terms are only used to distinguish one element, component, region, section, layer and/or section from another element, component, region, section, layer and/or section.

除非另有定義，本發明所使用的所有術語(包括技術和科學術語)具有與本發明所屬技術領域的普通技術人員通常理解的相同含義。將進一步理解的是，諸如在通常使用的字典中定義的那些術語應當被解釋為具有與它們在相關技術和本發明的上下文中的含義一致的定義，並且將不被解釋為理想化或過度正式的意義，除非本文中明確地這樣定義。 Unless otherwise defined, all terms (including technical and scientific terms) used in this invention have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms such as those defined in commonly used dictionaries should be interpreted as having definitions consistent with their meanings in the context of the relevant art and the present invention, and will not be construed as idealistic or overly formal unless otherwise expressly defined herein.

請參閱第1、2圖，其分別為本發明之視障語音輔助系統的第一實施例的方塊示意圖及學習模組的流程示意圖。本發明主要應用於視障的使用者，但在適當情況下亦可應用於一般使用者，故應不可僅以此為限。如第1圖所示，本發明之一種視障語音輔助系統1，其包含頭戴式裝置10、便攜式電子裝置20及辨識伺服器30。 Please refer to Figures 1 and 2, which are respectively the block diagram of the first embodiment of the voice assistance system for the visually impaired and the flow diagram of the learning module of the present invention. The present invention is mainly applied to visual disabled users, but can also be applied to general users under appropriate circumstances, so it should not be limited to this only. As shown in FIG. 1 , a voice assistance system 1 for the visually impaired according to the present invention includes a head-mounted device 10 , a portable electronic device 20 and a recognition server 30 .

其中，頭戴式裝置10可為智慧型眼鏡、智慧型帽子或智慧型頭盔等等，其包含攝像模組11及出音模組12。當然地，頭戴式裝置10更可包含通訊模組、電池模組等所屬技術領域中具有通常知識者所熟知的元件，於此便不再加以贅述。其中，攝像模組11，較佳地可為深度攝像模組，其擷取使用者前方的環境以產生即時環境影像111。出音模組12則是用以發出，例如使用者前方的物件的名稱或其位置或其相對於使用者的距離等等的輔助語音訊息121。 Wherein, the head-mounted device 10 can be smart glasses, a smart hat or a smart helmet, etc., which includes a camera module 11 and a sound output module 12 . Of course, the head-mounted device 10 may further include communication modules, battery modules, and other components that are well known to those skilled in the art, so details will not be repeated here. Wherein, the camera module 11 is preferably a depth camera module, which captures the environment in front of the user to generate a real-time environment image 111 . The voice output module 12 is used to issue auxiliary voice messages 121 such as the name or position of the object in front of the user or its distance relative to the user.

其中，便攜式電子裝置20包含處理模組21及無線傳輸模組22。其中，處理模組21可包含安裝於便攜式電子裝置20的應用程式(APP)、處理器等元件的協同運作。無線傳輸模組22可包含藍芽、2.4G、5G或WIFI等傳輸方式，而在本發明中，係利用USB序列埠傳輸方式電性連接攝像模組11，以傳送即時環境影像111。接著，無線傳輸模組22再利用WIFI之2.4G、5G等傳輸方式將即時環境影像111，依據傳輸控制協定執行用戶端-伺服器之多執行序時程，快速傳送至辨識伺服器30。 Wherein, the portable electronic device 20 includes a processing module 21 and a wireless transmission module 22 . Wherein, the processing module 21 may include an application program (APP) installed in the portable electronic device 20 , a processor and other elements for coordinated operation. The wireless transmission module 22 can include bluetooth, 2.4G, 5G or WIFI transmission methods, and in the present invention, the USB serial port transmission method is used to electrically connect the camera module 11 to transmit the real-time environment image 111 . Then, the wireless transmission module 22 uses 2.4G, 5G and other transmission methods of WIFI to quickly transmit the real-time environment image 111 to the identification server 30 by executing the client-server multi-execution program according to the transmission control protocol.

其中，辨識伺服器30係包含資料庫31及學習模組32。該資料庫31係儲存複數個參考物件特徵311及分別對應的參考物件資訊312。舉例來說，參考物件特徵311可為行人、車輛、坑洞、階梯等已訓練好物件的外形特徵，而參考物件資訊312即為前述物件的名稱或名稱與其進一步資訊。 Wherein, the recognition server 30 includes a database 31 and a learning module 32 . The database 31 stores a plurality of reference object features 311 and corresponding reference object information 312 . For example, the reference object feature 311 can be the appearance feature of trained objects such as pedestrians, vehicles, potholes, stairs, etc., and the reference object information 312 is the name or title and further information of the aforementioned object.

如第2圖所示，其中，該辨識伺服器30係接收來自該便攜式電子裝置20的該即時環境影像111，該學習模組32係由該即時環境影像111辨識出複數個環境物件特徵321，且該學習模組32比對該環境物件特徵321是否符合該參考物件特徵311。順帶說明的是，該學習模組32主要係利用物件辨識之張量流演算法以及人臉辨識之深度學習卷積神經網路演算法來達成。也就是說，學習模組32利用張量流演算法由即時環境影像111辨識出複數個環境物件特徵321，並也利用深度學習之卷積神經網路比對各環境物件特徵321中的人臉，是否符合各參考物件之人臉特徵311，並且估算所辨識之環境物件(包含人臉)與使用者之間的距離。 As shown in FIG. 2, wherein, the identification server 30 receives the real-time environment image 111 from the portable electronic device 20, and the learning module 32 recognizes a plurality of environmental object features 321 from the real-time environment image 111, And the learning module 32 compares whether the environmental object feature 321 matches the reference object feature 311 . Incidentally, the learning module 32 is mainly realized by using the tensor flow algorithm for object recognition and the deep learning convolutional neural network algorithm for face recognition. That is to say, the learning module 32 uses the tensor flow algorithm to identify a plurality of environmental object features 321 from the real-time environmental image 111, and also uses the convolutional neural network of deep learning to compare the human faces in each environmental object feature 321 , whether it matches the face feature 311 of each reference object, and estimate the distance between the recognized environmental object (including the face) and the user.

其中，物件辨識演算法可依據微軟共同物體檢測資料集之單次多框偵測器模組，搭配張量流所導出的凍結推論圖形，從動態即時環境影像中推論物件名稱；其中，人臉辨識演算法係依據Caffe人臉追蹤，從動態即時影像中偵測「哪裡有人臉」(人臉數量、位置、大小、人臉特徵等)，並且依據深度學習之卷積神經網路，透過五官的位置特徵和幾何關係的數據，其中特徵項目包括了眉毛、眼睛、鼻子、嘴巴、臉形等器官的主要特徵，從資料庫中做特徵比較是否有相同人臉、排列相似度等來進一步判斷所偵測到之人臉「是誰」來判斷此影像是否為一張人臉，若符合條件的話，則回傳相關的人臉數據。 Among them, the object recognition algorithm can infer the name of the object from the dynamic real-time environment image based on the single-shot multi-frame detector module of the Microsoft common object detection data set, and the frozen inference graph exported by tensorflow; among them, the human face The recognition algorithm is based on Caffe face tracking to detect "where is there a face" (face number, position, size, face features, etc.) from the dynamic real-time image, and based on the convolutional neural network of deep learning The data of positional features and geometric relationships, among which the feature items include the main features of organs such as eyebrows, eyes, nose, mouth, face shape, etc., do feature comparisons from the database to see if there is the same face, arrangement similarity, etc. to further judge the The detected face is "who" to judge whether the image is a face, and if it meets the conditions, the relevant face data will be returned.

當該環境物件特徵321符合該參考物件特徵311時，該學習模組32係産生包含對應的各該參考物件資訊312的即時輔助資訊322並回饋至該便攜式電子裝置20。其中，便攜式電子裝置20的該處理模組21則可依據該即時輔助資訊322産生該輔助語音訊息121，並控制該出音模組12 係發出該輔助語音訊息121。值得一提的是，該輔助語音訊息121係包含各該參考物件資訊312及各該參考物件資訊312所對應的物件於該即時環境影像111中的位置。 When the environmental object characteristics 321 match the reference object characteristics 311 , the learning module 32 generates real-time auxiliary information 322 including corresponding reference object information 312 and feeds back to the portable electronic device 20 . Wherein, the processing module 21 of the portable electronic device 20 can generate the auxiliary voice message 121 according to the real-time auxiliary information 322, and control the sound output module 12 Send out the auxiliary voice message 121 . It is worth mentioning that the auxiliary voice message 121 includes each of the reference object information 312 and the position of the object corresponding to each of the reference object information 312 in the real-time environment image 111 .

舉例來說，攝像模組11拍攝完前方畫面後，在經上述的程序後，處理模組21會控制出音模組12以語音方式提示使用者前方周遭環境的左邊有什麼樣的物件(包含熟識或未知的人物)且其相對於使用者的距離多遠，中間環境有什麼樣的物件(包含熟識或未知的人物)且其相對於使用者的距離多遠，右邊環境有什麼樣的物件(包含熟識或未知的人物)且其相對於使用者的距離多遠等訊息。而使用者在得知這些訊息後，其可向熟識的人打招呼並且自行決定行走的方式，例如偏向左、依中間走或偏向等，而不是僅聽從輔助裝置的語音來行走或被動地跟著輔助裝置行走。順帶一提的是，由於使用者可能是不斷的在往前行走，因此，在後續過程中，處理模組21可控制的方式則僅針對新出現的物件或距離使用者過近的物件來控制出音模組12以語音方式提示使用者。 For example, after the camera module 11 finishes shooting the front picture, after the above procedure, the processing module 21 will control the sound module 12 to prompt the user what kind of objects (including familiar or unknown characters) and how far away they are from the user, what kind of objects are in the middle environment (including familiar or unknown characters) and how far are they relative to the user, what kind of objects are in the right environment (including familiar or unknown characters) and how far away they are from the user. After learning the information, the user can greet the acquaintances and decide how to walk, such as leaning to the left, walking in the middle, or leaning, instead of just listening to the voice of the assisting device or passively following the assisting device. The device walks. Incidentally, since the user may be constantly walking forward, in the subsequent process, the controllable mode of the processing module 21 is only for new objects or objects that are too close to the user. The sound module 12 prompts the user by voice.

請參閱第3圖，其為本發明之視障語音輔助系統的第二實施例的方塊示意圖。本發明主要應用於視障的使用者，但在適當情況下亦可應用於一般使用者，故應不可僅以此為限。如圖所示，本發明的一種視障語音輔助系統1，其包含頭戴式裝置10、便攜式電子裝置20及辨識伺服器30。 Please refer to FIG. 3 , which is a schematic block diagram of the second embodiment of the voice assistance system for the visually impaired of the present invention. The present invention is mainly applied to visually impaired users, but it can also be applied to ordinary users under appropriate circumstances, so it should not be limited thereto. As shown in the figure, a voice assistance system 1 for the visually impaired according to the present invention includes a head-mounted device 10 , a portable electronic device 20 and a recognition server 30 .

而本實施例與前述實施例之主要不同之處在於，該頭戴式裝置10更包含一收音模組13，其接收來自使用者的一詢問訊息131。該處理模組21係依據該詢問訊息131及該即時輔助資訊322産生該輔助語音訊息121，並控制該出音模組12係發出該輔助語音訊息121。 The main difference between this embodiment and the previous embodiments is that the head-mounted device 10 further includes a radio module 13 for receiving an inquiry message 131 from the user. the place The management module 21 generates the auxiliary voice message 121 according to the inquiry message 131 and the real-time auxiliary information 322, and controls the voice output module 12 to send out the auxiliary voice message 121.

其中，攝像模組11拍攝完前方畫面後，在經上述的程序後，處理模組21並不會主動控制出音模組12以語音方式提示使用者。處理模組21會在接收到使用者的詢問訊息131後，處理模組21才根據詢問訊息131及即時輔助資訊322産生該輔助語音訊息121，並控制該出音模組12係發出該輔助語音訊息121。 Wherein, after the camera module 11 shoots the front picture, the processing module 21 will not actively control the sound module 12 to prompt the user by voice after the above-mentioned procedure. After the processing module 21 receives the query message 131 from the user, the processing module 21 generates the auxiliary voice message 121 according to the query message 131 and the real-time auxiliary information 322, and controls the sound output module 12 to send out the auxiliary voice Message 121.

舉例來說，攝像模組11拍攝完前方畫面後，在經上述的程序後，處理模組21會保持待機狀態。而使用者可透過收音模組13詢問身體左邊有什麼樣的物件的詢問訊息131。此時，處理模組21可控制出音模組12以語音方式提示使用者前方的左邊有坑洞等物件且其相對距離多遠。另外，在實際運用中，其可為在行走的開始時，處理模組21會先提示前方的所有物件，而在行走的過程中，處理模組21再根據詢問訊息131來提示使用者。 For example, after the camera module 11 shoots the front frame, the processing module 21 will remain in the standby state after the above procedure. And the user can inquire about the inquiry message 131 of what kind of object is there on the left side of the body through the sound receiving module 13 . At this time, the processing module 21 can control the sound module 12 to remind the user that there are potholes and other objects on the left side in front of them and how far they are relative to each other. In addition, in practical application, at the beginning of walking, the processing module 21 will first prompt all the objects in front, and during the walking process, the processing module 21 will prompt the user according to the query message 131 .

值得一提的是，出音模組12及收音模組13可為骨傳導式，透過骨頭共振將聲音直接傳達到大腦的接收端，以避免外界雜訊的影響。 It is worth mentioning that the sound emitting module 12 and the sound receiving module 13 can be of bone conduction type, and the sound is directly transmitted to the receiving end of the brain through bone resonance, so as to avoid the influence of external noise.

請參閱第4圖，其為本發明之視障語音輔助系統的第三實施例的方塊示意圖。本發明主要應用於視障的使用者，但在適當情況下亦可應用於一般使用者，故應不可僅以此為限。如圖所示，本發明的一種視障語音輔助系統1，其包含頭戴式裝置10、便攜式電子裝置20及辨識伺服器30。 Please refer to FIG. 4 , which is a schematic block diagram of a third embodiment of the voice assistance system for the visually impaired of the present invention. The present invention is mainly applied to visually impaired users, but it can also be applied to ordinary users under appropriate circumstances, so it should not be limited thereto. As shown in the figure, a voice assistance system 1 for the visually impaired according to the present invention includes a head-mounted device 10 , a portable electronic device 20 and a recognition server 30 .

而本實施例與前述實施例之主要不同之處在於，當即時環境影像111中包含一行動條碼資訊91時，處理模組21係依據行動條碼資訊91連接至一應用伺服器92，以讀取相應的一物品資訊921，並依據物品資訊921産生輔助語音訊息121，以控制出音模組12係發出輔助語音訊息121。 The main difference between this embodiment and the preceding embodiments is that when the real-time environment image 111 includes a mobile barcode information 91, the processing module 21 is connected to an application server 92 according to the mobile barcode information 91 to read Corresponding to an item information 921, and generate auxiliary voice message 121 according to the item information 921, so as to control the sound module 12 to send out the auxiliary voice message 121.

也就是說，攝像模組11拍攝完前方畫面後，即時環境影像111即時傳送至便攜式電子裝置20後，處理模組21可先判斷即時環境影像111是否存有行動條碼資訊91，若有則依據行動條碼資訊91連接至對應的應用伺服器92，以讀取相應的物品資訊921。接著，處理模組21可依據物品資訊921産生輔助語音訊息121，以控制出音模組12係發出輔助語音訊息121。例如，行動條碼資訊91所對應的物件是什麼等等資訊。 That is to say, after the camera module 11 shoots the front picture, and the real-time environment image 111 is transmitted to the portable electronic device 20 in real time, the processing module 21 can first judge whether the real-time environment image 111 contains the action barcode information 91, and if so, based on The mobile barcode information 91 is connected to the corresponding application server 92 to read the corresponding item information 921 . Then, the processing module 21 can generate the auxiliary voice message 121 according to the item information 921 to control the sound module 12 to send out the auxiliary voice message 121 . For example, what is the object corresponding to the mobile barcode information 91 and so on.

另外，在不同實施例中，為避免不斷的發出行動條碼資訊91所對應的物件的資訊，使用者可透過收音模組13等方式設定為逛街模式等，而在此模式下，處理模組21才會判斷即時環境影像111是否存有行動條碼資訊91。 In addition, in different embodiments, in order to avoid continuously sending out the information of the object corresponding to the mobile barcode information 91, the user can set the shopping mode through the radio module 13, etc., and in this mode, the processing module 21 Only then will it be judged whether the real-time environment image 111 contains the mobile barcode information 91.

請參閱第5、6、7圖，其分別為本發明之視障語音輔助方法的第一步驟示意圖、第二步驟示意圖及第三步驟示意圖。 Please refer to Figures 5, 6, and 7, which are schematic views of the first step, second step, and third step of the voice assistance method for the visually impaired according to the present invention.

如第5圖所示，本發明的一種視障語音輔助方法，其包含下列步驟：(S11)利用一頭戴式裝置10的一攝像模組11係擷取使用者前方的環境以產生一即時環境影像111；(S12)透過一便攜式電子裝置20接收該即時環境影像111並傳送至一辨識伺服器30；(S13)由該即時環境影像111辨識出複數個環境物件特徵321，並比對各該環境物件特徵321是否符合一資料庫31中的複數個參考物件特徵311；(S14)産生包含對應於該參考物件特徵311的一參考物件資訊312的一即時輔助資訊322，並回饋至該便攜式電子裝置20；以及(S15)依據該即時輔助資訊322産生一輔助語音訊息121，並控制該頭戴式裝置10的一出音模組12係發出該輔助語音訊息121。 As shown in FIG. 5, a voice assistance method for the visually impaired of the present invention includes the following steps: (S11) using a camera module 11 of a head-mounted device 10 to capture the environment in front of the user to generate a real-time Environmental image 111; (S12) receiving the real-time environmental image 111 through a portable electronic device 20 and sending it to a recognition server 30; (S13) identifying a plurality of environmental object features 321 from the real-time environmental image 111, and comparing each Whether this environmental object feature 321 accords with a plurality of reference object features 311 in a database 31; (S14) generate and include corresponding to this parameter and (S15) generate an auxiliary voice message 121 according to the real-time auxiliary information 322, and control the head-mounted device A voice module 12 of 10 sends out the auxiliary voice message 121 .

如第6圖所示，本發明的一種視障語音輔助方法，其更包含下列步驟：(S21)接收來自使用者的一詢問訊息131；以及(S22)依據該詢問訊息131及該即時輔助資訊322産生該輔助語音訊息121，並控制該出音模組12係發出該輔助語音訊息121。 As shown in FIG. 6, a speech assistance method for the visually impaired of the present invention further includes the following steps: (S21) receiving an inquiry message 131 from the user; and (S22) according to the inquiry message 131 and the real-time auxiliary information 322 generates the auxiliary voice message 121, and controls the sound output module 12 to send out the auxiliary voice message 121.

如第7圖所示，本發明的一種視障語音輔助方法，其更包含下列步驟：(S31)判斷該即時環境影像111中是否包含一行動條碼資訊91；(S32)若是，依據該行動條碼資訊91連接至一應用伺服器92，以讀取相應的一物品資訊921；(S33)依據該物品資訊921産生該輔助語音訊息121，以控制該出音模組12係發出該輔助語音訊息121。 As shown in Figure 7, a voice assistance method for the visually impaired of the present invention further includes the following steps: (S31) judging whether the real-time environment image 111 contains an action barcode information 91; (S32) if so, according to the action barcode The information 91 is connected to an application server 92 to read a corresponding item information 921; (S33) generating the auxiliary voice message 121 according to the item information 921, so as to control the voice output module 12 to send out the auxiliary voice message 121 .

本發明之視障語音輔助方法的詳細說明以及實施方式已於前面敘述本發明之視障語音輔助系統時描述過，在此為了簡略說明便不再重覆敘述。 The detailed description and implementation of the voice assistance method for the visually impaired of the present invention have been described above in the speech assistance system for the visually impaired of the present invention, and will not be repeated here for brevity.

以上所述僅為舉例性，而非為限制性者。任何未脫離本發明之精神與範疇，而對其進行之等效修改或變更，均應包含於後附之申請專利範圍中。 The above descriptions are illustrative only, not restrictive. Any equivalent modification or change made without departing from the spirit and scope of the present invention shall be included in the scope of the appended patent application.

1:視障語音輔助系統 1: Visually impaired voice assistance system

10:頭戴式裝置 10:Head mounted device

11:攝像模組 11: Camera module

111:即時環境影像 111: Real-time environment image

12:出音模組 12: Sound module

121:輔助語音訊息 121: Auxiliary voice message

20:便攜式電子裝置 20: Portable Electronic Devices

21:處理模組 21: Processing Module

22:無線傳輸模組 22: Wireless transmission module

30:辨識伺服器 30: Identify the server

31:資料庫 31: Database

311:參考物件特徵 311:Reference object feature

312:參考物件資訊 312: Reference object information

32:學習模組 32: Learning modules

321:環境物件特徵 321: Environmental object characteristics

322:即時輔助資訊 322: Real-time auxiliary information

Claims

A voice assistance system for the visually impaired, which includes: a head-mounted device including a camera module and a sound output module, the camera module captures the environment in front of the user to generate a real-time environment image, the output The audio module sends an auxiliary voice message; a portable electronic device includes a processing module and a wireless transmission module, and the wireless transmission module transmits the real-time environment image; a recognition server includes a database and a learning module, the database stores a plurality of reference object features and corresponding reference object information, the recognition server receives the real-time environment image from the portable electronic device, and the learning module is generated by the real-time environment When the image recognizes a plurality of environmental object features, and the learning module compares the environmental object features with the reference object features, the learning module generates a real-time auxiliary information containing the corresponding reference object information and feeds back to the portable electronic device, and synchronously stored in a cloud server; wherein, the processing module generates the auxiliary voice message according to the real-time auxiliary information, and controls the sound output module to send out the auxiliary voice message, the voice auxiliary message Contains a plurality of orientations, and the reference object information corresponding to the plurality of orientations, and the voice assistance message does not directly indicate the way the user is walking; wherein, when the user maintains the direction of travel, the processing module is only for the reference The new object in the object is used to control the sound module to prompt the user with the auxiliary voice message; wherein, the learning module is based on the tensor flow algorithm of a single multi-frame detector module and the volume of deep learning The product neural network algorithm compares whether the characteristics of each environmental object conform to the characteristics of each reference object. When the face in each environmental object characteristic matches the face characteristics of each reference object, the processing module controls the sound model Group voice prompts have met the reference object The orientation of the facial feature of the piece, the distance from the user and the corresponding facial data.

The voice assistance system for the visually impaired as described in claim 1, wherein the head-mounted device further includes a radio module, which receives an inquiry message from the user, and the processing module is based on the inquiry message and the real-time auxiliary information Generate the auxiliary voice message, and control the sound output module to send out the auxiliary voice message.

The voice assistance system for the visually impaired as described in claim 1, wherein when the real-time environment image contains a mobile barcode (QR code) information, the processing module is connected to an application server according to the mobile barcode information to read Fetch a corresponding item information, and generate the auxiliary voice message according to the item information, so as to control the sound output module to send out the auxiliary voice message.

A voice assistance method for the visually impaired, which includes the following steps: using a camera module of a head-mounted device to capture the environment in front of the user to generate a real-time environment image; receiving and transmitting the real-time environment image through a portable electronic device to a recognition server; identify a plurality of environmental object characteristics from the real-time environmental image, and compare whether each of the environmental object characteristics conforms to a plurality of reference object characteristics in a database; generate an image corresponding to the reference object characteristics Refer to a real-time auxiliary information of the object information, feed it back to the portable electronic device, and store it in a cloud server synchronously; and generate an auxiliary voice message according to the real-time auxiliary information, and control a sound mode of the head-mounted device The group sends out the auxiliary voice message, the voice auxiliary message includes a plurality of directions, and the information of the reference object corresponding to the plurality of directions, and the voice auxiliary message does not directly instruct the user to walk; wherein, when the user keeps walking direction, the processing module controls the sound output module to prompt the user with the auxiliary voice message only for newly-appeared objects; Among them, the learning module compares whether the characteristics of each environmental object conform to the characteristics of each reference object based on the single-shot multi-frame detector module of the tensor flow algorithm and the convolutional neural network algorithm of deep learning. When the face in the environmental object features matches the face features of each of the reference objects, the processing module controls the sound output module to prompt the orientation with the face features in line with the reference object, and the user's distance. The distance between them and the corresponding face data.

The voice assistance method for the visually impaired as described in Claim 4, further comprising the following steps: receiving an inquiry message from the user; and generating the auxiliary voice message according to the inquiry message and the real-time auxiliary information, and controlling the sound output mode The system sends out the auxiliary voice message.

The voice assistance method for the visually impaired as described in claim 4, further comprising the following steps: judging whether the real-time environment image contains a mobile barcode (QR code) information; if so, connecting to an application server according to the mobile barcode information, to read corresponding item information; and generate the auxiliary voice message according to the item information, so as to control the sound output module to send out the auxiliary voice message.