CN113539252A

CN113539252A - Barrier-free intelligent voice system and control method thereof

Info

Publication number: CN113539252A
Application number: CN202010320575.7A
Authority: CN
Inventors: 庄连豪
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2021-10-22

Abstract

The invention provides a barrier-free intelligent voice system and a control method thereof, which can recognize a plurality of words which can be used as independent semantic units from a voice frequency, can continuously judge whether the words are one of a plurality of voice labels created by a user, can further distinguish the voice labels to represent the name, code, single or combined instruction, program, voice information, recording information and the like of a certain object or information, can further judge a target object, a program instruction and remark description pointed by the semantic label in a database according to the successfully compared voice label combination, and can start a corresponding program or trigger a remote device to actuate according to the target object, the program instruction and the remark description, and can be used as an AI intelligent voice processing engine, can effectively reduce the operation amount and increase the processing speed of the system by enabling the user to define different types of voice label combinations by himself, meanwhile, the effects of confidentiality, theft prevention, barrier-free use and no limitation of language types can be achieved.

Description

Barrier-free intelligent voice system and control method thereof

Technical Field

The present invention relates to audio recognition technology, and more particularly to a barrier-free intelligent audio system and a control method thereof, which can recognize a plurality of independent semantic units from an audio, continuously compare whether the recognized semantic units are one of a plurality of audio tags created by a user, and further determine a voice command (also called a voice code) corresponding to the audio according to the combination of the compared audio tags (which can represent the name, code, single or combined command, program, voice message, recording message, etc. of a target object or information) to start the corresponding program or trigger other controlled devices to operate.

Background

With the development of technology, mobile devices with speech recognition systems are becoming popular, and most speech recognition systems allow users to communicate with mobile devices directly in Natural Language (Natural Language) through Language understanding technology, for example, the user may issue a continuous voice command to the mobile device to "buy a airline ticket to tokyo on saturday, huahang", however, to achieve the level of spoken language understanding (speech understating), the speech recognition system needs to perform a syntax analysis (e.g. syntax analyzer) and a semantic interpretation (e.g. semantic analyzer) on the continuous speech command, after the part of speech tagging is performed on each word of the continuous voice command, the word stem is extracted, the structure tree is formed, and each vocabulary of the structure tree is endowed with a semantic role, the semantic meaning of the whole sentence voice command is analyzed, and a larger operand can be generated.

Furthermore, the grammar structure of such continuous voice commands usually follows the restriction of specific grammar rules (including syntax and lexical), and the grammar structures of different languages are different, so that if the continuous voice commands issued by the user are complicated, even many superfluous words, slightly pause without speaking, or the grammar structure expressed by the user is not correct enough, or the voice commands may be different in individual accents or the user is used according to different single languages and mixed languages, the recognition accuracy of the voice recognition system may be affected, and the training of the natural language processing model (NLP) is also difficult.

Moreover, if the voiceprint recognition technology is not adopted, the existing voice recognition system cannot distinguish whether the user has the authority to perform a specific action or not from the voice of the user, so that how to provide an audio recognition technology which can reduce the operation amount of voice recognition, reduce the influence of a grammar structure on the voice recognition system, can be used without obstacles, can verify the use authority and has the characteristics of confidentiality and theft prevention for the voice recognition system which generally adopts the language understanding technology still remains a problem to be solved.

Disclosure of Invention

In order to achieve the above object, the present invention provides a control method for an intelligent barrier-free speech system, comprising:

(1) a step of analyzing voice audio: a voice recognition unit connected to a voice database and performing a voice analysis on a voice audio received by a voice receiving unit to identify a plurality of voices therefrom, and performing a word formation analysis on the plurality of voices to identify a plurality of words therefrom that can be used as independent semantic units;

(2) a comparison and voice labeling step: the voice recognition unit is connected to a label database to judge whether a plurality of words are one of a plurality of target voice labels defined by a mobile device and whether the words are one of a plurality of instruction voice labels defined by the mobile device;

(3) a step of executing corresponding voice commands: and a processor of the mobile device enables the mobile device to execute the program instruction on the target object according to the target object pointed by the compared target voice label in the label database and the program instruction pointed by the compared instruction voice label in the label database.

To achieve the above objective, the present invention provides an intelligent barrier-free speech system suitable for a mobile device having a processor, the system comprising: a voice receiving unit, which is connected with the processor information and is used for receiving a voice audio; the communication unit is in information connection with the processor; a voice database, which stores a plurality of voice audio samples; a label database, which stores a plurality of target voice labels and a plurality of instruction voice labels; a voice recognition unit which is respectively connected with the communication unit, the voice database and the label database, and is used for receiving the voice audio sent by the voice receiving unit, performing a voice analysis on the voice audio, identifying a plurality of voices from the result based on the result of reading the voice database, and performing a word formation analysis on the plurality of voices to identify a plurality of independent words from the result; the voice recognition unit is also used for judging whether the plurality of words are one of a plurality of target voice labels defined by the mobile device or not and whether the plurality of words are one of a plurality of instruction voice labels defined by the mobile device or not based on the result of reading the label database; if the judgment result is in accordance with the target object, the voice recognition unit enables the processor to enable the mobile device to execute the program instruction on the target object according to the target object pointed by the compared target voice tag in the tag database and the program instruction pointed by the compared instruction voice tag in the tag database through the communication unit.

Further, the method also comprises a step of detecting the awakening voice: the voice recognition unit judges whether the voice receiving unit receives a predefined awakening audio, if so, the awakening audio is regarded as awakening operation, and the step of analyzing the audio is continuously executed on the voice audio.

Further, when the step of comparing the voice tags is performed, the voice recognition unit determines whether the plurality of words identified by the voice audio also include a remark voice tag defined by the mobile device, and if so, the processor adjusts the program instruction or the content of the target object according to a remark description pointed by the remark voice tag in the tag database.

Further, the method also comprises a step of verifying the use authority: and the permission verification unit judges a grade permission corresponding to the voice audio based on the compared target voice label and the compared instruction voice label so as to determine whether the mobile device can execute the program instruction based on the current grade permission when the processor executes the corresponding voice instruction step.

Further, the method also comprises a step of detecting the dormant voice: the voice recognition unit judges whether the voice receiving unit receives a predefined dormant audio or not, if so, the dormant audio is regarded as dormant operation, and the step of analyzing the voice audio is stopped.

The invention also provides a barrier-free intelligent voice system, comprising:

a voice receiving unit, which is connected with a processor information of a mobile device for receiving a voice audio;

a communication unit in information connection with the processor;

a voice database storing a plurality of voice audio samples;

a label database for storing a plurality of target voice labels and a plurality of instruction voice labels;

a voice recognition unit, which is respectively connected with the communication unit, the voice database and the label database for receiving the voice audio sent by the voice receiving unit, executing a voice analysis to the voice audio, identifying a plurality of voices based on the result of reading the voice database, and then executing a word formation analysis to the plurality of voices to identify a plurality of independent words;

the voice recognition unit is also used for judging whether the words are one of the target voice labels defined by the mobile device or not and whether the words are one of the instruction voice labels defined by the mobile device or not based on the result of reading the label database; and the voice recognition unit is also used for enabling the processor to enable the mobile device to execute the program instruction on the target object according to a target object pointed by the compared target voice label in the label database and a program instruction pointed by the compared instruction voice label in the label database through the communication unit.

Furthermore, the voice recognition unit is also used for judging whether the voice receiving unit receives a predefined awakening audio and a predefined sleeping audio, if so, the voice analysis and the word formation analysis are continuously executed on the voice audio, and if so, the voice analysis and the word formation analysis are stopped being executed on the voice audio.

Further, the voice recognition unit is also used for judging whether the words recognized by the voice audio also include a remark voice tag defined by the mobile device or not based on the result of reading the tag database, and if so, the processor adjusts the program instruction or the content of the target object according to a remark description pointed by the remark voice tag in the tag database.

Furthermore, the system also comprises an authority verification unit which is in information connection with the voice recognition unit and used for judging a grade authority corresponding to the voice audio based on the compared target voice label and the compared instruction voice label so as to enable the voice recognition unit to determine the processor and determine whether to execute the program instruction based on the current grade authority of the mobile device.

The present invention also provides a barrier-free intelligent voice system, which is suitable for a mobile device having a processor, and is characterized by comprising:

a voice receiving unit connected with the processor information for receiving a voice audio;

a voice database storing a plurality of voice audio samples;

a voice recognition unit, which is respectively connected with the voice database and the label database information, and is used for receiving the voice audio sent by the voice receiving unit, executing a voice analysis on the voice audio, identifying a plurality of voices based on the result of reading the voice database, and then executing a word formation analysis on the plurality of voices so as to identify a plurality of independent words;

the voice recognition unit is also used for judging whether the words are one of a plurality of target voice labels defined by the mobile device or not and whether the words are one of a plurality of instruction voice labels defined by the mobile device or not based on the result of reading the label database; and if the judgment result is that the target object is matched with the target object, the voice recognition unit enables the processor to enable the mobile device to execute the program instruction on the target object according to a target object pointed by the compared target voice label in the label database and a program instruction pointed by the compared instruction voice label in the label database.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a block diagram of an intelligent speech system according to the present invention.

Fig. 2 is a schematic information flow diagram of the intelligent speech system of the present invention.

Fig. 3 is a flowchart of a control method of the intelligent voice system of the present invention.

FIG. 4 is a schematic diagram of a step of analyzing a voice audio according to the present invention.

FIG. 5A is a schematic diagram of an implementation of the present invention.

FIG. 5B is a schematic diagram of an implementation of the present invention.

Fig. 6A is a schematic information flow diagram according to another embodiment (a) of the present invention.

Fig. 6B is a schematic diagram of an implementation scenario of another embodiment (i) of the present invention.

FIG. 6C is a schematic diagram (II) illustrating another embodiment (I)

Fig. 7 is a block diagram of another embodiment (two) of the present invention.

Fig. 8 is a flowchart of a method according to another embodiment (c).

Fig. 9 is a block diagram illustrating another embodiment (four) of the present invention.

Fig. 10 is a schematic diagram of an implementation scenario of another embodiment (five) of the present invention.

Description of the reference numerals

10 accessible intelligent voice system

101 mobile device 1011 voice receiving unit

1012 processor

1013 communication unit

102 servo 1021 voice database

1022 tag database

1023 speech recognition unit

1024 right verification unit

103 controlled device

V-voice audio

L1 target Voice tag O target object

L2 instruction Voice tag I program instruction

L3 remark Voice tag R remark description

N network

Control method of S barrier-free intelligent voice system

S5 detecting a wake-up voice

S10 analyzing voice audio

S20 comparison voice tag

S25 verifying the usage rights

S30 executing the corresponding voice command

S35 detects a dormant voice.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Referring to fig. 1, which is a block diagram of an intelligent speech system according to the present invention, and referring to fig. 2, an information flow diagram of the intelligent speech system 10 of the present embodiment includes:

(1) a voice receiving unit 1011 in communication with a processor 1012 of a mobile device 101 for receiving a voice audio V, wherein the voice receiving unit 1011 may be a microphone of the mobile device 101 or a wireless earphone (wireless earphone) connected to the mobile device 101 via wireless communication (e.g. bluetooth);

(2) a communication unit 1013 capable of being in communication with the processor 1012;

(3) a speech database 1021, which can store a plurality of speech audio samples, and each group of speech (phone) in each speech audio sample corresponds to a word (word) that can be an independent semantic unit, wherein the speech audio sample can be identified by a speech recognition unit 1023 according to various initials (Initial) and finals (Final) of a speech audio V to identify one or more syllables (syllables) in the speech audio sample, and the language of the speech audio sample can be chinese, english, south-minning, cantonese, japanese, korean, etc., but not limited thereto;

(4) a tag database 1022 storing a plurality of target voice tags L1 and a plurality of command voice tags L2;

(5) a voice recognition unit 1023, which is respectively in information connection with the communication unit 1013, the voice database 1021 and the tag database 1022, for receiving the voice audio V transmitted by the voice receiving unit 1011 through a network N, and performing a voice Analysis (telephonic Analysis) on the voice audio V to identify a plurality of voices (phones) based on the result of reading the voice database 1021, and then performing a morphological Analysis (morphological) on the plurality of voices to identify independent words (words) therefrom; preferably, after recognizing the above-mentioned multiple voices, the present invention can simultaneously perform a Phoneme Analysis (phonetic Analysis) on the voice audio V to recognize the Phoneme (Phoneme) of the voice in the language to which the voice belongs, so as to help the voice recognition unit 1023 to recognize the multiple voices as words because the Phoneme is a minimum voice unit with a semantic function;

(6) the voice recognition unit 1023 can also determine whether the words are one of a plurality of target voice tags L1 defined by the mobile device 101 and one of a plurality of command voice tags L2 defined by the mobile device 101 based on the results of reading the tag database 1022;

(7) the voice recognition unit 1023 can also make the processor 1012 of the mobile device 101 execute the program instruction I on the target object O according to a target object O pointed by the compared target voice tag L1 in the tag database 1022 and a program instruction I pointed by the compared instruction voice tag L2 in the tag database 1022 through the communication unit 1013;

(8) in this embodiment, the voice receiving unit 1011, the processor 1012 and the communication unit 1013 are all operable on the mobile device 101.

(9) In this embodiment, the voice database 1021, the tag database 1022 and the voice recognition unit 1023 can all run on a server 102, and the server 102 can also have a second processor, which is not described herein.

The target object O illustrated in fig. 1 may be, for example, a contact information (which may be a contact name, a contact code number, or a contact name) stored in the mobile device 101, a target object information (which may be a target object name, a target object code number, or a target object name), a trip information, a to-do item information, a to-do list information, a file address information, or a hyperlink, but not limited thereto, and may also be a controlled device (not shown) communicatively connected to the communication unit 1013 of the mobile device 101 through the network N.

The network N illustrated in fig. 1 may be, for example, a public or private network, such as a wireless network (e.g., 3G, 4G LTE, Wi-Fi), a wired network, a Local Area Network (LAN), a wide area network (WA), etc., but not limited thereto.

The Server 102 illustrated in fig. 1 may be, for example, an independent Server providing a connection service, a Virtual Machine (VM) installed and running in the Server, a Server running in the form of a Virtual Private Server (Virtual Private Server), a public cloud, a Private cloud, an edge device (edge device), or the like, but is not limited thereto.

The processor 1012 illustrated in fig. 1 may be, for example, a Central Processing Unit (CPU), a Microprocessor (MPU), a Microcontroller (MCU), an Application Processor (AP), an embedded processor, or an Application Specific Integrated Circuit (ASIC), but is not limited thereto.

For example, the voice database 1021 and the tag database 1022 illustrated in fig. 1 may be a physical database host, a cloud database, or stored in the server 102 in the form of a plurality of data tables (tables) as a relational database or a non-relational database, but not limited thereto.

Referring to fig. 3, which is a flowchart of a control method of an intelligent speech system according to the present invention, and referring to fig. 1-2, the barrier-free intelligent speech system 10 of the present embodiment is adapted to analyze a speech audio V, and includes a speech receiving unit 1011, a speech database 1021, a tag database 1022, a speech recognition unit 1023, and a processor 1012 of the mobile device 101, and the control method S of the barrier-free intelligent speech system includes the following steps:

(1) analyzing the speech audio (step S10): the voice recognition unit 1023 is connected to the voice database 1021, and performs a voice analysis on the voice audio V received by the voice receiving unit 1011 to recognize a plurality of voices therefrom, and further performs a word formation analysis on the plurality of voices to recognize a plurality of words (words) that can be used as independent semantic units, such as books, boys, etc. that can be independently formed into words, and referring to the schematic diagram of the voice audio analyzing step of fig. 4, as shown in fig. 4, the voice audio V received by the voice receiving unit 1011, whether recognized as a plurality of voices of "banker board hit", "banker board", "banker old", "banker board Call", "banker head hit" (minnan t 'au' -ke) "," banker head "or" banker head Call ", violates the grammar limit or not, can be further identified as a plurality of words consisting of "zhuang boss" and "make", zhuang cephalad "and" make ", zhuang lao zhang" and "make telephone", zhuang lao zhang and "Call", "zhuang cephalad" and "Call", or "zhuang nape lao" and "Call"; similarly, if the speech audio V is a plurality of speech sounds recognized as "banker boss photo-combination display", "party boss photo-combination display", "banker head photo-combination display", "party banker boss Show", "Show photo-combination boss", "banker head photo Show", or "party head photo Show", regardless of whether the actual expression violates the grammar rule, the speech audio V can be further recognized as a plurality of words composed of "banker boss", "combination" and "display", "banker", "year plate", "combination" and "Show", or "banker", "combination" and "Show", but this is merely an example and not a limitation;

(2) comparing the voice tags (step S20): referring to the schematic diagram of the step of analyzing the voice audio shown in fig. 4, as shown in fig. 4, the voice recognition unit 1023 can be connected to the tag database 1022, to determine whether the words are one of a plurality of target voice tags L1 defined by the mobile device 101, and whether it is one of the command voice tags L2 defined by the mobile device 101, more specifically, if the voice audio V is recognized as words consisting of "banker boss" + "make", "banker house" + "make", "banker boss" + "make Call", "banker boss" + "Call", "banker house" + "Call", then, in step S20, the voice recognition unit 1023 compares the target voice tag L1 "banker boss" and the instructed voice tag L2 "Call" or "hit", or comparing another target voice tag L1 "banker" and a command voice tag L2 "Call" or "hit"; similarly, if the speech audio V can be recognized as words consisting of "banker boss" + "matching" + "display", or "banker house" + "matching" + "display", or "banker boss" + "matching" + "Show", then the speech recognition unit 1023 can compare the target speech tag L1 "banker boss matching" with the comparison instruction speech tag L2 "display" or "Show", or compare another target speech tag L1 "banker boss matching" with the comparison instruction speech tag L2 "display" or "Show", so far, if the comparison results of the speech recognition unit 1023 for the target speech tag L1 and the instruction speech tag L2 are matching, the step S30 is continued, if not matching, the speech is not recognized, and the step S10 or step S20 can be executed again;

(3) executing the corresponding voice instruction (step S30): the processor 1012 causes the mobile device 101 to execute the program command I on the target object O according to a target object O pointed to by the compared target voice tag L1 in the tag database 1022 and a program command I pointed to by the compared command voice tag L2 in the tag database 1022; more specifically, if the voice audio V can be recognized as words consisting of "banker boss" + "make", "banker boss" + "make a Call" or "banker boss" + "Call", then the voice recognition unit 1023 in step S30 may determine whether it can correspond to the target object O "banker xiao' S contact phone" in the mobile device 101 according to the compared target voice tag L1 "banker boss", and may determine that it can correspond to the program instruction I "to execute a telephone Application (APP)" carried by the mobile device 101 on the target object according to the compared instruction voice tag L2 "Call" or "make", and execute this, as shown in the implementation scenario diagram (one) of fig. 5A; similarly, if the voice audio V can be recognized by words consisting of "zhuanggao" + "matching" + "display" or "zhuanggao" + "matching" + "Show", the voice recognition unit 1023 can determine whether it can correspond to the photo of the target object O "and the zhuanggao in the mobile device 101 according to the compared target voice tag L1" zhuanggao ", and can determine whether it can correspond to the program instruction I" to execute a photo detector program installed in the mobile device 101 "according to the compared instruction voice tag L2" Show "or" display ", and execute the program, as shown in the implementation scenario diagram (two) of fig. 5B.

Referring to fig. 6A to 6C, which are a schematic diagram of information flow and a schematic diagram of implementation situation (a) and (b) respectively, and referring to fig. 1 to 3, when the voice recognition unit 1023 of this embodiment executes step S20 (comparing the voice tags), it can determine whether the words recognized by the voice audio V also include a remark voice tag L3 defined by the mobile device 101 itself, if so, the processor 1012 of the mobile device 101 can adjust the content of the program instruction I or the target object O according to the remark voice tag L3 in a remark explanation R pointed by the tag database 1022, more specifically, if the voice audio V is exemplified by a plurality of words consisting of "banker" + "house" or "banker" + "Call" + "house", then the voice recognition unit 1023 executes step S20 (comparing the voice tags), the target voice tag L1 "banker boss" can be compared, the instruction voice tag L2 "hit" or "Call" can be compared, and the remark voice tag L3 "house" can be compared, whereby, when the voice recognition unit 1023 executes the step S30 (executes the corresponding voice instruction), the voice recognition unit 1023 can determine whether it can correspond to the target object O "shoal house contact phone" in the mobile device 101 according to the compared target voice tag L1 "shoal boss" and remark voice tag L3 "house", meanwhile, according to the compared command voice tag L2 "Call" or "Call", it can be determined that it can correspond to the program command I "execute a telephone Application (APP) carried by the mobile device 101 on the target object, and execute it, so that, that is, as shown in fig. 6B, that is, the embodiment shown in fig. 6B is an example in which the content of the target object O can be adjusted by the remark voice tag L3; similarly, if the word "banker boss" + "group" + "shows" + "notes" or "banker boss" + "group" + "Show" + "notes" is taken as an example, the voice recognition unit 1023 compares the target voice tag L1 "banker boss group" + "instructs the voice tag L2" shows "or" Show "and compares the note voice tag L3" notes "or" notes "in step S20 (compare voice tags), so that when the voice recognition unit 1023 executes step S30 (execute corresponding voice command), the voice recognition unit 1023 can determine whether the target object O" in the mobile device 101 can be corresponded to the photo "banker boss group" + "notes" or "notes" according to the compared target voice tag L1 "group" ", and can be compared to the voice tag L2" or "Show" 2 ", and according to the compared remark voice tag L3 "speak remarks" or "speak notes", determining that it can correspond to the program instruction I "execute a photo viewer program installed on the mobile device 101, and play or present remark information associated with the target object", and execute the program, so far, as shown in fig. 6C, that is, the embodiment shown in fig. 6C is an example in which the remark voice tag L3 can adjust the content of the program instruction I.

Please refer to fig. 7, which is a block diagram of another embodiment (two) of the present invention, the present embodiment is similar to the technologies illustrated in fig. 1-3, and the main difference is that the barrier-free intelligent speech system 10 of the present embodiment further includes an authority verification unit 1024, which stores a plurality of program instructions and is in information connection with the speech recognition unit 1023, the authority verification unit 1024 can determine a level authority corresponding to the speech audio V according to the result of the speech recognition unit 1023 reading the tag database 1022 based on the compared target speech tag L1 and the compared instruction speech tag L2, so that the speech recognition unit 1023 can determine whether the mobile device 101 can execute the program instruction I based on the current level authority; in other words, the control method S of the barrier-free smart speech system of the present embodiment may include the step of "verifying the usage right" (step S25): an authority verification unit 1024 determines a level authority corresponding to the voice audio V based on the compared target voice tag L1 and the compared command voice tag L2 according to the result of the tag database 1022 read by the voice recognition unit 1023, so as to determine whether the processor 1012 executes the program command I in step 30 (executing the corresponding voice command), for example, the authority verification unit 1024 determines the target voice tag L1 and the command voice tag L2 corresponding to the voice audio V, and the data table stored in the tag database 1022 belongs to the first level (highest level), so as to determine that the current level authority of the mobile device 101 is the owner, and when the level authority of the program command I is the first level, the mobile device 101 is determined to have the authority to execute the program command I, and the second level, the third level, the fourth level, and the fourth level, The level authority of the third level may be, for example, a family user who cannot execute the program instruction I belonging to the first level; if the permission verification unit 1024 determines that the target voice tag L1 and the instructed voice tag L2 corresponding to the voice audio V belong to the second hierarchy in the data table stored in the tag database 1022, and when the level permission of the program instruction I is the third hierarchy or the second hierarchy, it determines that the mobile device 101 currently has the permission to execute the program instruction I; in contrast, if the permission verification unit 1024 determines that the target voice tag L1 and the command voice tag L2 corresponding to the voice audio V belong to the third level in the data table stored in the tag database 1022, and when the level permission of the program command I is the second level, it may determine that the mobile device 101 does not currently have the permission to execute the program command I, and so on.

Please refer to fig. 8, which is a flowchart illustrating a method according to another embodiment (three), the present embodiment is similar to the techniques illustrated in fig. 1 to 3, and the main difference is that the control method S of the barrier-free intelligent speech system of the present embodiment may include a step of "detecting a wake-up speech" (step S5): the speech recognition unit 1023 first determines whether the speech receiving unit 1011 receives a predefined wake-up audio, such as "small white" or "secretary", and if so, regards the wake-up audio as a wake-up operation, and then continues to perform step S10 (analyzing the speech audio) on the speech audio V to perform speech analysis and word formation analysis on the speech audio V in a continuous manner; in addition, the control method S of the barrier-free intelligent speech system of the embodiment may further include a step of "detecting the dormant speech" (step S35): the speech recognition unit 1023 determines whether the speech receiving unit 1011 has received a predefined dormant audio, such as "closing the reminder" or "rest in secretary", and if so, the dormant audio may be regarded as a dormant operation, and stops performing the step S10 (analyzing the speech audio) on the speech audio V, i.e. suspending performing the speech analysis and the word formation analysis on the speech audio V, and only performing the step S5 continuously, but the step S35 may be performed after the step S5, the step S10 or the step S20 continuously, and the sequence is not limited to the sequence illustrated in fig. 8.

Referring to fig. 9, which is a block diagram of another embodiment (four) of the present invention, compared to the technical solutions illustrated in fig. 1-3, in the barrier-free intelligent speech system 10 of the present embodiment, the speech database 1021, the tag database 1022, the speech recognition unit 1023 and the authorization verification unit 1024 can all be mounted in the mobile device 101, and the communication unit 1013 shown in fig. 1 can be omitted, so that the information of the speech audio V, the target object O, the program instruction I and the remark R can be transmitted and received between the processor 1012 and the speech recognition unit 1023 through the network N shown in fig. 1, in other words, the barrier-free intelligent speech system 10 of the present embodiment can allow the user to directly use the mobile device 101 even in the absence of a communication network, and then the recognition and execution of the speech command can be completed at present.

Referring to fig. 10, which is an implementation scenario diagram of another embodiment (five) of the present invention, and referring to fig. 1 to fig. 3 in combination, in the barrier-free intelligent speech system 10 of the present embodiment, when the speech recognition unit 1023 enables the processor 1012 to execute the program instruction I on the target object O pointed by the tag database 1022 according to the compared target speech tag L1 and the compared instruction speech tag L2 and the program instruction I pointed by the tag database 1022 according to the compared instruction speech tag L2, the target object O may be a controlled device 103, such as a power door, a lamp, a television, an electrical appliance, etc., other than the mobile device 101, for example, the user may operate the mobile device 101, and after the speech receiving unit 1011 receives the speech audio V, the speech audio V may be recognized by the speech recognition unit 1023 as being composed of an "on" (corresponding to the instruction speech tag L2) + "TVS" (corresponding to the target speech tag L1) For example, and no matter whether the grammar structure of the voice audio V is correct, the voice recognition unit 1023 may compare the target object O "tv TVS news stand" corresponding to the target voice tag L1 and the program instruction I "corresponding to the instruction voice tag L2 with the communication unit 1013 of the mobile device 101 to wirelessly turn on the smart tv.

In another embodiment of the present invention, a computer program product for executing the method S for controlling a barrier-free intelligent audio system is provided, wherein after a plurality of program instructions of the computer program product are loaded into a computer system, the steps S5, S10, S20, S25, S30 and S35 of the method S for controlling a barrier-free intelligent audio system are at least completed.

Accordingly, the present invention can achieve at least the following advantages:

(1) no matter whether the grammar rule of the voice audio is correct or not, the invention can identify whether the voice label combination (representing the name, code number, single or combined instruction, program, voice information, recording information and the like of a certain object or information) is created by a user by comparing with the voice label combination, and can identify whether the voice label combination corresponds to a specific target object and program instruction.

(2) The invention can judge the current level authority of the voice audio by comparing the voice label combination created by the user, has the technology of verification and similar to audio encryption, and can not drive the mobile device to execute the specific function or start the controlled device except the mobile device by sending the voice audio if a third party does not know the voice label combination created by the user and the voice audio which is not from the original user.

However, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention; any equivalent changes and modifications within the spirit and scope of the present invention should be covered by the protection scope of the present invention.

Claims

1. A control method of barrier-free intelligent voice system is disclosed, the barrier-free intelligent voice system is suitable for analyzing a voice audio, the barrier-free intelligent voice system includes a voice receiving unit, a voice database, a label database, a voice recognition unit, a permission verification unit and a processor of a mobile device, the control method includes:

a step of analyzing voice audio: the voice recognition unit is connected to the voice database, and executes voice analysis on the voice audio received by the voice receiving unit so as to recognize a plurality of voices from the voice audio, and then executes word formation analysis on the plurality of voices so as to recognize a plurality of words which can be used as independent semantic units from the voice audio;

a comparison and voice labeling step: the voice recognition unit is connected to the tag database to determine whether the words are one of a plurality of target voice tags defined by the mobile device and whether the words are one of a plurality of instruction voice tags defined by the mobile device; and

a step of executing corresponding voice commands: the processor enables the mobile device to execute the program instruction on a target object according to a target object pointed by the compared target voice tag in the tag database and according to a program instruction pointed by the compared instruction voice tag in the tag database.

2. The method of claim 1, further comprising a step of detecting the wake-up voice: the voice recognition unit judges whether the voice receiving unit receives a predefined awakening audio, if so, the awakening audio is regarded as awakening operation, and the step of analyzing the audio is continuously executed on the voice audio.

3. The method as claimed in claim 1, wherein when the comparing step is performed, the voice recognition unit determines whether the words identified by the voice audio also include a remark voice tag defined by the mobile device, and if so, the processor adjusts the program command or the content of the target object according to a remark description pointed by the remark voice tag in the tag database.

4. The method for controlling a barrier-free intelligent voice system of claim 1, further comprising a step of verifying the usage right: and the permission verification unit judges a grade permission corresponding to the voice audio based on the compared target voice label and the compared instruction voice label so as to determine whether the mobile device can execute the program instruction based on the current grade permission when the processor executes the corresponding voice instruction step.

5. The method for controlling an intelligent voice system without obstruction as claimed in claim 1 or 2, further comprising a step of detecting a dormant voice: the voice recognition unit judges whether the voice receiving unit receives a predefined dormant audio or not, if so, the dormant audio is regarded as dormant operation, and the step of analyzing the voice audio is stopped.

6. An intelligent barrier-free speech system, comprising:

a communication unit in information connection with the processor;

a voice database storing a plurality of voice audio samples;

the voice recognition unit is also used for judging whether the words are one of the target voice labels defined by the mobile device or not and whether the words are one of the instruction voice labels defined by the mobile device or not based on the result of reading the label database; and

the voice recognition unit is also used for enabling the processor to enable the mobile device to execute the program instruction on the target object according to a target object pointed by the compared target voice label in the label database and a program instruction pointed by the compared instruction voice label in the label database through the communication unit.

7. The barrier-free intelligent voice system of claim 6, wherein the voice recognition unit is also configured to determine whether the voice receiving unit receives a predefined wake-up audio and a predefined sleep audio, if so, then continue to perform the voice analysis and the word formation analysis on the voice audio, and if not, then stop performing the voice analysis and the word formation analysis on the voice audio.

8. The barrier-free intelligent speech system of claim 6, wherein the speech recognition unit is further configured to determine whether the words identified by the speech audio also include a remark speech tag defined by the mobile device based on the result of reading the tag database, and if so, enable the processor to adjust the program command or the content of the target object according to a remark description pointed to by the remark speech tag in the tag database.

9. The barrier-free intelligent voice system of claim 6, further comprising an authority verification unit in communication with the voice recognition unit for determining a level authority corresponding to the voice audio based on the compared target voice tag and the compared instruction voice tag, so that the voice recognition unit can determine whether to execute the program instruction based on the level authority of the mobile device.

10. An intelligent barrier-free speech system adapted for use with a mobile device having a processor, comprising:

a voice database storing a plurality of voice audio samples;

the voice recognition unit is also used for judging whether the words are one of a plurality of target voice labels defined by the mobile device or not and whether the words are one of a plurality of instruction voice labels defined by the mobile device or not based on the result of reading the label database; and

if the judgment result is that the target object is matched with the target object, the voice recognition unit enables the processor to enable the mobile device to execute the program instruction on the target object according to a target object pointed by the compared target voice label in the label database and a program instruction pointed by the compared instruction voice label in the label database.