CN107910003A

CN107910003A - A kind of voice interactive method and speech control system for smart machine

Info

Publication number: CN107910003A
Application number: CN201711407315.8A
Authority: CN
Inventors: 林树宏
Original assignee: Chi Tong (xiamen) Technology Co Ltd
Current assignee: Chi Tong (xiamen) Technology Co Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2018-04-13

Abstract

The invention discloses a kind of voice interactive method and speech control system for smart machine, pass through the task scene residing for collecting device, and differentiate whether the voice described in user forms gain source of sound, come whether decision device needs to perform the phonetic order that user wants to assign.Under this scenario, user, which can remove from, needs first to say specific the step of waking up word, need to only say specific content when needing to issue an order in special scenes, just smart machine can be made directly to perform voice command, is for a kind of intelligence, effective speech interactive mode.

Description

A kind of voice interactive method and speech control system for smart machine

Technical field

The present invention relates to the voice control field of smart machine, more particularly to a kind of interactive voice side for smart machine Method and speech control system.

Background technology

Voice control technology is all equipped with extensively on multiclass intelligent terminal.At present, the language of user and equipment room Sound interactive mode is mostly two-part interaction, that is, includes waking up interaction and content interaction.Such as " SIRI " of iPhone, user needs Wake-up word-" Hey, Siri set in advance is said to mobile microphone！", subsequent system enters SIRI interactive interfaces, listens to and uses The phonetic order content at family.

There are problems with for such interactive mode：(1) user needs first to say the wake-up word of corresponding speech control system, Waiting system enters content interaction mode so that user need said around interval time two sections of voices could will order transmit To speech control system, not enough intelligently；(2) inhomogeneity equipment room is there are a variety of different wake-up words on the market, such as Android class mobile phone Wake-up word be " OK, Google！", exacerbate the interface differentiating phenomenon in voice control field, and add user study into This, is unfavorable for standard integration；(3) in noisy environment or multi-person speech environment, system is difficult to distinguish whether user has said wake-up word, Causing to occur voice system can not wake up or the situation of false wake-up.

The content of the invention

It is an object of the invention to provide a kind of intelligence, effective speech control program, under this scenario, user can remove from The step of needing first to say specific wake-up word, solves above-mentioned technical problem.

To achieve the above object, the first aspect of the present invention provides a kind of voice interactive method for smart machine, Comprise the following steps：

Step S1：Phonetic entry is received, identifies the voice content of the phonetic entry；

Step S2：Extract the acoustical characteristic parameters of above-mentioned phonetic entry, and according to its differentiate this input voice whether structure Into gain source of sound；If being determined as forming gain source of sound, S3 is performed；

Step S3：Directly perform phonetic order corresponding with the voice content.

In one embodiment：Step A1 is also performed while step S1 is performed：Task scene residing for collecting device；

After step S1 and step A1 is performed, before performing step S3, following steps are also performed：

Step A2：Differentiate whether the voice content matches with above-mentioned task scene；

If the differentiation result of step S2 and step A2 is affirmative, step S3 is performed.

In one embodiment：The step A2 is performed before step S2, if the differentiation result of step A2 is affirmative, is performed Step S2.

In one embodiment：The step S2 includes the following steps：

Step S21：The characteristic parameter storehouse of component source of sound, this feature parameter library contain the default gain source of sound that can form The effective range of acoustical characteristic parameters；

Step S22：The vocal segments in phonetic entry are extracted, and therefrom extract its acoustical characteristic parameters；

Step S23：The acoustical characteristic parameters extracted are compared whether in the effective range of features described above parameter library, if In the range of, then it is determined as that the voice of this input forms gain source of sound, otherwise, is determined as not forming.

In one embodiment：The gain source of sound includes volume gain source of sound and/or quadrature gain source of sound；

When the gain source of sound is quadrature gain source of sound, language of the corresponding acoustical characteristic parameters for source of sound relative to equipment The input angle of sound input unit；

When the gain source of sound is volume gain source of sound, corresponding acoustical characteristic parameters are the volume of source of sound.

In one embodiment：The speech input device of equipment is microphone；

Multiple microphones are equipped with said device, to form microphone array, when microphone array receives phonetic entry When, by being sampled to voice, handling, the process such as calculating, obtain speech input device wheat of the input source of sound relative to equipment The input angle of gram wind array.

In one embodiment：Task scene in the step A1 corresponds to the task of processing needed for equipment；

The step A1 includes the following steps：

Step A11：Corresponding scene identifiers, structure scene identity storehouse are distributed being handled needed for equipment for task；

Step A12：When equipment starts a certain task, the scene identifiers of the corresponding task are exported；

Step A13：Identify the scene identifiers.

In one embodiment：The step A2 includes the following steps：

Step A21：Phonetic order collection is built, which is the available phonetic order under corresponding each task scene Set；

Step A22：Phonetic entry is converted into the voice content of device readable form, and by the voice of the readable form Hold and convert the false plan phonetic order identical with above-mentioned phonetic order form；

Step A23：All available phonetic orders under the task scene identified in extraction step A1, using step The false phonetic order of intending obtained in A22 compares one by one with above-mentioned available phonetic order；

Step A24：If phonetic order is intended in vacation covers available phonetic order under a certain task scene, terminate to compare It is right, and be determined as voice content and match with task scene, otherwise, it is determined as mismatching.

To achieve the above object, the second aspect of the present invention provides a kind of speech control system for smart machine, Including：Voice-input device, microprocessor；

The microprocessor is built-in with gain source of sound judgement unit, instruction execution unit and content recognition unit；It is described Content recognition unit connects voice-input device to identify the content of phonetic entry；

The gain source of sound judgement unit connects voice-input device, and can extract the acoustical characteristic parameters of phonetic entry, To differentiate whether the voice of input is formed gain source of sound；

Described instruction execution unit is connected respectively to content recognition unit and gain source of sound judgement unit, when gain source of sound When the differentiation result of judgement unit is certainly, described instruction execution unit performs the phonetic order that the corresponding voice content is answered.

In one embodiment：Storage device is further included, the storing device for storing has characteristic parameter storehouse, the characteristic parameter storehouse Contain the effective range of the default acoustical characteristic parameters that can form gain source of sound；

The gain source of sound judgement unit connects the characteristic parameter storehouse, and the acoustical characteristic parameters extracted described in comparison Whether in the effective range of features described above parameter library.

In one embodiment：The gain source of sound includes volume gain source of sound and/or quadrature gain source of sound；When the gain sound When source is quadrature gain source of sound, input angle of the corresponding acoustical characteristic parameters for source of sound relative to the speech input device of equipment Degree；When the gain source of sound is volume gain source of sound, corresponding acoustical characteristic parameters are the volume of source of sound.

In one embodiment：The voice-input device is multiple microphones, and forms microphone array；

When microphone array receives phonetic entry, by being sampled to voice, handling, the process such as calculating, acquisition is defeated Enter input angle of the source of sound relative to the speech input device microphone array of equipment, and output this to the gain source of sound and sentence Other unit；

The gain source of sound judgement unit further includes volume detecting unit, to detect the volume of phonetic entry.

In one embodiment：The microprocessor is also built-in with scene matching unit, and scene matching unit connection content is known Other unit, to differentiate whether voice content matches with the task scene residing for equipment；

Described instruction execution unit is also connected to the scene matching unit, when scene matching unit and gain source of sound differentiate When the differentiation result of unit is certainly, described instruction execution unit performs the phonetic order that the corresponding voice content is answered.

In one embodiment：Storage device is further included, the storing device for storing has scene identity storehouse, phonetic order collection；

The scene identity storehouse contain with equipment needed for the corresponding scene identifiers of task distribution that handle；The voice Instruction set is the set of the available phonetic order of corresponding each scene identifiers.

In one embodiment：Microprocessor further includes task processing unit, and for each task of processing equipment, it is connected to The scene identity storehouse of the storage device and scene matching unit；When equipment starts a certain task, the task processing unit The scene identifiers of the corresponding task are exported to scene matching unit.

In one embodiment：The scene matching unit connects the phonetic order collection of the storage device, in scene matching list After member receives the scene identifiers, the scene matching unit is extracted under the corresponding task scene according to the scene identifiers All available phonetic orders；

Phonetic entry is converted into the false plan phonetic order identical with above-mentioned phonetic order form by the content recognition unit, And output this to scene matching unit, scene matching unit by it is described it is false intend phonetic order and above-mentioned available phonetic order by One compares, to differentiate whether voice content matches with the task scene residing for equipment.

Compared to the prior art, the present invention has the advantage that：

Voice interactive method and speech control system provided by the invention, are based primarily upon the task scene conduct residing for equipment One of condition whether performed, in addition it is also necessary to take into account that the source of sound of phonetic entry needs to form gain source of sound, when both are satisfied by During condition, it is meant that the task scene make it that whether user has said suitable phonetic order under the scene, and user is at this Speak under specific scene to equipment, equipment should handle user what is said or talked about just now corresponding instruction at this time, thus, equipment is just The content of user speech input is directly performed, user is needless to say more once to wake up word.

So, a kind of intelligence, efficient interactive voice mode are not only realized, avoiding appearance can not wake up or miss The situation of wake-up, reduces user's study and the cost that uses, and this interactive voice mode can also Unified Generalization, be conducive to The resource consolidation of voice control industry.

Brief description of the drawings

Fig. 1 shown in embodiment one, the flow chart of voice interactive method；

Fig. 2 shown in embodiment two, the flow chart of voice interactive method；

Fig. 3 shown in embodiment three, the flow chart of voice interactive method；

Fig. 4 shows in example IV that the system of speech control system forms schematic diagram；

Fig. 5 shows in embodiment five that the system of speech control system forms schematic diagram；

Fig. 6 shows in embodiment six that the system of speech control system forms schematic diagram.

Embodiment

The present invention provides a kind of voice interactive method and speech control system for smart machine, below in conjunction with attached drawing The present invention is further illustrated with embodiment.What deserves to be explained is the smart machine of the present invention can be hand The terminal devices such as machine, tablet computer, computer, intelligent robot, but be not limited thereto.

Please also refer to Fig. 1, it illustrates the flow chart of voice interactive method in embodiment one, this method includes following step Suddenly：

Step S3：Directly perform phonetic order corresponding with the voice content.

The gain source of sound, it embodies attention rate of the user to equipment, when user improves equipment attention rate, he Phonetic entry to equipment is with regard to that can form gain source of sound, it means that and user wishes that equipment listens to the content that he speaks, and to it Handled, equipment should handle user what is said or talked about just now corresponding instruction at this time.On the user attention rate of representing The embodiment of gain source of sound, its principle may be embodied in following quadrature gain source of sound and volume gain source of sound.

Next refer to Fig. 2, it illustrates the flow chart of voice interactive method in embodiment two, embodiment two relative to Embodiment one difference lies in,

Step A1 is also performed while step S1 is performed：Task scene residing for collecting device；

By adding step A1 and A2, add and be used as whether perform voice command based on the task scene residing for equipment One of criterion.When gain source of sound differentiates and scene matching is satisfied by condition, it is meant that user is in suitable task Suitable phonetic order has been said under scene to equipment, so relative to embodiment one, has optimized system processing and the flow analyzed, Eliminate unnecessary redundant process steps so that system can more efficiently be run.

Next refer to Fig. 3, it illustrates the flow chart of voice interactive method in embodiment three, embodiment two relative to Difference lies in the gain source of sound of embodiment two differentiates embodiment one and scene matching differentiation carries out side by side, and this implementation In example, the differentiation of scene matching is carried out first, only when the result that scene gain source of sound differentiates to match, just carries out gain sound The differentiation in source.So, differentiate scene matching prior to gain source of sound to differentiate, the flow of system differentiation is simplified, into one Walk optimization system flow.

As：The step A2 is performed before step S2, if the differentiation result of step A2 is affirmative, performs step S2。

However, in the present embodiment, the priority that scene matching differentiates is differentiated that this is mainly in view of prior to gain source of sound Scene matching differentiate actual match degree it is higher, in other embodiment, can also by itself otherwise and go, make gain source of sound differentiate it is excellent Differentiate prior to scene matching.

Above-described embodiment has supplied such a interactive mode, which eliminates the wake-up step of the prior art, adopts With the interactive mode of one-part form.It is based primarily upon task scene residing for equipment as whether one of the condition performed, additionally It need to consider that the source of sound of phonetic entry needs to form gain source of sound, when both are satisfied by condition, it is meant that user is in the spy Speak under fixed scene to equipment, equipment just directly performs the content of user speech input, and user is needless to say more once to wake up word.

Specifically, user is represented to equipment attention rate for how to differentiate whether the voice of input is formed in the step S2 The gain source of sound of lifting, it includes the following steps：

Preferably, even if in more people or noisy environment, when user wishes to assign phonetic order to equipment, it is to equipment Attention rate can improve naturally, this may be embodied in volume and angle that he speaks equipment.In concrete scheme, the gain sound Source includes volume gain source of sound and/or quadrature gain source of sound；When the gain source of sound is quadrature gain source of sound, corresponding acoustics Input angle of the characteristic parameter for source of sound relative to the speech input device of equipment；When the gain source of sound is volume gain source of sound When, corresponding acoustical characteristic parameters are the volume of source of sound.In the present embodiment, the differentiation of gain source of sound preferably needs to meet defeated Enter two aspects of angle and volume, but in other embodiment, only meet one side.

Preferably, the speech input device of equipment is microphone；Multiple microphones are equipped with said device, to form wheat Gram wind array, when microphone array receives phonetic entry, by being sampled, handling to voice, process, the acquisition such as to calculate defeated Enter input angle of the source of sound relative to the speech input device microphone array of equipment.

In addition, for the method for acquisition tasks scene in the step A1, it specifically comprises the following steps：

Step A13：Identify the scene identifiers.

And for how to differentiate whether voice matches with scene in the step A2, realized by following steps：

For example, when equipment is in the task scene of multimedia, and user says " next ", speech control system Judge that user has said suitable correct matched phonetic order under the task scene, just directly perform the order of " next ".

It refer to Fig. 4-5 below, another aspect of the present invention additionally provides a kind of speech control system, and wherein Fig. 4 is shown Speech control system in example IV, it includes：Voice-input device, microprocessor.

By above structure, a kind of speech control system of the sound exchange method based on embodiment one is constructed, is carried for it Hardware support is supplied.By loading the system in equipment so that user is more intelligent to the interactive voice of equipment, efficiently Change.

Preferably, in the embodiment five shown in Fig. 5, speech control system further includes storage device, the storage device storage There is characteristic parameter storehouse, the characteristic parameter storehouse contains effective model of the default acoustical characteristic parameters that can form gain source of sound Enclose.

Specifically, the gain source of sound includes volume gain source of sound and/or quadrature gain source of sound；When the gain source of sound is During quadrature gain source of sound, input angle of the corresponding acoustical characteristic parameters for source of sound relative to the speech input device of equipment；When When the gain source of sound is volume gain source of sound, corresponding acoustical characteristic parameters are the volume of source of sound.

Preferably, the voice-input device is multiple microphones, and forms microphone array.When microphone array receives During phonetic entry, by being sampled to voice, handling, the process such as calculating, it is defeated relative to the voice of equipment to obtain input source of sound Enter the input angle of device microphone array, and output this to the gain source of sound judgement unit.

Further, the gain source of sound judgement unit further includes volume detecting unit, to detect the volume of phonetic entry Size.

Finally, Fig. 6 is refer to, it illustrates the speech control system in embodiment six, the system of embodiment six corresponds to The voice interactive method of embodiment two or embodiment three.The embodiment is also built-in with compared to embodiment five, the microprocessor Scene matching unit, the scene matching unit connection content recognition unit, with differentiate voice content whether with equipment residing for appoint Business scene matches.

In addition, described instruction execution unit is also connected to the scene matching unit, when scene matching unit and gain sound When the differentiation result of source judgement unit is certainly, described instruction execution unit performs the voice that the corresponding voice content is answered and refers to Order.

In the present embodiment, the storing device for storing has described in scene identity storehouse, phonetic order collection and embodiment five Characteristic parameter storehouse.The scene identity storehouse contain with equipment needed for the corresponding scene identifiers of task distribution that handle；Institute Predicate sound instruction set is the set of the available phonetic order of corresponding each scene identifiers.

In concrete structure, microprocessor further includes task processing unit, and for each task of processing equipment, it is connected to The scene identity storehouse of the storage device and scene matching unit.When equipment starts a certain task, the task processing unit The scene identifiers of the corresponding task are exported to scene matching unit.

The scene matching unit connects the phonetic order collection of the storage device, and the field is received in scene matching unit After scape identifier, the scene matching unit extracts all available languages under the corresponding task scene according to the scene identifiers Sound instructs.In addition, phonetic entry is converted into the false plan voice identical with above-mentioned phonetic order form by the content recognition unit Instruction, and scene matching unit is output this to, the vacation is intended phonetic order and above-mentioned available voice by scene matching unit Instruction compares one by one, to differentiate whether voice content matches with the task scene residing for equipment.

In this way, pass through scene identity storehouse, the cooperation of phonetic order collection, task processing unit, content recognition unit so that field Scape matching unit can differentiate whether the content of phonetic entry matches with the task scene residing for equipment.Sentence in conjunction with gain source of sound Other unit so that instruction execution unit is according to the differentiation of scene matching unit and gain source of sound judgement unit as a result, choosing whether Perform user command.

The foregoing is merely the preferred embodiment of the present invention, not thereby limits its scope of the claims, every to utilize the present invention The equivalent structure transformation that specification and accompanying drawing content are made, is directly or indirectly used in other related technical areas, similarly It is included within the scope of the present invention.

Claims

1. a kind of voice interactive method for smart machine, it is characterised in that comprise the following steps：

Step S2：The acoustical characteristic parameters of above-mentioned phonetic entry are extracted, and differentiate whether the voice of this input forms increasing according to it Beneficial source of sound；If being determined as forming gain source of sound, S3 is performed；

Step S3：Directly perform phonetic order corresponding with the voice content.

2. a kind of voice interactive method for smart machine as claimed in claim 1, it is characterised in that performing step S1 While also perform step A1：Task scene residing for collecting device；

3. a kind of voice interactive method for smart machine as claimed in claim 2, it is characterised in that the step A2 exists Performed before step S2, if the differentiation result of step A2 is affirmative, perform step S2.

A kind of 4. voice interactive method for smart machine as claimed in claim 1, it is characterised in that the step S2 bags Include following steps：

Step S21：The characteristic parameter storehouse of component source of sound, this feature parameter library contain the default acoustics that can form gain source of sound The effective range of characteristic parameter；

Step S23：The acoustical characteristic parameters extracted are compared whether in the effective range of features described above parameter library, if in scope It is interior, then it is determined as that the voice of this input forms gain source of sound, otherwise, is determined as not forming.

A kind of 5. voice interactive method for smart machine as claimed in claim 4, it is characterised in that：The gain source of sound Include volume gain source of sound and/or quadrature gain source of sound；

When the gain source of sound is quadrature gain source of sound, corresponding acoustical characteristic parameters are defeated relative to the voice of equipment for source of sound Enter the input angle of device；

A kind of 6. voice interactive method for smart machine as claimed in claim 5, it is characterised in that：The voice of equipment is defeated It is microphone to enter device；

Multiple microphones are equipped with said device, to form microphone array, when microphone array receives phonetic entry, are led to Cross and the process such as be sampled, handle, calculating to voice, obtain speech input device microphone array of the input source of sound relative to equipment The input angle of row.

7. a kind of voice interactive method for smart machine as claimed in claim 2, it is characterised in that in the step A1 Task scene correspond to equipment needed for processing task；

The step A1 includes the following steps：

Step A13：Identify the scene identifiers.

A kind of 8. voice interactive method for smart machine as claimed in claim 2, it is characterised in that the step A2 bags Include following steps：

Step A21：Phonetic order collection is built, which is the collection of the available phonetic order under corresponding each task scene Close；

Step A22：Phonetic entry is converted into the voice content of device readable form, and the voice content of the readable form is turned Change the false plan phonetic order identical with above-mentioned phonetic order form；

Step A23：All available phonetic orders under the task scene identified in extraction step A1, using in step A22 Obtained false phonetic order of intending compares one by one with above-mentioned available phonetic order；

Step A24：If phonetic order is intended in vacation covers available phonetic order under a certain task scene, terminate to compare, and It is determined as voice content with task scene to match, otherwise, is determined as mismatching.

A kind of 9. speech control system for smart machine, it is characterised in that including：Voice-input device, microprocessor；

The microprocessor is built-in with gain source of sound judgement unit, instruction execution unit and content recognition unit；The content Recognition unit connects voice-input device to identify the content of phonetic entry；

The gain source of sound judgement unit connects voice-input device, and can extract the acoustical characteristic parameters of phonetic entry, to sentence Whether the voice not inputted forms gain source of sound；

Described instruction execution unit is connected respectively to content recognition unit and gain source of sound judgement unit, when gain source of sound differentiates When the differentiation result of unit is certainly, described instruction execution unit performs the phonetic order that the corresponding voice content is answered.

A kind of 10. speech control system for smart machine as claimed in claim 9, it is characterised in that：Further include storage Device, the storing device for storing have a characteristic parameter storehouse, and the characteristic parameter storehouse contains the default gain source of sound that can form The effective range of acoustical characteristic parameters；

The gain source of sound judgement unit connects the characteristic parameter storehouse, and the acoustical characteristic parameters extracted described in comparing whether In the effective range of features described above parameter library.

A kind of 11. speech control system for smart machine as claimed in claim 10, it is characterised in that：The gain sound Source includes volume gain source of sound and/or quadrature gain source of sound；When the gain source of sound is quadrature gain source of sound, corresponding acoustics Input angle of the characteristic parameter for source of sound relative to the speech input device of equipment；When the gain source of sound is volume gain source of sound When, corresponding acoustical characteristic parameters are the volume of source of sound.

A kind of 12. speech control system for smart machine as claimed in claim 11, it is characterised in that：The voice is defeated It is multiple microphones to enter equipment, and forms microphone array；

When microphone array receives phonetic entry, by being sampled to voice, handling, the process such as calculating, obtain and input sound Source relative to the speech input device microphone array of equipment input angle, and output this to the gain source of sound differentiate it is single Member；

A kind of 13. speech control system for smart machine as claimed in claim 9, it is characterised in that：

The microprocessor is also built-in with scene matching unit, scene matching unit connection content recognition unit, to differentiate language Whether sound content matches with the task scene residing for equipment；

Described instruction execution unit is also connected to the scene matching unit, when scene matching unit and gain source of sound judgement unit Differentiation result when being certainly, described instruction execution unit performs the phonetic order that the corresponding voice content is answered.

A kind of 14. speech control system for smart machine as claimed in claim 13, it is characterised in that：Further include storage Device, the storing device for storing have scene identity storehouse, phonetic order collection；

The scene identity storehouse contain with equipment needed for the corresponding scene identifiers of task distribution that handle；The phonetic order Integrate the set of the available phonetic order as each scene identifiers of correspondence.

A kind of 15. speech control system for smart machine as claimed in claim 14, it is characterised in that：Microprocessor is also Including task processing unit, for each task of processing equipment, it is connected to the scene identity storehouse of the storage device and field Scape matching unit；When equipment starts a certain task, the task processing unit exports the corresponding task to scene matching unit Scene identifiers.

A kind of 16. speech control system for smart machine as claimed in claim 15, it is characterised in that：The scene The phonetic order collection of the storage device is connected with unit, after scene matching unit receives the scene identifiers, the field Scape matching unit extracts all available phonetic orders under the corresponding task scene according to the scene identifiers；

Phonetic entry is converted into the false plan phonetic order identical with above-mentioned phonetic order form by the content recognition unit, and will It is exported to scene matching unit, scene matching unit compares the false phonetic order of intending with above-mentioned available phonetic order one by one It is right, to differentiate whether voice content matches with the task scene residing for equipment.