CN113160810A

CN113160810A - LD 3320-based voice recognition interaction method and system

Info

Publication number: CN113160810A
Application number: CN202110042343.4A
Authority: CN
Inventors: 金宇; 丁新涛; 章智强; 刘朝; 王万军; 陈波
Original assignee: Anhui Normal University
Current assignee: Anhui Normal University
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2021-07-23

Abstract

One or more embodiments of the present specification provide a LD 3320-based voice recognition interaction method and system, the received voice is analyzed, voice characteristics are extracted and matched with the primary instruction keywords, secondary instruction keywords related to the primary instruction are set as candidate keywords after matching is successful, the received voice in set time is analyzed again, voice characteristics are extracted, matching with the candidate keywords, outputting voice recognition content after successful matching, and performing feedback broadcast according to the voice recognition content to complete voice recognition interaction, the method can reduce the calculation amount when matching the keywords, reduce the energy consumption, can identify the voice of unspecified persons without the need of recording training, has higher universality, and noise interference such as noise, natural sound, etc. can be reduced, and the speech recognition effect with high accuracy and practicality is achieved.

Description

LD 3320-based voice recognition interaction method and system

Technical Field

One or more embodiments of the present disclosure relate to the technical field of single-chip microcomputers, and in particular, to a voice recognition interaction method and system based on LD 3320.

Background

The language is the most direct, natural, effective and convenient communication mode in people's life, so the research and design of the voice control device can bring great convenience to production and life.

In the prior art, voice recognition interaction causes the reduction of recognition efficiency due to the inelegability of noise, natural sound, noise emitted by surrounding people and the like in real life, is not beneficial to recognizing accurate and correct contents in the using process, and has lower voice recognition accuracy.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure are directed to a method and a system for voice recognition interaction based on LD3320, so as to solve the problem of low accuracy of voice recognition.

In view of the above, one or more embodiments of the present specification provide a LD 3320-based voice recognition interaction method, including:

analyzing the received voice, and extracting voice characteristics;

matching preset primary instruction keywords, and if the matching is successful, taking secondary instruction keywords associated with the primary instruction as candidate keywords;

analyzing the voice received within the set time after the matching is successful, and extracting the voice characteristics;

matching the candidate keywords, and if the matching is successful, outputting the successfully matched keywords as voice recognition content;

and selecting preset feedback contents matched with the voice recognition contents for broadcasting according to the voice recognition contents.

Preferably, the method further comprises: and after the primary instruction keywords are successfully matched, broadcasting voice prompt feedback.

Preferably, the number of the primary instruction keywords is set to be plural, and each of the primary instruction keywords is associated with plural secondary instruction keywords.

Preferably, when the primary instruction keywords and the secondary instruction keywords are pre-selected and set, the phonetic table satisfies the following conditions in a manner of setting the phonetic table: each pinyin corresponds to a unique coded character, and the number of repeated pinyin strings cannot appear.

Preferably, when the matching with the preset primary instruction keyword is successful, if the preset control instruction keyword is matched within the set time, executing the corresponding operation according to the control instruction keyword.

Preferably, the primary instruction keywords and the secondary instruction keywords are modifiable by the user.

An LD 3320-based voice recognition interaction system, comprising:

the voice acquisition and recognition module is used for analyzing the received voice, extracting voice characteristics, matching preset primary instruction keywords, taking secondary instruction keywords associated with the primary instruction as candidate keywords if the matching is successful, analyzing the voice received within set time after the matching is successful, extracting the voice characteristics, matching the candidate keywords, and outputting the successfully matched keywords as voice recognition contents if the matching is successful;

the voice synthesis module is used for selecting preset feedback contents matched with the voice recognition contents according to the voice recognition contents and broadcasting the feedback contents;

and the relay bottom plate is used for connecting the voice acquisition and recognition module with the voice synthesis module to realize the communication between the voice acquisition and recognition module and the voice synthesis module.

Preferably, the voice collecting and recognizing module comprises an LD3320 voice recognizing chip.

Preferably, the voice synthesis module comprises an MR628 voice synthesis chip, and a USB to TTL chip is further disposed on the relay substrate.

Preferably, the relay base plate further comprises an indicator light group, and the indicator light group is used for prompting when the voice acquisition and recognition module is matched with the successful secondary instruction.

As can be seen from the above, the LD 3320-based voice recognition interaction method and system provided in one or more embodiments of the present disclosure analyze received voice, extract voice features, match the voice features with the primary instruction keywords, set the secondary instruction keywords associated with the primary instruction as candidate keywords after matching is successful, analyze the received voice again within a set time, extract the voice features, match the candidate keywords, output voice recognition contents after matching is successful, perform feedback broadcast according to the voice recognition contents, complete voice recognition interaction, reduce the computation workload when matching keywords, reduce energy consumption, recognize unspecified human voice, do not require recording training, have high versatility, and reduce noise, natural sounds and other noise interferences, has high accuracy and practical speech recognition effect.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

Fig. 1 is a flow diagram illustrating a LD 3320-based voice recognition interaction method according to one or more embodiments of the present disclosure;

FIG. 2 is a schematic diagram of an LD 3320-based speech recognition interaction system in accordance with one or more embodiments of the present description;

FIG. 3 is a diagram of an LD3320 speech recognition chip according to one or more embodiments of the present disclosure;

FIG. 4 is a schematic diagram of a relay backplane according to one or more embodiments of the present disclosure;

FIG. 5 is a schematic diagram of a CH340C chip in a relay backplane of one or more embodiments of the present disclosure;

fig. 6 is a schematic diagram of a relay element in a relay backplane of one or more embodiments of the present disclosure;

FIG. 7 is a schematic diagram of an LD3320 speech recognition chip interface and an MR628 speech synthesis chip interface according to one or more embodiments of the present description;

FIG. 8 is a USB square socket diagram in accordance with one or more embodiments of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure is further described in detail below with reference to specific embodiments.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

One or more embodiments of the present specification provide a LD 3320-based voice recognition interaction method, as shown in fig. 1, including the following steps:

s101, analyzing the received voice and extracting voice characteristics;

s102, matching preset primary instruction keywords, and if the matching is successful, taking secondary instruction keywords associated with the primary instruction as candidate keywords;

s103, analyzing the voice received within the set time after the matching is successful, and extracting voice features;

s104, matching the candidate keywords, and if the matching is successful, outputting the successfully matched keywords as voice recognition contents;

and S105, according to the voice recognition content, selecting preset feedback content matched with the voice recognition content for broadcasting.

According to the LD 3320-based voice recognition interaction method provided by the specification, received voice is analyzed, voice features are extracted and matched with the primary instruction keywords, secondary instruction keywords associated with the primary instruction are set as candidate keywords after matching is successful, the received voice within set time is analyzed again, the voice features are extracted and matched with the candidate keywords, voice recognition contents are output after matching is successful, feedback broadcasting is carried out according to the voice recognition contents, and voice recognition interaction is completed.

As an embodiment, the method further comprises: after the first-level instruction keywords are successfully matched, voice prompt feedback is broadcasted, so that a user is prompted to enter the identification process of the second-level instruction keywords at the moment, and the second-level instruction keywords can be spoken.

As an implementation manner, the number of the primary instruction keywords is set to be multiple, each primary instruction keyword is associated with multiple secondary instruction keywords, for example, 4 primary instruction keywords are set, each primary instruction keyword is associated with 10 secondary instruction keywords, and after one of the primary instruction keywords is successfully matched, only the extracted voice features need to be matched with the associated 10 secondary instruction keywords, so that the accuracy of voice recognition is improved.

As an implementation mode, when the primary instruction keywords and the secondary instruction keywords are pre-selected and set, the phonetic table satisfies the following conditions in a mode of setting the phonetic table: each pinyin corresponds to a unique code character, the number of the corresponding pinyin string may be different, but no duplicate number of the pinyin string may occur, e.g., number 1 may have two corresponding strings, but if number 2 is occupied, no more number 2 may be present.

As an implementation manner, when matching with the preset primary instruction keyword is successful, if the preset control instruction keyword is matched within the set time, executing an operation corresponding to the control instruction keyword according to the control instruction keyword, for example, the control instruction keyword includes "return", and if the extracted voice feature is successfully matched with the return, executing the return operation, and performing matching of the primary instruction keyword again.

As an implementation manner, the primary instruction keyword, the secondary instruction keyword, and the control instruction keyword may be modified by a user, for example, only the recognized keyword needs to be transmitted into the system in the form of a character string, and the recognized keyword may be immediately effective in the next recognition, for example, in programming, the content of the recognized keyword such as "hello" is dynamically transmitted into the system simply by setting a register of a chip, and the system may recognize the set keyword, and besides, the keyword list editing is very convenient.

The present specification further provides a LD 3320-based voice recognition interactive system, as shown in fig. 2, including:

For example, the voice collecting and recognizing module includes an LD3320 voice recognizing chip, as shown in fig. 3, and is implemented by using a recognition technology of "keyword list" of the LD3320 voice recognizing chip. The LD3320 speech recognition chip integrates a speech recognition processor and an external circuit, and also comprises a microphone interface, a sound output interface and the like, the chip can recognize at most 50 candidate sentences each time during operation, and in practical use, a user only needs to store the recognized key words in the chip in the form of character strings, so that the recognition can be immediately effective.

The voice synthesis module comprises an MR628 voice synthesis chip, and the voice synthesis is realized by adopting the function of automatically converting text to voice of the MR 628. MR628 uses a serial port communication mode to send instructions to realize text-to-speech conversion, and simultaneously supports Chinese, English (reading according to letters) and numerical reading, the text amount synthesized each time can reach up to 250 bytes, and text analysis and speech playing can be simultaneously carried out, so that continuous gapless speech synthesis is realized. A built-in audio power amplifier can directly drive a loudspeaker of 0.5W 8R or 3W 4R, and an LD3320 interface and an MR628 interface are shown in figure 7.

The relay backplane is shown in fig. 4 and includes a USB square port, as shown in fig. 8, a speech recognition module interface, a speech synthesis module interface, and a USB to TTL chip: CH340C, as shown in fig. 5, and a four-way relay, as shown in fig. 6.

The USB square port is used for supplying power and burning programs after being connected with a computer; the voice recognition module interface is used for connecting the LD3320 voice recognition chip and realizing communication with the main body and returning recognized voice data; the voice synthesis module interface is used for connecting the MR628 voice synthesis chip and transmitting information at the same time, thereby providing convenience for realizing voice interaction.

Because the USB end of the computer is the USB level, the signal that the singlechip needs to receive is the TTL level, and the level of both is different, can't carry out direct communication, need through the transmission that the conversion just can realize mutual signal. The CH340C module is an advantageous tool for realizing the conversion between USB level and TTL level.

The relay has the functions of a control system and a controlled system, is an 'automatic switch' for controlling larger current by using smaller current, and plays the roles of automatic adjustment, safety protection and circuit conversion in the circuit. The relay bottom plate realizes the integration of each module, and facilitates the communication and connection among the modules.

When the circuit of the system is connected, the relay bottom plate integrates the interfaces of all modules, and is the most main circuit template of the whole system. The bottom plate of the relay is connected with a capacitor, so that the power supply stability of the system is improved. Three male ports (needle type) on the left side of the relay bottom plate are used for connecting the MR628 module, are sequentially provided with GND, TXD and VCC from top to bottom and are respectively connected with the MR628 voice synthesis chip black line (grounding), the yellow line (connecting the singlechip) and the red line (connecting the 5V power supply); two rows of female openings (pass) in relay bottom plate right side are used for connecting LD3320 speech recognition chip, only need with LD3320 module bottom insert the female opening of bottom plate can. As described above, the circuit connection is completed.

The software design flow of the system is as follows:

(1) the system can set whether the system carries a primary instruction or not. In a system without a primary instruction, each module is in a running state all the time, waits for recognition of proper or set voice, and in real life, the recognition efficiency is reduced due to the inelegability of noise, natural sound, noise emitted by surrounding people and the like, so that the accurate and correct content is not recognized in the use process, and the system is improved to be provided with the primary instruction. The voice recognition with the first-level instruction needs to preset a starting instruction. When the voice recognition module recognizes a preset primary instruction, the later recognition program can be started. Taking garbage classification as an example, when a primary instruction, namely 'garbage classification', is spoken to a microphone and successfully recognized, a 'hello' voice feedback is given, then the type of garbage can be continuously inquired and an answer is obtained, meanwhile, a cyclic waiting state is entered, a next recognition of the primary instruction or a closing command is waited, secondary instruction keywords related to 'garbage classification' comprise 'leftovers', 'peels', 'tissues', and the like, and the feedback voice is 'kitchen garbage', 'other garbage', and the like in sequence.

(2) Recognizing phrase writing

When writing the recognized phrase into the LD3320 voice recognition chip, a phonetic table needs to be set reasonably. The setting of LD3320 speech recognition chip to the phonetic table needs to satisfy: each pinyin corresponds to a unique coded character, and the number of the corresponding pinyin string can be different, but the repeated number cannot appear. For example, number 1 may have two corresponding strings, but if number 2 is occupied, there cannot be more number 2. The maximum pinyin character string supported by the LD3320 chip is 50 characters.

(3) Voice feedback program writing

The MR628 voice synthesis chip can be driven by a single chip microcomputer with a serial port. Meanwhile, the MR628 voice synthesis chip takes < G > as a frame header, and content needing voice synthesis is added later, so that voice broadcasting can be realized. And selecting proper feedback content to broadcast according to the content identified by the voice identification module so as to achieve the purpose of voice interaction.

As an implementation manner, the relay backplane further includes an indicator light group, where the indicator light group is used to prompt when the voice acquisition and recognition module successfully matches the secondary instruction, and then a personalized light-up program may be written in when the program is written in, so that the system functionality is increased while the circuit is protected, for example, after a certain instruction is recognized, some operations of lighting up lights may be defined in a personalized manner.

The system has the following characteristics:

(1) non-specific person voice recognition technology: the recording training is not needed;

(2) dynamically editable recognition keyword lists: the identified key words need only be transmitted into the system in the form of a string of characters to be immediately available for the next recognition. For example, in programming, the contents of the identification keywords such as "hello" are dynamically transmitted to the system simply by setting a register of the chip, the system can identify the set key words, and in addition, the list editing of the key words is very convenient;

(3) true single chip solution: no external auxiliary Flash and RAM are needed;

(4) high accuracy and practical speech recognition effect, and low cost.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the understanding of one or more embodiments of the present description, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An LD 3320-based voice recognition interaction method is characterized by comprising the following steps:

analyzing the received voice, and extracting voice characteristics;

2. The LD3320 based voice recognition interaction method of claim 1, wherein the method further comprises: and after the primary instruction keywords are successfully matched, broadcasting voice prompt feedback.

3. The LD3320 based speech recognition interaction method of claim 1, wherein the primary command keyword is provided in plurality, and each of the primary command keywords is associated with a plurality of secondary command keywords.

4. The LD3320 based voice recognition interaction method of claim 1, wherein when the primary instruction keywords and the secondary instruction keywords are pre-selected and set, the pinyin table satisfies the following conditions by setting the pinyin table: each pinyin corresponds to a unique coded character, and the number of repeated pinyin strings cannot appear.

5. The LD 3320-based voice recognition interaction method of claim 1, wherein when matching with a preset primary command keyword is successful, if a preset control command keyword is matched within a set time, then executing the corresponding operation according to the control command keyword.

6. The LD3320 based voice recognition interaction method of claim 1, wherein the primary instruction keywords and the secondary instruction keywords are modifiable by a user.

7. An LD 3320-based voice recognition interactive system, comprising:

8. The LD3320 based voice recognition interaction system of claim 7, wherein the voice capturing and recognizing module comprises an LD3320 voice recognition chip.

9. The LD 3320-based voice recognition interaction system according to claim 7, wherein the voice synthesis module comprises an MR628 voice synthesis chip, and a USB-to-TTL chip is further disposed on the relay backplane.

10. The LD3320 based voice recognition interaction system of claim 7, wherein the relay backplane further comprises an indicator light set for prompting when the voice capture recognition module matches a successful secondary instruction.