CN112735390B

CN112735390B - Intelligent voice terminal equipment with voice recognition function

Info

Publication number: CN112735390B
Application number: CN202011564820.5A
Authority: CN
Inventors: 刘伟; 杨志
Original assignee: Jiangxi Taide Intelligence Technology Co Ltd
Current assignee: Jiangxi Taide Intelligence Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2023-02-28
Anticipated expiration: 2040-12-25
Also published as: CN112735390A

Abstract

The invention discloses intelligent voice terminal equipment with a voice recognition function, relates to a voice terminal and belongs to the technical field of intelligent voice recognition; the voice recognition system comprises a voice acquisition module, a voice processing module, a voice storage module, a voice matching module and a voice recognition module; the voice acquisition module is used for acquiring voice information, and send the voice information to the voice processing module, the voice processing module processes the received voice, and send the processed result to the voice storage module, when the central controller detects voice input, the central controller controls the voice acquisition module to acquire voice, and send the acquired voice to the voice processing module, the voice processing module intercepts voice segments to carry out the voice matching module, if the voice segments are matched with data in the voice storage module, the voice recognition module acquires the complete voice process, and combines the knowledge processing module, and the recognized voice is displayed in the recognition display module.

Description

Intelligent voice terminal equipment with voice recognition function

Technical Field

The invention relates to a voice terminal, in particular to an intelligent voice terminal device with a voice recognition function, and belongs to the technical field of intelligent voice recognition.

Background

In general, an intelligent terminal is a type of embedded computer system device, and therefore, the architecture framework of the intelligent terminal is consistent with the architecture of an embedded system; meanwhile, the intelligent terminal is used as an application direction of the embedded system, and the application scene setting is clear, so that the system structure is more clear than that of a common embedded system, the granularity is finer, and the system has certain characteristics.

The intelligent terminal system structure is divided into a hardware structure and a software structure, and from the hardware, the intelligent terminal is generally adopted as a computer classical system structure-a von Neumann structure, namely, the intelligent terminal is composed of five parts, namely an arithmetic unit, a controller, a memory, an input device and an output device, wherein the arithmetic unit and the controller form a core part, namely a central processing unit, of the computer. In the software structure of the intelligent terminal, system software mainly comprises an operating system and middleware. The operating system has the function of managing all resources (including hardware and software) of the intelligent terminal and is also a kernel and a foundation of the intelligent terminal system.

The existing intelligent voice terminal equipment can automatically output voice information by a user and convert the voice information into text to be displayed for the user to confirm, but a huge voice database is not established, a reference is provided for later-stage user authentication, voice filtering is not performed after the user confirms, and the problem that the recognized voice information is inaccurate when other noise exists beside the voice information is solved.

Therefore, an intelligent voice terminal device with a voice recognition function is provided.

Disclosure of Invention

The invention aims to provide intelligent voice terminal equipment with a voice recognition function, which is used for solving the problems that the existing intelligent voice terminal equipment can automatically output voice information by a user and convert the voice information into characters to be displayed for the user to confirm, but a huge voice database is not established, a reference is provided for later authentication of the user, voice filtering is not performed after the user confirms, and the recognized voice information is possibly inaccurate when other noise exists nearby. The invention relates to a voice acquisition module, a voice processing module, a voice storage module, a voice matching module, a voice recognition module, a user authentication module, a central controller, a recognition display module, a voice output module and a knowledge processing module, wherein the voice acquisition module is used for acquiring voice information and sending the voice information to the voice processing module, the voice processing module is used for processing the received voice and sending a processing result to the voice storage module, when the central controller detects voice input, the central controller controls the voice acquisition module to acquire the voice and sends the acquired voice to the voice processing module, the voice processing module intercepts voice segments to carry out the voice matching module, and if the voice segments are matched with data in the voice storage module, the voice recognition module acquires a complete voice process and combines the knowledge processing module to display the recognized voice in the recognition display module.

The purpose of the invention can be realized by the following technical scheme:

an intelligent voice terminal device with a voice recognition function comprises a voice acquisition module, a voice processing module, a voice storage module, a voice matching module, a voice recognition module, a user authentication module, a central controller, a recognition display module, a voice output module and a knowledge processing module; the central controller is electrically connected with the voice acquisition module, the voice acquisition module is in wireless communication connection with the voice processing module, the voice processing module is in wireless communication connection with the voice storage module, the voice processing module is in wireless communication connection with the voice matching module, the voice matching module is in wireless communication connection with the voice recognition module, and the voice recognition module and the knowledge processing module are in wireless communication connection with the voice output module and the recognition display module;

the voice acquisition module is used for acquiring voice information, the voice information is sent to the voice processing module, the voice processing module processes the received voice, the processed result is sent to the voice storage module, when the central controller detects voice input, the central controller controls the voice acquisition module to acquire the voice, the acquired voice is sent to the voice processing module, the voice processing module intercepts voice fragments to carry out the voice matching module, if the voice fragments are matched with data in the voice storage module, the voice recognition module acquires the complete voice process, and the knowledge processing module is combined, and the recognized voice is displayed in the recognition display module.

Specifically, the specific way of storing data by the voice storage module includes the following processes:

the user inputs an account password through the user authentication module and then logs in the intelligent voice terminal equipment through authentication, the central controller marks the user as a storage user, generates an identity code of the storage user, and acquires a voice fragment of the storage user through the voice acquisition module;

the voice acquisition module sends the acquired voice segments of the storage users to the voice processing module, and the voice processing module normalizes the amplitude of the voice segments, corrects frequency response, divides frames, adds windows and detects the starting and ending endpoints;

acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; marking the signals as Ai, F and Ci; wherein i represents the number of frames of the speech segment; i =1,2 … m;

and sending the Ai, the F, the Ci and the identity codes of the stored users to a voice storage module.

Specifically, the voice matching module is configured to perform voice matching on a user, and a specific matching process includes the following steps:

the voice acquisition module acquires voice information of a user and sends the voice information of the user to the voice processing module, the voice processing module intercepts voice fragments with the same length, and the voice processing module normalizes the amplitude of the voice fragments, corrects frequency response, divides frames, adds windows and detects a start end point and a tail end point;

acquiring the amplitude of the voice fragment, the frequency of the voice fragment and overtone interval; marking the three-dimensional image as Ai ', F ' and Ci ' respectively;

the voice matching degree Pc between the user and a plurality of stored users is calculated by using a calculation formula

Wherein a1, a2 and a3 are preset values, and a1 is more than a2 and more than a3; c represents the number of the storage user, c =1,2 … m;

setting a voice matching degree threshold, if the voice matching degree Pc is larger than the voice matching degree threshold, the voice matching module carries out descending order arrangement on the calculated voice matching degree Pc, and the identity code of the user with the largest voice matching degree Pc is sent to the voice recognition module;

if the voice matching degree Pc is not larger than the voice matching degree threshold value, the user is represented as a new user, and the user is reminded to perform user authentication through the voice output module.

Specifically, after the voice recognition module receives the identity code of the user sent by the voice matching module, complete voice of the user is obtained, character recognition is carried out, and the recognized voice is converted into characters by combining the knowledge processing module and displayed in the recognition display module.

Specifically, after the voice recognition module obtains complete voice, the voice recognition module performs character recognition and sends recognized characters to the knowledge processing module, and common phrases and words of the user are stored in the knowledge processing module.

Specifically, the user authentication module is used for a new user to input personal information for registration and login, and perform user authentication when logging in next time, wherein the personal information comprises name, age and home address, the personal information of the user who successfully registers is stored in the voice storage module, and the user authentication module generates an identity code at the same time.

Specifically, the working process of the intelligent voice terminal equipment with the voice recognition function comprises the following steps:

the method comprises the following steps: storing the user voice;

the user inputs an account password through the user authentication module and then logs in the intelligent voice terminal equipment through authentication, the central controller marks the user as a storage user and generates an identity code of the user, and a voice fragment of the storage user is obtained through the voice acquisition module; the voice acquisition module sends the acquired voice fragments of the storage user to the voice processing module, and the voice processing module normalizes the amplitude of the voice fragments, corrects frequency response, divides frames, adds windows and detects the starting and ending points;

acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; marking the signals as Ai, F and Ci; wherein i represents the number of frames of the speech segment; sending the Ai, the F, the Ci and the identity codes of the stored users to a voice storage module;

step two: intelligent voice matching;

when a user inputs voice, the voice acquisition module acquires voice information of the user and sends the voice information of the user to the voice processing module, the voice processing module intercepts voice segments with the same length, and the voice processing module normalizes the amplitude of the voice segments, corrects frequency response, divides frames, adds windows and detects start and end points; acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; marking the three-dimensional image as Ai ', F ' and Ci ' respectively;

the voice matching degree Pc between the user and the stored user is calculated by the formula

if the voice matching degree Pc is not larger than the voice matching degree threshold value, the user is represented as a new user, and the user is reminded to perform user authentication through the voice output module;

step three: intelligent voice recognition;

and when the voice recognition module receives the identity code of the user sent by the voice matching module, the complete voice of the user is acquired, character recognition is carried out, and the recognized voice is converted into characters by combining the knowledge processing module and displayed in the recognition display module.

Compared with the prior art, the invention has the beneficial effects that:

1. the intelligent voice recognition system comprises a voice acquisition module, a voice processing module, a voice storage module, a voice matching module, a voice recognition module, a user authentication module, a central controller, a recognition display module, a voice output module and a knowledge processing module; the central controller is electrically connected with the voice acquisition module, the voice acquisition module is in wireless communication connection with the voice processing module, the voice processing module is in wireless communication connection with the voice storage module, the voice processing module is in wireless communication connection with the voice matching module, the voice matching module is in wireless communication connection with the voice recognition module, and the voice recognition module and the knowledge processing module are in wireless communication connection with the voice output module and the recognition display module; the voice acquisition module is used for acquiring voice information, the voice information is sent to the voice processing module, the voice processing module processes the received voice, the processed result is sent to the voice storage module, when the central controller detects voice input, the central controller controls the voice acquisition module to acquire the voice, the acquired voice is sent to the voice processing module, the voice processing module intercepts voice fragments to carry out the voice matching module, if the voice fragments are matched with data in the voice storage module, the voice recognition module acquires the complete voice process, and the knowledge processing module is combined, and the recognized voice is displayed in the recognition display module.

2. The voice storage module is used for storing the voice of the authenticated user and providing a database for later voice authentication, the specific user inputs an account number and a password through the user authentication module and then logs in the intelligent voice terminal equipment through authentication, the central controller marks the user as a storage user and generates an identity code of the storage user, and a voice acquisition module acquires a voice fragment of the storage user; the voice acquisition module sends the acquired voice segments of the storage users to the voice processing module, and the voice processing module normalizes the amplitude of the voice segments, corrects frequency response, divides frames, adds windows and detects the starting and ending endpoints; acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; marking the signals as Ai, F and Ci; and sending the Ai, the F, the Ci and the identity codes of the stored users to a voice storage module.

3. The voice recognition system is provided with a voice matching module, wherein the voice matching module is used for matching voice of a user, the voice acquisition module acquires voice information of the user and sends the voice information of the user to the voice processing module, the voice processing module intercepts voice fragments with the same length, and the voice processing module normalizes the amplitude of the voice fragments, corrects frequency response, divides frames, adds windows and detects starting and ending endpoints; acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; it is labeled Ai ', F ', ci ', respectively; the voice matching degree Pc between the user and a plurality of stored users is calculated by using a calculation formula

Setting a voice matching degree threshold value, if the voice matching degree Pc is larger than the voice matching degree threshold value, the voice matching module carries out descending order arrangement on the calculated voice matching degree Pc, and the user with the largest voice matching degree Pc isThe code share is sent to a voice recognition module; if the voice matching degree Pc is not larger than the voice matching degree threshold value, the user is represented as a new user, and the user is reminded to perform user authentication through the voice output module.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of an intelligent voice terminal device with a voice recognition function according to the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, an intelligent voice terminal device with a voice recognition function includes a voice collecting module, a voice processing module, a voice storage module, a voice matching module, a voice recognition module, a user authentication module, a central controller, a recognition display module, a voice output module, and a knowledge processing module; the central controller is electrically connected with the voice acquisition module, the voice acquisition module is in wireless communication connection with the voice processing module, the voice processing module is in wireless communication connection with the voice storage module, the voice processing module is in wireless communication connection with the voice matching module, the voice matching module is in wireless communication connection with the voice recognition module, and the voice recognition module and the knowledge processing module are in wireless communication connection with the voice output module and the recognition display module;

The specific way of storing data by the voice storage module comprises the following processes:

the voice acquisition module sends the acquired voice fragments of the storage user to the voice processing module, and the voice processing module normalizes the amplitude of the voice fragments, corrects frequency response, divides frames, adds windows and detects the starting and ending points;

The voice matching module is used for matching voice of a user, and the specific matching process comprises the following steps:

acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; it is labeled Ai ', F ', ci ', respectively;

Wherein a1, a2 and a3 are preset values, and a1 is more than a2 and more than a3; c denotes the number of the storage user, c =1,2 … m;

When the voice recognition module receives the identity code of the user sent by the voice matching module, complete voice of the user is obtained, character recognition is carried out, and the recognized voice is converted into characters by combining the knowledge processing module and displayed in the recognition display module.

After the voice recognition module acquires complete voice, the voice recognition module performs character recognition and sends recognized characters to the knowledge processing module, and commonly used phrases and words of a user are stored in the knowledge processing module.

The user authentication module is used for inputting personal information to register and log in by a new user, and performing user authentication when logging in next time, wherein the personal information comprises name, age and home address, the personal information of the user who successfully registers is stored in the voice storage module, and the user authentication module simultaneously generates an identity code.

The working process of the intelligent voice terminal equipment with the voice recognition function comprises the following steps:

the method comprises the following steps: storing the user voice;

step two: intelligent voice matching;

when a user inputs voice, the voice acquisition module acquires voice information of the user and sends the voice information of the user to the voice processing module, the voice processing module intercepts voice segments with the same length, and the voice processing module normalizes the amplitude of the voice segments, corrects frequency response, divides frames, adds windows and detects start and end points; acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval; it is labeled Ai ', F ', ci ', respectively;

the voice matching degree Pc of the user and the stored user is calculated by the calculation formula

step three: intelligent voice recognition;

When a user inputs an account password and an identity code to log in the intelligent voice terminal equipment, the voice matching module performs authentication matching, when the matching is passed, the data acquisition module acquires complete voice and sends the complete voice to the voice recognition module, the voice recognition module removes other impurity sounds in the recognition process, and the voice of the user who inputs the account password and the identity code is output in the recognition display module by combining the knowledge processing module.

The above formulas are all calculated by removing dimensions and taking values thereof, the formula is one closest to the real situation obtained by collecting a large amount of data and performing software simulation, and the preset parameters in the formula are set by the technical personnel in the field according to the actual situation.

The working principle of the invention is as follows: the user inputs an account password through the user authentication module and then logs in the intelligent voice terminal equipment through authentication, the central controller marks the user as a storage user and generates an identity code of the user, and a voice fragment of the storage user is obtained through the voice acquisition module; the voice acquisition module sends the acquired voice segments of the storage users to the voice processing module, and the voice processing module normalizes the amplitude of the voice segments, corrects frequency response, divides frames, adds windows and detects the starting and ending endpoints; acquiring the amplitude of the voice fragment, the frequency of the voice fragment and overtone interval; marking the signals as Ai, F and Ci; wherein i represents the number of frames of the speech segment; sending the Ai, the F, the Ci and the identity codes of the stored users to a voice storage module;

when a user inputs voice, the voice acquisition module acquires voice information of the user and sends the voice information of the user to the voice processing module, the voice processing module intercepts voice segments with the same length, and the voice processing module normalizes the amplitude of the voice segments, corrects frequency response, divides frames, adds windows and detects start and end points; acquiring the amplitude of the voice fragment, the frequency of the voice fragment and the overtone interval;it is labeled Ai ', F ', ci ', respectively; the voice matching degree Pc of the user and the stored user is calculated by the calculation formula

Setting a voice matching degree threshold, if the voice matching degree Pc is larger than the voice matching degree threshold, the voice matching module carries out descending order arrangement on the calculated voice matching degree Pc, and the identity code of the user with the largest voice matching degree Pc is sent to the voice recognition module; if the voice matching degree Pc is not larger than the voice matching degree threshold value, the user is represented as a new user, and the user is reminded to perform user authentication through the voice output module;

and when the voice recognition module receives the identity code of the user sent by the voice matching module, acquiring the complete voice of the user, recognizing characters, converting the recognized voice into characters by combining the knowledge processing module, and displaying the characters in the recognition display module.

In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. An intelligent voice terminal device with a voice recognition function is characterized by comprising a voice acquisition module, a voice processing module, a voice storage module, a voice matching module, a voice recognition module, a user authentication module, a central controller, a recognition display module, a voice output module and a knowledge processing module; the central controller is electrically connected with the voice acquisition module, the voice acquisition module is in wireless communication connection with the voice processing module, the voice processing module is in wireless communication connection with the voice storage module, the voice processing module is in wireless communication connection with the voice matching module, the voice matching module is in wireless communication connection with the voice recognition module, and the voice recognition module and the knowledge processing module are in wireless communication connection with the voice output module and the recognition display module;

the voice acquisition module is used for acquiring voice information and sending the voice information to the voice processing module, the voice processing module is used for processing the received voice and sending a processing result to the voice storage module, when the central controller detects voice input, the central controller controls the voice acquisition module to acquire the voice and sends the acquired voice to the voice processing module, the voice processing module intercepts voice fragments to carry out a voice matching module, if the voice fragments are matched with data in the voice storage module, the voice recognition module acquires a complete voice process and displays the recognized voice in the recognition display module by combining with the knowledge processing module;

sending the Ai, the F, the Ci and the identity codes of the stored users to a voice storage module;

the voice matching module is used for carrying out voice matching on a user, and the specific matching process comprises the following steps:

2. The intelligent voice terminal equipment with the voice recognition function as claimed in claim 1, wherein after the voice recognition module receives the identity code of the user sent by the voice matching module, the complete voice of the user is obtained, character recognition is performed, and the recognized voice is converted into characters by combining the knowledge processing module and displayed in the recognition display module.

3. The intelligent voice terminal device with the voice recognition function as claimed in claim 1, wherein after the voice recognition module obtains complete voice, the voice recognition module performs character recognition and sends recognized characters to the knowledge processing module, and common phrases and words of the user are stored in the knowledge processing module.

4. The intelligent voice terminal device with voice recognition function as claimed in claim 1, wherein the user authentication module is used for a new user to input personal information for registration login, and to perform user authentication at the next login, wherein the personal information includes name, age and home address, and the personal information of the user who successfully registers is stored in the voice storage module, and the user authentication module generates the identity code at the same time.

5. An intelligent voice terminal device with voice recognition function according to claim 1, characterized in that the working process of the intelligent voice terminal device with voice recognition function comprises the following steps:

the method comprises the following steps: storing the user voice;

step two: intelligent voice matching;

step three: intelligent voice recognition;