CN113421558A

CN113421558A - Voice recognition system and method

Info

Publication number: CN113421558A
Application number: CN202110978244.7A
Authority: CN
Inventors: 杨代福
Original assignee: Beijing Xinhe Technology Co ltd
Current assignee: Beijing Xinhe Technology Co ltd
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-09-21

Abstract

The invention is suitable for the technical field of voice analysis processing, and particularly relates to a voice recognition system and a method, wherein the method comprises the following steps: acquiring real-time voice information in the vehicle; carrying out online analysis on the real-time voice information, and judging whether the real-time voice information fluctuates or not; if the real-time voice information fluctuates, audio extraction is carried out on the real-time voice information to obtain fluctuating voice information; comparing the fluctuating voice information with the interference voice information to obtain a comparison result; and performing voice recognition on the fluctuating voice information, and executing a corresponding instruction according to a voice recognition result. According to the method and the device, the real-time voice in the vehicle is obtained and analyzed in real time, the filtering processing is carried out on the real-time voice according to the collected interference sound source, the noise in the real-time voice is removed, and the corresponding instruction is executed after the fact that the current voice contains the voice information of the driver is judged, so that the voice message of the user can be accurately identified, the influence of the external environment is avoided, and the identification accuracy is improved.

Description

Voice recognition system and method

Technical Field

The invention belongs to the technical field of voice analysis processing, and particularly relates to a voice recognition system and method.

Background

Speech recognition is a cross discipline. In the last two decades, speech recognition technology has advanced significantly, starting to move from the laboratory to the market. The voice recognition technology will enter various fields such as industry, home appliances, communication, automotive electronics, medical treatment, home services, consumer electronics, and the like. The fields to which speech recognition technology relates include: signal processing, pattern recognition, probability and information theory, sound and hearing mechanisms, artificial intelligence, and the like.

Among the current automotive devices, speech recognition systems are also increasingly being equipped. The interaction between a driver and a vehicle can be completed by utilizing a voice recognition system built in the vehicle machine, for example, the opening and closing of an air conditioner is controlled by utilizing voice, the lifting and falling of a vehicle window are controlled by utilizing voice, and the like.

However, the existing in-vehicle voice system needs to keep the environment in a relatively quiet state, and if music is played or a window is opened in the current vehicle, the voice recognition accuracy is greatly reduced.

Disclosure of Invention

In order to overcome the above problems or at least partially solve the above problems, embodiments of the present invention provide a speech recognition method and system, which perform real-time correction and real-time processing on a speech recognition process, ensure that a speech message of a user can be accurately recognized, avoid the influence of an external environment, and improve recognition accuracy.

The embodiment of the invention is realized in such a way that a voice recognition method comprises the following steps:

acquiring real-time voice information in the vehicle;

carrying out online analysis on the real-time voice information, and judging whether the real-time voice information fluctuates or not;

if the real-time voice information fluctuates, audio extraction is carried out on the real-time voice information to obtain fluctuating voice information;

comparing the fluctuating voice information with the interference voice information stored in the interference sound source database to obtain a comparison result;

and carrying out voice recognition on the fluctuating voice information according to the comparison result, and executing a corresponding instruction according to the voice recognition result.

Preferably, the step of performing online analysis on the real-time voice information and determining whether the real-time voice information fluctuates specifically includes:

segmenting the real-time voice information to obtain a real-time voice segment;

numbering real-time voice sections, wherein the corresponding recording durations of the real-time voice sections are the same, and the real-time voice sections are numbered continuously;

and sequentially comparing the real-time voice sections adjacent to the two ends according to the numbering sequence, and judging whether the real-time voice information fluctuates according to the comparison result.

Preferably, the step of performing audio extraction on the real-time speech information to obtain fluctuating speech information specifically includes:

positioning a real-time voice section with fluctuation in real-time voice information to obtain a first voice section;

reading a previous real-time voice section adjacent to the real-time voice section with fluctuation to obtain a second voice section;

and filtering the first voice section on the basis of the second voice section to obtain fluctuating voice information.

Preferably, the step of comparing the fluctuating voice information with the interference voice information stored in the interference sound source database to obtain a comparison result specifically includes:

sequentially reading interference voice information in an interference voice source database, wherein the interference voice information at least comprises song interference information and wind sound interference information;

and comparing the interference voice information with the fluctuation voice information one by one to obtain a plurality of comparison results.

Preferably, the step of performing voice recognition on the fluctuating voice information according to the comparison result and executing the corresponding instruction according to the voice recognition result specifically includes:

analyzing all comparison results, and judging whether interference voice information is matched with fluctuating voice information or not in the comparison results;

if so, carrying out voice recognition on the fluctuating voice information to obtain a voice recognition result;

and retrieving and executing the corresponding instruction according to the voice recognition result.

Preferably, the interference voice information is recorded in real time and updated regularly.

Preferably, the voice recognition process adopts networking recognition or local recognition.

It is another object of an embodiment of the present invention to provide a speech recognition system, including:

the information acquisition module is used for acquiring real-time voice information in the vehicle;

the voice analysis module is used for carrying out online analysis on the real-time voice information and judging whether the real-time voice information fluctuates or not;

the audio extraction module is used for extracting audio from the real-time voice information to obtain fluctuating voice information if the real-time voice information fluctuates;

the audio comparison module is used for comparing the fluctuating voice information with the interference voice information stored in the interference voice source database to obtain a comparison result;

and the voice recognition module is used for carrying out voice recognition on the fluctuating voice information according to the comparison result and executing a corresponding instruction according to the voice recognition result.

Preferably, the voice analysis module includes:

the data segmentation unit is used for segmenting the real-time voice information to obtain a real-time voice segment;

the data numbering unit is used for numbering the real-time voice sections, the corresponding recording durations of the real-time voice sections are the same, and the numbering is continuous;

and the data comparison unit is used for sequentially comparing the real-time voice sections adjacent to the two ends according to the numbering sequence and judging whether the real-time voice information fluctuates according to the comparison result.

Preferably, the audio extraction module includes:

the audio positioning unit is used for positioning a real-time voice section with fluctuation in real-time voice information to obtain a first voice section;

the audio reading unit is used for reading a previous real-time voice section adjacent to the real-time voice section with fluctuation to obtain a second voice section;

and the filtering unit is used for filtering the first voice section on the basis of the second voice section to obtain the fluctuating voice information.

According to the voice recognition method provided by the embodiment of the invention, the real-time voice in the vehicle is obtained and analyzed in real time, the filtering processing is carried out on the real-time voice according to the collected interference sound source, the noise in the real-time voice is removed, and the corresponding instruction is executed after the current voice is judged to contain the voice information of the driver, so that the voice message of the user can be accurately recognized, the influence of the external environment is avoided, and the recognition accuracy is improved.

Drawings

Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of performing online analysis on real-time voice information and determining whether there is fluctuation in the real-time voice information according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a step of performing audio extraction on real-time speech information to obtain fluctuating speech information according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a step of comparing the fluctuating voice information with the interfering voice information stored in the interfering voice source database to obtain a comparison result according to the embodiment of the present invention;

fig. 5 is a flowchart illustrating steps of performing speech recognition on fluctuating speech information according to a comparison result and executing a corresponding instruction according to a speech recognition result according to an embodiment of the present invention;

FIG. 6 is an architecture diagram of a speech recognition system according to an embodiment of the present invention;

FIG. 7 is an architecture diagram of a speech analysis module according to an embodiment of the present invention;

fig. 8 is an architecture diagram of an audio extraction module according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.

Among the current automotive devices, speech recognition systems are also increasingly being equipped. The interaction between a driver and a vehicle can be completed by utilizing a voice recognition system built in the vehicle machine, for example, the opening and closing of an air conditioner is controlled by utilizing voice, the lifting and falling of a vehicle window are controlled by utilizing voice, and the like. The existing built-in voice system of the car machine needs to keep the environment in a quieter state, and if music is played or a car window is opened in the car at present, the voice recognition accuracy rate is greatly reduced.

Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present invention, where the method includes:

and S100, acquiring real-time voice information in the vehicle.

The car machine is a vehicle-mounted information entertainment product installed in a car, and the car machine can realize information communication between people and the car and between the car and the outside (car and car) functionally. With the development of science and technology, car navigation has been developed from early CD and DVD navigation to intellectualization and informatization. The functions of the existing car machine are that besides the traditional radio, music video playing and navigation functions, information communication between people and a car and between the car and the outside is realized, and the functions related to user experience, service and safety are enhanced.

In this step, real-time voice information in the vehicle is acquired, where the real-time voice information is any sound generated in the vehicle, and specifically, the real-time voice information may be collected by a microphone, and the microphone is disposed at a position close to the driver in order to better recognize the sound of the driver.

S200, carrying out online analysis on the real-time voice information, and judging whether the real-time voice information fluctuates.

In this step, carry out online analysis to real-time speech information, real-time speech information is acquireed in real time, consequently in the whole process that the car went, all gather the interior sound of car, under most circumstances, the sound in the car is in the little state of change, and when navigating mate's pronunciation, the real-time speech information of gathering will appear obvious change to carry out online analysis to real-time speech information, realize the judgement whether to have undulant to real-time speech information.

And S300, if the real-time voice information fluctuates, performing audio extraction on the real-time voice information to obtain fluctuating voice information.

In this step, after the real-time voice information is determined, if the real-time voice information fluctuates, it is indicated that the sound in the vehicle changes at this time, and it may be that the driver is speaking, but it may also be that the driver is playing music or the driver opens a window, so that it is necessary to further determine this, perform audio extraction on the real-time voice information, and extract a part of the real-time voice information that has changed, thereby obtaining the fluctuating voice information.

S400, comparing the fluctuating voice information with the interference voice information stored in the interference sound source database to obtain a comparison result.

In this step, an interference sound source database is read, interference sound information is stored in the interference sound source database, the types of the interference sound information at least include two types, one type is external wind sound in the driving process of a vehicle, and the other type is played music sound, the former mainly uses a microphone to collect information and collect current wind sound, and the latter is directly obtained from a vehicle machine, so that in the judging process, the fluctuation sound information is compared with the interference sound information stored in the interference sound source database to judge whether the current fluctuation is wind sound or is caused by playing music.

And S500, performing voice recognition on the fluctuating voice information according to the comparison result, and executing a corresponding instruction according to the voice recognition result.

In the step, the comparison result is judged, if the comparison result shows that the current fluctuation voice information is not caused by wind noise or music playing, the current fluctuation voice information can be determined to be caused by the sound production of a driver, so that the voice recognition is carried out on the fluctuation voice information, the character information in the fluctuation voice information is recognized, the corresponding instruction is executed according to the character information, and if the effective content cannot be recognized after the voice recognition, no instruction is executed.

As shown in fig. 2, as a preferred embodiment of the present invention, the step of performing online analysis on the real-time voice information and determining whether the real-time voice information has fluctuation specifically includes:

s201, segmenting the real-time voice information to obtain a real-time voice segment.

In this step, the real-time voice information is segmented, and since the real-time voice information is continuously recorded, the whole real-time voice information is also continuous, and the real-time voice information is segmented to facilitate processing.

S202, numbering is carried out on the real-time voice sections, the recording time lengths corresponding to the real-time voice sections are the same, and the numbering is continuous numbering.

In this step, the real-time voice segments are numbered, the sequence of the numbers also needs to come according to the sequence of time, the number first entered is numbered first, and the number later entered is numbered later, so that the comparison and the processing are performed subsequently.

S203, comparing the real-time voice sections adjacent to the two ends in sequence according to the numbering sequence, and judging whether the real-time voice information fluctuates according to the comparison result.

In this step, the real-time speech segments are read segment by segment, and the real-time speech segments at two adjacent ends are compared to determine whether there is fluctuation between the two adjacent real-time speech segments, for example, when the real-time speech segments are numbered, the real-time speech segments 01 and 02 … … real-time speech segments 0N are obtained by numbering the real-time speech segments with arabic numbers, when the real-time speech segments 02 are read, the real-time speech segments 01 are used as comparison items to determine whether there is a large change in the real-time speech segments 02 compared with the real-time speech segments 01, if so, the fluctuation is indicated, and if so, the fluctuation is not indicated.

As shown in fig. 3, as a preferred embodiment of the present invention, the step of performing audio extraction on the real-time speech information to obtain fluctuating speech information specifically includes:

s301, positioning a real-time voice section with fluctuation in the real-time voice information to obtain a first voice section.

In this step, the real-time speech segment with fluctuation is located and defined as the first speech segment, and then the first speech segment is the part of the real-time speech information where fluctuation begins to occur, so that it is stated that this first speech segment includes both the ambient sound and possibly the sound made by the driver.

S302, reading a previous real-time voice section adjacent to the real-time voice section with fluctuation to obtain a second voice section.

And S303, filtering the first voice section on the basis of the second voice section to obtain fluctuating voice information.

In this step, the previous real-time speech segment adjacent to the real-time speech segment with fluctuation is read, and is the second speech segment, and the fluctuation does not occur in the second speech segment yet, so the main content therein is the environmental sound, and the first speech segment is filtered on the basis of the second speech segment at this time, i.e. the second speech segment is removed from the first speech segment, thereby removing the environmental sound.

As shown in fig. 4, as a preferred embodiment of the present invention, the step of comparing the fluctuating voice information with the interfering voice information stored in the interfering voice source database to obtain a comparison result specifically includes:

s401, sequentially reading interference voice information in an interference voice source database, wherein the interference voice information at least comprises song interference information and wind sound interference information.

In this step, the interfering voice information in the interfering voice source database is sequentially read, and for the interfering information, the main source is the played music and the sound generated by the high-speed airflow during driving, and if the fluctuation of the sound occurring at present is not caused by the fluctuation, the sound can be determined as being caused by the driver, which indicates that the driver may need to perform corresponding operations.

S402, comparing the interference voice information with the fluctuation voice information one by one to obtain a plurality of comparison results.

In this step, the fluctuating voice information is compared with the interfering voice information on the basis of the fluctuating voice information, so that the fluctuating voice information can be directly judged, the current fluctuation source is judged, and a comparison result is generated every time the fluctuating voice information is compared with the interfering voice information.

As shown in fig. 5, as a preferred embodiment of the present invention, the step of performing voice recognition on the fluctuating voice information according to the comparison result, and executing the corresponding instruction according to the voice recognition result specifically includes:

s501, analyzing all comparison results, and judging whether interference voice information is matched with fluctuating voice information in the comparison results.

In this step, the comparison result is analyzed, the number of the comparison result is the same as the number of the types contained in the interference voice information, for example, four external sound sources exist in the interference voice information, after the comparison, four groups of comparison results are generated, wherein when one group of results show that the interference voice information is matched with the fluctuation voice information, the current fluctuation is not caused by the driver, and the current fluctuation is ignored.

And S502, if the voice recognition result exists, performing voice recognition on the fluctuating voice information to obtain a voice recognition result.

In this step, if all the comparison results indicate that the interference voice information is not matched with the fluctuation voice information, it indicates that the current fluctuation is caused by the driver, and therefore voice recognition is performed on the current fluctuation to obtain a voice recognition result.

S503, retrieving the corresponding command according to the voice recognition result and executing.

In this step, after the voice recognition is performed, there are generally two cases, one is that there is a character in the recognition result, the search is performed according to the character, the corresponding instruction is executed if the search is performed, and if there is no character or the corresponding instruction is not included in the character, the execution is abandoned.

As shown in fig. 6, a speech recognition system provided for an embodiment of the present invention includes:

and the information acquisition module 100 is used for acquiring real-time voice information in the vehicle.

In the system, the information acquiring module 100 acquires real-time voice information in the vehicle, where the real-time voice information is any sound generated in the vehicle, and specifically, the real-time voice information may be collected by a microphone, and the microphone is disposed at a position close to the driver in order to better recognize the sound of the driver.

And the voice analysis module 200 is configured to perform online analysis on the real-time voice information, and determine whether the real-time voice information fluctuates.

In this system, voice analysis module 200 carries out online analysis to real-time voice information, and real-time voice information is acquireed in real time, consequently in the middle of the whole process that the car went, all gather the interior sound of car, and under most circumstances, the sound in the car is in the little state of change, and when navigating mate's pronunciation, the obvious change will appear in the real-time voice information of gathering to carry out online analysis to real-time voice information, realize the judgement whether there is fluctuation to real-time voice information.

And the audio extraction module 300 is configured to, if the real-time voice information fluctuates, perform audio extraction on the real-time voice information to obtain fluctuating voice information.

In the system, after the audio extraction module 300 determines the real-time voice information, if the real-time voice information fluctuates, it is determined that the sound in the vehicle changes, and it may be that the driver is speaking, but it may also be that the driver is playing music or the driver opens a window, so that it is necessary to further determine this, extract the audio from the real-time voice information, and extract a part of the real-time voice information that changes, so as to obtain the fluctuating voice information.

The audio comparison module 400 is configured to compare the fluctuating voice information with the interference voice information stored in the interference sound source database, so as to obtain a comparison result.

In the system, the audio comparison module 400 reads the interference sound source database, in which the interference sound information is stored, and compares the fluctuation sound information with the interference sound information stored in the interference sound source database during the determination process to determine whether the current fluctuation is caused by wind noise or music playing.

And the voice recognition module 500 is configured to perform voice recognition on the fluctuating voice information according to the comparison result, and execute a corresponding instruction according to the voice recognition result.

In the system, the voice recognition module 500 compares the results to determine, and if the comparison result indicates that the current fluctuating voice information is not caused by wind noise or music playing, it can be determined that the current fluctuating voice information is caused by the sound of the driver.

As shown in fig. 7, as a preferred embodiment of the present invention, the voice analysis module includes:

the data segmentation unit 201 is configured to segment the real-time speech information to obtain a real-time speech segment.

In the module, the real-time voice information is segmented, and the real-time voice information is continuously recorded, so that the whole real-time voice information is also continuous, and is segmented to facilitate processing.

The data numbering unit 202 is configured to number the real-time voice segments, where the recording durations corresponding to the real-time voice segments are the same, and the numbers are consecutive numbers.

In this module, the data numbering unit 202 numbers the real-time speech segments, and the sequence of the numbers also needs to be in accordance with the sequence of time, the first number entered first and the last number entered later, so as to facilitate the subsequent comparison processing.

And the data comparison unit 203 is used for sequentially comparing the real-time voice segments adjacent to the two ends according to the numbering sequence, and judging whether the real-time voice information fluctuates according to the comparison result.

In this module, the data comparing unit 203 reads the real-time speech segments segment by segment, and compares the real-time speech segments at two adjacent ends, thereby determining whether there is fluctuation between the real-time speech segments at two adjacent ends.

As shown in fig. 8, as a preferred embodiment of the present invention, the audio extraction module includes:

the audio positioning unit 301 is configured to position a real-time speech segment in which fluctuation occurs in the real-time speech information, and obtain a first speech segment.

In this module, the audio localization unit 301 localizes the real-time speech segment with fluctuation, and defines it as a first speech segment, and then the first speech segment is the part of the real-time speech information where the fluctuation begins to appear.

The audio reading unit 302 is configured to read a previous real-time speech segment adjacent to the real-time speech segment with the fluctuation to obtain a second speech segment.

The filtering unit 303 is configured to filter the first speech segment based on the second speech segment to obtain the fluctuating speech information.

In this module, read the real-time pronunciation section of the preceding section adjacent with the real-time pronunciation section that appears undulant, for the second pronunciation section, undulant has not appeared in the second pronunciation section yet, consequently, main content wherein is the environment sound, uses the second pronunciation section to filter first pronunciation section as the basis this moment, is about to get rid of the second pronunciation section from first pronunciation section to get rid of the environment sound.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method of speech recognition, the method comprising:

acquiring real-time voice information in the vehicle;

2. The speech recognition method according to claim 1, wherein the step of performing online analysis on the real-time speech information and determining whether the real-time speech information fluctuates specifically comprises:

segmenting the real-time voice information to obtain a real-time voice segment;

3. The speech recognition method according to claim 2, wherein the step of performing audio extraction on the real-time speech information to obtain fluctuating speech information specifically comprises:

4. The speech recognition method according to claim 1, wherein the step of comparing the fluctuating speech information with the interfering speech information stored in the interfering sound source database to obtain a comparison result specifically comprises:

5. The speech recognition method according to claim 4, wherein the step of performing speech recognition on the fluctuating speech information according to the comparison result and executing the corresponding instruction according to the speech recognition result specifically comprises:

6. The speech recognition method of claim 4, wherein the interfering speech information is recorded in real time and updated periodically.

7. The speech recognition method of claim 1, wherein the speech recognition process employs network recognition or local recognition.

8. A speech recognition system, the system comprising:

9. The speech recognition system of claim 8, wherein the speech analysis module comprises:

10. The speech recognition system of claim 8, wherein the audio extraction module comprises: