CN108922520B

CN108922520B - Voice recognition method, voice recognition device, storage medium and electronic equipment

Info

Publication number: CN108922520B
Application number: CN201810764393.1A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2021-06-01
Anticipated expiration: 2038-07-12
Also published as: CN108922520A

Abstract

The embodiment of the application provides a voice recognition method, a voice recognition device, a storage medium and electronic equipment, wherein the voice recognition method comprises the following steps: when voice information input by a user is received, acquiring the current geographic position of the electronic equipment; acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold; matching the voice information with a preset voice recognition model to obtain a voice recognition matching degree; and when the voice recognition matching degree is larger than the voice recognition matching degree threshold value, executing the operation corresponding to the instruction in the voice information. In the voice recognition method, the electronic equipment can dynamically adjust the threshold value of the voice recognition matching degree according to the use habits of users in different places, so that the times of recognition failure can be reduced, the time consumed by the electronic equipment in voice recognition can be saved, and the efficiency of the electronic equipment in voice recognition can be improved.

Description

Voice recognition method, voice recognition device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of speech recognition technologies, and in particular, to a speech recognition method, a speech recognition apparatus, a storage medium, and an electronic device.

Background

With the rapid development of electronic technology, electronic devices such as smart phones have increasingly rich functions. For example, a user may control an electronic device through voice to perform various functions of the electronic device.

When a user performs voice control on the electronic device, the electronic device first needs to recognize the voice of the user. When the user frequently uses the voice control function, the electronic device performs the same voice recognition every time, which may reduce the efficiency of the voice recognition.

Disclosure of Invention

The embodiment of the application provides a voice recognition method, a voice recognition device, a storage medium and an electronic device, which can improve the efficiency of the electronic device in voice recognition.

The embodiment of the application provides a voice recognition method, which comprises the following steps:

when voice information input by a user is received, acquiring the current geographic position of the electronic equipment;

acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold;

matching the voice information with a preset voice recognition model to obtain a voice recognition matching degree;

and when the voice recognition matching degree is larger than the voice recognition matching degree threshold value, executing the operation corresponding to the instruction in the voice information.

An embodiment of the present application further provides a speech recognition apparatus, including:

the first acquisition module is used for acquiring the current geographic position of the electronic equipment when receiving voice information input by a user;

the second acquisition module is used for acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold;

the matching module is used for matching the voice information with a preset voice recognition model so as to obtain a voice recognition matching degree;

and the execution module is used for executing the operation corresponding to the instruction in the voice information when the voice recognition matching degree is greater than the voice recognition matching degree threshold value.

An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer is caused to execute the above-mentioned speech recognition method.

The embodiment of the application also provides an electronic device, which comprises a processor and a memory, wherein a computer program is stored in the memory, and the processor is used for executing the voice recognition method by calling the computer program stored in the memory.

An embodiment of the present application further provides an electronic device, including a processor and a microphone electrically connected to the processor, wherein:

the microphone is used for receiving voice information input by a user;

the processor is configured to:

acquiring the current geographic position of the electronic equipment;

The voice recognition method provided by the embodiment of the application comprises the following steps: when voice information input by a user is received, acquiring the current geographic position of the electronic equipment; acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold; matching the voice information with a preset voice recognition model to obtain a voice recognition matching degree; and when the voice recognition matching degree is larger than the voice recognition matching degree threshold value, executing the operation corresponding to the instruction in the voice information. In the voice recognition method, the electronic equipment can dynamically adjust the threshold value of the voice recognition matching degree according to the use habits of users in different places, so that the times of recognition failure can be reduced, the time consumed by the electronic equipment in voice recognition can be saved, and the efficiency of the electronic equipment in voice recognition can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic diagram of voice control of an electronic device by a user.

Fig. 2 is a schematic flowchart of a first speech recognition method according to an embodiment of the present application.

Fig. 3 is a schematic flowchart of a second speech recognition method according to an embodiment of the present application.

Fig. 4 is a third flowchart illustrating a speech recognition method according to an embodiment of the present application.

Fig. 5 is a fourth flowchart illustrating a speech recognition method according to an embodiment of the present application.

Fig. 6 is a fifth flowchart of a speech recognition method according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present application.

Fig. 8 is another schematic structural diagram of a speech recognition apparatus according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 10 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.

The terms "first," "second," "third," and the like in the description and in the claims of the present application and in the above-described drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so described are interchangeable under appropriate circumstances. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, or apparatus, electronic device, system comprising a list of steps is not necessarily limited to those steps or modules or units explicitly listed, may include steps or modules or units not explicitly listed, and may include other steps or modules or units inherent to such process, method, apparatus, electronic device, or system.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a user performing voice control on an electronic device. Wherein, the user outputs a section of voice, and the electronic equipment collects the voice information of the user. The electronic device then compares the collected speech information to speech recognition models stored in the electronic device. When the voice information is matched with the voice recognition model, the electronic equipment recognizes the control instruction from the voice information. And then, the electronic equipment executes the operation corresponding to the control instruction, such as screen lightening, application opening, application quitting, screen locking and the like, so that the voice control of the user on the electronic equipment is realized.

The embodiment of the application provides a voice recognition method which can be applied to electronic equipment. The electronic device may be a smart phone, a tablet computer, a game device, an AR (Augmented Reality) device, a data storage device, an audio playing device, a video playing device, a notebook computer, a desktop computing device, or the like.

As shown in fig. 2, the speech recognition method may include the following steps:

and 110, acquiring the current geographical position of the electronic equipment when receiving the voice information input by the user.

A positioning system is arranged in the electronic equipment. For example, the electronic device may include a positioning System such as a GPS (global positioning System), a BDS (BeiDou Navigation Satellite System), and the like.

After the electronic equipment starts the voice recognition function, the electronic equipment can continuously collect the voice information of the user. For example, a microphone may be provided in the electronic device, and the electronic device may collect voice information input by the user through the microphone. The voice information of the user is a sentence input to the electronic equipment by the user. The voice information is used for carrying out voice control on the electronic equipment. One or more instructions, such as "lock screen", "volume up", etc., may be included in the voice message.

When the electronic device receives voice information input by a user, the electronic device can acquire the current geographic position of the electronic device through the positioning system. The geographic position acquired by the electronic device may include coordinate information of the current geographic position or area information of the current geographic position, and the like. The coordinate information of the geographic location may include, for example, longitude, latitude, and the like of the current geographic location. The regional information of the geographic location may include, for example, information of a street, a cell, a supermarket, a subway station, etc. where the current geographic location is located.

And 120, acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold.

The electronic device may preset a corresponding relationship between the historical geographic location and the threshold of the matching degree of the voice recognition. The historical geographic location includes a geographic location at which the electronic device has been voice recognized. In the correspondence, the speech recognition matching degree thresholds corresponding to different historical geographic positions may be different. Thus, when the electronic device is in different geographic locations, the corresponding speech recognition match thresholds may also be different.

The voice recognition matching degree threshold value represents a boundary between successful matching or failed matching of the voice information and the voice recognition model. When the matching degree between the voice information and a preset voice recognition model in the electronic equipment is larger than a threshold of the matching degree of the voice recognition, the successful matching of the voice information and the voice recognition model is represented. When the matching degree between the voice information and a preset voice recognition model in the electronic equipment is smaller than or equal to a voice recognition matching degree threshold value, the failure of the matching between the voice information and the voice recognition model is represented.

After the electronic equipment acquires the current geographic position, the voice recognition matching degree threshold value can be acquired according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold value.

And 130, matching the voice information with a preset voice recognition model to obtain a voice recognition matching degree.

The electronic device may match the received voice information with a voice recognition model preset in the electronic device to obtain a matching degree between the voice information and the voice recognition model. Wherein the matching degree represents the similarity degree or the coincidence degree between the voice information and the voice recognition model.

The preset voice recognition model can be a voice recognition model which is generated by the electronic equipment according to training voice information collected by the electronic equipment when a user starts a voice recognition function of the electronic equipment for the first time.

And 140, when the voice recognition matching degree is greater than the voice recognition matching degree threshold value, executing the operation corresponding to the instruction in the voice information.

After the electronic device obtains the matching degree between the voice information and the preset voice recognition model, the matching degree can be compared with the voice recognition matching degree threshold value so as to judge the size relationship between the matching degree and the voice recognition matching degree threshold value.

And when the matching degree is greater than the threshold value of the voice recognition matching degree, the voice information is successfully matched with the preset voice recognition model. Subsequently, the electronic device may further analyze the voice information to obtain a control instruction included in the voice information, and perform an operation corresponding to the instruction, for example, control the electronic device to lock a screen, control the electronic device to increase a volume, and the like.

In the embodiment of the application, when the electronic device is located at different geographic positions, the obtained threshold values of the matching degrees of the voice recognition may be different. Therefore, when the electronic equipment performs voice recognition at different geographic positions, the voice recognition matching degree threshold value can be dynamically adjusted according to the use habits of users at different places, the times of recognition failure can be reduced, the time consumed by the electronic equipment during voice recognition is saved, and the efficiency of the electronic equipment during voice recognition can be improved.

In some embodiments, as shown in fig. 3, step 110, before acquiring the current geographic location of the electronic device when receiving the voice information input by the user, further includes the following steps:

151, obtaining a plurality of voice recognition history data of an electronic device, wherein each voice recognition history data comprises a historical geographic location of the electronic device when the electronic device performs voice recognition;

152, performing cluster analysis on the plurality of voice recognition historical data to obtain the number of times of voice recognition performed by the electronic device at each historical geographic location;

153, generating a corresponding relation between the historical geographical position and the threshold of the matching degree of the voice recognition according to the times of the voice recognition of the electronic equipment in each historical geographical position.

The electronic device may record the geographic location of the electronic device each time speech recognition is performed, thereby forming speech recognition history data, and store the speech recognition history data in the electronic device. And the geographic position in the voice recognition historical data is the historical geographic position.

The electronic device may obtain a plurality of speech recognition history data. Wherein each voice recognition history data comprises the historical geographic position of the electronic equipment when the electronic equipment performs voice recognition. For example, the speech recognition history data acquired by the electronic device may include 100 pieces of history data.

Subsequently, the electronic device may perform cluster analysis on a plurality of the speech recognition history data to obtain the number of times that the electronic device performs speech recognition at each of the historical geographic locations. For example, the results from the cluster analysis may include: the number of speech recognition times corresponding to the historical geographic location a is 20, the number of speech recognition times corresponding to the historical geographic location B is 30, the number of speech recognition times corresponding to the historical geographic location C is 10, and the number of speech recognition times corresponding to the historical geographic location D is 40.

And then, the electronic equipment generates a corresponding relation between the historical geographic position and a threshold of the matching degree of the voice recognition according to the times of the voice recognition of the electronic equipment at each historical geographic position. For example, the more times the electronic device performs speech recognition at a historical geographic location, the lower the corresponding speech recognition match threshold.

For example, the correspondence between the historical geographic location generated by the electronic device and the threshold speech recognition match may be as shown in table 1:

TABLE 1

Historical geographic location	Number of speech recognitions	Threshold of speech recognition matching degree
			Historical geographic location D	40	80％
……	……	……
			Historical geographic location B	30	85％
……	……	……
			Historical geographic location A	20	90％
……	……	……
			Historical geographic location C	10	95％
……	……	……

In some embodiments, as shown in fig. 4, step 153, generating a correspondence between the historical geographic location and a threshold of a matching degree of speech recognition according to the number of times of speech recognition performed by the electronic device at each of the historical geographic locations, includes the following steps:

1531, setting a plurality of preset times intervals;

1532, determining a plurality of historical geographic positions corresponding to the times included in each preset time interval according to the times of voice recognition of the electronic device in each historical geographic position;

1533, a threshold of voice recognition matching degree is set for a plurality of historical geographic positions corresponding to the times included in each preset time interval.

The electronic device may have performed speech recognition at a large number of historical geographic locations, and the number of times speech recognition is performed at each historical geographic location may be different, thereby complicating the correspondence between the historical geographic locations generated by the electronic device and the threshold of speech recognition matching.

Therefore, a plurality of preset number intervals can be set in the electronic equipment according to the number of times that the electronic equipment performs voice recognition at each historical geographic position. For example, the electronic device may be provided with a plurality of preset number intervals such as (0, 15), (15, 25), (25, 35), (35, 45), and the like.

The electronic device may determine, according to the number of times of performing voice recognition at each historical geographic location, a plurality of historical geographic locations corresponding to the number of times included in each preset number of times interval. For example, the preset number of times interval (15, 25) includes 16, 18, 20, 22, etc. times, and the historical geographic locations corresponding to the times 16, 18, 20, 22 are a1, a2, A, A3, respectively, then the plurality of historical geographic locations corresponding to the preset number of times interval (15, 25) include historical geographic locations a1, a2, A, A3, etc.

The electronic device may set a threshold of speech recognition matching degree for a plurality of historical geographic locations corresponding to the times included in each preset time interval. Therefore, a plurality of historical geographic positions with similar times can correspond to the same voice recognition matching degree threshold value, so that the corresponding relation between the historical geographic positions and the voice recognition matching degree threshold value is simplified.

For example, the correspondence between the historical geographic location and the threshold speech recognition matching degree may be a correspondence as shown in table 2:

TABLE 2

In some embodiments, as shown in fig. 5, the step 152 of performing cluster analysis on a plurality of the speech recognition history data to obtain the number of times of speech recognition performed by the electronic device at each of the historical geographic locations includes the following steps:

1521, performing cluster analysis on the plurality of voice recognition historical data to obtain the number of times of voice recognition performed at each historical time when the electronic device is at each historical geographic location;

step 153, generating a corresponding relationship between the historical geographic location and a threshold of a matching degree of voice recognition according to the number of times of voice recognition performed by the electronic device at each historical geographic location, including the following steps:

1534 generating a corresponding relationship between the historical geographic position, the historical time and the threshold of the matching degree of the voice recognition according to the number of times of the voice recognition of the electronic equipment at each historical time when the electronic equipment is at each historical geographic position.

Each piece of voice recognition historical data acquired by the electronic equipment further comprises historical time when the electronic equipment performs voice recognition. The historical time at which the electronic device performs speech recognition may be represented by the time of day.

The electronic device may perform cluster analysis on a plurality of the speech recognition history data to obtain the number of times of speech recognition performed at each history time when the electronic device is located at each history geographical location. For example, the results from the cluster analysis may include: the number of speech recognitions corresponding to the historical geographic position a and the time T1 is 5, the number of speech recognitions corresponding to the historical geographic position a and the time T2 is 8, the number of speech recognitions corresponding to the historical geographic position B and the time T1 is 3, and the number of speech recognitions corresponding to the historical geographic position B and the time T2 is 10, and so on.

The electronic equipment can generate the historical geographic positions and the corresponding relation between the historical time and the threshold value of the matching degree of the voice recognition according to the number of times of voice recognition at each historical time of each historical geographic position.

For example, the correspondence between the historical geographic location, the historical time, and the threshold speech recognition matching degree generated by the electronic device may be the correspondence shown in table 3

TABLE 3

Therefore, the voice recognition matching degree threshold value can correspond to the historical geographic position and the historical moment at the same time, namely the voice recognition matching degree threshold value can be adjusted according to the geographic position and the historical moment at the same time, so that the electronic equipment can more specifically and dynamically adjust the voice recognition matching degree threshold value during voice recognition, the accuracy of the voice recognition can be guaranteed, and the efficiency of the voice recognition can be improved.

In some embodiments, as shown in fig. 6, step 110, after acquiring the current geographic location of the electronic device when receiving the voice information input by the user, further includes the following steps:

161, acquiring the current time;

step 120, obtaining a threshold value of the matching degree of the voice recognition according to the geographic position and the corresponding relation between the historical geographic position and the threshold value of the matching degree of the voice recognition, and the method comprises the following steps:

and 121, acquiring a voice recognition matching degree threshold according to the geographic position, the time and the corresponding relation among the historical geographic position, the historical time and the voice recognition matching degree threshold.

After the current geographic position is obtained, the electronic device can further obtain the current time. The current time may be represented by a current time, for example, the current time is 16: 00.

And then, the electronic equipment acquires a voice recognition matching degree threshold according to the geographic position, the time and the corresponding relation between the historical geographic position, the historical time and the voice recognition matching degree threshold.

For example, if the current geographic location is the geographic location B and the current time is 16:00, the electronic device may obtain that the threshold of the speech recognition matching degree is 88% according to the correspondence.

In particular implementation, the present application is not limited by the execution sequence of the described steps, and some steps may be performed in other sequences or simultaneously without conflict.

As can be seen from the above, the speech recognition method provided in the embodiment of the present application includes: when voice information input by a user is received, acquiring the current geographic position of the electronic equipment; acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold; matching the voice information with a preset voice recognition model to obtain a voice recognition matching degree; and when the voice recognition matching degree is larger than the voice recognition matching degree threshold value, executing the operation corresponding to the instruction in the voice information. In the voice recognition method, the electronic equipment can dynamically adjust the threshold value of the voice recognition matching degree according to the use habits of users in different places, so that the times of recognition failure can be reduced, the time consumed by the electronic equipment in voice recognition can be saved, and the efficiency of the electronic equipment in voice recognition can be improved.

The embodiment of the application also provides a voice recognition device, and the voice recognition device can be integrated in electronic equipment. The electronic device may be a smart phone, a tablet computer, a game device, an AR (Augmented Reality) device, a data storage device, an audio playing device, a video playing device, a notebook computer, a desktop computing device, or the like.

As shown in fig. 7, the speech recognition apparatus 200 may include: a first obtaining module 201, a second obtaining module 202, a matching module 203, and an executing module 204.

The first obtaining module 201 is configured to obtain a current geographic location of the electronic device when the voice information input by the user is received.

When the electronic device receives voice information input by a user, the first obtaining module 201 may obtain a current geographic location of the electronic device through the positioning system. The geographic location acquired by the first acquiring module 201 may include coordinate information of the current geographic location or area information of the current geographic location. The coordinate information of the geographic location may include, for example, longitude, latitude, and the like of the current geographic location. The regional information of the geographic location may include, for example, information of a street, a cell, a supermarket, a subway station, etc. where the current geographic location is located.

A second obtaining module 202, configured to obtain a threshold of matching degree of speech recognition according to the geographic location and a correspondence between the historical geographic location and the threshold of matching degree of speech recognition.

After the first obtaining module 201 obtains the current geographic location, the second obtaining module 202 may obtain the threshold of the matching degree of the voice recognition according to the geographic location and the corresponding relationship between the historical geographic location and the threshold of the matching degree of the voice recognition.

And the matching module 203 is configured to match the voice information with a preset voice recognition model to obtain a voice recognition matching degree.

The matching module 203 may match the received voice information with a voice recognition model preset in the electronic device, so as to obtain a matching degree between the voice information and the voice recognition model. Wherein the matching degree represents the similarity degree or the coincidence degree between the voice information and the voice recognition model.

The executing module 204 is configured to execute an operation corresponding to an instruction in the voice information when the voice recognition matching degree is greater than the voice recognition matching degree threshold.

After the matching module 203 obtains the matching degree between the voice information and the preset voice recognition model, the executing module 204 may compare the matching degree with the voice recognition matching degree threshold to determine the size relationship between the matching degree and the voice recognition matching degree threshold.

And when the matching degree is greater than the threshold value of the voice recognition matching degree, the voice information is successfully matched with the preset voice recognition model. Subsequently, the execution module 204 may further analyze the voice message to obtain a control instruction included in the voice message, and perform an operation corresponding to the instruction, for example, control the electronic device to lock a screen, control the electronic device to increase a volume, and the like.

In some embodiments, as shown in fig. 8, the speech recognition apparatus 200 further comprises a generation module 205. The generating module 205 is configured to perform the following steps:

acquiring a plurality of voice recognition historical data of electronic equipment, wherein each voice recognition historical data comprises a historical geographic position of the electronic equipment when the electronic equipment performs voice recognition;

performing cluster analysis on the plurality of voice recognition historical data to obtain the times of voice recognition of the electronic equipment at each historical geographic position;

and generating a corresponding relation between the historical geographic position and a voice recognition matching degree threshold according to the times of voice recognition of the electronic equipment at each historical geographic position.

The generation module 205 may obtain a plurality of speech recognition history data. Wherein each voice recognition history data comprises the historical geographic position of the electronic equipment when the electronic equipment performs voice recognition. For example, the acquired speech recognition history data may include 100 pieces of history data.

Subsequently, the generating module 205 may perform cluster analysis on a plurality of the speech recognition history data to obtain the number of times of speech recognition performed by the electronic device at each of the historical geographic locations. For example, the results from the cluster analysis may include: the number of speech recognition times corresponding to the historical geographic location a is 20, the number of speech recognition times corresponding to the historical geographic location B is 30, the number of speech recognition times corresponding to the historical geographic location C is 10, and the number of speech recognition times corresponding to the historical geographic location D is 40.

Then, the generating module 205 generates a corresponding relationship between the historical geographic location and a threshold of a matching degree of speech recognition according to the number of times of speech recognition performed by the electronic device at each historical geographic location. For example, the more times the electronic device performs speech recognition at a historical geographic location, the lower the corresponding speech recognition match threshold.

For example, the correspondence between the historical geographic location generated by the generation module 205 and the threshold speech recognition matching degree may be the correspondence shown in table 4:

TABLE 4

In some embodiments, when generating the corresponding relationship between the historical geographic location and the threshold of the matching degree of speech recognition according to the number of times of speech recognition performed by the electronic device at each historical geographic location, the generating module 205 is configured to perform the following steps:

setting a plurality of preset times intervals;

determining a plurality of historical geographic positions corresponding to the times included in each preset time interval according to the times of voice recognition of the electronic equipment in each historical geographic position;

and setting a voice recognition matching degree threshold value for a plurality of historical geographic positions corresponding to the times included in each preset time interval.

Accordingly, the generation module 205 may set a plurality of preset time intervals in the electronic device for the number of times the electronic device performs speech recognition at each historical geographic location. For example, the electronic device may be provided with a plurality of preset number intervals such as (0, 15), (15, 25), (25, 35), (35, 45), and the like.

The generating module 205 may determine, according to the number of times of performing voice recognition at each historical geographic location, a plurality of historical geographic locations corresponding to the number of times included in each preset number of times interval. For example, the preset number of times interval (15, 25) includes 16, 18, 20, 22, etc. times, and the historical geographic locations corresponding to the times 16, 18, 20, 22 are a1, a2, A, A3, respectively, then the plurality of historical geographic locations corresponding to the preset number of times interval (15, 25) include historical geographic locations a1, a2, A, A3, etc.

The generating module 205 may set a threshold of the matching degree of speech recognition for a plurality of historical geographic locations corresponding to the times included in each of the preset time intervals. Therefore, a plurality of historical geographic positions with similar times can correspond to the same voice recognition matching degree threshold value, so that the corresponding relation between the historical geographic positions and the voice recognition matching degree threshold value is simplified.

For example, the correspondence between the historical geographic location and the threshold speech recognition matching degree may be a correspondence as shown in table 5:

TABLE 5

In some embodiments, when performing cluster analysis on a plurality of the speech recognition history data to obtain the number of times the electronic device performs speech recognition at each of the historical geographic locations, the generating module 205 is configured to perform the following steps:

performing cluster analysis on the plurality of voice recognition historical data to obtain the number of times of voice recognition of each historical moment when the electronic equipment is located at each historical geographic position;

when generating the corresponding relationship between the historical geographic location and the threshold of the matching degree of the voice recognition according to the number of times that the electronic device performs the voice recognition at each historical geographic location, the generating module 205 is configured to execute the following steps:

and generating a corresponding relation between the historical geographic position, the historical time and a voice recognition matching degree threshold according to the number of times of voice recognition of the electronic equipment at each historical time when the electronic equipment is at each historical geographic position.

Each piece of speech recognition history data acquired by the generating module 205 further includes a history time when the electronic device performs speech recognition. The historical time at which the electronic device performs speech recognition may be represented by the time of day.

The generating module 205 may perform cluster analysis on a plurality of the speech recognition history data to obtain the number of times of speech recognition performed at each historical time when the electronic device is located at each historical geographic location. For example, the results from the cluster analysis may include: the number of speech recognitions corresponding to the historical geographic position a and the time T1 is 5, the number of speech recognitions corresponding to the historical geographic position a and the time T2 is 8, the number of speech recognitions corresponding to the historical geographic position B and the time T1 is 3, and the number of speech recognitions corresponding to the historical geographic position B and the time T2 is 10, and so on.

The generating module 205 may generate a corresponding relationship between the historical geographic location and the historical time and a threshold of the matching degree of the voice recognition according to the number of times of performing the voice recognition at each historical time of each historical geographic location.

For example, the correspondence between the historical geographic location, the historical time, and the threshold speech recognition matching degree generated by the generation module 205 may be the correspondence shown in table 6

TABLE 6

In some embodiments, the first obtaining module 201 is further configured to perform the following steps:

acquiring the current time;

when the threshold of the matching degree of the voice recognition is obtained according to the geographic location and the corresponding relationship between the historical geographic location and the threshold of the matching degree of the voice recognition, the second obtaining module 202 is configured to perform the following steps:

and acquiring a voice recognition matching degree threshold according to the geographic position, the time and the corresponding relation among the historical geographic position, the historical time and the voice recognition matching degree threshold.

The first obtaining module 201 may further obtain the current time after obtaining the current geographic location. The current time may be represented by a current time, for example, the current time is 16: 00.

Subsequently, the second obtaining module 202 obtains the threshold of the matching degree of the voice recognition according to the geographic location, the time, and the corresponding relationship between the historical geographic location, the historical time, and the threshold of the matching degree of the voice recognition.

For example, if the current geographic location is the geographic location B and the current time is 16:00, the second obtaining module 202 may obtain that the threshold of the matching degree of the voice recognition is 88% according to the corresponding relationship.

In specific implementation, the modules may be implemented as independent entities, or may be combined arbitrarily and implemented as one or several entities.

As can be seen from the above, in the speech recognition apparatus 200 provided in the embodiment of the present application, when receiving the speech information input by the user, the first obtaining module 201 obtains the current geographic location of the electronic device; the second obtaining module 202 obtains a threshold of the matching degree of voice recognition according to the geographic position and the corresponding relationship between the historical geographic position and the threshold of the matching degree of voice recognition; the matching module 203 matches the voice information with a preset voice recognition model to obtain a voice recognition matching degree; the execution module 204 executes an operation corresponding to the instruction in the voice message when the voice recognition matching degree is greater than the voice recognition matching degree threshold. The voice recognition device can dynamically adjust the threshold value of the voice recognition matching degree according to the use habits of users using the electronic equipment in different places, can reduce the times of recognition failure, saves the time consumed by the electronic equipment during voice recognition, and can improve the efficiency of the electronic equipment during voice recognition.

The embodiment of the application also provides the electronic equipment. The electronic device can be a smart phone, a tablet computer and the like. As shown in fig. 9, the electronic device 300 includes a processor 301 and a memory 302. The processor 301 is electrically connected to the memory 302.

The processor 301 is a control center of the electronic device 300, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the electronic device.

In this embodiment, the processor 301 in the electronic device 300 loads instructions corresponding to one or more processes of the computer program into the memory 302 according to the following steps, and the processor 301 runs the computer program stored in the memory 302, so as to implement various functions:

In some embodiments, when receiving the voice information input by the user, before acquiring the current geographic location of the electronic device, the processor 301 further performs the following steps:

In some embodiments, when generating the corresponding relationship between the historical geographic location and the threshold of the matching degree of speech recognition according to the number of times of speech recognition performed by the electronic device at each historical geographic location, the processor 301 performs the following steps:

setting a plurality of preset times intervals;

In some embodiments, each of the speech recognition history data further includes a history time at which the electronic device performs speech recognition;

when performing cluster analysis on a plurality of the speech recognition history data to obtain the number of times of speech recognition performed by the electronic device at each of the historical geographic locations, the processor 301 performs the following steps:

when generating the corresponding relationship between the historical geographic location and the threshold of the matching degree of speech recognition according to the number of times of speech recognition performed by the electronic device at each historical geographic location, the processor 301 executes the following steps:

In some embodiments, after acquiring the current geographic location of the electronic device when the voice information input by the user is received, the processor 301 further performs the following steps:

acquiring the current time;

when obtaining the threshold of the matching degree of voice recognition according to the geographic location and the corresponding relationship between the historical geographic location and the threshold of the matching degree of voice recognition, the processor 301 executes the following steps:

Memory 302 may be used to store computer programs and data. The memory 302 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.

In some embodiments, as shown in fig. 10, the electronic device 300 further comprises: radio frequency circuit 303, display screen 304, control circuit 305, input unit 306, audio circuit 307, sensor 308, and power supply 309. The processor 301 is electrically connected to the rf circuit 303, the display 304, the control circuit 305, the input unit 306, the audio circuit 307, the sensor 308, and the power source 309, respectively.

The radio frequency circuit 303 is used for transceiving radio frequency signals to communicate with a network device or other electronic devices through wireless communication.

The display screen 304 may be used to display information entered by or provided to the user as well as various graphical user interfaces of the electronic device, which may be comprised of images, text, icons, video, and any combination thereof.

The control circuit 305 is electrically connected to the display screen 304, and is used for controlling the display screen 304 to display information.

The input unit 306 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit 306 may include a fingerprint recognition module.

Audio circuitry 307 may provide an audio interface between the user and the electronic device through a speaker, microphone. Where audio circuitry 307 includes a microphone. The microphone is electrically connected to the processor 301. The microphone is used for receiving voice information input by a user.

The sensor 308 is used to collect external environmental information. The sensor 308 may include one or more of an ambient light sensor, an acceleration sensor, a gyroscope, and the like.

The power supply 309 is used to power the various components of the electronic device 300. In some embodiments, the power source 309 may be logically coupled to the processor 301 through a power management system, such that functions to manage charging, discharging, and power consumption management are performed through the power management system.

Although not shown in fig. 10, the electronic device 300 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.

As can be seen from the above, an embodiment of the present application provides an electronic device, where the electronic device performs the following steps: when voice information input by a user is received, acquiring the current geographic position of the electronic equipment; acquiring a voice recognition matching degree threshold according to the geographic position and the corresponding relation between the historical geographic position and the voice recognition matching degree threshold; matching the voice information with a preset voice recognition model to obtain a voice recognition matching degree; and when the voice recognition matching degree is larger than the voice recognition matching degree threshold value, executing the operation corresponding to the instruction in the voice information. The electronic equipment can dynamically adjust the threshold value of the voice recognition matching degree according to the use habits of users in different places, can reduce the times of recognition failure, saves the time consumed by the electronic equipment when performing voice recognition, and thus can improve the efficiency of the electronic equipment when performing voice recognition.

An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the speech recognition method according to any of the above embodiments.

It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The speech recognition method, the speech recognition device, the storage medium and the electronic device provided by the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A speech recognition method, comprising:

acquiring a plurality of voice recognition historical data of electronic equipment, wherein each voice recognition historical data comprises a historical geographic position of the electronic equipment when the electronic equipment performs voice recognition and a historical moment of the electronic equipment when the electronic equipment performs voice recognition; performing cluster analysis on the plurality of voice recognition historical data to obtain the number of times of voice recognition of each historical moment when the electronic equipment is located at each historical geographic position; generating a corresponding relation between the historical geographic position, the historical time and a voice recognition matching degree threshold according to the number of times of voice recognition of the electronic equipment at each historical time when the electronic equipment is at each historical geographic position;

when voice information input by a user is received, acquiring the current geographical position and the current time of the electronic equipment;

the electronic equipment acquires the geographic position through a positioning system, wherein the geographic position comprises coordinate information of the geographic position or area information of the geographic position;

acquiring a voice recognition matching degree threshold according to the geographic position, the time and the corresponding relation among the historical geographic position, the historical time and the voice recognition matching degree threshold;

when the voice recognition matching degree is greater than the voice recognition matching degree threshold value, it is indicated that the voice information is successfully matched with the preset voice recognition model, and then, the electronic device may further analyze the voice information to obtain a control instruction contained in the voice information and execute an operation corresponding to the instruction in the voice information.

2. A speech recognition apparatus, comprising:

a generation module: the method comprises the steps of obtaining a plurality of voice recognition historical data of the electronic equipment, wherein each voice recognition historical data comprises a historical geographic position of the electronic equipment when the electronic equipment performs voice recognition and a historical moment of the electronic equipment when the electronic equipment performs voice recognition; performing cluster analysis on the plurality of voice recognition historical data to obtain the number of times of voice recognition of each historical moment when the electronic equipment is located at each historical geographic position; generating a corresponding relation between the historical geographic position, the historical time and a voice recognition matching degree threshold according to the number of times of voice recognition of the electronic equipment at each historical time when the electronic equipment is at each historical geographic position;

the electronic equipment comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is used for acquiring the current geographical position and the current time of the electronic equipment when voice information input by a user is received, the electronic equipment acquires the geographical position through a positioning system, and the geographical position comprises coordinate information of the geographical position or area information of the geographical position;

the second acquisition module is used for acquiring a voice recognition matching degree threshold according to the geographic position, the time and the corresponding relation among the historical geographic position, the historical time and the voice recognition matching degree threshold;

and the execution module is used for indicating that the voice information is successfully matched with the preset voice recognition model when the voice recognition matching degree is greater than the voice recognition matching degree threshold, and then the electronic equipment can further analyze the voice information to acquire a control instruction contained in the voice information and execute an operation corresponding to the instruction in the voice information.

3. A storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the speech recognition method of claim 1.

4. An electronic device, characterized in that the electronic device comprises a processor and a memory, in which a computer program is stored, the processor being adapted to perform the speech recognition method of claim 1 by calling the computer program stored in the memory.

5. An electronic device, comprising a processor and a microphone electrically connected to the processor, wherein:

the microphone is used for receiving voice information input by a user;

the processor is configured to:

acquiring a plurality of voice recognition historical data of electronic equipment, wherein each voice recognition historical data comprises a historical geographic position of the electronic equipment when the electronic equipment performs voice recognition and a historical moment of the electronic equipment when the electronic equipment performs voice recognition; performing cluster analysis on the plurality of voice recognition historical data to obtain the number of times of voice recognition of each historical moment when the electronic equipment is located at each historical geographic position; generating a corresponding relation between the historical geographic position, the historical time and a voice recognition matching degree threshold according to the number of times of voice recognition of the electronic equipment at each historical time when the electronic equipment is at each historical geographic position; when voice information input by a user is received, acquiring the current geographical position and the current time of the electronic equipment;

acquiring a current geographical position of electronic equipment, wherein the geographical position is acquired by the electronic equipment through a positioning system and comprises coordinate information of the geographical position or area information of the geographical position;