WO2018219105A1

WO2018219105A1 - Speech recognition and related products

Info

Publication number: WO2018219105A1
Application number: PCT/CN2018/086205
Authority: WO
Inventors: 白剑
Original assignee: Oppo广东移动通信有限公司
Priority date: 2017-05-31
Filing date: 2018-05-09
Publication date: 2018-12-06
Also published as: CN107274885A; CN107274885B

Abstract

Disclosed in the embodiments of the invention are a voice recognition method and a related product. The method comprises the following steps: acquiring a geographical location of a mobile terminal, and determining a dialect type corresponding to the geographical location (101); acquiring a recognition algorithm corresponding to the dialect type as a target algorithm (102); and after voice data is collected, using the target algorithm to recognize the voice data to get a recognition result (103). The type of dialect used in an area to which a mobile terminal belongs is determined according to the geographical location of the mobile terminal, so that a corresponding algorithm can be used to improve the accuracy of voice recognition. Therefore, the recognition accuracy of nonstandard voice is improved.

Description

Speech recognition method and related products

The present invention claims priority to the priority of the application Serial No. 201710401786.1, the entire disclosure of which is hereby incorporated by reference.

Technical field

The present invention relates to the field of computer technology, and in particular to a voice recognition method and related products.

Background technique

Communicate with the machine and let the machine understand what you are saying. This is what people have long dreamed of. The image of the China Internet of Things School and Enterprise Alliance has to compare speech recognition as a machine's auditory system. Speech recognition technology is a technique that allows a machine to transform a speech signal into a corresponding text or command through an identification and understanding process.

Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology. The voice recognition technology car network has also been fully quoted, for example: just dictate to set the destination direct navigation, safe and convenient.

Speech recognition is an interdisciplinary subject. In the past two decades, speech recognition technology has made significant progress and has begun to move from the laboratory to the market. It is expected that in the next 10 years, speech recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, and consumer electronics. The areas covered by speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and so on.

How to improve the accuracy and recognition speed of speech recognition is the direction of the technicians in the field; at present, because people speak with accents and even dialects that are very different, it poses great difficulties for speech recognition, so it needs to be proposed. solution.

Summary of the invention

The embodiment of the invention provides a speech recognition method and related products for improving the accuracy of recognition of non-standard speech.

In a first aspect, an embodiment of the present invention provides a voice recognition method, including:

Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;

Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;

After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.

In a possible implementation, the acquiring the geographic location of the mobile terminal includes:

After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.

In a possible implementation manner, before the determining the dialect type corresponding to the geographic location, the method further includes:

A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.

In a possible implementation, before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:

A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.

In a possible implementation manner, acquiring the recognition algorithm corresponding to the dialect type as the target algorithm includes:

When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.

In a possible implementation manner, the using the target algorithm to perform voice recognition on the voice data to obtain a recognition result includes:

The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.

In a possible implementation, after the voice data is voice-recognized to obtain the recognition result by using the target algorithm, the method further includes:

Sort the recognition results according to the exact probability from large to small;

Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;

Receiving a selection instruction;

After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.

In a possible implementation, the method further includes: recording the recognition result to the recognition result set, determining the recognition result with the highest accuracy in the recognition result set, and using the recognition algorithm corresponding to the highest accuracy one type of recognition result as a follow-up A recognition algorithm for speech recognition.

In this embodiment, the speech recognition algorithm can be dynamically adjusted, on the one hand, dynamically adjusted according to the geographic location, and more importantly, based on the recognition result after multiple times of dynamically adjusting the recognition algorithm, a more optimized recognition algorithm can be determined as the final The recognition algorithm, for private devices, will have higher accuracy and recognition speed will be high. Subsequent to the foregoing, “acquiring the geographical location of the mobile terminal, determining the dialect type corresponding to the geographical location, and acquiring the recognition algorithm corresponding to the dialect type as the target algorithm” may be performed.

The second embodiment of the present invention further provides a mobile terminal, including a processing unit and an input and output unit.

The input/output unit is configured to receive input data and output data;

The processing unit is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographic location, acquire a recognition algorithm corresponding to the dialect type as a target algorithm, and use the target after collecting the voice data. The algorithm performs speech recognition on the speech data to obtain a recognition result.

In a possible implementation, the processing unit is further configured to: after the mobile terminal is started, collect location information of the mobile terminal to obtain a history record set; analyze the history record set to obtain the The geographical area to which the mobile terminal belongs is the geographical location.

The third embodiment of the present invention further provides a mobile terminal, including one or more processors, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory. And configured to be executed by the one or more processors, the program comprising instructions for performing the steps of any of the methods provided by embodiments of the present invention.

The present invention further provides a computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes the computer to perform the method of any one of claims 1-6 The computer includes a mobile terminal.

It can be seen that, in the embodiment of the present invention, the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the non-standard voice. The accuracy of the recognition.

DRAWINGS

The drawings referred to in the embodiments of the present invention will be briefly described below.

1 is a schematic flow chart of a method provided by an embodiment of the present invention;

2 is a schematic diagram of an interface according to an embodiment of the present invention;

3 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention;

4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The terms "first", "second" and the like in the specification and claims of the present invention and the above drawings are used to distinguish different objects, and are not intended to describe a specific order. Furthermore, the terms "comprises" and "comprising" and "comprising" are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units not listed, or alternatively Other steps or units inherent to these processes, methods, products, or equipment.

References to "an embodiment" herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention. The appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.

The mobile terminal involved in the embodiments of the present invention may include various mobile handheld devices, in-vehicle devices, wearable devices, computing devices, or other processing devices connected to the wireless modem, and various forms of user equipment (User Equipment, UE), mobile station (MS), terminal device, and the like. For convenience of description, the devices mentioned above are collectively referred to as mobile terminals.

The accuracy of speech recognition has always been a big problem in speech recognition. At present, various algorithms are used to improve the accuracy of speech recognition. However, for mobile terminals, users vary widely, and language types are easy to distinguish, but local dialects cause great trouble.

In the embodiment of the present invention, the non-standard voice is relative to the standard voice, and the standard voice may be: Mandarin pronunciation of Chinese, or some dialect pronunciations that are included in the standard. This will not be repeated hereafter.

The embodiments of the present invention are described below with reference to the accompanying drawings.

Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a voice recognition method according to an embodiment of the present invention, which is applied to a mobile terminal. As shown in the figure, the camera control method includes:

101. Obtain a geographic location of the mobile terminal, and determine a dialect type corresponding to the geographical location.

In this embodiment, the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location.

The dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China, namely:

1. Northern dialect (abbreviation: Northern language);

2. Cantonese (abbreviation: Cantonese);

3. Jiangsu and Zhejiang dialects (abbreviation: Wu language);

4. Fujian dialect (abbreviation: proverb);

5. Hunan dialect (abbreviation: Xiang language);

6. Jiangxi dialect (abbreviation: proverb);

7, Hakka (abbreviation: Hakka).

There are many other dialect types in addition to this, which are not listed here.

Step 102: Obtain an identification algorithm corresponding to the dialect type as the target algorithm.

In the research and development of speech recognition, researchers designed and produced speech databases in various languages such as Chinese (including different dialects) and English according to the pronunciation characteristics of different languages. For example, MIT Media lab Speech Dataset (MIT Institute of Media Lab Voice Dataset), Pitch and Voicing Estimates for Aurora 2 (Aurora2 Speech Library Gene Cycle and Tone Estimation), Congressional Speech Data, and Mandarin Speech Frame Data ), voice data used to test the blind source separation algorithm, and the like.

Therefore, different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.

103: After collecting the voice data, performing voice recognition on the voice data by using the target algorithm to obtain a recognition result.

The voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user. After the algorithm of the speech recognition, that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.

In this embodiment, the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.

In an optional implementation manner, the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client. As follows: The geographical location of obtaining the mobile terminal mentioned above includes:

After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing The geographical area to which the mobile terminal belongs is the above geographical location.

In this embodiment, the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.

The manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.

In an optional implementation manner, the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:

A database for establishing a correspondence between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.

In this embodiment, by establishing a dialect type and a database, a more accurate identification can be performed for a more refined dialect, for example:

Wu language is also known as Jiangsu-Zhejiang dialect or Jiangnan dialect. In the past, the Suzhou dialect was used as the representative. Nowadays, with the economic development of Shanghai, the population used in Shanghai dialect has been increasing, and the number of Shanghai dialects has gradually increased. Therefore, the representative of Wu language today is Shanghai dialect. The main areas are south of the Yangtze River in Jiangsu Province, east of Zhenjiang, a small part of Nantong, and most of Shanghai and Zhejiang. It can be divided into five pieces:

(1) Taihu Lake, represented by Shanghai dialect, passing through areas: Shanghai, Changzhou, Hangzhou and Ningbo.

(2) Taizhou film represented by Linhai dialect.

(3) Eastern Europe film represented by Wenzhou dialect.

(4) The Jinhua dialect is the representative of Zhangzhou.

(5) Li Wei film represented by Lishui dialect.

It can be seen that the same dialect type is also divided into a variety of more detailed branches, so the establishment of the corresponding database can further improve the accuracy of speech recognition.

In an optional implementation manner, before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:

In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types. The embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type The algorithm as a target algorithm includes:

In this embodiment, multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .

In an optional implementation, a plurality of different recognition algorithms may be used, and multiple different recognition results may occur. This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data. The recognition results include:

The obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.

Based on the probability theory, the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.

In an optional implementation manner, the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, which is specifically as follows: after the voice recognition is performed on the voice data by using the target algorithm, the recognition result is obtained. The above methods also include:

Receiving a selection instruction;

In FIG. 2, two kinds of recognition results are displayed; the two recognition results can be displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.

In this embodiment, after the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better; The solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.

As shown in FIG. 3, a voice recognition device is provided in the embodiment of the present invention. The voice recognition device may be a mobile terminal, and specifically includes:

a location obtaining unit 301, configured to acquire a geographic location of the mobile terminal;

a type determining unit 302, configured to determine a dialect type corresponding to the geographical location;

An algorithm obtaining unit 303, configured to acquire a recognition algorithm corresponding to the dialect type as a target algorithm;

The identifying unit 304 is configured to perform voice recognition on the voice data to obtain a recognition result by using the target algorithm after the voice data is collected.

In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be the case that a plurality of dialect types are determined. This embodiment provides a solution as follows: the location obtaining unit 301 is configured to use Obtaining the geographic location of the mobile terminal includes:

In an optional implementation manner, a plurality of different recognition results may occur due to the use of multiple identification algorithms. The embodiment provides the solution as follows:

The voice recognition device further includes: a data establishing unit 305, configured to: before the determining the dialect type corresponding to the geographical location, further comprising:

In this embodiment, by establishing a dialect type and a database, more accurate identification can be performed for a more refined dialect. The same dialect type will also be divided into a variety of more detailed branches, so the establishment of a corresponding database can further improve the accuracy of speech recognition.

The data establishing unit 305 is further configured to: establish a database of correspondence between the dialect type and the recognition algorithm, and a dialect type in the database corresponding to the relationship between the dialect type and the recognition algorithm corresponds to an identification algorithm.

In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types. This embodiment provides a solution as follows: the above-mentioned type determining unit 302 is configured to Obtaining an identification algorithm corresponding to the above dialect type as a target algorithm includes:

The identification unit 304 is configured to perform voice recognition on the voice data by using the target algorithm to obtain a recognition result, including:

In an optional implementation manner, the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, specifically as follows: the foregoing voice recognition device further includes:

The algorithm modifying unit 306 is configured to: after the voice recognition is performed on the voice data by using the target algorithm, obtain the recognition result, and sort the recognition result according to an accurate probability; the output accurate probability is greater than or equal to the preset threshold. And receiving the selection instruction; after the selection instruction specifies the accurate recognition result in the at least two recognition results, modifying the target algorithm to the recognition algorithm corresponding to the recognition result.

As shown in FIG. 4, an embodiment of the present invention further provides a mobile terminal, including a processing unit 401 and an input and output unit 403. The processing unit 402 is configured to perform control and management on actions of the terminal device. For example, the processing unit 402 is configured to support The terminal device performs steps 101-103 of Figure 1 or other processes for the techniques described herein. The input and output unit 403 is for supporting data input and output. The terminal device may further include a storage unit 401 for storing program codes and data of the terminal device.

The processing unit 402 can be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The above processors may also be a combination of computing functions, such as one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like. The input and output unit 403 may be a microphone, an earpiece, a speaker, etc., and the storage unit 401 may be a memory.

The input/output unit 403 is configured to receive input data and output data.

The processing unit 401 is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographical location, and obtain a recognition algorithm corresponding to the dialect type as a target algorithm; after collecting the voice data, use the target algorithm to perform the foregoing The speech data is speech-recognized to obtain the recognition result.

In an optional implementation, the processing unit 401 is further configured to: after the mobile terminal is started, acquire a history set, where the history set is that the mobile terminal counts the mobile after each time it is started. Obtaining the location information of the terminal; analyzing the historical record set to obtain the geographical area to which the mobile terminal belongs as the geographical location.

For other processes that the processor 401 is used for execution, reference may be made to the foregoing method embodiments, and details are not described herein again.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention. As shown, the mobile terminal includes one or more processors, a memory, a communication interface, and one or more programs, where One or more of the above programs are stored in the memory and configured to be executed by the one or more processors, the program including instructions for performing the following steps;

Obtaining a geographic location of the mobile terminal, determining a dialect type corresponding to the geographical location; acquiring an identification algorithm corresponding to the dialect type as a target algorithm; and after acquiring the voice data, using the target algorithm to perform voice recognition on the voice data to be recognized result.

In this embodiment, the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location. The dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China.

In this embodiment, by establishing a dialect type and a database, a more accurate dialect can be more accurately identified, and the same dialect type is also divided into a plurality of more detailed branches, thus establishing corresponding The database can further improve the accuracy of speech recognition.

In an optional implementation manner, the embodiment of the present invention further provides an implementation scheme for establishing a database for pre-establishing a relationship between a dialect type and a recognition algorithm to improve recognition speed and accuracy, as follows: Before the recognition algorithm corresponding to the dialect type is used as the target algorithm, the method further includes:

In an optional implementation manner, the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, which is specifically as follows: after performing the voice recognition on the voice data by using the foregoing target algorithm to obtain the recognition result, the method further includes:

Sorting the recognition results according to an accurate probability from the largest to the smallest; outputting an accurate probability that is greater than or equal to the at least two recognition results of the preset threshold; receiving the selection instruction; wherein the selection instruction specifies the at least two recognition results After accurately identifying the result, the target algorithm is modified to an identification algorithm corresponding to the recognition result.

The above description mainly introduces the solution of the embodiment of the present invention from the perspective of the method side execution process. It can be understood that, in order to implement the above functions, the mobile terminal includes corresponding hardware structures and/or software modules for performing various functions. Those skilled in the art will readily appreciate that the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

The embodiment of the present invention may divide the functional unit into the mobile terminal according to the foregoing method example. For example, each functional unit may be divided according to each function, or two or more functions may be integrated into one processing unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.

The embodiment of the present invention further provides another mobile terminal. As shown in FIG. 6 , for the convenience of description, only parts related to the embodiment of the present invention are shown. For details that are not disclosed, refer to the method of the embodiment of the present invention. section. The mobile terminal can be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), an in-vehicle computer, and the mobile terminal is used as a mobile phone as an example:

FIG. 6 is a block diagram showing a partial structure of a mobile phone related to a mobile terminal provided by an embodiment of the present invention. Referring to FIG. 6, the mobile phone includes: a radio frequency (RF) circuit 910, a memory 920, an input unit 930, a display unit 940, a sensor 950, an audio circuit 960, a wireless fidelity (WiFi) module 970, and a processor 980. And power supply 990 and other components. It will be understood by those skilled in the art that the structure of the handset shown in FIG. 6 does not constitute a limitation to the handset, and may include more or less components than those illustrated, or some components may be combined, or different components may be arranged.

The following describes the components of the mobile phone in detail with reference to FIG. 6:

The RF circuit 910 can be used for receiving and transmitting information. Generally, RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 910 can also communicate with the network and other devices via wireless communication. The above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.

The memory 920 can be used to store software programs and modules, and the processor 980 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 920. The memory 920 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; the storage data area can store data created according to the use of the mobile phone (such as an application). Use parameters, etc.). Further, the memory 920 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 930 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset. Specifically, the input unit 930 may include a fingerprint sensor 931 and other input devices 932. The fingerprint sensor 931 can collect fingerprint data of the user. In addition to the fingerprint sensor 931, the input unit 930 may also include other input devices 932. Specifically, the other input device 932 may include, but is not limited to, one or more of a touch screen, a physical button, a function key (such as a volume control button, a switch button, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 940 can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone. The display unit 940 can include a display screen 941. Alternatively, the display screen 941 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Although in FIG. 6, the fingerprint sensor 931 and the display screen 941 are two separate components to implement the input and input functions of the mobile phone, in some embodiments, the fingerprint sensor 931 can be integrated with the display screen 941 to implement the mobile phone. Input and playback features.

The handset may also include at least one type of sensor 950, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen 941 according to the brightness of the ambient light, and the proximity sensor may turn off the display screen 941 and/or when the mobile phone moves to the ear. Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.

An audio circuit 960, a speaker 961, and a microphone 962 can provide an audio interface between the user and the handset. The audio circuit 960 can transmit the converted electrical data of the received audio data to the speaker 961 for conversion to the sound signal by the speaker 961; on the other hand, the microphone 962 converts the collected sound signal into an electrical signal by the audio circuit 960. After receiving, it is converted into audio data, and then processed by the audio data playback processor 980, sent to the other mobile phone via the RF circuit 910, or played back to the memory 920 for further processing.

WiFi is a short-range wireless transmission technology, and the mobile phone can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 970, which provides users with wireless broadband Internet access. Although FIG. 6 shows the WiFi module 970, it can be understood that it does not belong to the essential configuration of the mobile phone, and can be omitted as needed within the scope of not changing the essence of the invention.

The processor 980 is the control center of the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 920, and invoking data stored in the memory 920, executing The phone's various functions and processing data, so that the overall monitoring of the phone. Optionally, the processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 980.

The handset also includes a power source 990 (such as a battery) that supplies power to the various components. Preferably, the power source can be logically coupled to the processor 980 through a power management system to manage functions such as charging, discharging, and power management through the power management system.

Although not shown, the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.

In the foregoing embodiment shown in FIG. 1, each step method flow can be implemented based on the structure of the mobile phone.

In the embodiments shown in the foregoing FIGS. 3 to 4, each unit function can be implemented based on the structure of the mobile phone.

The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program causing the computer to perform some or all of the steps of any of the methods described in the foregoing method embodiments. The above computer includes a mobile terminal.

The embodiment of the present invention further provides a computer program product, the computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause the computer to execute any one of the methods described in the foregoing method embodiments Part or all of the steps of the method. The computer program product can be a software installation package, and the computer includes a mobile terminal.

It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.

The units described above as separate components may or may not be physically separated. The components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The above-described integrated unit can be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a memory. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the above-described methods of various embodiments of the present invention. The foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.

A person skilled in the art can understand that all or part of the steps of the foregoing embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable memory, and the memory can include: a flash drive , read-only memory (English: Read-Only Memory, referred to as: ROM), random accessor (English: Random Access Memory, referred to as: RAM), disk or CD.

The embodiments of the present invention have been described in detail above, and the principles and implementations of the present invention are described in detail herein. The description of the above embodiments is only for helping to understand the method of the present invention and its core ideas; It should be understood by those skilled in the art that the present invention is not limited by the scope of the present invention.

Claims

A speech recognition method, comprising:

Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;

Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;

After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
The method of claim 1, wherein the obtaining the geographic location of the mobile terminal comprises:

After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
The method according to claim 2, further comprising: before the determining the dialect type corresponding to the geographical location, further comprising:

A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
The method according to claim 3, further comprising: before the obtaining the recognition algorithm corresponding to the dialect type as the target algorithm,

A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
The method according to claim 4, wherein the obtaining the recognition algorithm corresponding to the dialect type as the target algorithm comprises:

When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
The method according to claim 5, wherein the using the target algorithm to perform speech recognition on the speech data to obtain a recognition result comprises:

The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
The method according to claim 5, wherein after the speech data is speech-recognized using the target algorithm to obtain a recognition result, the method further comprises:

Sort the recognition results according to the exact probability from large to small;

Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;

Receiving a selection instruction;

After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
A mobile terminal, comprising: one or more processors, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured Executed by one or more processors, the program includes instructions for performing the following operations:

Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;

Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;

After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
The mobile terminal according to claim 8, wherein in the obtaining the geographical location of the mobile terminal, the instruction in the program is specifically configured to perform the following operations:

After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
The mobile terminal of claim 9, wherein the program further comprises instructions for performing the following operations:

A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
The mobile terminal of claim 10, wherein the program further comprises instructions for performing the following operations:

A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
The mobile terminal according to claim 11, wherein in the obtaining an identification algorithm corresponding to the dialect type as a target algorithm, the instructions in the program are specifically configured to perform the following operations:

When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
The mobile terminal according to claim 12, wherein in the speech recognition of the speech data using the target algorithm to obtain a recognition result, the instruction in the program is specifically configured to perform the following operations:

The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
The mobile terminal of claim 12, wherein the program further comprises instructions for:

Sort the recognition results according to the exact probability from large to small;

Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;

Receiving a selection instruction;

After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is corrected to an identification algorithm corresponding to the recognition result.
A computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to:

Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;

Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;

After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
The computer readable storage medium according to claim 15, wherein the computer is specifically configured to perform the following operations when performing obtaining the geographic location of the mobile terminal:

After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
A computer readable storage medium according to claim 16, wherein said computer further performs the following operations before performing a determination of a dialect type corresponding to said geographic location:

A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
The computer readable storage medium according to claim 17, wherein said computer further performs the following operations before performing an acquisition of an identification algorithm corresponding to said dialect type as a target algorithm:

When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
The computer readable storage medium according to claim 18, wherein the computer specifically performs the following operations when performing speech recognition on the speech data using the target algorithm to obtain a recognition result:

The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
The computer readable storage medium according to claim 18, wherein after performing speech recognition of the speech data using the target algorithm to obtain a recognition result, the computer further performs the following operations:

Sort the recognition results according to the exact probability from large to small;

Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;

Receiving a selection instruction;

After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.