WO2018219105A1 - Speech recognition and related products - Google Patents

Speech recognition and related products Download PDF

Info

Publication number
WO2018219105A1
WO2018219105A1 PCT/CN2018/086205 CN2018086205W WO2018219105A1 WO 2018219105 A1 WO2018219105 A1 WO 2018219105A1 CN 2018086205 W CN2018086205 W CN 2018086205W WO 2018219105 A1 WO2018219105 A1 WO 2018219105A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition
dialect
mobile terminal
algorithm
recognition result
Prior art date
Application number
PCT/CN2018/086205
Other languages
French (fr)
Chinese (zh)
Inventor
白剑
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2018219105A1 publication Critical patent/WO2018219105A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present invention relates to the field of computer technology, and in particular to a voice recognition method and related products.
  • Speech recognition technology is a technique that allows a machine to transform a speech signal into a corresponding text or command through an identification and understanding process.
  • Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology.
  • the voice recognition technology car network has also been fully quoted, for example: just dictate to set the destination direct navigation, safe and convenient.
  • Speech recognition is an interdisciplinary subject. In the past two decades, speech recognition technology has made significant progress and has begun to move from the laboratory to the market. It is expected that in the next 10 years, speech recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, and consumer electronics. The areas covered by speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and so on.
  • the embodiment of the invention provides a speech recognition method and related products for improving the accuracy of recognition of non-standard speech.
  • an embodiment of the present invention provides a voice recognition method, including:
  • the target data is used to perform voice recognition on the voice data to obtain a recognition result.
  • the acquiring the geographic location of the mobile terminal includes:
  • the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
  • the method before the determining the dialect type corresponding to the geographic location, the method further includes:
  • a database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
  • the method before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:
  • a database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
  • acquiring the recognition algorithm corresponding to the dialect type as the target algorithm includes:
  • an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  • the using the target algorithm to perform voice recognition on the voice data to obtain a recognition result includes:
  • the voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  • the method further includes:
  • the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
  • the method further includes: recording the recognition result to the recognition result set, determining the recognition result with the highest accuracy in the recognition result set, and using the recognition algorithm corresponding to the highest accuracy one type of recognition result as a follow-up A recognition algorithm for speech recognition.
  • the speech recognition algorithm can be dynamically adjusted, on the one hand, dynamically adjusted according to the geographic location, and more importantly, based on the recognition result after multiple times of dynamically adjusting the recognition algorithm, a more optimized recognition algorithm can be determined as the final
  • the recognition algorithm for private devices, will have higher accuracy and recognition speed will be high.
  • “acquiring the geographical location of the mobile terminal, determining the dialect type corresponding to the geographical location, and acquiring the recognition algorithm corresponding to the dialect type as the target algorithm” may be performed.
  • the second embodiment of the present invention further provides a mobile terminal, including a processing unit and an input and output unit.
  • the input/output unit is configured to receive input data and output data
  • the processing unit is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographic location, acquire a recognition algorithm corresponding to the dialect type as a target algorithm, and use the target after collecting the voice data.
  • the algorithm performs speech recognition on the speech data to obtain a recognition result.
  • the processing unit is further configured to: after the mobile terminal is started, collect location information of the mobile terminal to obtain a history record set; analyze the history record set to obtain the The geographical area to which the mobile terminal belongs is the geographical location.
  • the third embodiment of the present invention further provides a mobile terminal, including one or more processors, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory. And configured to be executed by the one or more processors, the program comprising instructions for performing the steps of any of the methods provided by embodiments of the present invention.
  • the present invention further provides a computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes the computer to perform the method of any one of claims 1-6
  • the computer includes a mobile terminal.
  • the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the non-standard voice.
  • the accuracy of the recognition is determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the non-standard voice.
  • FIG. 1 is a schematic flow chart of a method provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of an interface according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
  • references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention.
  • the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.
  • the mobile terminal involved in the embodiments of the present invention may include various mobile handheld devices, in-vehicle devices, wearable devices, computing devices, or other processing devices connected to the wireless modem, and various forms of user equipment (User Equipment, UE), mobile station (MS), terminal device, and the like.
  • UE User Equipment
  • MS mobile station
  • terminal device and the like.
  • the devices mentioned above are collectively referred to as mobile terminals.
  • the non-standard voice is relative to the standard voice
  • the standard voice may be: Mandarin pronunciation of Chinese, or some dialect pronunciations that are included in the standard. This will not be repeated hereafter.
  • FIG. 1 is a schematic flowchart of a voice recognition method according to an embodiment of the present invention, which is applied to a mobile terminal.
  • the camera control method includes:
  • the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location.
  • the dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China, namely:
  • Cantonese (abbreviation: Cantonese);
  • Hunan dialect abbreviation: Xiang language
  • Hakka (abbreviation: Hakka).
  • Step 102 Obtain an identification algorithm corresponding to the dialect type as the target algorithm.
  • MIT Media lab Speech Dataset MIT Institute of Media Lab Voice Dataset
  • Pitch and voicingng Estimates for Aurora 2 Aurora2 Speech Library Gene Cycle and Tone Estimation
  • Congressional Speech Data and Mandarin Speech Frame Data
  • voice data used to test the blind source separation algorithm, and the like.
  • different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.
  • the voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user.
  • the algorithm of the speech recognition that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.
  • the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.
  • the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client.
  • the geographical location of obtaining the mobile terminal mentioned above includes:
  • the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing
  • the geographical area to which the mobile terminal belongs is the above geographical location.
  • the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.
  • the manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.
  • the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:
  • a more accurate identification can be performed for a more refined dialect, for example:
  • Wu language is also known as Jiangsu-Zhejiang dialect or Jiangnan dialect.
  • the Suzhou dialect was used as the representative.
  • the population used in Shanghai dialect has been increasing, and the number of Shanghai dialects has gradually increased. Therefore, the representative of Wu language today is Shanghai dialect.
  • the main areas are south of the Yangtze River in Jiangsu province, east of Zhenjiang, a small part of Nantong, and most of Shanghai and Zhejiang. It can be divided into five pieces:
  • the Jinhua dialect is the representative of Zhangzhou.
  • the method before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:
  • a database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
  • the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types.
  • the embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type
  • the algorithm as a target algorithm includes:
  • an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  • multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
  • a plurality of different recognition algorithms may be used, and multiple different recognition results may occur.
  • This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data.
  • the recognition results include:
  • the obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  • the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
  • the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, which is specifically as follows: after the voice recognition is performed on the voice data by using the target algorithm, the recognition result is obtained.
  • the above methods also include:
  • the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
  • the two recognition results are displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.
  • the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better;
  • the solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
  • the voice recognition device may be a mobile terminal, and specifically includes:
  • a location obtaining unit 301 configured to acquire a geographic location of the mobile terminal
  • a type determining unit 302 configured to determine a dialect type corresponding to the geographical location
  • An algorithm obtaining unit 303 configured to acquire a recognition algorithm corresponding to the dialect type as a target algorithm
  • the identifying unit 304 is configured to perform voice recognition on the voice data to obtain a recognition result by using the target algorithm after the voice data is collected.
  • the location obtaining unit 301 is configured to use Obtaining the geographic location of the mobile terminal includes:
  • the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing
  • the geographical area to which the mobile terminal belongs is the above geographical location.
  • multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
  • a plurality of different recognition results may occur due to the use of multiple identification algorithms.
  • the embodiment provides the solution as follows:
  • the voice recognition device further includes: a data establishing unit 305, configured to: before the determining the dialect type corresponding to the geographical location, further comprising:
  • the data establishing unit 305 is further configured to: establish a database of correspondence between the dialect type and the recognition algorithm, and a dialect type in the database corresponding to the relationship between the dialect type and the recognition algorithm corresponds to an identification algorithm.
  • the above-mentioned type determining unit 302 is configured to Obtaining an identification algorithm corresponding to the above dialect type as a target algorithm includes:
  • an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  • the identification unit 304 is configured to perform voice recognition on the voice data by using the target algorithm to obtain a recognition result, including:
  • the obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  • the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
  • the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, specifically as follows: the foregoing voice recognition device further includes:
  • the algorithm modifying unit 306 is configured to: after the voice recognition is performed on the voice data by using the target algorithm, obtain the recognition result, and sort the recognition result according to an accurate probability; the output accurate probability is greater than or equal to the preset threshold. And receiving the selection instruction; after the selection instruction specifies the accurate recognition result in the at least two recognition results, modifying the target algorithm to the recognition algorithm corresponding to the recognition result.
  • Fig. 2 two kinds of recognition results are displayed; the two recognition results can be displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.
  • the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better;
  • the solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
  • an embodiment of the present invention further provides a mobile terminal, including a processing unit 401 and an input and output unit 403.
  • the processing unit 402 is configured to perform control and management on actions of the terminal device.
  • the processing unit 402 is configured to support
  • the terminal device performs steps 101-103 of Figure 1 or other processes for the techniques described herein.
  • the input and output unit 403 is for supporting data input and output.
  • the terminal device may further include a storage unit 401 for storing program codes and data of the terminal device.
  • the processing unit 402 can be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the above processors may also be a combination of computing functions, such as one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the input and output unit 403 may be a microphone, an earpiece, a speaker, etc., and the storage unit 401 may be a memory.
  • the input/output unit 403 is configured to receive input data and output data.
  • the processing unit 401 is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographical location, and obtain a recognition algorithm corresponding to the dialect type as a target algorithm; after collecting the voice data, use the target algorithm to perform the foregoing
  • the speech data is speech-recognized to obtain the recognition result.
  • the processing unit 401 is further configured to: after the mobile terminal is started, acquire a history set, where the history set is that the mobile terminal counts the mobile after each time it is started. Obtaining the location information of the terminal; analyzing the historical record set to obtain the geographical area to which the mobile terminal belongs as the geographical location.
  • processor 401 For other processes that the processor 401 is used for execution, reference may be made to the foregoing method embodiments, and details are not described herein again.
  • FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
  • the mobile terminal includes one or more processors, a memory, a communication interface, and one or more programs, where One or more of the above programs are stored in the memory and configured to be executed by the one or more processors, the program including instructions for performing the following steps;
  • Obtaining a geographic location of the mobile terminal determining a dialect type corresponding to the geographical location; acquiring an identification algorithm corresponding to the dialect type as a target algorithm; and after acquiring the voice data, using the target algorithm to perform voice recognition on the voice data to be recognized result.
  • the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location.
  • the dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China.
  • different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.
  • the voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user.
  • the algorithm of the speech recognition that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.
  • the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.
  • the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client.
  • the geographical location of obtaining the mobile terminal mentioned above includes:
  • the mobile terminal After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing
  • the geographical area to which the mobile terminal belongs is the above geographical location.
  • the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.
  • the manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.
  • the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:
  • a more accurate dialect can be more accurately identified, and the same dialect type is also divided into a plurality of more detailed branches, thus establishing corresponding
  • the database can further improve the accuracy of speech recognition.
  • the embodiment of the present invention further provides an implementation scheme for establishing a database for pre-establishing a relationship between a dialect type and a recognition algorithm to improve recognition speed and accuracy, as follows: Before the recognition algorithm corresponding to the dialect type is used as the target algorithm, the method further includes:
  • a database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
  • the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types.
  • the embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type
  • the algorithm as a target algorithm includes:
  • an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  • multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
  • a plurality of different recognition algorithms may be used, and multiple different recognition results may occur.
  • This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data.
  • the recognition results include:
  • the obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  • the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
  • the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, which is specifically as follows: after performing the voice recognition on the voice data by using the foregoing target algorithm to obtain the recognition result, the method further includes:
  • the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better;
  • the solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
  • the mobile terminal includes corresponding hardware structures and/or software modules for performing various functions.
  • the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the embodiment of the present invention may divide the functional unit into the mobile terminal according to the foregoing method example.
  • each functional unit may be divided according to each function, or two or more functions may be integrated into one processing unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • the embodiment of the present invention further provides another mobile terminal.
  • the mobile terminal can be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), an in-vehicle computer, and the mobile terminal is used as a mobile phone as an example:
  • FIG. 6 is a block diagram showing a partial structure of a mobile phone related to a mobile terminal provided by an embodiment of the present invention.
  • the mobile phone includes: a radio frequency (RF) circuit 910, a memory 920, an input unit 930, a display unit 940, a sensor 950, an audio circuit 960, a wireless fidelity (WiFi) module 970, and a processor 980. And power supply 990 and other components.
  • RF radio frequency
  • the RF circuit 910 can be used for receiving and transmitting information.
  • RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
  • LNA Low Noise Amplifier
  • RF circuitry 910 can also communicate with the network and other devices via wireless communication.
  • the above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • E-mail Short Messaging Service
  • the memory 920 can be used to store software programs and modules, and the processor 980 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 920.
  • the memory 920 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; the storage data area can store data created according to the use of the mobile phone (such as an application). Use parameters, etc.).
  • the memory 920 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit 930 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset.
  • the input unit 930 may include a fingerprint sensor 931 and other input devices 932.
  • the fingerprint sensor 931 can collect fingerprint data of the user.
  • the input unit 930 may also include other input devices 932.
  • the other input device 932 may include, but is not limited to, one or more of a touch screen, a physical button, a function key (such as a volume control button, a switch button, etc.), a trackball, a mouse, a joystick, and the like.
  • the display unit 940 can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone.
  • the display unit 940 can include a display screen 941.
  • the display screen 941 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the fingerprint sensor 931 and the display screen 941 are two separate components to implement the input and input functions of the mobile phone, in some embodiments, the fingerprint sensor 931 can be integrated with the display screen 941 to implement the mobile phone. Input and playback features.
  • the handset may also include at least one type of sensor 950, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen 941 according to the brightness of the ambient light, and the proximity sensor may turn off the display screen 941 and/or when the mobile phone moves to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
  • vibration recognition related functions such as pedometer, tapping
  • the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • An audio circuit 960, a speaker 961, and a microphone 962 can provide an audio interface between the user and the handset.
  • the audio circuit 960 can transmit the converted electrical data of the received audio data to the speaker 961 for conversion to the sound signal by the speaker 961; on the other hand, the microphone 962 converts the collected sound signal into an electrical signal by the audio circuit 960. After receiving, it is converted into audio data, and then processed by the audio data playback processor 980, sent to the other mobile phone via the RF circuit 910, or played back to the memory 920 for further processing.
  • WiFi is a short-range wireless transmission technology
  • the mobile phone can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 970, which provides users with wireless broadband Internet access.
  • FIG. 6 shows the WiFi module 970, it can be understood that it does not belong to the essential configuration of the mobile phone, and can be omitted as needed within the scope of not changing the essence of the invention.
  • the processor 980 is the control center of the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 920, and invoking data stored in the memory 920, executing The phone's various functions and processing data, so that the overall monitoring of the phone.
  • the processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 980.
  • the handset also includes a power source 990 (such as a battery) that supplies power to the various components.
  • a power source 990 such as a battery
  • the power source can be logically coupled to the processor 980 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • each step method flow can be implemented based on the structure of the mobile phone.
  • each unit function can be implemented based on the structure of the mobile phone.
  • the embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program causing the computer to perform some or all of the steps of any of the methods described in the foregoing method embodiments.
  • the above computer includes a mobile terminal.
  • the embodiment of the present invention further provides a computer program product, the computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause the computer to execute any one of the methods described in the foregoing method embodiments Part or all of the steps of the method.
  • the computer program product can be a software installation package, and the computer includes a mobile terminal.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described above as separate components may or may not be physically separated.
  • the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the above-described integrated unit can be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a memory. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the above-described methods of various embodiments of the present invention.
  • the foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.
  • ROM Read-Only Memory
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

Disclosed in the embodiments of the invention are a voice recognition method and a related product. The method comprises the following steps: acquiring a geographical location of a mobile terminal, and determining a dialect type corresponding to the geographical location (101); acquiring a recognition algorithm corresponding to the dialect type as a target algorithm (102); and after voice data is collected, using the target algorithm to recognize the voice data to get a recognition result (103). The type of dialect used in an area to which a mobile terminal belongs is determined according to the geographical location of the mobile terminal, so that a corresponding algorithm can be used to improve the accuracy of voice recognition. Therefore, the recognition accuracy of nonstandard voice is improved.

Description

语音识别方法及相关产品Speech recognition method and related products
本发明要求2017年5月31日递交的发明名称为“语音识别方法及相关产品”的申请号201710401786.1的在先申请优先权,上述在先申请的内容以引入的方式并入本文本中。The present invention claims priority to the priority of the application Serial No. 201710401786.1, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本发明涉及计算机技术领域,具体涉及语音识别方法及相关产品。The present invention relates to the field of computer technology, and in particular to a voice recognition method and related products.
背景技术Background technique
与机器进行语音交流,让机器明白你说什么,这是人们长期以来梦寐以求的事情。中国物联网校企联盟形象得把语音识别比做为机器的听觉系统。语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的技术。Communicate with the machine and let the machine understand what you are saying. This is what people have long dreamed of. The image of the China Internet of Things School and Enterprise Alliance has to compare speech recognition as a machine's auditory system. Speech recognition technology is a technique that allows a machine to transform a speech signal into a corresponding text or command through an identification and understanding process.
语音识别技术主要包括特征提取技术、模式匹配准则及模型训练技术三个方面。语音识别技术车联网也得到了充分的引用,例如:只需口述即可设置目的地直接导航,安全、便捷。Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology. The voice recognition technology car network has also been fully quoted, for example: just dictate to set the destination direct navigation, safe and convenient.
语音识别是一门交叉学科。近二十年来,语音识别技术取得显著进步,开始从实验室走向市场。人们预计,未来10年内,语音识别技术将进入工业、家电、通信、汽车电子、医疗、家庭服务、消费电子产品等各个领域。语音识别技术所涉及的领域包括:信号处理、模式识别、概率论和信息论、发声机理和听觉机理、人工智能等等。Speech recognition is an interdisciplinary subject. In the past two decades, speech recognition technology has made significant progress and has begun to move from the laboratory to the market. It is expected that in the next 10 years, speech recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, and consumer electronics. The areas covered by speech recognition technology include: signal processing, pattern recognition, probability theory and information theory, vocal mechanism and auditory mechanism, artificial intelligence, and so on.
如何提高语音识别的准确率以及识别速度,是该领域技术人员努力的方向;目前,由于人们说话带有口音,甚至有区别很大的方言,给语音识别造成了较大的困难,因此需要提出解决方案。How to improve the accuracy and recognition speed of speech recognition is the direction of the technicians in the field; at present, because people speak with accents and even dialects that are very different, it poses great difficulties for speech recognition, so it needs to be proposed. solution.
发明内容Summary of the invention
本发明实施例提供了语音识别方法及相关产品,用于提高非标准语音的识别的准确率。The embodiment of the invention provides a speech recognition method and related products for improving the accuracy of recognition of non-standard speech.
第一方面,本发明实施例提供了一种语音识别方法,包括:In a first aspect, an embodiment of the present invention provides a voice recognition method, including:
获取移动终端的地理位置,确定与所述地理位置对应的方言类型;Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;
获取与所述方言类型对应的识别算法作为目标算法;Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;
在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
在一个可能的实现方式中,所述获取移动终端的地理位置包括:In a possible implementation, the acquiring the geographic location of the mobile terminal includes:
在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
在一个可能的实现方式中,在所述确定与所述地理位置对应的方言类型之前,还包括:In a possible implementation manner, before the determining the dialect type corresponding to the geographic location, the method further includes:
建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
在一个可能的实现方式中,在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:In a possible implementation, before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:
建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
在一个可能的实现方式中,获取与所述方言类型对应的识别算法作为目标算法包括:In a possible implementation manner, acquiring the recognition algorithm corresponding to the dialect type as the target algorithm includes:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
在一个可能的实现方式中,所述使用所述目标算法对所述语音数据进行语音识别得到识别结果包括:In a possible implementation manner, the using the target algorithm to perform voice recognition on the voice data to obtain a recognition result includes:
使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
在一个可能的实现方式中,在使用所述目标算法对所述语音数据进行语音识别得到识别结果之后,所述方法还包括:In a possible implementation, after the voice data is voice-recognized to obtain the recognition result by using the target algorithm, the method further includes:
将识别结果按照准确的概率由大至小进行排序;Sort the recognition results according to the exact probability from large to small;
输出准确的概率大于或等于预设阈值的至少两个识别结果;Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;
接收选择指令;Receiving a selection instruction;
在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
在一个可能的实现方式中,所述方法还包括:记录识别结果到识别结果集,确定识别结果集中准确度最高一类识别结果,将所述准确度最高一类识别结果对应的识别算法作为后续进行语音识别的识别算法。In a possible implementation, the method further includes: recording the recognition result to the recognition result set, determining the recognition result with the highest accuracy in the recognition result set, and using the recognition algorithm corresponding to the highest accuracy one type of recognition result as a follow-up A recognition algorithm for speech recognition.
该实施例,可以动态调整语音识别算法,一方面根据地理位置来动态调整,更为重要的是,基于多次动态调整识别算法后的识别结果,可以确定一 个更为优化的识别算法作为最终的识别算法,这样对于私人设备而言,会具有较高的准确度并且识别速度会很高。后续可以不必再执行前文中提到的“获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法”。In this embodiment, the speech recognition algorithm can be dynamically adjusted, on the one hand, dynamically adjusted according to the geographic location, and more importantly, based on the recognition result after multiple times of dynamically adjusting the recognition algorithm, a more optimized recognition algorithm can be determined as the final The recognition algorithm, for private devices, will have higher accuracy and recognition speed will be high. Subsequent to the foregoing, “acquiring the geographical location of the mobile terminal, determining the dialect type corresponding to the geographical location, and acquiring the recognition algorithm corresponding to the dialect type as the target algorithm” may be performed.
二方面本发明实施例还提供了一种移动终端,包括处理单元和输入输出单元,The second embodiment of the present invention further provides a mobile terminal, including a processing unit and an input and output unit.
所述输入输出单元,用于接收输入的数据和输出数据;The input/output unit is configured to receive input data and output data;
所述处理单元,用于获取移动终端的地理位置,确定与所述地理位置对应的方言类型;获取与所述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。The processing unit is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographic location, acquire a recognition algorithm corresponding to the dialect type as a target algorithm, and use the target after collecting the voice data. The algorithm performs speech recognition on the speech data to obtain a recognition result.
在一个可能的实现方式中,所述处理单元,还用于在所述移动终端被启动后,统计所述移动终端所处的位置信息得到历史记录集;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。In a possible implementation, the processing unit is further configured to: after the mobile terminal is started, collect location information of the mobile terminal to obtain a history record set; analyze the history record set to obtain the The geographical area to which the mobile terminal belongs is the geographical location.
三方面本发明实施例还提供了一种移动终端,包括一个或多个处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述一个或多个处理器执行,所述程序包括用于执行本发明实施例提供的任一项方法中的步骤的指令。The third embodiment of the present invention further provides a mobile terminal, including one or more processors, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory. And configured to be executed by the one or more processors, the program comprising instructions for performing the steps of any of the methods provided by embodiments of the present invention.
四方面本发明实施例还提供了一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-6任一项所述的方法,所述计算机包括移动终端。The present invention further provides a computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes the computer to perform the method of any one of claims 1-6 The computer includes a mobile terminal.
可以看出,本发明实施例中,通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。It can be seen that, in the embodiment of the present invention, the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the non-standard voice. The accuracy of the recognition.
附图说明DRAWINGS
下面将对本发明实施例所涉及到的附图作简单地介绍。The drawings referred to in the embodiments of the present invention will be briefly described below.
图1是本发明实施例提供的方法的流程示意图;1 is a schematic flow chart of a method provided by an embodiment of the present invention;
图2是本发明实施例的界面示意图;2 is a schematic diagram of an interface according to an embodiment of the present invention;
图3是本发明实施例的语音识别设备结构示意图;3 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention;
图4是本发明实施例的移动终端结构示意图;4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention;
图5是本发明实施例的移动终端的结构示意图;FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention; FIG.
图6是本发明实施例的移动终端的结构示意图。FIG. 6 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second" and the like in the specification and claims of the present invention and the above drawings are used to distinguish different objects, and are not intended to describe a specific order. Furthermore, the terms "comprises" and "comprising" and "comprising" are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units not listed, or alternatively Other steps or units inherent to these processes, methods, products, or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。References to "an embodiment" herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention. The appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.
本发明实施例所涉及到的移动终端可以包括各种可移动的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为移动终端。The mobile terminal involved in the embodiments of the present invention may include various mobile handheld devices, in-vehicle devices, wearable devices, computing devices, or other processing devices connected to the wireless modem, and various forms of user equipment (User Equipment, UE), mobile station (MS), terminal device, and the like. For convenience of description, the devices mentioned above are collectively referred to as mobile terminals.
语音识别的准确性一直是语音识别的大难题,目前使用各种算法来提高语音识别的准确性,但是对于移动终端而言,使用者千差万别,语言类型容易区分,但是各地方言造成极大困扰。The accuracy of speech recognition has always been a big problem in speech recognition. At present, various algorithms are used to improve the accuracy of speech recognition. However, for mobile terminals, users vary widely, and language types are easy to distinguish, but local dialects cause great trouble.
在本发明实施例中,非标准语音是相对于标准语音而言的,标准语音可以是:汉语的普通话发音,或者,某些被列入标准的方言发音。后续实施例对此不再一一赘述。In the embodiment of the present invention, the non-standard voice is relative to the standard voice, and the standard voice may be: Mandarin pronunciation of Chinese, or some dialect pronunciations that are included in the standard. This will not be repeated hereafter.
下面结合附图对本发明实施例进行介绍。The embodiments of the present invention are described below with reference to the accompanying drawings.
请参阅图1,图1是本发明实施例提供了一种语音识别方法的流程示意图,应用于移动终端,如图所示,本拍照控制方法包括:Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a voice recognition method according to an embodiment of the present invention, which is applied to a mobile terminal. As shown in the figure, the camera control method includes:
101,获取移动终端的地理位置,确定与上述地理位置对应的方言类型;101. Obtain a geographic location of the mobile terminal, and determine a dialect type corresponding to the geographical location.
在本实施例中,地理位置可以使用经纬度,或者行政区划等方式来表示;也可以使用预置的方言区域划分来表示,并不仅限于经纬度的方式来表示该地理位置。In this embodiment, the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location.
方言类型是指方言所属的种类。目前在中国主要有如下七种,分别为:The dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China, namely:
1、北方话(简称:北语);1. Northern dialect (abbreviation: Northern language);
2、广东话(简称:粤语);2. Cantonese (abbreviation: Cantonese);
3、江浙话(简称:吴语);3. Jiangsu and Zhejiang dialects (abbreviation: Wu language);
4、福建话(简称:闽语);4. Fujian dialect (abbreviation: proverb);
5、湖南话(简称:湘语);5. Hunan dialect (abbreviation: Xiang language);
6、江西话(简称:赣语);6. Jiangxi dialect (abbreviation: proverb);
7、客家话(简称:客语)。7, Hakka (abbreviation: Hakka).
除此之外还有很多其他的方言类型,在此不再一一罗列。There are many other dialect types in addition to this, which are not listed here.
102:获取与上述方言类型对应的识别算法作为目标算法;Step 102: Obtain an identification algorithm corresponding to the dialect type as the target algorithm.
在语音识别的研究发展过程中,研究人员根据不同语言的发音特点,设计和制作了以汉语(包括不同方言)、英语等各类语言的语音数据库,这些语音数据库,例如:MIT Media lab Speech Dataset(麻省理工学院媒体实验室语音数据集)、Pitch and Voicing Estimates for Aurora 2(Aurora2语音库的基因周期和声调估计)、Congressional speech data(国会语音数据)、Mandarin Speech Frame Data(普通话语音帧数据)、用于测试盲源分离算法的语音数据等。In the research and development of speech recognition, researchers designed and produced speech databases in various languages such as Chinese (including different dialects) and English according to the pronunciation characteristics of different languages. For example, MIT Media lab Speech Dataset (MIT Institute of Media Lab Voice Dataset), Pitch and Voicing Estimates for Aurora 2 (Aurora2 Speech Library Gene Cycle and Tone Estimation), Congressional Speech Data, and Mandarin Speech Frame Data ), voice data used to test the blind source separation algorithm, and the like.
因此,不同的方言类型可以有不同的识别算法与之对应,特别地不同的识别算法可以对应到不同的方言类型的标准语音的语音数据库;因此对于确定的方言类型,可以有针对性地提高识别速度和准确度。Therefore, different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.
103:在采集到语音数据后,使用上述目标算法对上述语音数据进行语音识别得到识别结果。103: After collecting the voice data, performing voice recognition on the voice data by using the target algorithm to obtain a recognition result.
上述采集语音数据,可以是人对着终端设备说话,由终端设备的语音拾取设备,例如:话筒,采集用户输入的语音数据。在语音识别的算法,即目标算法确定后,具体的识别过程本发明实施例不作赘述。可以理解的是,对于不同的方言,可以有不同方言的语音数据库与识别算法配套使用。The voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user. After the algorithm of the speech recognition, that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.
在本实施例中,通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。In this embodiment, the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.
在一个可选的实现方式中,由于即时获取的地理位置信息未必是终端设备的常用或者真实的能够体现其方言区域的位置,例如:出差客户的移动终端,因此本发明实施例提供了解决方案如下:上述获取移动终端的地理位置包括:In an optional implementation manner, the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client. As follows: The geographical location of obtaining the mobile terminal mentioned above includes:
在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing The geographical area to which the mobile terminal belongs is the above geographical location.
在本实施例中,采用历史记录集的方式来确定终端设备真实所属的区域,这样可以避免移动终端频繁在各种不同方言区域移动导致判断不准确的问题。In this embodiment, the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.
上述分析历史记录集的方式,可以如:确定终端设备在某地理区域持续的时间最长,则该地理区域可以作为该移动终端最可能的真实地理位置区域。例如:汽车停放最多的地理位置,手机晚上所在最多的地理位置等等。The manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.
在一个可选的实现方式中,本发明实施例还提供了建立预先建立数据库来提高识别速度和准确性的实现方案,具体如下:在上述确定与上述地理位置对应的方言类型之前,还包括:In an optional implementation manner, the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:
建立地理区域与方言类型之间对应关系的数据库,在上述数据库中一个地理区域对应到一个或一个以上的方言类型。A database for establishing a correspondence between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
本实施例中,通过建立了方言类型和数据库,那么可以针对更为细化的方言进行更为准确的识别,例如:In this embodiment, by establishing a dialect type and a database, a more accurate identification can be performed for a more refined dialect, for example:
吴语又称江浙话或江南话。过去以苏州话为代表,现今随着上海市的经济发展,使上海话使用的人口不断的增多,通晓上海话也逐渐多。因此现今吴语的代表是上海话。通行地域主要是江苏省长江以南、镇江以东,南通小部份,上海及浙江大部份地区,可分为五个片:Wu language is also known as Jiangsu-Zhejiang dialect or Jiangnan dialect. In the past, the Suzhou dialect was used as the representative. Nowadays, with the economic development of Shanghai, the population used in Shanghai dialect has been increasing, and the number of Shanghai dialects has gradually increased. Therefore, the representative of Wu language today is Shanghai dialect. The main areas are south of the Yangtze River in Jiangsu Province, east of Zhenjiang, a small part of Nantong, and most of Shanghai and Zhejiang. It can be divided into five pieces:
(1)以上海话为代表的太湖片,通行地域:上海市、常州地区、杭州地区和宁波地区。(1) Taihu Lake, represented by Shanghai dialect, passing through areas: Shanghai, Changzhou, Hangzhou and Ningbo.
(2)以临海话为代表的台州片。(2) Taizhou film represented by Linhai dialect.
(3)以温州话为代表的东欧片。(3) Eastern Europe film represented by Wenzhou dialect.
(4)以金华话为代表婺州片。(4) The Jinhua dialect is the representative of Zhangzhou.
(5)以丽水话为代表的丽衢片。(5) Li Wei film represented by Lishui dialect.
可见,即是同一个方言类型也会分为多种更为细化的分支,因此建立相应的数据库可以进一步提高语音识别的准确性。It can be seen that the same dialect type is also divided into a variety of more detailed branches, so the establishment of the corresponding database can further improve the accuracy of speech recognition.
在一个可选的实现方式中,在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:In an optional implementation manner, before the acquiring the recognition algorithm corresponding to the dialect type as the target algorithm, the method further includes:
建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:获取与上述方言类型对应的识别算法作为目标算法包括:In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types. The embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type The algorithm as a target algorithm includes:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
在本实施例中,可以对应到不同的方言类型,获得多个识别算法与之分别对应;有可能多种方言类型对应到一种识别算法,因此识别算法的个数可以比方言类型的数量少。In this embodiment, multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
在一个可选的实现方式中,由于使用了多种识别算法,那么可能会出现多个不同的识别结果,本实施例提供了解决方案具体如下:上述使用上述目 标算法对上述语音数据进行语音识别得到识别结果包括:In an optional implementation, a plurality of different recognition algorithms may be used, and multiple different recognition results may occur. This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data. The recognition results include:
使用获取的各目标算法对上述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
基于概率论,识别结果会对应到一个准确的概率,那么各个识别算法得到的识别结果都会对应到一个概率,那么可以将概率值最大的识别结果作为最终的识别结果。Based on the probability theory, the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
在一个可选的实现方式中,本发明实施例还提供了进一步矫正识别算法的选择方案,如图2所示,具体如下:在使用上述目标算法对上述语音数据进行语音识别得到识别结果之后,上述方法还包括:In an optional implementation manner, the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, which is specifically as follows: after the voice recognition is performed on the voice data by using the target algorithm, the recognition result is obtained. The above methods also include:
将识别结果按照准确的概率由大至小进行排序;Sort the recognition results according to the exact probability from large to small;
输出准确的概率大于或等于预设阈值的至少两个识别结果;Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;
接收选择指令;Receiving a selection instruction;
在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
在图2中,显示了两种识别结果;该两种识别结果可以使用文字的形式显示出来,也可以使用语音的方式播放,如果采用语音的方式播放可以进一步使用对应的方言播放。In FIG. 2, two kinds of recognition results are displayed; the two recognition results can be displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.
在本实施例中,在采集到语音数据后,然后采用一种或者多种算法得到了两种以上的识别结果,然后通过用户确认的更为准确的识别结果可以确定哪一种算法更好;该方案极为适合例如手机等较为私人或者口音类似的用户使用,可以在保证识别速度的前提下,提高非标准语音的识别的准确率。In this embodiment, after the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better; The solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
如图3所示,为本发明实施例提供的一种语音识别设备,该语音识别设备可以为移动终端,具体包括:As shown in FIG. 3, a voice recognition device is provided in the embodiment of the present invention. The voice recognition device may be a mobile terminal, and specifically includes:
位置获取单元301,用于获取移动终端的地理位置;a location obtaining unit 301, configured to acquire a geographic location of the mobile terminal;
类型确定单元302,用于确定与上述地理位置对应的方言类型;a type determining unit 302, configured to determine a dialect type corresponding to the geographical location;
算法获取单元303,用于获取与上述方言类型对应的识别算法作为目标算法;An algorithm obtaining unit 303, configured to acquire a recognition algorithm corresponding to the dialect type as a target algorithm;
识别单元304,用于在采集到语音数据后,使用上述目标算法对上述语音数据进行语音识别得到识别结果。The identifying unit 304 is configured to perform voice recognition on the voice data to obtain a recognition result by using the target algorithm after the voice data is collected.
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:上述位置获取单元301,用于获取移动终端的地理位置包括:In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be the case that a plurality of dialect types are determined. This embodiment provides a solution as follows: the location obtaining unit 301 is configured to use Obtaining the geographic location of the mobile terminal includes:
在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing The geographical area to which the mobile terminal belongs is the above geographical location.
在本实施例中,可以对应到不同的方言类型,获得多个识别算法与之分 别对应;有可能多种方言类型对应到一种识别算法,因此识别算法的个数可以比方言类型的数量少。In this embodiment, multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
在一个可选的实现方式中,由于使用了多种识别算法,那么可能会出现多个不同的识别结果,本实施例提供了解决方案具体如下:In an optional implementation manner, a plurality of different recognition results may occur due to the use of multiple identification algorithms. The embodiment provides the solution as follows:
上述语音识别设备还包括:数据建立单元305,用于在上述确定与上述地理位置对应的方言类型之前,还包括:The voice recognition device further includes: a data establishing unit 305, configured to: before the determining the dialect type corresponding to the geographical location, further comprising:
建立地理区域与方言类型之间对应关系的数据库,在上述数据库中一个地理区域对应到一个或一个以上的方言类型。A database for establishing a correspondence between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
本实施例中,通过建立了方言类型和数据库,那么可以针对更为细化的方言进行更为准确的识别。同一个方言类型也会分为多种更为细化的分支,因此建立相应的数据库可以进一步提高语音识别的准确性。In this embodiment, by establishing a dialect type and a database, more accurate identification can be performed for a more refined dialect. The same dialect type will also be divided into a variety of more detailed branches, so the establishment of a corresponding database can further improve the accuracy of speech recognition.
上述数据建立单元305,还用于:建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。The data establishing unit 305 is further configured to: establish a database of correspondence between the dialect type and the recognition algorithm, and a dialect type in the database corresponding to the relationship between the dialect type and the recognition algorithm corresponds to an identification algorithm.
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:上述类型确定单元302,用于获取与上述方言类型对应的识别算法作为目标算法包括:In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types. This embodiment provides a solution as follows: the above-mentioned type determining unit 302 is configured to Obtaining an identification algorithm corresponding to the above dialect type as a target algorithm includes:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
上述识别单元304,用于使用上述目标算法对上述语音数据进行语音识别得到识别结果包括:The identification unit 304 is configured to perform voice recognition on the voice data by using the target algorithm to obtain a recognition result, including:
使用获取的各目标算法对上述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
基于概率论,识别结果会对应到一个准确的概率,那么各个识别算法得到的识别结果都会对应到一个概率,那么可以将概率值最大的识别结果作为最终的识别结果。Based on the probability theory, the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
在一个可选的实现方式中,本发明实施例还提供了进一步矫正识别算法的选择方案,如图2所示,具体如下:上述语音识别设备还包括:In an optional implementation manner, the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, as shown in FIG. 2, specifically as follows: the foregoing voice recognition device further includes:
算法修正单元306,用于在使用上述目标算法对上述语音数据进行语音识别得到识别结果之后,将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。The algorithm modifying unit 306 is configured to: after the voice recognition is performed on the voice data by using the target algorithm, obtain the recognition result, and sort the recognition result according to an accurate probability; the output accurate probability is greater than or equal to the preset threshold. And receiving the selection instruction; after the selection instruction specifies the accurate recognition result in the at least two recognition results, modifying the target algorithm to the recognition algorithm corresponding to the recognition result.
在图2中,显示了两种识别结果;该两种识别结果可以使用文字的形式显示出来,也可以使用语音的方式播放,如果采用语音的方式播放可以进一步 使用对应的方言播放。In Fig. 2, two kinds of recognition results are displayed; the two recognition results can be displayed in the form of text, or can be played by using a voice, and if the voice is played, the corresponding dialect can be further played.
在本实施例中,在采集到语音数据后,然后采用一种或者多种算法得到了两种以上的识别结果,然后通过用户确认的更为准确的识别结果可以确定哪一种算法更好;该方案极为适合例如手机等较为私人或者口音类似的用户使用,可以在保证识别速度的前提下,提高非标准语音的识别的准确率。In this embodiment, after the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better; The solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
如图4所示,本发明实施例还提供了一种移动终端,包括处理单元401和输入输出单元403,处理单元402用于对终端设备的动作进行控制管理,例如,处理单元402用于支持终端设备执行图1中的步骤101-103或用于本文所描述的技术的其它过程。输入输出单元403用于支持数据输入和输出。终端设备还可以包括存储单元401,用于存储终端设备的程序代码和数据。As shown in FIG. 4, an embodiment of the present invention further provides a mobile terminal, including a processing unit 401 and an input and output unit 403. The processing unit 402 is configured to perform control and management on actions of the terminal device. For example, the processing unit 402 is configured to support The terminal device performs steps 101-103 of Figure 1 or other processes for the techniques described herein. The input and output unit 403 is for supporting data input and output. The terminal device may further include a storage unit 401 for storing program codes and data of the terminal device.
其中,处理单元402可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。上述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。输入输出单元403可以话筒、听筒、喇叭等,存储单元401可以是存储器。The processing unit 402 can be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The above processors may also be a combination of computing functions, such as one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like. The input and output unit 403 may be a microphone, an earpiece, a speaker, etc., and the storage unit 401 may be a memory.
其中,上述输入输出单元403,用于接收输入的数据和输出数据;The input/output unit 403 is configured to receive input data and output data.
上述处理单元401,用于获取移动终端的地理位置,确定与上述地理位置对应的方言类型;获取与上述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用上述目标算法对上述语音数据进行语音识别得到识别结果。The processing unit 401 is configured to acquire a geographic location of the mobile terminal, determine a dialect type corresponding to the geographical location, and obtain a recognition algorithm corresponding to the dialect type as a target algorithm; after collecting the voice data, use the target algorithm to perform the foregoing The speech data is speech-recognized to obtain the recognition result.
在一个可选的实现方式中,上述处理单元401,还用于在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。In an optional implementation, the processing unit 401 is further configured to: after the mobile terminal is started, acquire a history set, where the history set is that the mobile terminal counts the mobile after each time it is started. Obtaining the location information of the terminal; analyzing the historical record set to obtain the geographical area to which the mobile terminal belongs as the geographical location.
上述处理器401还用于执行的其他流程可以参考前文方法实施例,在此不再一一赘述。For other processes that the processor 401 is used for execution, reference may be made to the foregoing method embodiments, and details are not described herein again.
请参阅图5,图5是本发明实施例提供的一种移动终端的结构示意图,如图所示,该移动终端包括一个或多个处理器、存储器、通信接口以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述一个或多个处理器执行,上述程序包括用于执行以下步骤的指令;Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention. As shown, the mobile terminal includes one or more processors, a memory, a communication interface, and one or more programs, where One or more of the above programs are stored in the memory and configured to be executed by the one or more processors, the program including instructions for performing the following steps;
获取移动终端的地理位置,确定与上述地理位置对应的方言类型;获取与上述方言类型对应的识别算法作为目标算法;在采集到语音数据后,使用 上述目标算法对上述语音数据进行语音识别得到识别结果。Obtaining a geographic location of the mobile terminal, determining a dialect type corresponding to the geographical location; acquiring an identification algorithm corresponding to the dialect type as a target algorithm; and after acquiring the voice data, using the target algorithm to perform voice recognition on the voice data to be recognized result.
在本实施例中,地理位置可以使用经纬度,或者行政区划等方式来表示;也可以使用预置的方言区域划分来表示,并不仅限于经纬度的方式来表示该地理位置。方言类型是指方言所属的种类。目前在中国主要有如下七种。In this embodiment, the geographic location may be represented by means of latitude and longitude, or administrative division, etc.; it may also be represented by a preset dialect area division, and is not limited to the latitude and longitude manner to represent the geographical location. The dialect type refers to the kind to which the dialect belongs. At present, there are mainly seven types in China.
因此,不同的方言类型可以有不同的识别算法与之对应,特别地不同的识别算法可以对应到不同的方言类型的标准语音的语音数据库;因此对于确定的方言类型,可以有针对性地提高识别速度和准确度。Therefore, different dialect types may have different recognition algorithms corresponding thereto, and in particular, different recognition algorithms may correspond to speech databases of standard dialects of different dialect types; therefore, for the determined dialect types, the identification may be specifically improved. Speed and accuracy.
上述采集语音数据,可以是人对着终端设备说话,由终端设备的语音拾取设备,例如:话筒,采集用户输入的语音数据。在语音识别的算法,即目标算法确定后,具体的识别过程本发明实施例不作赘述。可以理解的是,对于不同的方言,可以有不同方言的语音数据库与识别算法配套使用。The voice data collected above may be a person speaking to the terminal device, and the voice pickup device of the terminal device, for example, a microphone, collects voice data input by the user. After the algorithm of the speech recognition, that is, the target algorithm is determined, the specific identification process is not described in detail in the embodiments of the present invention. It can be understood that for different dialects, a voice database with different dialects can be used in conjunction with the recognition algorithm.
在本实施例中,通过移动设备的地理位置来确定移动终端所属的区域使用哪些类型的方言,这样可以使用相应的识别算法来提高语音识别的准确性,因此提高了非标准语音的识别的准确率。In this embodiment, the types of dialects used by the area to which the mobile terminal belongs are determined by the geographic location of the mobile device, so that the corresponding recognition algorithm can be used to improve the accuracy of the voice recognition, thereby improving the accuracy of the recognition of the non-standard voice. rate.
在一个可选的实现方式中,由于即时获取的地理位置信息未必是终端设备的常用或者真实的能够体现其方言区域的位置,例如:出差客户的移动终端,因此本发明实施例提供了解决方案如下:上述获取移动终端的地理位置包括:In an optional implementation manner, the embodiment of the present invention provides a solution, because the location information obtained by the terminal device is not necessarily a common or real location of the terminal device, such as a mobile terminal of the travel client. As follows: The geographical location of obtaining the mobile terminal mentioned above includes:
在上述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析上述历史记录集,得到上述移动终端所属的地理区域作为上述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by counting the location information of the mobile terminal after the mobile terminal is started each time; analyzing the historical record set to obtain the foregoing The geographical area to which the mobile terminal belongs is the above geographical location.
在本实施例中,采用历史记录集的方式来确定终端设备真实所属的区域,这样可以避免移动终端频繁在各种不同方言区域移动导致判断不准确的问题。In this embodiment, the history set is used to determine the area to which the terminal device belongs. This can avoid the problem that the mobile terminal frequently moves in various dialect areas to cause inaccurate judgment.
上述分析历史记录集的方式,可以如:确定终端设备在某地理区域持续的时间最长,则该地理区域可以作为该移动终端最可能的真实地理位置区域。例如:汽车停放最多的地理位置,手机晚上所在最多的地理位置等等。The manner of analyzing the historical record set may be as follows: determining that the terminal device lasts for a long time in a certain geographical area, and the geographic area may be the most likely real geographical location area of the mobile terminal. For example, the location where the car is parked the most, the location where the phone is at night, and so on.
在一个可选的实现方式中,本发明实施例还提供了建立预先建立数据库来提高识别速度和准确性的实现方案,具体如下:在上述确定与上述地理位置对应的方言类型之前,还包括:In an optional implementation manner, the embodiment of the present invention further provides an implementation solution for establishing a pre-established database to improve the recognition speed and accuracy, and the following is as follows: Before determining the dialect type corresponding to the geographical location, the method further includes:
建立地理区域与方言类型之间对应关系的数据库,在上述数据库中一个地理区域对应到一个或一个以上的方言类型。A database for establishing a correspondence between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
本实施例中,通过建立了方言类型和数据库,那么可以针对更为细化的方言进行更为准确的识别,同一个方言类型也会分为多种更为细化的分支,因此建立相应的数据库可以进一步提高语音识别的准确性。In this embodiment, by establishing a dialect type and a database, a more accurate dialect can be more accurately identified, and the same dialect type is also divided into a plurality of more detailed branches, thus establishing corresponding The database can further improve the accuracy of speech recognition.
在一个可选的实现方式中,本发明实施例还提供了建立预先建立方言类 型与识别算法之间对应的关系的数据库来提高识别速度和准确性的实现方案,具体如下:在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:In an optional implementation manner, the embodiment of the present invention further provides an implementation scheme for establishing a database for pre-establishing a relationship between a dialect type and a recognition algorithm to improve recognition speed and accuracy, as follows: Before the recognition algorithm corresponding to the dialect type is used as the target algorithm, the method further includes:
建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
在一个可选的实现方式中,由于一个地理位置所在的区域方言可能比较复杂,因此有可能出现确定多个方言类型的情况,本实施例提供了解决方案如下:获取与上述方言类型对应的识别算法作为目标算法包括:In an optional implementation manner, since the regional dialect where a geographical location is located may be more complicated, it may be possible to determine a plurality of dialect types. The embodiment provides the solution as follows: acquiring the identifier corresponding to the above dialect type The algorithm as a target algorithm includes:
在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
在本实施例中,可以对应到不同的方言类型,获得多个识别算法与之分别对应;有可能多种方言类型对应到一种识别算法,因此识别算法的个数可以比方言类型的数量少。In this embodiment, multiple recognition algorithms may be correspondingly corresponding to different dialect types; it is possible that multiple dialect types correspond to one recognition algorithm, so the number of recognition algorithms may be less than the number of dialect types. .
在一个可选的实现方式中,由于使用了多种识别算法,那么可能会出现多个不同的识别结果,本实施例提供了解决方案具体如下:上述使用上述目标算法对上述语音数据进行语音识别得到识别结果包括:In an optional implementation, a plurality of different recognition algorithms may be used, and multiple different recognition results may occur. This embodiment provides a solution as follows: the foregoing target algorithm is used to perform voice recognition on the voice data. The recognition results include:
使用获取的各目标算法对上述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The obtained speech data is subjected to speech recognition using the obtained target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
基于概率论,识别结果会对应到一个准确的概率,那么各个识别算法得到的识别结果都会对应到一个概率,那么可以将概率值最大的识别结果作为最终的识别结果。Based on the probability theory, the recognition result will correspond to an accurate probability, and the recognition result obtained by each recognition algorithm will correspond to a probability, then the recognition result with the largest probability value can be used as the final recognition result.
在一个可选的实现方式中,本发明实施例还提供了进一步矫正识别算法的选择方案,具体如下:在使用上述目标算法对上述语音数据进行语音识别得到识别结果之后,还包括:In an optional implementation manner, the embodiment of the present invention further provides a selection scheme for further correcting the identification algorithm, which is specifically as follows: after performing the voice recognition on the voice data by using the foregoing target algorithm to obtain the recognition result, the method further includes:
将识别结果按照准确的概率由大至小进行排序;输出准确的概率大于或等于预设阈值的至少两个识别结果;接收选择指令;在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。Sorting the recognition results according to an accurate probability from the largest to the smallest; outputting an accurate probability that is greater than or equal to the at least two recognition results of the preset threshold; receiving the selection instruction; wherein the selection instruction specifies the at least two recognition results After accurately identifying the result, the target algorithm is modified to an identification algorithm corresponding to the recognition result.
在本实施例中,在采集到语音数据后,然后采用一种或者多种算法得到了两种以上的识别结果,然后通过用户确认的更为准确的识别结果可以确定哪一种算法更好;该方案极为适合例如手机等较为私人或者口音类似的用户使用,可以在保证识别速度的前提下,提高非标准语音的识别的准确率。In this embodiment, after the voice data is collected, two or more recognition results are obtained by using one or more algorithms, and then a more accurate recognition result confirmed by the user can determine which algorithm is better; The solution is very suitable for users such as mobile phones, which are relatively private or similar in accent, and can improve the accuracy of recognition of non-standard voices under the premise of ensuring recognition speed.
上述主要从方法侧执行过程的角度对本发明实施例的方案进行了介绍。可以理解的是,移动终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和 计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The above description mainly introduces the solution of the embodiment of the present invention from the perspective of the method side execution process. It can be understood that, in order to implement the above functions, the mobile terminal includes corresponding hardware structures and/or software modules for performing various functions. Those skilled in the art will readily appreciate that the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
本发明实施例可以根据上述方法示例对移动终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本发明实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present invention may divide the functional unit into the mobile terminal according to the foregoing method example. For example, each functional unit may be divided according to each function, or two or more functions may be integrated into one processing unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
本发明实施例还提供了另一种移动终端,如图6所示,为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该移动终端可以为包括手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等任意终端设备,以移动终端为手机为例:The embodiment of the present invention further provides another mobile terminal. As shown in FIG. 6 , for the convenience of description, only parts related to the embodiment of the present invention are shown. For details that are not disclosed, refer to the method of the embodiment of the present invention. section. The mobile terminal can be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), an in-vehicle computer, and the mobile terminal is used as a mobile phone as an example:
图6示出的是与本发明实施例提供的移动终端相关的手机的部分结构的框图。参考图6,手机包括:射频(Radio Frequency,RF)电路910、存储器920、输入单元930、显示单元940、传感器950、音频电路960、无线保真(Wireless Fidelity,WiFi)模块970、处理器980、以及电源990等部件。本领域技术人员可以理解,图6中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。FIG. 6 is a block diagram showing a partial structure of a mobile phone related to a mobile terminal provided by an embodiment of the present invention. Referring to FIG. 6, the mobile phone includes: a radio frequency (RF) circuit 910, a memory 920, an input unit 930, a display unit 940, a sensor 950, an audio circuit 960, a wireless fidelity (WiFi) module 970, and a processor 980. And power supply 990 and other components. It will be understood by those skilled in the art that the structure of the handset shown in FIG. 6 does not constitute a limitation to the handset, and may include more or less components than those illustrated, or some components may be combined, or different components may be arranged.
下面结合图6对手机的各个构成部件进行具体的介绍:The following describes the components of the mobile phone in detail with reference to FIG. 6:
RF电路910可用于信息的接收和发送。通常,RF电路910包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路910还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。The RF circuit 910 can be used for receiving and transmitting information. Generally, RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 910 can also communicate with the network and other devices via wireless communication. The above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
存储器920可用于存储软件程序以及模块,处理器980通过运行存储在存储器920的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器920可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据手机的使用所创建的数据(比如应用的使用参数等)等。此外,存储器920可以包括高 速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 920 can be used to store software programs and modules, and the processor 980 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 920. The memory 920 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; the storage data area can store data created according to the use of the mobile phone (such as an application). Use parameters, etc.). Further, the memory 920 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
输入单元930可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元930可包括指纹传感器931以及其他输入设备932。指纹传感器931,可采集用户在其上的指纹数据。除了指纹传感器931,输入单元930还可以包括其他输入设备932。具体地,其他输入设备932可以包括但不限于触控屏、物理按键、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 930 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset. Specifically, the input unit 930 may include a fingerprint sensor 931 and other input devices 932. The fingerprint sensor 931 can collect fingerprint data of the user. In addition to the fingerprint sensor 931, the input unit 930 may also include other input devices 932. Specifically, the other input device 932 may include, but is not limited to, one or more of a touch screen, a physical button, a function key (such as a volume control button, a switch button, etc.), a trackball, a mouse, a joystick, and the like.
显示单元940可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元940可包括显示屏941,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示屏941。虽然在图6中,指纹传感器931与显示屏941是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将指纹传感器931与显示屏941集成而实现手机的输入和播放功能。The display unit 940 can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone. The display unit 940 can include a display screen 941. Alternatively, the display screen 941 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Although in FIG. 6, the fingerprint sensor 931 and the display screen 941 are two separate components to implement the input and input functions of the mobile phone, in some embodiments, the fingerprint sensor 931 can be integrated with the display screen 941 to implement the mobile phone. Input and playback features.
手机还可包括至少一种传感器950,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示屏941的亮度,接近传感器可在手机移动到耳边时,关闭显示屏941和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The handset may also include at least one type of sensor 950, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen 941 according to the brightness of the ambient light, and the proximity sensor may turn off the display screen 941 and/or when the mobile phone moves to the ear. Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
音频电路960、扬声器961,传声器962可提供用户与手机之间的音频接口。音频电路960可将接收到的音频数据转换后的电信号,传输到扬声器961,由扬声器961转换为声音信号播放;另一方面,传声器962将收集的声音信号转换为电信号,由音频电路960接收后转换为音频数据,再将音频数据播放处理器980处理后,经RF电路910以发送给比如另一手机,或者将音频数据播放至存储器920以便进一步处理。An audio circuit 960, a speaker 961, and a microphone 962 can provide an audio interface between the user and the handset. The audio circuit 960 can transmit the converted electrical data of the received audio data to the speaker 961 for conversion to the sound signal by the speaker 961; on the other hand, the microphone 962 converts the collected sound signal into an electrical signal by the audio circuit 960. After receiving, it is converted into audio data, and then processed by the audio data playback processor 980, sent to the other mobile phone via the RF circuit 910, or played back to the memory 920 for further processing.
WiFi属于短距离无线传输技术,手机通过WiFi模块970可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图6示出了WiFi模块970,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-range wireless transmission technology, and the mobile phone can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 970, which provides users with wireless broadband Internet access. Although FIG. 6 shows the WiFi module 970, it can be understood that it does not belong to the essential configuration of the mobile phone, and can be omitted as needed within the scope of not changing the essence of the invention.
处理器980是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器920内的软件程序和/或模块,以及调用存 储在存储器920内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器980可包括一个或多个处理单元;优选的,处理器980可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器980中。The processor 980 is the control center of the handset, which connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 920, and invoking data stored in the memory 920, executing The phone's various functions and processing data, so that the overall monitoring of the phone. Optionally, the processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 980.
手机还包括给各个部件供电的电源990(比如电池),优选的,电源可以通过电源管理系统与处理器980逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The handset also includes a power source 990 (such as a battery) that supplies power to the various components. Preferably, the power source can be logically coupled to the processor 980 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。Although not shown, the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
前述图1所示的实施例中,各步骤方法流程可以基于该手机的结构实现。In the foregoing embodiment shown in FIG. 1, each step method flow can be implemented based on the structure of the mobile phone.
前述图3~4所示的实施例中,各单元功能可以基于该手机的结构实现。In the embodiments shown in the foregoing FIGS. 3 to 4, each unit function can be implemented based on the structure of the mobile phone.
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤,上述计算机包括移动终端。The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program causing the computer to perform some or all of the steps of any of the methods described in the foregoing method embodiments. The above computer includes a mobile terminal.
本发明实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,上述计算机包括移动终端。The embodiment of the present invention further provides a computer program product, the computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause the computer to execute any one of the methods described in the foregoing method embodiments Part or all of the steps of the method. The computer program product can be a software installation package, and the computer includes a mobile terminal.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated. The components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例上述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The above-described integrated unit can be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a memory. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the above-described methods of various embodiments of the present invention. The foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。A person skilled in the art can understand that all or part of the steps of the foregoing embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable memory, and the memory can include: a flash drive , read-only memory (English: Read-Only Memory, referred to as: ROM), random accessor (English: Random Access Memory, referred to as: RAM), disk or CD.
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been described in detail above, and the principles and implementations of the present invention are described in detail herein. The description of the above embodiments is only for helping to understand the method of the present invention and its core ideas; It should be understood by those skilled in the art that the present invention is not limited by the scope of the present invention.

Claims (20)

  1. 一种语音识别方法,其特征在于,包括:A speech recognition method, comprising:
    获取移动终端的地理位置,确定与所述地理位置对应的方言类型;Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;
    获取与所述方言类型对应的识别算法作为目标算法;Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;
    在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
  2. 根据权利要求1所述方法,其特征在于,所述获取移动终端的地理位置包括:The method of claim 1, wherein the obtaining the geographic location of the mobile terminal comprises:
    在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
  3. 根据权利要求2所述方法,其特征在于,在所述确定与所述地理位置对应的方言类型之前,还包括:The method according to claim 2, further comprising: before the determining the dialect type corresponding to the geographical location, further comprising:
    建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
  4. 根据权利要求3所述方法,其特征在于,在所述获取与所述方言类型对应的识别算法作为目标算法之前,还包括:The method according to claim 3, further comprising: before the obtaining the recognition algorithm corresponding to the dialect type as the target algorithm,
    建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
  5. 根据权利要求4所述方法,其特征在于,所述获取与所述方言类型对应的识别算法作为目标算法包括:The method according to claim 4, wherein the obtaining the recognition algorithm corresponding to the dialect type as the target algorithm comprises:
    在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  6. 根据权利要求5所述方法,其特征在于,所述使用所述目标算法对所述语音数据进行语音识别得到识别结果包括:The method according to claim 5, wherein the using the target algorithm to perform speech recognition on the speech data to obtain a recognition result comprises:
    使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  7. 根据权利要求5项所述方法,其特征在于,在使用所述目标算法对所述语音数据进行语音识别得到识别结果之后,所述方法还包括:The method according to claim 5, wherein after the speech data is speech-recognized using the target algorithm to obtain a recognition result, the method further comprises:
    将识别结果按照准确的概率由大至小进行排序;Sort the recognition results according to the exact probability from large to small;
    输出准确的概率大于或等于预设阈值的至少两个识别结果;Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;
    接收选择指令;Receiving a selection instruction;
    在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
  8. 一种移动终端,其特征在于,包括一个或多个处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器 中,并且被配置由所述一个或多个处理器执行,所述程序包括用于执行以下操作的指令:A mobile terminal, comprising: one or more processors, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured Executed by one or more processors, the program includes instructions for performing the following operations:
    获取移动终端的地理位置,确定与所述地理位置对应的方言类型;Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;
    获取与所述方言类型对应的识别算法作为目标算法;Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;
    在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
  9. 根据权利要求8所述的移动终端,其特征在于,在获取移动终端的地理位置方面,所述程序中指令具体用于执行以下操作:The mobile terminal according to claim 8, wherein in the obtaining the geographical location of the mobile terminal, the instruction in the program is specifically configured to perform the following operations:
    在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
  10. 根据权利要求9所述的移动终端,其特征在于,所述程序还包括用于执行以下操作的指令:The mobile terminal of claim 9, wherein the program further comprises instructions for performing the following operations:
    建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
  11. 根据权利要求10所述的移动终端,其特征在于,所述程序还包括用于执行以下操作的指令:The mobile terminal of claim 10, wherein the program further comprises instructions for performing the following operations:
    建立方言类型与识别算法之间对应关系的数据库,在所述方言类型与识别算法之间对应关系的数据库中一个方言类型对应到一个识别算法。A database is established for the correspondence between the dialect type and the recognition algorithm, and one dialect type in the database corresponding to the dialect type and the recognition algorithm corresponds to a recognition algorithm.
  12. 根据权利要求11所述的移动终端,其特征在于,在获取与所述方言类型对应的识别算法作为目标算法方面,所述程序中的指令具体用于执行以下操作:The mobile terminal according to claim 11, wherein in the obtaining an identification algorithm corresponding to the dialect type as a target algorithm, the instructions in the program are specifically configured to perform the following operations:
    在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  13. 根据权利要求12所述的移动终端,其特征在于,在使用所述目标算法对所述语音数据进行语音识别得到识别结果方面,所述程序中的指令具体用于执行以下操作:The mobile terminal according to claim 12, wherein in the speech recognition of the speech data using the target algorithm to obtain a recognition result, the instruction in the program is specifically configured to perform the following operations:
    使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  14. 根据权利要求12所述的移动终端,其特征在于,所述程序还包括用于执行以下操作的指令:The mobile terminal of claim 12, wherein the program further comprises instructions for:
    将识别结果按照准确的概率由大至小进行排序;Sort the recognition results according to the exact probability from large to small;
    输出准确的概率大于或等于预设阈值的至少两个识别结果;Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;
    接收选择指令;Receiving a selection instruction;
    在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将 所述目标算法修正为所述识别结果对应的识别算法。After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is corrected to an identification algorithm corresponding to the recognition result.
  15. 一种计算机可读存储介质,其特征在于,其存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行以下操作:A computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to:
    获取移动终端的地理位置,确定与所述地理位置对应的方言类型;Obtaining a geographic location of the mobile terminal, and determining a dialect type corresponding to the geographic location;
    获取与所述方言类型对应的识别算法作为目标算法;Obtaining an identification algorithm corresponding to the dialect type as a target algorithm;
    在采集到语音数据后,使用所述目标算法对所述语音数据进行语音识别得到识别结果。After the voice data is collected, the target data is used to perform voice recognition on the voice data to obtain a recognition result.
  16. 根据权利要求15所述计算机可读存储介质,其特征在于,在执行获取移动终端的地理位置时,所述计算机具体用于执行以下操作:The computer readable storage medium according to claim 15, wherein the computer is specifically configured to perform the following operations when performing obtaining the geographic location of the mobile terminal:
    在所述移动终端被启动后,获取历史记录集,所述历史记录集是所述移动终端在每次被启动后统计所述移动终端所处的位置信息得到的;分析所述历史记录集,得到所述移动终端所属的地理区域作为所述地理位置。After the mobile terminal is started, acquiring a history set, where the history record is obtained by the mobile terminal counting the location information of the mobile terminal after being activated each time; analyzing the history record set, A geographical area to which the mobile terminal belongs is obtained as the geographical location.
  17. 根据权利要求16所述计算机可读存储介质,其特征在于,在执行确定与所述地理位置对应的方言类型之前,所述计算机还执行以下操作:A computer readable storage medium according to claim 16, wherein said computer further performs the following operations before performing a determination of a dialect type corresponding to said geographic location:
    建立地理区域与方言类型之间对应关系的数据库,在所述数据库中一个地理区域对应到一个或一个以上的方言类型。A database is established that corresponds to a relationship between a geographic area and a dialect type, in which one geographic area corresponds to one or more dialect types.
  18. 根据权利要求17所述计算机可读存储介质,其特征在于,在执行获取与所述方言类型对应的识别算法作为目标算法之前,所述计算机还执行以下操作:The computer readable storage medium according to claim 17, wherein said computer further performs the following operations before performing an acquisition of an identification algorithm corresponding to said dialect type as a target algorithm:
    在确定的方言类型数量大于1种的情况下,获取分别与各方言类型对应的识别算法作为目标算法。When the determined number of dialect types is greater than one, an identification algorithm corresponding to each type of dialect is obtained as the target algorithm.
  19. 根据权利要求18所述计算机可读存储介质,其特征在于,在执行使用所述目标算法对所述语音数据进行语音识别得到识别结果时,所述计算机具体执行以下操作:The computer readable storage medium according to claim 18, wherein the computer specifically performs the following operations when performing speech recognition on the speech data using the target algorithm to obtain a recognition result:
    使用获取的各目标算法对所述语音数据进行语音识别,将准确的概率最大识别结果作为最终的识别结果。The voice data is voice-recognized using the acquired target algorithms, and the accurate probability maximum recognition result is used as the final recognition result.
  20. 根据权利要求18项所述计算机可读存储介质,其特征在于,在执行使用所述目标算法对所述语音数据进行语音识别得到识别结果之后,所述计算机还执行以下操作:The computer readable storage medium according to claim 18, wherein after performing speech recognition of the speech data using the target algorithm to obtain a recognition result, the computer further performs the following operations:
    将识别结果按照准确的概率由大至小进行排序;Sort the recognition results according to the exact probability from large to small;
    输出准确的概率大于或等于预设阈值的至少两个识别结果;Outputting an accurate probability that is greater than or equal to at least two recognition results of a preset threshold;
    接收选择指令;Receiving a selection instruction;
    在所述选择指令指定了所述至少两个识别结果中准确的识别结果后,将所述目标算法修正为所述识别结果对应的识别算法。After the selection instruction specifies an accurate recognition result in the at least two recognition results, the target algorithm is modified to a recognition algorithm corresponding to the recognition result.
PCT/CN2018/086205 2017-05-31 2018-05-09 Speech recognition and related products WO2018219105A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710401786.1 2017-05-31
CN201710401786.1A CN107274885B (en) 2017-05-31 2017-05-31 Speech recognition method and related product

Publications (1)

Publication Number Publication Date
WO2018219105A1 true WO2018219105A1 (en) 2018-12-06

Family

ID=60064910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/086205 WO2018219105A1 (en) 2017-05-31 2018-05-09 Speech recognition and related products

Country Status (2)

Country Link
CN (1) CN107274885B (en)
WO (1) WO2018219105A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274885B (en) * 2017-05-31 2020-05-26 Oppo广东移动通信有限公司 Speech recognition method and related product
CN108417203A (en) * 2018-01-31 2018-08-17 广东聚晨知识产权代理有限公司 A kind of human body speech recognition transmission method and system
CN108346426B (en) * 2018-02-01 2020-12-08 威盛电子(深圳)有限公司 Speech recognition device and speech recognition method
CN110797014B (en) * 2018-07-17 2024-06-07 中兴通讯股份有限公司 Speech recognition method, device and computer storage medium
CN110909134A (en) * 2018-09-18 2020-03-24 奇酷互联网络科技(深圳)有限公司 Voice conversion method, mobile terminal and readable storage medium
CN109377990A (en) * 2018-09-30 2019-02-22 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN109410935A (en) * 2018-11-01 2019-03-01 平安科技(深圳)有限公司 A kind of destination searching method and device based on speech recognition
CN109493848A (en) * 2018-12-17 2019-03-19 深圳市沃特沃德股份有限公司 Audio recognition method, system and electronic device
CN111951808B (en) * 2019-04-30 2023-09-08 深圳市优必选科技有限公司 Voice interaction method, device, terminal equipment and medium
CN112116909A (en) * 2019-06-20 2020-12-22 杭州海康威视数字技术股份有限公司 Voice recognition method, device and system
CN110491368B (en) * 2019-07-23 2023-06-16 平安科技(深圳)有限公司 Dialect background-based voice recognition method, device, computer equipment and storage medium
CN110570837B (en) * 2019-08-28 2022-03-11 卓尔智联(武汉)研究院有限公司 Voice interaction method and device and storage medium
CN111142999A (en) * 2019-12-24 2020-05-12 深圳市元征科技股份有限公司 Equipment language selection method, system, device and computer storage medium
CN111291154B (en) * 2020-01-17 2022-08-23 厦门快商通科技股份有限公司 Dialect sample data extraction method, device and equipment and storage medium
CN112749543B (en) * 2020-12-22 2022-08-05 浙江吉利控股集团有限公司 Matching method, device, equipment and storage medium for information analysis process
CN114165819A (en) * 2021-11-26 2022-03-11 珠海格力电器股份有限公司 Range hood, control method and module thereof and computer readable medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005331608A (en) * 2004-05-18 2005-12-02 Matsushita Electric Ind Co Ltd Device and method for processing information
CN103037117A (en) * 2011-09-29 2013-04-10 中国电信股份有限公司 Method and system of voice recognition and voice access platform
CN103903611A (en) * 2012-12-24 2014-07-02 联想(北京)有限公司 Speech information identifying method and equipment
CN104575493A (en) * 2010-05-26 2015-04-29 谷歌公司 Acoustic model adaptation using geographic information
CN105225665A (en) * 2015-10-15 2016-01-06 桂林电子科技大学 A kind of audio recognition method and speech recognition equipment
CN105931643A (en) * 2016-06-30 2016-09-07 北京海尔广科数字技术有限公司 Speech recognition method and apparatus
CN106057204A (en) * 2016-05-05 2016-10-26 刘世超 Online calling service method and system
CN106228974A (en) * 2016-08-19 2016-12-14 镇江惠通电子有限公司 Control method based on speech recognition, Apparatus and system
CN107274885A (en) * 2017-05-31 2017-10-20 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN107316637A (en) * 2017-05-31 2017-11-03 广东欧珀移动通信有限公司 Audio recognition method and Related product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106128462A (en) * 2016-06-21 2016-11-16 东莞酷派软件技术有限公司 Audio recognition method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005331608A (en) * 2004-05-18 2005-12-02 Matsushita Electric Ind Co Ltd Device and method for processing information
CN104575493A (en) * 2010-05-26 2015-04-29 谷歌公司 Acoustic model adaptation using geographic information
CN103037117A (en) * 2011-09-29 2013-04-10 中国电信股份有限公司 Method and system of voice recognition and voice access platform
CN103903611A (en) * 2012-12-24 2014-07-02 联想(北京)有限公司 Speech information identifying method and equipment
CN105225665A (en) * 2015-10-15 2016-01-06 桂林电子科技大学 A kind of audio recognition method and speech recognition equipment
CN106057204A (en) * 2016-05-05 2016-10-26 刘世超 Online calling service method and system
CN105931643A (en) * 2016-06-30 2016-09-07 北京海尔广科数字技术有限公司 Speech recognition method and apparatus
CN106228974A (en) * 2016-08-19 2016-12-14 镇江惠通电子有限公司 Control method based on speech recognition, Apparatus and system
CN107274885A (en) * 2017-05-31 2017-10-20 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN107316637A (en) * 2017-05-31 2017-11-03 广东欧珀移动通信有限公司 Audio recognition method and Related product

Also Published As

Publication number Publication date
CN107274885A (en) 2017-10-20
CN107274885B (en) 2020-05-26

Similar Documents

Publication Publication Date Title
WO2018219105A1 (en) Speech recognition and related products
CN107170454B (en) Speech recognition method and related product
JP5996783B2 (en) Method and terminal for updating voiceprint feature model
WO2020029906A1 (en) Multi-person voice separation method and apparatus
WO2021135611A1 (en) Method and device for speech recognition, terminal and storage medium
CN108538320B (en) Recording control method and device, readable storage medium and terminal
WO2018072543A1 (en) Model generation method, speech synthesis method and apparatus
US11274932B2 (en) Navigation method, navigation device, and storage medium
CN109903773B (en) Audio processing method, device and storage medium
CN111798821B (en) Sound conversion method, device, readable storage medium and electronic equipment
KR20160106075A (en) Method and device for identifying a piece of music in an audio stream
CN110097895B (en) Pure music detection method, pure music detection device and storage medium
CN107316637A (en) Audio recognition method and Related product
CN112751648A (en) Packet loss data recovery method and related device
CN111522592A (en) Intelligent terminal awakening method and device based on artificial intelligence
WO2018214760A1 (en) Focusing method and related product
CN106791010B (en) Information processing method and device and mobile terminal
CN112242143B (en) Voice interaction method and device, terminal equipment and storage medium
CN109684501B (en) Lyric information generation method and device
CN112259076A (en) Voice interaction method and device, electronic equipment and computer readable storage medium
CN112948763B (en) Piece quantity prediction method and device, electronic equipment and storage medium
WO2020102979A1 (en) Method and apparatus for processing voice information, storage medium and electronic device
CN116597828B (en) Model determination method, model application method and related device
CN117012202B (en) Voice channel recognition method and device, storage medium and electronic equipment
CN106847280A (en) Audio-frequency information processing method, intelligent terminal and Voice command terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18810404

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18810404

Country of ref document: EP

Kind code of ref document: A1