WO2018149209A1 - 语音识别方法、电子设备以及计算机存储介质 - Google Patents

语音识别方法、电子设备以及计算机存储介质 Download PDF

Info

Publication number
WO2018149209A1
WO2018149209A1 PCT/CN2017/113154 CN2017113154W WO2018149209A1 WO 2018149209 A1 WO2018149209 A1 WO 2018149209A1 CN 2017113154 W CN2017113154 W CN 2017113154W WO 2018149209 A1 WO2018149209 A1 WO 2018149209A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
voiceprint
voice
matched
semantic
Prior art date
Application number
PCT/CN2017/113154
Other languages
English (en)
French (fr)
Inventor
万秋生
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2019539928A priority Critical patent/JP6771805B2/ja
Priority to KR1020197016994A priority patent/KR102222317B1/ko
Priority to EP17897119.8A priority patent/EP3584786B1/en
Publication of WO2018149209A1 publication Critical patent/WO2018149209A1/zh
Priority to US16/442,193 priority patent/US11043211B2/en
Priority to US17/244,737 priority patent/US11562736B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Definitions

  • the present invention relates to the field of computer information processing technologies, and in particular, to a voice recognition method, an electronic device, and a computer storage medium.
  • Speech recognition technology is applied to various intelligent products to achieve intelligent control, along with smart products. Increased and more accurate requirements for speech recognition, a variety of speech recognition technologies emerge one after another.
  • the commonly used voice recognition method is to extract the feature of the voice information to be recognized sent by the user, and then identify the voice information to be recognized sent by the user according to the recognition algorithm.
  • the captured voice information to be recognized may contain the content of multiple people, wherein only one person's voice information to be recognized is valid, and there is a voice issued by another person. Noise, unable to recognize the correct semantics, resulting in insufficient accuracy of speech recognition.
  • a voice recognition method an electronic device, and a computer storage medium are provided.
  • a speech recognition method includes the following steps:
  • a voiceprint information that has not been matched is obtained from the local voiceprint database as the voiceprint information to be matched;
  • the combined semantic information When the combined semantic information satisfies the preset rule, the combined semantic information is used as a speech recognition result.
  • An electronic device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
  • a voiceprint information that has not been matched is obtained from the local voiceprint database as the voiceprint information to be matched;
  • the combined semantic information When the combined semantic information satisfies the preset rule, the combined semantic information is used as a speech recognition result.
  • a computer storage medium having stored thereon a computer program, The step of implementing the expression processing method of any of the above described ones when the computer program is executed by the processor.
  • FIG. 1 is a schematic diagram of an application environment of a voice recognition method according to an embodiment
  • FIG. 2 is a schematic diagram showing the internal structure of an electronic device in an embodiment
  • FIG. 3 is a schematic flow chart of a voice recognition method in an embodiment
  • FIG. 4 is a schematic flow chart of a voice recognition method according to another embodiment
  • FIG. 5 is a schematic flowchart of a voice recognition method according to an embodiment
  • FIG. 6 is a structural block diagram of an electronic device in an embodiment
  • FIG. 7 is a structural block diagram of an electronic device in another embodiment
  • FIG. 8 is a structural block diagram of a memory module in an electronic device in another embodiment.
  • FIG. 1 is a schematic diagram of an application environment of a voice recognition method in an embodiment.
  • the speech recognition method is applied to a speech recognition system.
  • the speech recognition system includes a terminal 10 and a server 20, and the terminal 10 and the server 20 can communicate via a network.
  • the terminal 10 can identify the voice information to obtain the semantic information, and further process the semantic information to determine the voice recognition result, and can also upload the obtained voice information to the corresponding server 20 through the network, and the server 20 can upload the voice to the terminal 10.
  • the information is identified, and the recognition result is transmitted to the terminal 10 through the network, and the terminal 10 takes the received recognition result as semantic information, and determines the voice recognition result according to the received semantic information.
  • the terminal 10 can generate corresponding instructions according to the voice recognition result to perform subsequent related operations, and implement voice intelligent control.
  • the terminal 10 can be any device capable of implementing intelligent input and output and recognizing voice, for example, a desktop terminal or a mobile terminal, and the mobile terminal can be a smart phone, a tablet computer, a vehicle-mounted computer, a wearable smart device, or the like.
  • the server 20 may be a server on which a platform that receives voice information and performs voice recognition is located, and the server may be implemented by a separate server or a server cluster composed of multiple servers.
  • an electronic device which may be the terminal 10 of FIG.
  • the electronic device includes a processor coupled through a system bus, a non-volatile storage medium, an internal memory, and a communication interface.
  • the non-volatile storage medium of the electronic device stores an operating system, a local voiceprint database, and computer readable instructions, and the local voiceprint database stores voiceprint information, and the computer readable instructions can be used to implement a voice recognition method.
  • the processor of the electronic device is used to provide computing and control capabilities to support the operation of the entire electronic device.
  • the computer readable instructions may be stored in the internal memory of the electronic device, and when the computer readable instructions are executed by the processor, the processor may be caused to perform a speech recognition method.
  • the communication interface is used to communicate with the server 20.
  • the structure shown in FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the electronic device to which the solution of the present application is applied.
  • the specific electronic device may be It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • a voice recognition method is provided. This embodiment is applied to the terminal 10 of FIG. 1 to illustrate.
  • the method specifically includes the following steps S310 to S360:
  • S310 Acquire acquired voice information to be identified, and determine semantic information of the voice information to be identified.
  • the voice information may be audio information input by the user through the voice input device of the terminal, that is, the voice information of the user may be collected through the voice input device, and after the voice information to be recognized is collected, the collected information to be recognized may be acquired.
  • Voice information wherein the voice input device can include but Not limited to a microphone.
  • the voice information to be recognized refers to the voice information that needs to be identified to obtain the semantic information.
  • the semantic information may be text information, and the voice information to be recognized is obtained by voice recognition, and the semantic information of the voice information to be recognized may be determined. That is, the semantic meaning expressed by the user who inputs the voice information to be recognized can be determined.
  • the preset rule may be a preset requirement for semantic information, that is, the semantic information does not satisfy the preset.
  • the rule indicates that the voice message does not satisfy the requirement, it is considered to be inaccurate.
  • the pre-rule can be the semantic information conforming to the preset.
  • the terminal implementing the voice recognition method of the embodiment may include multiple working modes, which may include, but are not limited to, a navigation mode, a music mode, a broadcast mode, and a program mode, and work in different working modes.
  • the terminal can meet different user needs, and each working mode has its corresponding vocabulary.
  • the vocabulary includes vocabulary that may be used in the working mode.
  • the key can be obtained after determining the linguistic information segmentation. Whether the word is in the thesaurus, if it is, indicates that the semantic information of the user's to-be-identified voice information is a word that may be used in the working mode of the terminal.
  • the preset rule may be that the semantic information conforms to the preset grammar rule and the semantic information is in a single vocabulary, and the semantic information does not satisfy the preset rule, and the semantic information obtained by identifying the recognized voice information cannot be It is accurately recognized by the terminal, so that it cannot be converted into a corresponding instruction to perform the corresponding operation.
  • the preset rule may also be that the semantic information conforms to the preset grammar rule, the semantic information is in a single lexicon and the voice information has corresponding instructions, and the semantic information conforms to the preset grammar rule and the voice information is in a single lexicon.
  • the semantic information cannot be converted into a valid instruction, that is, if the semantic information does not have a corresponding instruction, it is considered that it does not conform to the preset rule.
  • the determined semantic information is "I want you to play music", User A corresponds to "I want to play music”, but during the process of speaking, User B inserts "Hello” after User A's "I want", although "Play Music” is in the music mode corresponding thesaurus.
  • the grammar of the whole sentence does not conform to the normal grammar of human beings, so that it can be considered that it does not satisfy the preset rules.
  • the semantic information is “Hello”, which is consistent with the pre-set grammar rules and in the thesaurus, but it is essentially a greeting, not a control, and the terminal does not correspond to “hello”.
  • the instruction that is, the instruction that performs the corresponding operation cannot be generated, and it can be considered that it does not satisfy the preset rule.
  • the recognized speech information needs to be segmented to obtain each speech segment, and the voiceprint information of each speech segment is extracted. Since each person's voiceprint information is different, the different voice information of the same person corresponds to the same voiceprint information. For example, user A speaks different voice information, but for the same user A, the voiceprint information is the same. of. In order to improve the accuracy, the voice information of the single person can be proposed by the judgment of the voiceprint information.
  • the local voiceprint database can store voiceprint information, and the voiceprint information stored in the local voiceprint database can be voiceprint information of the user who has performed voice interaction with the terminal, and the corresponding semantic information satisfies the preset rule at least once.
  • the semantic information corresponding to each voice segment is matched with the voiceprint information that has not been matched in the local voiceprint database, firstly, a voiceprint information that has not been matched is obtained from the local voiceprint database as the voiceprint information to be matched.
  • the voiceprint information of each voice segment is matched with the voiceprint information in the local voiceprint database, each time the voiceprint information to be matched in the local voiceprint database is matched with the voiceprint information of each voice segment, that is, The voice information of a single user can be filtered out.
  • S340 Match the voiceprint information of each voice segment with the voiceprint information to be matched, and determine the filtered voiceprint information that is successfully matched with the voiceprint information to be matched from the voiceprint information of each voice segment.
  • the voice information may include voices of multiple users, and after selecting a voiceprint information to be matched that has not been matched from the local voiceprint database, after selecting a user's voiceprint information, the voiceprint information of each voice segment is The voiceprint information to be matched is matched, and the voiceprint information of the same user is the same.
  • the voiceprint information of the voiceprint information of each voice segment that matches the voiceprint information to be matched is the voiceprint information of the same user, that is, the voiceprint information after the screening is the voiceprint information of the user corresponding to the voiceprint information to be matched.
  • S350 Combine the voice segments corresponding to the selected voiceprint information to obtain combined voice information, and determine combined semantic information of the combined voice information.
  • the voice segments corresponding to the selected voiceprint information can be combined, that is, the voice segments of the same user are combined, and the combined voice information obtained is the voice of the same user. It is the voice data of the same user, and then, the combined semantic information of the combined voice information is determined, and the combined semantic information is the semantic meaning of the accurate expression corresponding to the voice information to be recognized by the user.
  • the voice information may be the voice of multiple users, and the combined voice information of the combined voice information may be obtained through the above steps.
  • the combined semantic information can be used as the voice. Identify the results and achieve speech recognition. Subsequently, corresponding instructions can be generated according to the speech recognition result, and corresponding operations can be performed according to the instructions.
  • the semantic information is “I want you to play music”, if the selected voiceprint information to be matched is the voiceprint information of user A, the voiceprint information of the voice clips “I want” and “play music” and user A’s
  • the voiceprint information is successfully matched, that is, the voiceprint information after the screening is the voiceprint information of the voice clips "I want” and "play music”, and the voice clips "I want” and “play music” can be combined as the final combined voice information.
  • Determine the combined voice information as "I want to play music” which meets the preset grammar requirements and is in the thesaurus, and is a semantic information that needs to perform the music playing operation, and has corresponding instructions, that is, the music can be played according to the instruction.
  • the operation considers that the combined semantic information is a predetermined rule, and "I want to play music” as a speech recognition result. Subsequent instructions for playing music corresponding to it can be generated to execute music.
  • the above voice recognition method first determines the semantic information of the voice information to be recognized, and the semantic information When the preset rule is not satisfied, it means that the semantic information identified at this time may be inaccurate.
  • the voice information to be recognized is segmented, each voice segment is obtained, and the voiceprint information of each voice segment is extracted, and the local voiceprint database is obtained.
  • an unmatched voiceprint information is obtained from the local voiceprint database as the voiceprint information to be matched, which provides a basis for subsequent voiceprint matching, that is, provides a single user for voiceprint matching.
  • the voiceprint information to be matched is matched, and then the voiceprint information of each voice segment is matched with the voiceprint information to be matched, and the filtered voiceprint matching the voiceprint information to be matched is determined from the voiceprint information of each voice segment.
  • the information that is, the voiceprint information of a single user matched with the voiceprint information to be matched is filtered, and the voice segments corresponding to the selected voiceprint information are combined to obtain combined voice information, and the combined voice information is identified and obtained.
  • it is also necessary to combine the semantic information. When sufficient pre-judgment rule, it said it was to meet the accurate identification of semantics, that is exactly won the semantic users want to express, this time combining the semantic information as a result of voice recognition, improve speech recognition accuracy.
  • the voice recognition method further includes the step of: returning the voiceprint information that has not been matched in the local voiceprint database when the combined semantic information does not satisfy the preset rule, from the local voiceprint database Obtain a voiceprint information that has not been matched as a step of matching voiceprint information.
  • the voiceprint information is “I want you to play music”. If the selected voiceprint information to be matched is User B's voiceprint information, the voice clip “Hello” voiceprint information matches User B's voiceprint information. That is, after the screening, the voiceprint information is the voiceprint information of the voice segment "Hello", then the voice segment "Hello” can be used as the final combined voice information, and the combined voice information is determined as "Hello", although it is in accordance with the preset.
  • the grammar is required and is in the thesaurus, but it does not have a corresponding instruction, that is, it cannot generate an instruction to perform the operation. Therefore, it can be considered that it does not satisfy the preset rule. In this case, it is necessary to return the sound that has not been matched in the local voiceprint database.
  • an unmatched voiceprint information is obtained from the local voiceprint database as a step of matching the voiceprint information, and the next unmatched voiceprint information is obtained as the voiceprint information to be matched. Continue the voiceprint matching process.
  • the voice recognition method further includes the following steps:
  • the voice information When the semantic information satisfies the preset rule, indicating that the voice information satisfies the requirement, the voice information is considered to be accurate, and is used as the voice recognition result, thereby obtaining a more accurate voice recognition result and improving the voice recognition accuracy.
  • the method may further include the following steps:
  • the voiceprint information of the voice information to be recognized can be extracted, and the voiceprint information is stored in the local voiceprint database.
  • the voiceprint information needs to be compared with each voiceprint information stored in the local voiceprint database, that is, whether the voiceprint information stored in the local voiceprint database exists and extracted. If the voiceprint information is not matched, the extracted voiceprint information fails to match the voiceprint information stored in the local voiceprint database, indicating that the user corresponding to the extracted voiceprint information is the first voice interaction with the terminal. And the corresponding voice information satisfies the preset rule, so the extracted voiceprint information is stored in the local voiceprint database.
  • the manner in which the voiceprint information is stored in the local voiceprint database includes: establishing a user identifier of the extracted voiceprint information; and storing the extracted voiceprint information in association with the corresponding user identifier in the local voiceprint database.
  • the priority level of the user ID is initialized to the initial level.
  • the user identifier is an identifier that can uniquely specify the identity of the user, and may include a character string of at least one of a number, a letter, and a punctuation symbol.
  • the user identifier corresponds to the voiceprint information, that is, the voiceprint information corresponds to the speaker. stand up.
  • Stored in the extracted voiceprint information and local voiceprint database When the stored voiceprint information fails to match, the user corresponding to the extracted voiceprint information is the first voice interaction and the semantic information satisfies the preset requirement, and the user identifier of the extracted voiceprint information is established, and the extracted voiceprint information is correspondingly
  • the user identifier is stored in the local voiceprint database, and the priority level of the user identifier is initialized to an initial level. For example, the initial level is 1, indicating that the level is the lowest level, and the higher the priority level, indicating that the user performs voice interaction. The more the number of times, the more important the corresponding voiceprint information.
  • the voice recognition method may further include the steps of: prior to the successful matching of the extracted voiceprint information and the voiceprint information stored in the local voiceprint database, prioritizing the user identifier corresponding to the extracted voiceprint information The level is increased by a preset level.
  • the extracted voiceprint information When the extracted voiceprint information is successfully matched with the voiceprint information stored in the local voiceprint database, it indicates that the extracted voiceprint information has been stored before, and the user corresponding to the extracted voiceprint information has previously performed voice with the terminal. The interaction and the corresponding semantic information satisfy the preset rule, and the voice interaction is not performed for the first time.
  • the priority level of the user identifier corresponding to the extracted voiceprint information is increased by a preset level to improve the importance of the user's voiceprint information.
  • the preset level may be 1, for example, the priority level of the user identifier corresponding to the extracted voiceprint information is originally 1, and the preset level is increased, that is, the priority level is changed to 2.
  • a user identifier may be established in the local voiceprint database, and the voiceprint information stored in the local voiceprint database corresponds to the user identifier.
  • the manner of obtaining the voiceprint information that has not been matched from the local voiceprint database as the voiceprint information to be matched may include: obtaining a not yet obtained from the local voiceprint database according to the preset level order of the priority level of the user identifier.
  • the voiceprint information corresponding to the matched user identifier is used as the voiceprint information to be matched.
  • the voiceprint information corresponding to the user identifier that has not been matched can be obtained in an orderly manner from the local voiceprint database as the voiceprint information to be matched, instead of being disordered and selected, effectively preventing errors.
  • the preset level order may include an order of priority levels from high to low or a order of priority levels from low to high.
  • the voice control user for example, in the locomotive, the terminal is the on-board computer, the general vehicle owner has the most frequent voice control, the higher the priority level, the higher the importance, the more likely the user corresponding to the voice information to be identified is the owner, thus According to the order of priority from high to low, the voiceprint information to be matched is selected, and not only the voiceprint matching can be performed sequentially, the error is prevented, and the overall recognition efficiency can be improved. In addition, when the preset level order adopts the order of priority level from low to high, the voiceprint information to be matched may be sequentially selected to effectively perform voiceprint matching to prevent errors.
  • the voice recognition method may further include the steps of: when the combined semantic information does not satisfy the preset rule, and the voiceprint information in the local voiceprint database does not exist, the recognition error message is given. .
  • the combined semantic information does not satisfy the preset rule, the combined semantic information is inaccurate.
  • the next unmatched voiceprint information selection is required, but the local voiceprint database does not have unmatched voiceprint information. It means that the voiceprint information in the local voiceprint database has been matched, and the voiceprint information is terminated.
  • the identification error message is given to remind the user that the voice recognition is invalid, so that the user can quickly enter the first voice control process.
  • the method further includes the step of: extracting keywords of the semantic information.
  • the semantic information does not conform to the preset grammar rules and the keywords of the semantic information are different, when there is an instruction in a thesaurus or a keyword corresponding to the semantic information does not exist in the local instruction library, the semantic information is determined not to satisfy the preset rule.
  • Each working mode corresponds to a thesaurus, that is, the working mode corresponds to the lexicon.
  • the semantic information meets the preset rules, it first judges whether it meets the preset grammar rules, and if it meets the preset grammar rules, the semantic meaning is checked.
  • the keywords of the information are in the same vocabulary at the same time, because there are multiple vocabularies in multiple working modes, the keywords may be distributed in various lexicons, because one voice can only be used in one working mode.
  • the related operation corresponding to the work mode if the keyword is distributed in each lexicon, the keyword indicating the semantic information does not satisfy the preset rule.
  • the local instruction library is used to store an instruction for controlling the execution of the related operation.
  • the keyword of the semantic information is stored in association with the instruction, and the corresponding instruction can be found through the keyword of the semantic information, and the corresponding operation is subsequently performed according to the instruction. If the semantic information conforms to the preset grammar rules, and the corresponding keywords are all in a thesaurus, but there is no corresponding instruction in the local instruction library, indicating that the voice information is still invalid, and the corresponding instruction cannot be obtained, that is, Voice control is not possible.
  • the semantic information is "Hello", which satisfies the default grammar rules, and exists in a thesaurus, but it is a simple greeting, not a control statement, does not exist in the local instruction library and "Hello.” " Corresponding instructions can be executed accordingly.
  • the method further includes the step of: extracting the keyword of the combined semantic information.
  • the keyword combining the semantic information exists in a vocabulary and the instruction corresponding to the combined semantic keyword exists in the local instruction library, and the combined semantic information is determined to satisfy the preset rule.
  • the combined semantic information conforms to the preset grammar rule, and the keyword of the combined semantic information exists in a vocabulary at the same time, and when there is an instruction corresponding to the combined semantic keyword in the local instruction library, it is considered that the combined semantic information satisfies the preset rule.
  • the instruction corresponding to the combined semantic keyword can be found in the local instruction library, the subsequent operation can be performed according to the instruction.
  • the combined voice message is "I want to play music", which is in accordance with the preset grammar rules.
  • the keywords are "play” and "music”, these keywords are simultaneously present in the lexicon corresponding to the music mode, and There is an instruction corresponding to "play” in the local instruction library, and it is considered that "I want to play music” satisfies the preset rule, and the corresponding play instruction can be found in the local instruction library for music playback.
  • the manner of determining the semantic information of the voice information to be identified may include:
  • the speech information to be recognized is subjected to speech recognition to obtain semantic information.
  • the voice recognition can be performed locally, that is, the voice information to be recognized, and the semantic information can be obtained, thereby improving the efficiency of determining the semantic information, thereby improving the overall voice recognition efficiency.
  • the manner of determining the semantic information of the voice information to be identified may include:
  • the identification can be performed in the cloud server, and the voice information to be recognized is sent to the cloud server, and the cloud server performs the recognition result of the voice recognition on the voice information, and the recognition result is used as the semantic information, because a large amount of data can be stored in the cloud server.
  • the identification data can improve the recognition accuracy.
  • FIG. 5 it is a flowchart of a voice recognition method according to an embodiment.
  • the collected recognized voice information is acquired, and the semantic information is obtained, or sent to the cloud server, and the cloud server is used to perform the voice recognition recognition result, and the recognition result is used as the semantic information. Then, judging whether the semantic information satisfies the preset rule, if not satisfied, segmenting the speech information to be recognized, obtaining each speech segment, and extracting the voiceprint information of each speech segment; searching for the existence in the local voiceprint database
  • the matched voiceprint information if present, obtains an unmatched voiceprint information from the local voiceprint database as the voiceprint information to be matched; and matches the voiceprint information of each voice segment with the voiceprint information to be matched,
  • the voiceprint information of each voice segment is determined to be the filtered voiceprint information that is successfully matched with the voiceprint information to be matched; the voice segments corresponding to the selected voiceprint information are combined to obtain combined voice information, and the combined voice information is determined.
  • the step of searching for the voiceprint information that has not been matched in the local voiceprint database is returned, and when there is the voiceprint information that has not been matched in the local voiceprint database, Obtain a voiceprint information that has not been matched from the local voiceprint database as the voiceprint information to be matched.
  • the recognition error prompt information is given.
  • the semantic information satisfies the preset rules, the semantic information is used as the voice. Identify the results.
  • the present application further provides an electronic device 600.
  • the internal structure of the electronic device 600 may correspond to the structure shown in FIG. 2.
  • Each of the following modules may pass through in whole or in part.
  • Software, hardware or a combination thereof is implemented.
  • the electronic device 600 includes a semantic information determining module 601, a segmented voiceprint acquiring module 602, a voiceprint information acquiring module 603 to be matched, a matching screening module 604, a combining module 605, and a recognition result determining module 606.
  • the semantic information determining module 601 is configured to acquire the collected voice information to be recognized, and determine the semantic information of the voice information to be recognized.
  • the segmented voiceprint obtaining module 602 is configured to perform segmentation processing on the voice information to be recognized when the semantic information does not satisfy the preset rule, obtain each voice segment, and extract voiceprint information of each voice segment.
  • the voiceprint information obtaining module 603 is configured to obtain a voiceprint information that has not been matched from the local voiceprint database as the voiceprint information to be matched when there is voiceprint information that has not been matched in the local voiceprint database.
  • the matching screening module 604 is configured to match the voiceprint information of each voice segment with the voiceprint information to be matched, and determine the filtered voiceprint information that is successfully matched with the voiceprint information to be matched from the voiceprint information of each voice segment.
  • the combination module 605 is configured to combine the voice segments corresponding to the selected voiceprint information to obtain combined voice information, and determine combined semantic information of the combined voice information.
  • the recognition result determining module 606 is configured to use the combined semantic information as a speech recognition result when the combined semantic information satisfies the preset rule.
  • the electronic device first determines the semantic information of the voice information to be recognized. When the semantic information does not meet the requirements of the preset rule, it indicates that the semantic information identified at this time may be inaccurate. At this time, the voice information to be identified is segmented to obtain each The voice segment is extracted, and the voiceprint information of each voice segment is extracted. When there is unmatched voiceprint information in the local voiceprint database, an unmatched voiceprint information is obtained from the local voiceprint database as the voiceprint information to be matched.
  • Providing a basis for subsequent voiceprint matching that is, providing a voiceprint information to be matched by a single user for voiceprint matching, and then matching the voiceprint information of each voice segment with the voiceprint information to be matched, from the voice of each voice segment Determined in the grain information
  • the selected voiceprint information after matching the voiceprint information is successfully selected, that is, the voiceprint information of the single user matching the voiceprint information to be matched is selected from the voiceprint information of each voice segment, and the voiceprints after each screening are selected.
  • the voice segments corresponding to the information are combined to obtain combined voice information, that is, the combined voice information of the single user is obtained, and the combined voice information is recognized, and the combined semantic information is obtained, thereby obtaining the semantic meaning expressed by the single user, in order to improve the recognition accuracy, It is necessary to judge whether the combined semantic information satisfies the preset rule requirement. When it is satisfied, it indicates that the semantic meaning of the accurate recognition has been obtained, that is, the semantic meaning that the user wants to express is accurately obtained. At this time, the combined semantic information is used as the speech recognition result, and the speech recognition accuracy is improved. .
  • the identification result determining module 606 is further configured to: when the combined semantic information does not satisfy the preset rule, return the to-be-matched voiceprint information obtaining module 603 to execute the sound that has not been matched in the local voiceprint database. When the information is printed, a voiceprint information that has not been matched is obtained from the local voiceprint database as the voiceprint information to be matched.
  • the identification result determining module 606 is configured to use the semantic information as the speech recognition result when the semantic information satisfies the preset rule.
  • the electronic device 600 further includes:
  • the voice voiceprint extraction module 607 is configured to: when the semantic information meets the preset rule, use the semantic information as the voice recognition result, and extract the voiceprint information of the voice information to be recognized.
  • the voiceprint matching module 608 is configured to compare the extracted voiceprint information with each voiceprint information stored in the local voiceprint database.
  • the storage module 609 is configured to store the extracted voiceprint information in the local voiceprint database when the extracted voiceprint information fails to match the voiceprint information stored in the local voiceprint database.
  • the storage module 609 includes:
  • the identifier establishing module 6091 is configured to establish a user identifier of the extracted voiceprint information.
  • the initialization module 6092 is configured to store the extracted voiceprint information in association with the corresponding user identifier in the local voiceprint database, and initialize the priority level of the user identifier to an initial level.
  • the electronic device further includes a level increasing module.
  • the level increasing module is configured to increase the priority level of the user identifier corresponding to the extracted voiceprint information by a preset level when the extracted voiceprint information is successfully matched with the voiceprint information stored in the local voiceprint database.
  • the voiceprint information stored in the local voiceprint database corresponds to the user identification.
  • the voiceprint information acquiring module 603 to be matched is configured to obtain voiceprint information corresponding to the user identifier that has not been matched from the local voiceprint database as the voiceprint information to be matched according to the preset level order of the priority level of the user identifier.
  • the preset level order includes an order of priority levels from high to low or a order of priority levels from low to high.
  • the electronic device further includes: a prompting module.
  • the prompting module is configured to provide the recognition error prompting information when the combined semantic information does not satisfy the preset rule, and the voiceprint information that has not been matched in the local voiceprint database does not exist.
  • the electronic device further includes: an information keyword extraction module.
  • the information keyword extraction module is a keyword for extracting semantic information, and a keyword for extracting combined semantic information.
  • the segmented voiceprint acquisition module 602 exists in a thesaurus when the semantic information does not meet the preset grammar rules and the keywords of the semantic information are different, or when there is no instruction corresponding to the keyword of the semantic information in the local instruction library. The judgment semantic information does not satisfy the preset rule.
  • the above-mentioned recognition result determining module 606 determines the combined semantic meaning when the combined semantic information conforms to the preset grammar rule, and the keyword of the combined semantic information exists simultaneously in a thesaurus and the instruction corresponding to the combined semantic keyword exists in the local instruction library. The information satisfies the preset rules.
  • the semantic information determining module 601 includes:
  • the identification module is configured to perform voice recognition on the voice information to be recognized, and obtain semantic information.
  • the information sending module is configured to send the to-be-identified voice information to the cloud server.
  • a semantic information obtaining module configured to receive a voice recognition of the cloud server to identify the voice information For other recognition results, the recognition result is used as semantic information.
  • an electronic device includes a memory and a processor, the memory storing computer readable instructions, when the computer readable instructions are executed by the processor, causing the processor to perform the following steps: acquiring the collected voice information to be recognized And determining semantic meaning information of the to-be-identified voice information; when the semantic information does not satisfy the preset rule, segmenting the voice information to be recognized, obtaining each voice segment, and extracting voiceprint information of each voice segment; in the local voiceprint database When there is unmatched voiceprint information, a voiceprint information that has not been matched is obtained from the local voiceprint database as the voiceprint information to be matched; the voiceprint information of each voice segment is matched with the voiceprint information to be matched, Determining the selected voiceprint information that is successfully matched with the voiceprint information to be matched from the voiceprint information of each voice segment; combining the voice segments corresponding to the selected voiceprint information to obtain combined voice information, and determining combined voice information Combined semantic information; when the combined semantic information satisfies the preset rules, the combined
  • the computer readable instructions further cause the processor to perform the following steps: returning the local sound when there is unmatched voiceprint information in the local voiceprint database when the combined semantic information does not satisfy the preset rule
  • the voice database obtains an unmatched voiceprint information as a step of matching the voiceprint information.
  • the computer readable instructions further cause the processor to perform the step of using the semantic information as a speech recognition result when the semantic information satisfies the preset rule.
  • the computer readable instructions further cause the processor to perform the following steps: after the semantic information satisfies the preset rule, after the semantic information is used as the voice recognition result, the voiceprint information of the voice information to be recognized is extracted; The voiceprint information is compared with each voiceprint information stored in the local voiceprint database; when the extracted voiceprint information fails to match the voiceprint information stored in the local voiceprint database, the extracted voiceprint information is stored in Local voiceprint database.
  • the computer readable instructions further cause the processor to perform the step of: storing the voiceprint information in the local voiceprint database comprises: establishing a user identification of the extracted voiceprint information; and extracting the extracted voiceprint information
  • the corresponding user identifier is stored in the local voiceprint database, and the priority level of the user identifier is initialized to the initial level.
  • the computer readable instructions further cause the processor to perform the step of: when the extracted voiceprint information matches the voiceprint information stored in the local voiceprint database, the user identifier corresponding to the extracted voiceprint information
  • the priority level is increased by a preset level.
  • the voiceprint information stored in the local voiceprint database corresponds to the user identifier
  • the computer readable instructions further cause the processor to perform the following steps: obtaining a voiceprint information that has not been matched from the local voiceprint database as the voiceprint information to be matched includes: according to a preset hierarchical order of priority levels of the user identification, The voiceprint information corresponding to the user identifier that has not been matched is obtained in the local voiceprint database as the voiceprint information to be matched.
  • the preset level order may include an order of priority levels from high to low or a order of priority levels from low to high.
  • the computer readable instructions further cause the processor to perform the step of: providing a recognition error when the combined semantic information does not satisfy the preset rule and there is no voiceprint information that has not been matched in the local voiceprint database Prompt message.
  • the computer readable instructions further cause the processor to perform the steps of: after determining the semantic information of the to-be-identified voice information, before performing the segmentation process on the voice information to be recognized, further comprising the step of: extracting keywords of the semantic information .
  • the semantic information does not conform to the preset grammar rules and the keywords of the semantic information are different, when there is an instruction in a thesaurus or a keyword corresponding to the semantic information does not exist in the local instruction library, the semantic information is determined not to satisfy the preset rule.
  • the computer readable instructions further cause the processor to perform the following steps: after determining the combined semantic information of the combined voice information, before combining the semantic information as the voice recognition result, further comprising the step of: extracting keywords of the combined semantic information ;
  • the keyword combining the semantic information exists in a vocabulary and the instruction corresponding to the combined semantic keyword exists in the local instruction library, and the combined semantic information is determined to satisfy the preset rule.
  • the computer readable instructions further cause the processor to perform the following steps:
  • the manner of determining the semantic information of the voice information may include: performing voice recognition on the voice information to be recognized, and obtaining semantic information.
  • the electronic device first determines the semantic information of the voice information to be recognized. When the semantic information does not meet the requirements of the preset rule, it indicates that the semantic information identified at this time may be inaccurate. At this time, the voice information to be identified is segmented to obtain each The voice segment is extracted, and the voiceprint information of each voice segment is extracted. When there is unmatched voiceprint information in the local voiceprint database, an unmatched voiceprint information is obtained from the local voiceprint database as the voiceprint information to be matched.
  • Providing a basis for subsequent voiceprint matching that is, providing a voiceprint information to be matched by a single user for voiceprint matching, and then matching the voiceprint information of each voice segment with the voiceprint information to be matched, from the voice of each voice segment
  • the filtered voiceprint information is determined in the pattern information, and the voiceprint information of the individual user matching the voiceprint information to be matched is selected from the voiceprint information of each voice segment, and Combining the voice segments corresponding to the selected voiceprint information to obtain combined voice information, that is, obtaining a combined voice of a single user Information, and the combined voice information is identified, the combined semantic information is obtained, and the semantic meaning expressed by a single user is obtained.
  • the combined semantic information In order to improve the recognition accuracy, it is also necessary to determine whether the combined semantic information satisfies the preset rule requirement, and when satisfied, the indication has been obtained accurately.
  • the semantic meaning of the recognition that is, the semantic meaning that the user wants to express is accurately obtained.
  • the combined semantic information is used as the speech recognition result to improve the accuracy of speech recognition.
  • the computer program can be stored in a non-volatile computer readable storage medium.
  • the computer program can be stored in a computer storage medium and executed by at least one processor in the computer system to implement a process comprising an embodiment of the methods as described above.
  • the computer storage medium can be a magnetic disk, Optical disc, read-only memory (ROM), or random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音识别方法,包括:获取采集的待识别语音信息,并确定待识别语音信息的语意信息(S310);在语意信息不满足预设规则时,对待识别语音信息进行分段处理,获得各语音片段,并提取各语音片段的声纹信息(S320);在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息(S330);对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息(S340);将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定组合语音信息的组合语意信息(S350);在组合语意信息满足预设规则时,将组合语意信息作为语音识别结果(S360)。

Description

语音识别方法、电子设备以及计算机存储介质
本申请要求于2017年02月15曰提交中国专利局,申请号为2017100821115,发明名称为“语音识别方法及语音识别装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机信息处理技术领域,特别是涉及一种语音识别方法、电子设备以及计算机存储介质。
背景技术
随着智能技术的发展,进行语音识别并根据识别的语音进行控制,已成为智能技术应用中的一项重要内容,各种智能产品中应用语音识别技术以实现智能化控制,随着智能产品的增加以及对语音识别的准确度的要求越来越高,各种语音识别技术层出不穷。
目前常用的语音识别方式是通过提取用户发出的待识别语音信息的特征,再根据识别算法对该用户发出的待识别语音信息进行识别。然而,在多人讲话的场合(如车内)使用语音识别功能,捕获到的待识别语音信息可能包含多人的说话内容,其中只有一个人的待识别语音信息是有效的,存在他人发出的噪音,无法识别出正确的语意,导致语音识别准确性不足。
发明内容
根据本申请的各种实施例,提供一种语音识别方法、电子设备以及计算机存储介质。
一种语音识别方法,包括以下步骤:
获取采集的待识别语音信息,并确定所述待识别语音信息的语意信息;
在所述语意信息不满足预设规则时,对所述待识别语音信息进行分段处理,获得各语音片段,并提取各所述语音片段的声纹信息;
在本地声纹数据库中存在尚未匹配过的声纹信息时,从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息;
对各所述语音片段的声纹信息与所述待匹配声纹信息进行匹配,从各所述语音片段的声纹信息中确定出与所述待匹配声纹信息匹配成功的筛选后声纹信息;
将各所述筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定所述组合语音信息的组合语意信息;
在所述组合语意信息满足所述预设规则时,将所述组合语意信息作为语音识别结果。
一种电子设备,包括存储器和处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
获取采集的待识别语音信息,并确定所述待识别语音信息的语意信息;
在所述语意信息不满足预设规则时,对所述待识别语音信息进行分段处理,获得各语音片段,并提取各所述语音片段的声纹信息;
在本地声纹数据库中存在尚未匹配过的声纹信息时,从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息;
对各所述语音片段的声纹信息与所述待匹配声纹信息进行匹配,从各所述语音片段的声纹信息中确定出与所述待匹配声纹信息匹配成功的筛选后声纹信息;
将各所述筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定所述组合语音信息的组合语意信息;
在所述组合语意信息满足所述预设规则时,将所述组合语意信息作为语音识别结果。
一种计算机存储介质,所述计算机存储介质上存储有计算机程序,所述 计算机程序被处理器执行时,实现上述任意一项所述表情处理方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为一个实施例语音识别方法的应用环境示意图;
图2为一个实施例中电子设备的内部结构示意图;
图3为一个实施例中语音识别方法的流程示意图;
图4为另一个实施例的语音识别方法的流程示意图;
图5为一具体实施例的语音识别方法的流程示意图;
图6为一个实施例中电子设备的结构框图;
图7为另一个实施例中电子设备的结构框图;
图8为另一个实施例中电子设备中存储模块的结构框图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
图1为一个实施例中语音识别方法的应用环境示意图。参照图1,该语音识别方法应用于语音识别系统。该语音识别系统包括终端10和服务器20,终端10与服务器20可以通过网络进行通信。终端10可对语音信息进行识别获得语意信息,再对语意信息进行进一步处理确定语音识别结果,也可将获取的语音信息通过网络上传至对应的服务器20,服务器20可对终端10上传的语音 信息进行识别,并可将识别结果通过网络发送至终端10,终端10将接收的识别结果作为语意信息,并根据接收的语意信息确定语音识别结果。终端10根据语音识别结果可生成相应的指令以执行后续的相关操作,实现语音智能化控制。该终端10可以是任何一种能够实现智能输入输出以及识别语音的设备,例如,台式终端或移动终端,移动终端可以是智能手机、平板电脑、车载电脑、穿戴式智能设备等。该服务器20可以是接收语音信息并进行语音识别的平台所在的服务器,服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
如图2所示,在其中一个实施例中,提供了一种电子设备,可以是图1中的终端10。该电子设备包括通过系统总线连接的处理器、非易失性存储介质、内存储器和通信接口。其中,该电子设备的非易失性存储介质存储有操作系统、本地声纹数据库和计算机可读指令,本地声纹数据库中存储声纹信息,该计算机可读指令可用于实现一种语音识别方法。该电子设备的处理器用于提供计算和控制能力,支撑整个电子设备的运行。该电子设备的内存储器中可储存有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种语音识别方法。通信接口用于与服务器20通信。本领域技术人员可以理解,图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的电子设备的限定,具体的电子设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
如图3所示,在其中一个实施例中,提供了一种语音识别方法,本实施例以该方法应用于上述图1终端10来举例说明。该方法具体包括如下步骤S310至步骤S360:
S310:获取采集的待识别语音信息,并确定待识别语音信息的语意信息。
在本实施例中,语音信息可为用户通过终端的语音输入装置输入的音频信息,即通过语音输入装置可采集到用户的语音信息,在对待识别语音信息采集完成后,可获取采集的待识别语音信息,其中,语音输入装置可包括但 不限于麦克风。待识别语音信息是指需要对其进行识别获得语意信息的语音信息,语意信息可为文字信息,对待识别语音信息进行语音识别获得对应的语意信息,即可确定待识别语音信息的语意信息,也就是可确定输入待识别语音信息的用户表达出的语意。
S320:在语意信息不满足预设规则时,对待识别语音信息进行分段处理,获得各语音片段,并提取各语音片段的声纹信息。
在确定待识别语音信息的语意信息之后,需要判断其是否满足预设规则,在本实施例中,预设规则可为预先设置的对语意信息的要求,也就是说,语意信息不满足预设规则时,表示该语音信息不满足对其的要求,则认为其是不准确。比如,由于是对音频进行识别,一般情况下,用户在通过音频准确表达其想表达的内容时,对应的语音信息应满足人说话时的语法要求,从而,预先规则可为语意信息符合预设语法规则。
另一方面,实现本实施例的语音识别方法的终端可以包括多种工作模式,这些工作模式可以包括但不限于导航模式、音乐模式、广播模式和节目模式等,在不同的工作模式下工作,终端可满足不同的用户需求,且每种工作模式有其对应的词库,词库中包括了工作模式对应可能用到的词汇,在确定语意信息后,还可判断语意信息分词后得到的关键词是否在词库中,若在,则表示用户的待识别语音信息的语意信息是在终端的工作模式下可能用到的词汇。
据此,在本实施例中,预设规则可以为语意信息符合预设语法规则且语意信息在单个词库中,语意信息不满足预设规则时,认为对待识别语音信息识别获得的语意信息不能被终端准确识别,从而无法转变成对应的指令以执行相应的操作。另一方面,预设规则也可以为语意信息符合预设语法规则、语意信息在单个词库中且语音信息有对应的指令,当语意信息符合预设语法规则且语音信息在单个词库中,但是语意信息不能转化为有效指令,即该语意信息没有对应的指令时,认为其还是不符合预设规则。
在一具体应用示例中,比如,确定的语意信息为“我要你好播放音乐”, 用户A对应说的是“我要播放音乐”,然而在其说话过程中,用户B在用户A的“我要”后面插入了“你好”,虽然“播放音乐”是在音乐模式对应词库中,但是整个句子的语法不符合人类正常语法,从而可认为其不满足预设规则的。又比如,语意信息为“你好”,既符合预设语法规则,又在词库中,但是其实质上是一种问候语,而非一种控制语,终端没有与“你好”对应的指令,即无法生成执行相应操作的指令,也可以认为其不满足预设规则。
在判定语意信息不满足预设规则时,认为其不准确,为了提高识别准确性,需要对待识别语音信息进行分段处理,以获得各语音片段,并提取各语音片段的声纹信息。由于每个人的声纹信息是不同的,同一个人的不同语音信息对应相同的声纹信息,例如,用户A说出不同的语音信息,但是为同一用户A说出的,其声纹信息是相同的。为了提高准确度,可通过声纹信息的判断提出单人的语音信息。
S330:在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息。
本地声纹数据库可存储声纹信息,在本地声纹数据库中存储的声纹信息可为与终端进行过语音交互的用户的声纹信息,且其对应的语意信息至少有一次满足预设规则。将各语音片段对应的语意信息与本地声纹数据库中存储的尚未匹配过的声纹信息进行匹配时,首先需要从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息,即进行各语音片段的声纹信息与本地声纹数据库中声纹信息匹配时,每次将本地声纹数据库中单个的待匹配声纹信息与各语音片段的声纹信息进行匹配,这样即可筛选出单个用户的语音信息。
S340:对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息。
语音信息中可能包括多个用户的语音,从本地声纹数据库中选择出一个尚未匹配过的待匹配声纹信息后,即选择一个用户的声纹信息后,将各语音片段的声纹信息与该待匹配声纹信息进行匹配,相同用户的声纹信息相同, 各语音片段的声纹信息中与待匹配声纹信息匹配成功的声纹信息是同一用户的声纹信息,即筛选后声纹信息即为待匹配声纹信息对应的用户的声纹信息。
S350:将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定组合语音信息的组合语意信息。
由于各筛选后声纹信息是来自同一用户,从而,可将各筛选后声纹信息对应的语音片段进行组合,即将同一用户的语音片段进行组合,得到的组合语音信息即是同一用户的语音,是同一用户的语音数据,然后,确定组合语音信息的组合语意信息,组合语意信息即为该用户的待识别语音信息对应的准确表达的语意。
S360:在组合语意信息满足预设规则时,将组合语意信息作为语音识别结果。
获得组合语意信息后,虽然组合语意信息为上述用户的待识别语音信息对应的准确表达的语意,但是语音信息可能为多个用户的语音,有可能通过上述步骤获得的组合语音信息的组合语音信息是不满足预设规则的,为了进一步提高准确性,需要对组合语音信息是否满足预设规则进行判断,当满足时,进一步说明该语意信息是准确的,此时,可将组合语意信息作为语音识别结果,实现语音识别目的。后续可根据语音识别结果生成相应的指令,根据指令可执行相应的操作。比如,语意信息为“我要你好播放音乐”,如果选出的待匹配声纹信息为用户A的声纹信息,语音片段“我要”和“播放音乐”的声纹信息与用户A的声纹信息匹配成功,即筛选后声纹信息为语音片段“我要”和“播放音乐”的声纹信息,则可将语音片段“我要”和“播放音乐”组合作为最终的组合语音信息,确定组合语音信息为“我要播放音乐”,既符合预设语法要求又在词库中,且是一个需要执行播放音乐操作的语意信息,有对应的指令,即根据该指令可执行播放音乐的操作,认为该组合语意信息是满足预设规则的,将“我要播放音乐”作为语音识别结果。后续可生成与之对应的播放音乐的指令以执行音乐。
上述语音识别方法,首先确定待识别语音信息的语意信息,在语意信息 不满足预设规则时,表示此时识别的语意信息可能不准确,此时,将待识别语音信息进行分段,获得各语音片段,并提取各语音片段的声纹信息,在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息,为后续声纹匹配提供依据,即提供进行声纹匹配的单个用户的待匹配声纹信息,然后,对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息,即筛选出与上述待匹配声纹信息匹配的单个用户的声纹信息,并将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并对组合语音信息进行识别,获得组合语意信息,从而获得单个用户表达的语意,为了提高识别准确性,还需对组合语意信息是否满足预设规则进行判断,满足时表示已获得了准确识别的语意,即准确获得了用户想表达的语意,此时将组合语意信息作为语音识别结果,提高语音识别准确度。
在其中一个实施例中,上述语音识别方法,还包括步骤:在组合语意信息不满足预设规则时,返回在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的步骤。
在组合语意信息不满足预设规则时,表示该组合语音信息不满足对其的要求,则认为其不准确。需对下一个用户的声纹信息进行匹配。比如,语意信息为“我要你好播放音乐”,如果选出的待匹配声纹信息为用户B的声纹信息,语音片段“你好”的声纹信息与用户B的声纹信息匹配成功,即筛选后声纹信息为语音片段“你好”的声纹信息,则可将语音片段“你好”作为最终的组合语音信息,确定组合语音信息为“你好”,虽然是符合预设语法要求且在词库中,但是其没有对应的指令,即不能生成执行操作的指令,因此也可以认为其不满足预设规则,此时需要返回在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的步骤,获取下一个未匹配过的声纹信息作为待匹配声纹信息, 继续进行声纹匹配过程。
如图4所示,在其中一个实施例中,上述语音识别方法,还包括步骤:
S370:在语意信息满足预设规则时,将语意信息作为语音识别结果。
在语意信息满足预设规则时,表示该语音信息满足对其的要求,则认为其是准确,将其作为语音识别结果,从而,可获得较为准确的语音识别结果,提高语音识别准确性。
请继续参阅图4,在其中一个实施例中,在语意信息满足预设规则时,将语意信息作为语音识别结果之后,还可以包括步骤:
S381:提取待识别语音信息的声纹信息。
S382:将提取的声纹信息与本地声纹数据库中存储的各声纹信息进行比对;
S383:在提取的声纹信息与本地声纹数据库中存储的各声纹信息均匹配失败时,将提取的声纹信息存储于本地声纹数据库。
语意信息满足预设规则时,认为其较为准确,将其作为语音识别结果之后,还可提取待识别语音信息的声纹信息,将该声纹信息存储到本地声纹数据库中。具体地,在存储之前,还需要对该声纹信息与本地声纹数据库中存储的各声纹信息进行比对,也就是比对本地声纹数据库中存储的各声纹信息是否存在与提取的声纹信息匹配的,若不存在,即提取的声纹信息与本地声纹数据库中存储的各声纹信息均匹配失败,说明该提取的声纹信息对应的用户是首次与终端进行语音交互,且对应的语音信息满足预设规则,因此将提取的声纹信息存储于本地声纹数据库。
在其中一个实施例中,将声纹信息存储于本地声纹数据库的方式包括:建立提取的声纹信息的用户标识;将提取的声纹信息与对应的用户标识关联存储于本地声纹数据库,并将用户标识的优先等级初始化为初始等级。
其中,用户标识为能唯一指定用户身份的标识,可以包括数字、字母和标点符号中的至少一种的字符的字符串,用户标识与声纹信息对应,也就是实现声纹信息与说话人对应起来。在提取的声纹信息与本地声纹数据库中存 储的各声纹信息均匹配失败时,表示提取的声纹信息对应的用户是首次语音交互且语意信息满足预设要求,建立提取的声纹信息的用户标识,将提取的声纹信息与对应的用户标识关联存储于本地声纹数据库,并将用户标识的优先等级初始化为初始等级,比如,初始等级为1,表示其等级是最低的等级,优先等级越高,表示该用户进行语音交互的次数越多,其对应的声纹信息越重要。
在其中一个实施例中,上述语音识别方法,还可以包括步骤:在提取的声纹信息与本地声纹数据库中存储的声纹信息匹配成功时,将提取的声纹信息对应的用户标识的优先等级增加预设等级。
在提取的声纹信息与本地声纹数据库中存储的声纹信息匹配成功时,说明该提取的声纹信息之前已存储过了,表示提取的声纹信息对应的用户之前有与终端进行过语音交互,且对应的语意信息满足预设规则,并非首次进行语音交互,此时,将提取的声纹信息对应的用户标识的优先等级增加预设等级,以提高该用户的声纹信息的重要程度。具体地,预设等级可为1,比如,提取的声纹信息对应的用户标识的优先等级原来为1,增加预设等级1,即则优先等级变为2。
在其中一个实施例中,本地声纹数据库中还可以建立有用户标识,本地声纹数据库中存储的声纹信息与用户标识对应。
据此,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的方式可以包括:根据用户标识的优先等级的预设等级顺序,从本地声纹数据库中获取一个尚未匹配过的用户标识对应的声纹信息作为待匹配声纹信息。
如此可从本地声纹数据库中有序地获取一个尚未匹配过的用户标识对应的声纹信息作为待匹配声纹信息,而不是杂乱无章进行选择,有效防止出错。
在其中一个实施例中,预设等级顺序可以包括优先等级从高到低的顺序或优先等级从低到高的顺序。
由于用户标识的优先等级越高,对应的声纹信息越重要,也就是越常用 语音进行控制的用户,比如,在机车内,终端为车载电脑,一般车主进行语音控制最为频繁,优先等级越高,其重要程度越高,待识别语音信息对应的用户越有可能是车主,从而,根据优先等级从高到低的顺序进行选择待匹配声纹信息,不但可有顺序地进行声纹匹配,防止出错,而且可提高整体的识别效率。另外,预设等级顺序采用优先等级从低到高的顺序时,可有顺序地选择待匹配声纹信息,有效地进行声纹匹配,防止出错。
在其中一个实施例中,上述语音识别方法,还可以包括步骤:在组合语意信息不满足预设规则,且本地声纹数据库中不存在尚未匹配过的声纹信息时,给出识别错误提示信息。
在组合语意信息不满足预设规则时,表示组合语意信息不准确,为了准确识别,需要进行下一个尚未匹配过的声纹信息选择,但本地声纹数据库中不存在尚未匹配过的声纹信息,说明本地声纹数据库中声纹信息均已匹配过,声纹信息匹配终止,此时,给出识别错误提示信息,以提醒用户此次语音识别失效,以便用户快速进入先一个语音控制过程。
在其中一个实施例中,在确定待识别语音信息的语意信息之后,对待识别语音信息进行分段处理之前,还包括步骤:提取语意信息的关键词。
当语意信息不符合预设语法规则、语意信息的关键词不同时存在于一个词库或在本地指令库中不存在与语意信息的关键词对应的指令时,判定语意信息不满足预设规则。
每种工作模式对应有词库,即工作模式与词库对应,在判断语意信息是否符合预设规则时,首先对其是否符合预设语法规则进行判断,若符合预设语法规则,则查看语意信息的关键词是否同时在一个词库,这是由于有多种工作模式下有多种词库,关键词可能分布在各种词库中,由于一次语音只能对一种工作模式下进行与工作模式对应的相关操作,若关键词是分布在各词库中,说明该语意信息的关键词不满足预设规则。另外,本地指令库用于存储控制执行相关操作的指令,具体地,语意信息的关键词与指令关联存储,通过语意信息的关键词可找到对应的指令,后续根据指令执行相应的操作。 若语意信息符合预设语法规则、且对应的关键词都在一个词库中,但是在本地指令库中不存在与之对应的指令,说明该语音信息还是无效的,无法得到对应的指令,即无法实现语音控制。比如,语意信息为“你好”,其满足预设语法规则,且存在一个词库中,但是其是一个简单的问候语,并不是一个控制语句,在本地指令库中不存在与“你好”对应的据此可执行相应操作的指令。
在本实施例中,在确定组合语音信息的组合语意信息之后,将组合语意信息作为语音识别结果之前,还包括步骤:提取组合语意信息的关键词。
当组合语意信息符合预设语法规则,组合语意信息的关键词同时存在于一个词库且在本地指令库中存在与组合语意的关键词对应的指令时,判定组合语意信息满足预设规则。
组合语意信息符合预设语法规则,组合语意信息的关键词同时存在于一个词库且在本地指令库中存在与组合语意的关键词对应的指令时,认为组合语意信息是满足预设规则的,此时将其作为语音识别结果,由于可在本地指令库可找到与组合语意的关键词对应的指令,后续即可根据该指令进行相关操作。比如,组合语音信息为“我要播放音乐”,是符合预设语法规则的,若其关键词为“播放”和“音乐”,这些关键词同时存在于与音乐模式对应的词库中,且在本地指令库中存在与“播放”对应的指令,认为“我要播放音乐”满足预设规则,可在本地指令库中找到对应的播放指令进行音乐播放。
在其中一个实施例中,确定待识别语音信息的语意信息的方式可以包括:
对待识别语音信息进行语音识别,获得语意信息。在对待识别语音信息进行识别时,可通过在终端本地进行识别,即对待识别语音信息进行语音识别,可获得语意信息,这样可提高确定语意信息的效率,从而提高整个语音识别效率。
在其中一个实施例中,确定待识别语音信息的语意信息的方式可以包括:
将待识别语音信息发送至云端服务器。
接收云端服务器对待识别语音信息进行语音识别的识别结果,将识别结果作为语意信息。
从而也可通过在云端服务器进行识别,即将待识别语音信息发送至云端服务器,云端服务器对待识别语音信息进行语音识别的识别结果,将识别结果作为语意信息,由于在云端服务器可存储大量的据以进行识别的数据,可提高识别准确性。
下面以一具体实施例对上述的语音识别方法加以具体说明,请参阅图5,为一具体实施例的语音识别方法的流程图。
首先,获取采集的识别语音信息,对其进行识别获得语意信息,或将其发送给云端服务器,接收云端服务器进行语音识别的识别结果,识别结果作为语意信息。然后,判断语意信息是否满足预设规则,在若不满足,对待识别语音信息进行分段处理,获得各语音片段,并提取各语音片段的声纹信息;在本地声纹数据库中查找是否存在尚未匹配过的声纹信息,若存在,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息;对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息;将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定组合语音信息的组合语意信息;再判断组合语意信息是否满足预设规则,若满足,则将组合语意信息作为语音识别结果;后续可提取待识别语音信息的声纹信息;将提取的声纹信息与本地声纹数据库中存储的各声纹信息进行比对;在提取的声纹信息与本地声纹数据库中存储的各声纹信息均匹配失败时,将提取的声纹信息存储于本地声纹数据库。另外,在提取的声纹信息与本地声纹数据库中存储的声纹信息匹配成功时,将提取的声纹信息对应的用户标识的优先等级增加预设等级。
另外,在组合语意信息不满足预设规则时,返回在本地声纹数据库中查找是否存在尚未匹配过的声纹信息的步骤,在本地声纹数据库中存在尚未匹配过的声纹信息时,再从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息。在本地声纹数据库中不存在尚未匹配过的声纹信息时,给出识别错误提示信息。在语意信息满足预设规则时,将语意信息作为语音 识别结果。
如图6所示,在其中一个实施例中,本申请还提供一种电子设备600,电子设备600的内部结构可对应于如图2所示的结构,下述每个模块可全部或部分通过软件、硬件或其组合来实现。电子设备600包括语意信息确定模块601、分段声纹获取模块602、待匹配声纹信息获取模块603、匹配筛选模块604、组合模块605和识别结果确定模块606。
语意信息确定模块601,用于获取采集的待识别语音信息,并确定待识别语音信息的语意信息。
分段声纹获取模块602,用于在语意信息不满足预设规则时,对待识别语音信息进行分段处理,获得各语音片段,并提取各语音片段的声纹信息。
待匹配声纹信息获取模块603,用于在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息。
匹配筛选模块604,用于对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息。
组合模块605,用于将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定组合语音信息的组合语意信息。
识别结果确定模块606,用于在组合语意信息满足预设规则时,将组合语意信息作为语音识别结果。
上述电子设备,首先确定待识别语音信息的语意信息,在语意信息不满足预设规则要求时,表示此时识别的语意信息可能不准确,此时,将待识别语音信息进行分段,获得各语音片段,并提取各语音片段的声纹信息,在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息,为后续声纹匹配提供依据,即提供进行声纹匹配的单个用户的待匹配声纹信息,然后,对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与 待匹配声纹信息匹配成功的筛选后声纹信息,也就是从各语音片段的声纹信息中筛选出与上述待匹配声纹信息匹配的单个用户的声纹信息,并将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,即获得单个用户的组合语音信息,并对组合语音信息进行识别,获得组合语意信息,从而获单个用户表达的语意,为了提高识别准确性,还需对组合语意信息判断是否满足预设规则要求,满足时表示已获得了准确识别的语意,即准确获得了用户想表达的语意,此时将组合语意信息作为语音识别结果,提高语音识别准确度。
在其中一个实施例中,上述识别结果确定模块606,还用于在组合语意信息不满足预设规则时,返回待匹配声纹信息获取模块603执行在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息。
在其中一个实施例中,上述识别结果确定模块606,用于在语意信息满足预设规则时,将语意信息作为语音识别结果。
如图7所示,在其中一个实施例中,上述电子设备600,还包括:
语音声纹提取模块607,用于识别结果确定模块606在语意信息满足预设规则时,将语意信息作为语音识别结果后,提取待识别语音信息的声纹信息。
声纹比对模块608,用于将提取的声纹信息与本地声纹数据库中存储的各声纹信息进行比对。
存储模块609,用于在提取的声纹信息与本地声纹数据库中存储的各声纹信息均匹配失败时,将提取的声纹信息存储于本地声纹数据库。
请参阅图8,在其中一个实施例中,存储模块609包括:
标识建立模块6091,用于建立提取的声纹信息的用户标识。
初始化模块6092,用于将提取的声纹信息与对应的用户标识关联存储于本地声纹数据库,并将用户标识的优先等级初始化为初始等级。
在其中一个实施例中,上述电子设备,还包括等级增加模块。
等级增加模块,用于在提取的声纹信息与本地声纹数据库中存储的声纹信息匹配成功时,将提取的声纹信息对应的用户标识的优先等级增加预设等级。
在其中一个实施例中,本地声纹数据库中存储的声纹信息与用户标识对应。
上述待匹配声纹信息获取模块603,用于根据用户标识的优先等级的预设等级顺序,从本地声纹数据库中获取一个尚未匹配过的用户标识对应的声纹信息作为待匹配声纹信息。
在其中一个实施例中,预设等级顺序包括优先等级从高到低的顺序或优先等级从低到高的顺序。
在其中一个实施例中,上述电子设备,还包括:提示模块。
提示模块,用于在组合语意信息不满足预设规则,且本地声纹数据库中不存在尚未匹配过的声纹信息时,给出识别错误提示信息。
在其中一个实施例中,上述电子设备,还包括:信息关键词提取模块。
信息关键词提取模,用于提取语意信息的关键词,以及提取组合语意信息的关键词。
上述分段声纹获取模块602,在语意信息不符合预设语法规则、语意信息的关键词不同时存在于一个词库或在本地指令库中不存在与语意信息的关键词对应的指令时,判定语意信息不满足预设规则。
上述识别结果确定模块606,在组合语意信息符合预设语法规则,组合语意信息的关键词同时存在于一个词库且在本地指令库中存在与组合语意的关键词对应的指令时,判定组合语意信息满足预设规则。
在其中一个实施例中,上述语意信息确定模块601包括:
识别模块,用于对待识别语音信息进行语音识别,获得语意信息。
或者
信息发送模块,用于将待识别语音信息发送至云端服务器。
语意信息获取模块,用于接收云端服务器对待识别语音信息进行语音识 别的识别结果,将识别结果作为语意信息。
在一个实施例中,一种电子设备,包括存储器和处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行以下步骤:获取采集的待识别语音信息,并确定待识别语音信息的语意信息;在语意信息不满足预设规则时,对待识别语音信息进行分段处理,获得各语音片段,并提取各语音片段的声纹信息;在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息;对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息;将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定组合语音信息的组合语意信息;在组合语意信息满足预设规则时,将组合语意信息作为语音识别结果。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:在组合语意信息不满足预设规则时,返回在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的步骤。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:在语意信息满足预设规则时,将语意信息作为语音识别结果。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:在语意信息满足预设规则时,将语意信息作为语音识别结果之后,提取待识别语音信息的声纹信息;将提取的声纹信息与本地声纹数据库中存储的各声纹信息进行比对;在提取的声纹信息与本地声纹数据库中存储的各声纹信息均匹配失败时,将提取的声纹信息存储于本地声纹数据库。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:将声纹信息存储于本地声纹数据库的方式包括:建立提取的声纹信息的用户标识;将提取的声纹信息与对应的用户标识关联存储于本地声纹数据库,并将用户标识的优先等级初始化为初始等级。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:在提取的声纹信息与本地声纹数据库中存储的声纹信息匹配成功时,将提取的声纹信息对应的用户标识的优先等级增加预设等级。
在其中一个实施例中,本地声纹数据库中存储的声纹信息与用户标识对应;
计算机可读指令还使得处理器执行以下步骤:从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的方式包括:根据用户标识的优先等级的预设等级顺序,从本地声纹数据库中获取一个尚未匹配过的用户标识对应的声纹信息作为待匹配声纹信息。
在其中一个实施例中,预设等级顺序可以包括优先等级从高到低的顺序或优先等级从低到高的顺序。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:在组合语意信息不满足预设规则,且本地声纹数据库中不存在尚未匹配过的声纹信息时,给出识别错误提示信息。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:在确定待识别语音信息的语意信息之后,对待识别语音信息进行分段处理之前,还包括步骤:提取语意信息的关键词。
当语意信息不符合预设语法规则、语意信息的关键词不同时存在于一个词库或在本地指令库中不存在与语意信息的关键词对应的指令时,判定语意信息不满足预设规则。
在本实施例中,计算机可读指令还使得处理器执行以下步骤:在确定组合语音信息的组合语意信息之后,将组合语意信息作为语音识别结果之前,还包括步骤:提取组合语意信息的关键词;
当组合语意信息符合预设语法规则,组合语意信息的关键词同时存在于一个词库且在本地指令库中存在与组合语意的关键词对应的指令时,判定组合语意信息满足预设规则。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:确 定待识别语音信息的语意信息的方式可以包括:对待识别语音信息进行语音识别,获得语意信息。
在其中一个实施例中,计算机可读指令还使得处理器执行以下步骤:确定待识别语音信息的语意信息的方式可以包括:
将待识别语音信息发送至云端服务器。
接收云端服务器对待识别语音信息进行语音识别的识别结果,将识别结果作为语意信息。
上述电子设备,首先确定待识别语音信息的语意信息,在语意信息不满足预设规则要求时,表示此时识别的语意信息可能不准确,此时,将待识别语音信息进行分段,获得各语音片段,并提取各语音片段的声纹信息,在本地声纹数据库中存在尚未匹配过的声纹信息时,从本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息,为后续声纹匹配提供依据,即提供进行声纹匹配的单个用户的待匹配声纹信息,然后,对各语音片段的声纹信息与待匹配声纹信息进行匹配,从各语音片段的声纹信息中确定出与待匹配声纹信息匹配成功的筛选后声纹信息,也就是从各语音片段的声纹信息中筛选出与上述待匹配声纹信息匹配的单个用户的声纹信息,并将各筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,即获得单个用户的组合语音信息,并对组合语音信息进行识别,获得组合语意信息,从而获单个用户表达的语意,为了提高识别准确性,还需对组合语意信息判断是否满足预设规则要求,满足时表示已获得了准确识别的语意,即准确获得了用户想表达的语意,此时将组合语意信息作为语音识别结果,提高语音识别准确度。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,计算机程序可存储于一非易失性的计算机可读取存储介质中,如本发明实施例中,该计算机程序可存储于计算机存储介质中,并被该计算机系统中的至少一个处理器执行,以实现包括如上述各方法的实施例的流程。其中,计算机存储介质可为磁碟、 光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (17)

  1. 一种语音识别方法,其特征在于,应用于终端,包括以下步骤:
    获取采集的待识别语音信息,并确定所述待识别语音信息的语意信息;
    在所述语意信息不满足预设规则时,对所述待识别语音信息进行分段处理,获得各语音片段,并提取各所述语音片段的声纹信息;
    在本地声纹数据库中存在尚未匹配过的声纹信息时,从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息;
    对各所述语音片段的声纹信息与所述待匹配声纹信息进行匹配,从各所述语音片段的声纹信息中确定出与所述待匹配声纹信息匹配成功的筛选后声纹信息;
    将各所述筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定所述组合语音信息的组合语意信息;
    在所述组合语意信息满足所述预设规则时,将所述组合语意信息作为语音识别结果。
  2. 根据权利要求1所述的语音识别方法,其特征在于,还包括步骤:
    在所述组合语意信息不满足所述预设规则时,返回在本地声纹数据库中存在尚未匹配过的声纹信息时,从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的步骤。
  3. 根据权利要求1所述的语音识别方法,其特征在于,还包括步骤:
    在所述语意信息满足所述预设规则时,将所述语意信息作为语音识别结果。
  4. 根据权利要求3所述的语音识别方法,其特征在于,在所述语意信息满足所述预设规则时,将所述语意信息作为语音识别结果之后,还包括步骤:
    提取所述待识别语音信息的声纹信息;
    将提取的所述声纹信息与所述本地声纹数据库中存储的各声纹信息进行比对;
    在提取的所述声纹信息与所述本地声纹数据库中存储的各声纹信息均匹 配失败时,将提取的所述声纹信息存储于所述本地声纹数据库。
  5. 根据权利要求4所述的语音识别方法,其特征在于,将所述声纹信息存储于所述本地声纹数据库的方式包括:
    建立提取的所述声纹信息的用户标识;
    将提取的所述声纹信息与对应的所述用户标识关联存储于所述本地声纹数据库,并将所述用户标识的优先等级初始化为初始等级。
  6. 根据权利要求5所述的语音识别方法,其特征在于,还包括步骤:
    在提取的所述声纹信息与所述本地声纹数据库中存储的声纹信息匹配成功时,将提取的所述声纹信息对应的用户标识的优先等级增加预设等级。
  7. 根据权利要求1所述的语音识别方法,其特征在于,所述本地声纹数据库中存储的声纹信息与用户标识对应;
    从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的方式包括:
    根据所述用户标识的优先等级的预设等级顺序,从所述本地声纹数据库中获取一个尚未匹配过的用户标识对应的声纹信息作为待匹配声纹信息。
  8. 根据权利要求1所述的语音识别方法,其特征在于,
    在确定所述带识别语音信息的语意信息之后,对所述待识别语音信息进行分段处理之前,还包括步骤:提取所述语意信息的关键词;
    当所述语意信息不符合预设语法规则、所述语意信息的关键词不同时存在于一个词库或在本地指令库中不存在与所述语意信息的关键词对应的指令时,判定所述语意信息不满足所述预设规则;
    在确定所述组合语音信息的组合语意信息之后,将所述组合语意信息作为语音识别结果之前,还包括步骤:提取所述组合语意信息的关键词;
    当所述组合语意信息符合所述预设语法规则,所述组合语意信息的关键词同时存在于一个词库且在所述本地指令库中存在与所述组合语意的关键词对应的指令时,判定所述组合语意信息满足所述预设规则。
  9. 一种电子设备,包括存储器和处理器,所述存储器中储存有计算机可 读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    获取采集的待识别语音信息,并确定所述待识别语音信息的语意信息;
    在所述语意信息不满足预设规则时,对所述待识别语音信息进行分段处理,获得各语音片段,并提取各所述语音片段的声纹信息;
    在本地声纹数据库中存在尚未匹配过的声纹信息时,从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息;
    对各所述语音片段的声纹信息与所述待匹配声纹信息进行匹配,从各所述语音片段的声纹信息中确定出与所述待匹配声纹信息匹配成功的筛选后声纹信息;
    将各所述筛选后声纹信息对应的语音片段进行组合,获得组合语音信息,并确定所述组合语音信息的组合语意信息;
    在所述组合语意信息满足所述预设规则时,将所述组合语意信息作为语音识别结果。
  10. 根据权利要求9所述的电子设备,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    在所述组合语意信息不满足所述预设规则时,返回在本地声纹数据库中存在尚未匹配过的声纹信息时,从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的步骤。
  11. 根据权利要求9所述的电子设备,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    在所述语意信息满足所述预设规则时,将所述语意信息作为语音识别结果。
  12. 根据权利要求11所述的电子设备,其特征在于,所述计算机可读指令还使得所述处理器执行以下步骤:
    在所述语意信息满足所述预设规则时,将所述语意信息作为语音识别结果之后,提取所述待识别语音信息的声纹信息;
    将提取的所述声纹信息与所述本地声纹数据库中存储的各声纹信息进行比对;
    在提取的所述声纹信息与所述本地声纹数据库中存储的各声纹信息均匹配失败时,将提取的所述声纹信息存储于所述本地声纹数据库。
  13. 根据权利要求12所述的电子设备,其特征在于,计算机可读指令还使得处理器执行以下步骤:将所述声纹信息存储于所述本地声纹数据库的方式包括:
    建立提取的所述声纹信息的用户标识;
    将提取的所述声纹信息与对应的所述用户标识关联存储于所述本地声纹数据库,并将所述用户标识的优先等级初始化为初始等级。
  14. 根据权利要求13所述的电子设备,其特征在于,在提取的所述声纹信息与所述本地声纹数据库中存储的声纹信息匹配成功时,将提取的所述声纹信息对应的用户标识的优先等级增加预设等级。
  15. 根据权利要求9所述的电子设备,其特征在于,所述本地声纹数据库中存储的声纹信息与用户标识对应;
    计算机可读指令还使得处理器执行以下步骤:从所述本地声纹数据库中获取一个尚未匹配过的声纹信息作为待匹配声纹信息的方式包括:
    根据所述用户标识的优先等级的预设等级顺序,从所述本地声纹数据库中获取一个尚未匹配过的用户标识对应的声纹信息作为待匹配声纹信息。
  16. 根据权利要求9所述的电子设备,其特征在于,计算机可读指令还使得处理器执行以下步骤:在确定所述带识别语音信息的语意信息之后,对所述待识别语音信息进行分段处理之前,还包括步骤:提取所述语意信息的关键词;
    当所述语意信息不符合预设语法规则、所述语意信息的关键词不同时存在一个词库或在本地指令库中不存在与所述语意信息的关键词对应的指令时,判定所述语意信息不满足所述预设规则;
    在确定所述组合语音信息的组合语意信息之后,将所述组合语意信息作 为语音识别结果之前,还包括步骤:提取所述组合语意信息的关键词;
    当所述组合语意信息符合所述预设语法规则,所述组合语意信息的关键词同时存在于一个词库且在所述本地指令库中存在与所述组合语意的关键词对应的指令时,判定所述组合语意信息满足所述预设规则。
  17. 一种计算机存储介质,所述计算机存储介质上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现权利要求1-8中任意一项所述的表情处理方法的步骤。
PCT/CN2017/113154 2017-02-15 2017-11-27 语音识别方法、电子设备以及计算机存储介质 WO2018149209A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2019539928A JP6771805B2 (ja) 2017-02-15 2017-11-27 音声認識方法、電子機器、及びコンピュータ記憶媒体
KR1020197016994A KR102222317B1 (ko) 2017-02-15 2017-11-27 음성 인식 방법, 전자 디바이스, 및 컴퓨터 저장 매체
EP17897119.8A EP3584786B1 (en) 2017-02-15 2017-11-27 Voice recognition method, electronic device, and computer storage medium
US16/442,193 US11043211B2 (en) 2017-02-15 2019-06-14 Speech recognition method, electronic device, and computer storage medium
US17/244,737 US11562736B2 (en) 2017-02-15 2021-04-29 Speech recognition method, electronic device, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710082111.5 2017-02-15
CN201710082111.5A CN108447471B (zh) 2017-02-15 2017-02-15 语音识别方法及语音识别装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/442,193 Continuation US11043211B2 (en) 2017-02-15 2019-06-14 Speech recognition method, electronic device, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2018149209A1 true WO2018149209A1 (zh) 2018-08-23

Family

ID=63169147

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/113154 WO2018149209A1 (zh) 2017-02-15 2017-11-27 语音识别方法、电子设备以及计算机存储介质

Country Status (6)

Country Link
US (2) US11043211B2 (zh)
EP (1) EP3584786B1 (zh)
JP (1) JP6771805B2 (zh)
KR (1) KR102222317B1 (zh)
CN (1) CN108447471B (zh)
WO (1) WO2018149209A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163630A (zh) * 2019-04-15 2019-08-23 中国平安人寿保险股份有限公司 产品监管方法、装置、计算机设备及存储介质
CN111756603A (zh) * 2019-03-26 2020-10-09 北京京东尚科信息技术有限公司 智能家居系统的控制方法、装置、电子设备和可读介质
CN112218412A (zh) * 2019-07-10 2021-01-12 上汽通用汽车有限公司 基于语音识别的车内氛围灯控制系统和控制方法
WO2024114303A1 (zh) * 2022-11-30 2024-06-06 腾讯科技(深圳)有限公司 音素识别方法、装置、电子设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447471B (zh) 2017-02-15 2021-09-10 腾讯科技(深圳)有限公司 语音识别方法及语音识别装置
CN107919130B (zh) * 2017-11-06 2021-12-17 百度在线网络技术(北京)有限公司 基于云端的语音处理方法和装置
CN110770820A (zh) * 2018-08-30 2020-02-07 深圳市大疆创新科技有限公司 语音识别方法、装置、拍摄系统和计算机可读存储介质
CN110970020A (zh) * 2018-09-29 2020-04-07 成都启英泰伦科技有限公司 一种利用声纹提取有效语音信号的方法
CN109841216B (zh) * 2018-12-26 2020-12-15 珠海格力电器股份有限公司 语音数据的处理方法、装置和智能终端
CN110335612A (zh) * 2019-07-11 2019-10-15 招商局金融科技有限公司 基于语音识别的会议记录生成方法、装置及存储介质
CN110853666B (zh) * 2019-12-17 2022-10-04 科大讯飞股份有限公司 一种说话人分离方法、装置、设备及存储介质
CN110970027B (zh) * 2019-12-25 2023-07-25 博泰车联网科技(上海)股份有限公司 一种语音识别方法、装置、计算机存储介质及系统
CN112102840B (zh) * 2020-09-09 2024-05-03 中移(杭州)信息技术有限公司 语义识别方法、装置、终端及存储介质
CN112164402B (zh) * 2020-09-18 2022-07-12 广州小鹏汽车科技有限公司 车辆语音交互方法、装置、服务器和计算机可读存储介质
CN112599136A (zh) * 2020-12-15 2021-04-02 江苏惠通集团有限责任公司 基于声纹识别的语音识别方法及装置、存储介质、终端
CN112908299B (zh) * 2020-12-29 2023-08-29 平安银行股份有限公司 客户需求信息识别方法、装置、电子设备及存储介质
CN112784734A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 一种视频识别方法、装置、电子设备和存储介质
CN113643700B (zh) * 2021-07-27 2024-02-27 广州市威士丹利智能科技有限公司 一种智能语音开关的控制方法及系统
CN114611523A (zh) * 2022-01-25 2022-06-10 北京探境科技有限公司 一种命令采集方法、装置和智能设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094158A (ja) * 2002-09-04 2004-03-25 Ntt Comware Corp 母音検索を利用した声紋認証装置
CN103888606A (zh) * 2014-03-11 2014-06-25 上海乐今通信技术有限公司 移动终端及其解锁方法
CN104217152A (zh) * 2014-09-23 2014-12-17 陈包容 一种移动终端在待机状态下进入应用程序的实现方法和装置
CN105931644A (zh) * 2016-04-15 2016-09-07 广东欧珀移动通信有限公司 一种语音识别方法及移动终端
CN106297775A (zh) * 2015-06-02 2017-01-04 富泰华工业(深圳)有限公司 语音识别装置及方法

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5384892A (en) * 1992-12-31 1995-01-24 Apple Computer, Inc. Dynamic language model for speech recognition
US6424946B1 (en) * 1999-04-09 2002-07-23 International Business Machines Corporation Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
US6434520B1 (en) * 1999-04-16 2002-08-13 International Business Machines Corporation System and method for indexing and querying audio archives
JP3662780B2 (ja) * 1999-07-16 2005-06-22 日本電気株式会社 自然言語を用いた対話システム
US6748356B1 (en) * 2000-06-07 2004-06-08 International Business Machines Corporation Methods and apparatus for identifying unknown speakers using a hierarchical tree structure
GB2407657B (en) * 2003-10-30 2006-08-23 Vox Generation Ltd Automated grammar generator (AGG)
JP4346571B2 (ja) * 2005-03-16 2009-10-21 富士通株式会社 音声認識システム、音声認識方法、及びコンピュータプログラム
US20150381801A1 (en) * 2005-04-21 2015-12-31 Verint Americas Inc. Systems, methods, and media for disambiguating call data to determine fraud
JP2009086132A (ja) * 2007-09-28 2009-04-23 Pioneer Electronic Corp 音声認識装置、音声認識装置を備えたナビゲーション装置、音声認識装置を備えた電子機器、音声認識方法、音声認識プログラム、および記録媒体
CA2717992C (en) * 2008-03-12 2018-01-16 E-Lane Systems Inc. Speech understanding method and system
US8537978B2 (en) * 2008-10-06 2013-09-17 International Business Machines Corporation Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
US8315866B2 (en) * 2009-05-28 2012-11-20 International Business Machines Corporation Generating representations of group interactions
KR20110036385A (ko) * 2009-10-01 2011-04-07 삼성전자주식회사 사용자 의도 분석 장치 및 방법
DE102009051508B4 (de) * 2009-10-30 2020-12-03 Continental Automotive Gmbh Vorrichtung, System und Verfahren zur Sprachdialogaktivierung und -führung
GB2489489B (en) * 2011-03-30 2013-08-21 Toshiba Res Europ Ltd A speech processing system and method
JP2013005195A (ja) * 2011-06-16 2013-01-07 Konica Minolta Holdings Inc 情報処理システム
JP5677901B2 (ja) * 2011-06-29 2015-02-25 みずほ情報総研株式会社 議事録作成システム及び議事録作成方法
JP6023434B2 (ja) * 2012-02-09 2016-11-09 岑生 藤岡 通信装置及び認証方法
CN102760434A (zh) * 2012-07-09 2012-10-31 华为终端有限公司 一种声纹特征模型更新方法及终端
US9098467B1 (en) * 2012-12-19 2015-08-04 Rawles Llc Accepting voice commands based on user identity
US9460722B2 (en) * 2013-07-17 2016-10-04 Verint Systems Ltd. Blind diarization of recorded calls with arbitrary number of speakers
KR20150093482A (ko) * 2014-02-07 2015-08-18 한국전자통신연구원 화자 분할 기반 다자간 자동 통번역 운용 시스템 및 방법과 이를 지원하는 장치
KR102097710B1 (ko) * 2014-11-20 2020-05-27 에스케이텔레콤 주식회사 대화 분리 장치 및 이에서의 대화 분리 방법
JP6669162B2 (ja) * 2015-03-31 2020-03-18 ソニー株式会社 情報処理装置、制御方法、およびプログラム
JP6739907B2 (ja) * 2015-06-18 2020-08-12 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 機器特定方法、機器特定装置及びプログラム
CN106487514A (zh) * 2015-09-01 2017-03-08 北京三星通信技术研究有限公司 语音通信加密方法、解密方法及其装置
US10269372B1 (en) * 2015-09-24 2019-04-23 United Services Automobile Association (Usaa) System for sound analysis and recognition
US10049666B2 (en) * 2016-01-06 2018-08-14 Google Llc Voice recognition system
CN106098068B (zh) * 2016-06-12 2019-07-16 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
CN108447471B (zh) 2017-02-15 2021-09-10 腾讯科技(深圳)有限公司 语音识别方法及语音识别装置
US10147438B2 (en) * 2017-03-02 2018-12-04 International Business Machines Corporation Role modeling in call centers and work centers
US10347244B2 (en) * 2017-04-21 2019-07-09 Go-Vivace Inc. Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response
US10403288B2 (en) * 2017-10-17 2019-09-03 Google Llc Speaker diarization
US10636427B2 (en) * 2018-06-22 2020-04-28 Microsoft Technology Licensing, Llc Use of voice recognition to generate a transcript of conversation(s)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004094158A (ja) * 2002-09-04 2004-03-25 Ntt Comware Corp 母音検索を利用した声紋認証装置
CN103888606A (zh) * 2014-03-11 2014-06-25 上海乐今通信技术有限公司 移动终端及其解锁方法
CN104217152A (zh) * 2014-09-23 2014-12-17 陈包容 一种移动终端在待机状态下进入应用程序的实现方法和装置
CN106297775A (zh) * 2015-06-02 2017-01-04 富泰华工业(深圳)有限公司 语音识别装置及方法
CN105931644A (zh) * 2016-04-15 2016-09-07 广东欧珀移动通信有限公司 一种语音识别方法及移动终端

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3584786A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111756603A (zh) * 2019-03-26 2020-10-09 北京京东尚科信息技术有限公司 智能家居系统的控制方法、装置、电子设备和可读介质
CN111756603B (zh) * 2019-03-26 2023-05-26 北京京东尚科信息技术有限公司 智能家居系统的控制方法、装置、电子设备和可读介质
CN110163630A (zh) * 2019-04-15 2019-08-23 中国平安人寿保险股份有限公司 产品监管方法、装置、计算机设备及存储介质
CN110163630B (zh) * 2019-04-15 2024-04-05 中国平安人寿保险股份有限公司 产品监管方法、装置、计算机设备及存储介质
CN112218412A (zh) * 2019-07-10 2021-01-12 上汽通用汽车有限公司 基于语音识别的车内氛围灯控制系统和控制方法
WO2024114303A1 (zh) * 2022-11-30 2024-06-06 腾讯科技(深圳)有限公司 音素识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
US20190295534A1 (en) 2019-09-26
EP3584786B1 (en) 2021-02-24
EP3584786A1 (en) 2019-12-25
US20210249000A1 (en) 2021-08-12
JP6771805B2 (ja) 2020-10-21
US11562736B2 (en) 2023-01-24
CN108447471A (zh) 2018-08-24
KR20190082900A (ko) 2019-07-10
CN108447471B (zh) 2021-09-10
JP2020505643A (ja) 2020-02-20
US11043211B2 (en) 2021-06-22
KR102222317B1 (ko) 2021-03-03
EP3584786A4 (en) 2019-12-25

Similar Documents

Publication Publication Date Title
WO2018149209A1 (zh) 语音识别方法、电子设备以及计算机存储介质
CN107301860B (zh) 基于中英文混合词典的语音识别方法及装置
US11823678B2 (en) Proactive command framework
CN111797632B (zh) 信息处理方法、装置及电子设备
WO2019000832A1 (zh) 一种声纹创建与注册方法及装置
US9589563B2 (en) Speech recognition of partial proper names by natural language processing
WO2017127296A1 (en) Analyzing textual data
WO2014117645A1 (zh) 信息的识别方法和装置
TWI536183B (zh) 語言歧義消除系統及方法
CN110047467B (zh) 语音识别方法、装置、存储介质及控制终端
TW201606750A (zh) 使用外國字文法的語音辨識
US20130030794A1 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN111768769A (zh) 语音交互方法、装置、设备及存储介质
WO2022143349A1 (zh) 一种确定用户意图的方法及装置
CN113112992A (zh) 一种语音识别方法、装置、存储介质和服务器
CN110945514B (zh) 用于分割句子的系统和方法
CN113051384A (zh) 基于对话的用户画像抽取方法及相关装置
TWI713870B (zh) 用於分割文本的系統和方法
US20230169988A1 (en) Method and apparatus for performing speaker diarization based on language identification
US20230113883A1 (en) Digital Signal Processor-Based Continued Conversation
US11069341B2 (en) Speech correction system and speech correction method
CN111382322B (zh) 字符串相似度的确定方法和装置
CN112037772A (zh) 基于多模态的响应义务检测方法、系统及装置
CN117334201A (zh) 一种声音识别方法、装置、设备以及介质
CN115712699A (zh) 语音信息提取方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17897119

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20197016994

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019539928

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017897119

Country of ref document: EP

Effective date: 20190916