WO2017000489A1 - 车载语音指令识别方法、装置和存储介质 - Google Patents

车载语音指令识别方法、装置和存储介质 Download PDF

Info

Publication number
WO2017000489A1
WO2017000489A1 PCT/CN2015/095269 CN2015095269W WO2017000489A1 WO 2017000489 A1 WO2017000489 A1 WO 2017000489A1 CN 2015095269 W CN2015095269 W CN 2015095269W WO 2017000489 A1 WO2017000489 A1 WO 2017000489A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
intention
voice
determining
confidence
Prior art date
Application number
PCT/CN2015/095269
Other languages
English (en)
French (fr)
Inventor
旬丽辉
欧阳能钧
穆向禹
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to US15/738,946 priority Critical patent/US10446150B2/en
Priority to JP2017530131A priority patent/JP6458149B2/ja
Priority to KR1020177014756A priority patent/KR101955958B1/ko
Priority to EP15897016.0A priority patent/EP3319081A4/en
Publication of WO2017000489A1 publication Critical patent/WO2017000489A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/09Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • Embodiments of the present invention relate to the field of computer data processing technologies, and in particular, to a method, an apparatus, and a storage medium for a vehicle voice command recognition.
  • in-vehicle intelligent terminals have gradually become important supporting equipment for automobiles.
  • economic prosperity has also led to a sharp increase in the number of domestic cars, and people's travel habits have also changed, and people and vehicles have been coexisting for a longer period of time. Therefore, the functions of the in-vehicle intelligent terminal also change from simple driving navigation to multi-function direction.
  • the recognition and execution of voice commands is striking.
  • the existing in-vehicle intelligent terminal often cannot accurately recognize the user's voice command because of the limited instruction set provided by the vehicle.
  • the recognition rate of the current in-vehicle intelligent terminal is relatively high, but the recognition rate for various dialects is low. Since the adaptability to the different voices of the user is not strong, the recognition rate is not high, resulting in user's use obstacles. In this way, the ratio of the voice command recognition function that the user actually uses the in-vehicle smart terminal is very low.
  • an embodiment of the present invention provides a method, apparatus, and storage medium for recognizing a vehicle voice command to improve a correct recognition rate of a voice command.
  • an embodiment of the present invention provides a method for recognizing a vehicle voice command, and the method package include:
  • an embodiment of the present invention further provides an in-vehicle voice instruction recognition apparatus, where the apparatus includes:
  • An instruction acquisition module configured to acquire a voice instruction input by a user
  • a basic information determining module configured to determine basic information of the user according to the pre-trained deep neural network DNN model
  • An intention identification module configured to perform content recognition on the voice instruction according to the basic information of the user, and determine at least one user's possible intention according to the identified content and a scene page context in which the user inputs the voice instruction;
  • a confidence determination module configured to determine a confidence level of a user's possible intention according to the DNN model
  • An intention determination module configured to determine a true intention of the user from the possible intentions of the user according to the confidence level
  • An action execution module configured to perform a corresponding action according to the real intention of the user.
  • embodiments of the present invention provide one or more storage media including computer executable instructions for performing an in-vehicle voice instruction recognition method when executed by a computer processor, the method Includes the following steps:
  • the method, device and storage medium for in-vehicle voice instruction obtained by the embodiment of the present invention obtain basic information of a user by using a deep neural network DNN model, and determine a possible intention of the user according to a scenario page context when the user inputs a voice instruction, and use the deep neural network
  • the network DNN model calculates the confidence of the possible intent, and finally confirms the user's true intention according to the confidence, performs the corresponding operation, and effectively improves the correct recognition rate of the user's voice instruction.
  • FIG. 1 is a flowchart of a method for recognizing a vehicle-mounted voice command according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of determining basic information in a vehicle-mounted voice command recognition method according to a second embodiment of the present invention
  • FIG. 3 is a flowchart of a method for recognizing a vehicle voice command according to a third embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for recognizing a vehicle voice command according to a fourth embodiment of the present invention.
  • FIG. 5 is a flowchart of determining a confidence level in a vehicle voice command recognition method according to a fifth embodiment of the present invention.
  • FIG. 6 is a flowchart of intention determination in a method for recognizing a vehicle-mounted voice command according to a sixth embodiment of the present invention.
  • FIG. 7 is a flowchart of an action execution in a vehicle voice command recognition method according to a seventh embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a method for recognizing a vehicle-mounted voice command according to an eighth embodiment of the present invention.
  • FIG. 9 is a structural diagram of a vehicle-mounted voice command recognition apparatus according to a ninth embodiment of the present invention.
  • the technical solution can be performed by an in-vehicle voice command recognition device.
  • the in-vehicle voice command recognition device may be integrated in a server on the network side.
  • the server receives the voice command input by the user on the vehicle through the Internet, processes the received voice command, and instructs the vehicle to perform the next action according to the result of the processing.
  • the in-vehicle voice command recognition device may also be integrated in the computing device on the terminal side. At this time, the acquisition of the voice instruction by the computing device does not need to pass through the Internet.
  • the method for identifying a vehicle voice command includes:
  • the voice command may indicate the next step that the user needs to perform the vehicle. For example, if the voice command is: "Relive Jay Chou's song,” then the car should next perform the action of playing all of Jay's songs.
  • some basic information of the user needs to be determined according to the input voice of the user.
  • the basic information includes: a time when the voice command is input, a place where the voice command is input, an age, a gender, a place of birth, and even a career of the user who performs the voice performing the input action.
  • a “portrait” of the user is defined.
  • the “portrait” is a profiled data structure for storing the above basic information.
  • the attributes of each of the basic information of the user are stored as a field of the "portrait”.
  • a DNN module In order to determine the basic information of the user according to the input voice of the user, it is necessary to train a DNN module in advance. type. In the process of training, features such as zero-crossing rate, short-time energy, cepstral coefficient and fundamental frequency in the training speech can be extracted as characteristic parameters of the training speech, and input as input parameters to the DNN model, and according to A difference between an output parameter of the DNN model and an annotation parameter of the training speech to determine a model parameter of the DNN model. After the training is completed, after receiving an input voice input by the user, the DNN model can accurately determine the basic information of the user's age, gender, place of origin, occupation, and the like according to the characteristics of the input voice.
  • S13 Perform content recognition on the voice instruction according to the basic information of the user, and determine at least one user's possible intention according to the identified content and a scene page context in which the user inputs the voice instruction.
  • the content recognition performed on the voice instruction is a voice recognition of the voice command.
  • the voice recognition of the voice command is voice recognition performed with reference to the basic information of the user.
  • the user's voice instruction can be voice-recognized with reference to the user's hometown attribute and the accent characteristics of the user's hometown corresponding to the region.
  • the user's possible intent is further determined for the voice command.
  • the user may be intended to be a possible purpose when the user inputs the voice command.
  • the user may intend to correspond to at least one operation that the vehicle should perform next.
  • the user's possible intent to perform the intent recognition on the voice command "Relive Jay Chou's Song” may correspond to the selection of Jay Chou's song of the vehicle and the operation of playing the selected song.
  • a confidence level for each user's possible intent is determined based on the DNN model. Further, the result of content recognition of the voice instruction may be analyzed, and the result is input to the DNN model to obtain a confidence degree of different user possible intentions.
  • an action corresponding to the true intention is performed.
  • the action may be playing a voice, playing a video, displaying a picture, opening a web page, or the like.
  • the embodiment determines the basic information of the user according to the pre-trained deep neural network DNN model by acquiring the voice instruction input by the user, and performs content recognition on the voice instruction according to the basic information of the user, and according to the identified content and the user input,
  • the scene page context of the voice instruction determines at least one user's possible intent, determines a confidence level of the user's possible intent based on the DNN model, determines a user's true intent from the user's possible intent based on the confidence, and performs according to the true intent
  • the corresponding action effectively improves the correct recognition rate of the voice command.
  • determining the basic information of the user according to the pre-trained deep neural network DNN model includes: extracting a voice feature parameter from the voice command, wherein the voice feature parameter includes at least one of the following: a zero crossing rate a short-time energy, a cepstrum coefficient, and a fundamental frequency; the speech feature parameter, the position, and the time are used as input parameters of the DNN, and basic information of the user is determined according to an output parameter of the DNN, where The basic information includes at least one of the following: gender, age, place of origin, and occupation of the user.
  • determining basic information of the user according to the pre-trained DNN model includes:
  • the voice feature parameter includes at least one of a zero crossing rate, a short time energy, a cepstral coefficient, and a fundamental frequency.
  • a speech feature parameter extracted from the speech instruction may be input to the DNN model as a feature of the speech instruction.
  • the voice feature parameter, location, and time are used as input parameters of the DNN, and basic information of the user is determined according to an output parameter of the DNN.
  • the DNN is a model that is pre-trained according to the DNN theory and used to determine the basic information of the user.
  • the basic information includes: gender, age, place of origin, and occupation of the user.
  • the DNN has an input layer, a hidden layer, and an output layer.
  • the input layer is for receiving input parameters; the output layer is for outputting an operation result; and the hidden layer is for taking according to the input parameter Value, calculate the result of the operation.
  • the input parameters include: a voice feature parameter, a location where the user is located when the voice command is input, and a time when the voice command is input.
  • the input parameter may further include: a called user identification number (CUID).
  • CUID has an important reference value in determining basic information such as the gender, age, and the like of the user.
  • the voice feature parameter is extracted from the voice command, and the voice feature parameter, the position, and the time are used as input parameters of the DNN, and the basicity of the user is determined according to the input parameter of the DNN.
  • the information realizes the judgment of the basic information of the user through the DNN.
  • the present embodiment further provides a technical solution of the vehicle voice command recognition method based on the above embodiment of the present invention.
  • the content of the voice instruction is identified according to the basic information of the user, and determining, according to the identified content and the scene page context of the user inputting the voice instruction, that the at least one user may include: acquiring the voice input by the user A page that has appeared within a predetermined length of time before the instruction, and determines the user's possible intention according to the page that appears within the predetermined length of time, the dwell time of each page, and the key identifying corpus in the voice command.
  • the method for identifying a vehicle voice command includes:
  • a page that has appeared within a predetermined length of time before the user inputs the voice command and a time of stay of the user on the page that has appeared may be set by setting a session object.
  • the page that has appeared within a predetermined length of time before the user inputs the voice instruction is obtained from the Session object, and the user is in each The dwell time on the page, combined with the recognition corpus of the voice command, comprehensively judges the user's possible intention.
  • the re-planning navigation line can be determined as the user's possible intention.
  • the page that has appeared in the predetermined length period before the user inputs the voice instruction is acquired, and according to the page that appears in the predetermined length period, the stay time of each page, and The key recognition corpus in the voice command determines the user's possible intention, and achieves an accurate judgment of the user's possible intention.
  • the present embodiment further provides a technical solution of the vehicle voice command recognition method based on the above embodiment of the present invention.
  • the content of the voice instruction is identified according to the basic information of the user, and determining, according to the identified content and the scene page context of the user inputting the voice instruction, that the at least one user may include: acquiring the voice input by the user A predetermined number of occurrences of the page are preceded by the instruction, and the user's possible intent is determined based on the predetermined number of occurrences of the page, the dwell time of each page, and the key identifying corpus in the voice command.
  • the method for identifying a vehicle voice command includes:
  • a Session object can be set and in the Session The object stores a predetermined number of occurrences of the page before the input of the voice command, and a time of stay of the user on the page that has appeared.
  • the previously stored page and the dwell time parameter are obtained from the Session object, and combined with the recognition corpus of the voice instruction, the user's possible intention is comprehensively determined.
  • the two pages that appear before the input voice command are the music play page and the map navigation page.
  • the user's stay time on the music play page and the map navigation page is 3 minutes and 2 to 10 minutes, respectively, and the identification corpus contains the keyword "navigation".
  • the actual intention of the user in this case is likely to be to re-plan the navigation line.
  • the key identifying corpus in the instruction determines the user's possible intent and achieves an accurate judgment of the user's possible intent.
  • determining a confidence level of the user's possible intention according to the DNN model includes: using a voice feature parameter of the voice instruction as an input parameter, and using the DNN model to evaluate an emotional state when the user inputs the voice instruction; A confidence level of the user's possible intent is obtained based on the emotional state.
  • the confidence level of determining the user's possible intention according to the DNN model includes:
  • the voice feature parameter of the voice instruction is used as an input parameter, and the DNN model is used to evaluate an emotional state when the user inputs the voice command.
  • the DNN model can be used not only to determine the basic information of the user, but also to evaluate the emotional state when the user inputs the voice instruction when determining the confidence of the possible intent.
  • a plurality of possible emotional states of the user may be defined in advance.
  • the user's emotional state may include: happy, sad, angry, and so on.
  • output units corresponding to different emotional states are set on the output layer of the DNN model.
  • the DNN can be used for the evaluation of the emotional state.
  • the value of the confidence corresponding to the different emotional states of the user may be specified according to the empirical value.
  • the confidence state may be specified according to experience, and the value of the confidence is the highest, and the value of the confidence is the lowest when the emotional state of the sadness is specified.
  • the DNN model is used to evaluate an emotional state when the user inputs the voice command, and a confidence level of the possible intention is obtained according to the emotion state, thereby The DNN model is used to evaluate the emotional state when the user inputs the voice command, and further determines the confidence level of the user's possible intention according to the emotional state.
  • determining the true intention of the user from the possible intent of the user according to the confidence level comprises: matching the confidence level with a confidence interval corresponding to the user's possible intention; matching the confidence degree The user corresponding to the highest degree of confidence interval may be intended as the user's true intent.
  • determining the true intention of the user from the possible intentions of the user according to the confidence includes:
  • Different possible intents correspond to corresponding confidence intervals.
  • the possible confidence interval for the intention of "re-planning the navigation line” is between 0.45 and 0.6.
  • the confidence interval corresponding to each possible intent is pre-acquired, and after obtaining the possible intent corresponding to the voice instruction and the confidence of the possible intent, the confidence is matched with each collected confidence interval.
  • the possible intent may also be accompanied by its corresponding parameters.
  • the parameters that may be attached to the intention of "changing the play mode" include: target play mode such as loop play, sequential play, random play, and the like.
  • target play mode such as loop play, sequential play, random play, and the like.
  • each of the attached parameters should be treated as a The independent scheme separately collects its corresponding confidence interval, and after obtaining the confidence, matches the confidence with the separately collected confidence interval.
  • the user corresponding to the confidence interval with the highest degree of confidence matching may be the user's true intention.
  • the possible intent corresponding to the highest degree of confidence interval is taken as the user's true intention.
  • the present embodiment further provides a technical solution intended to be determined in the method for identifying a vehicle-mounted voice command based on the above-described embodiments of the present invention.
  • performing the corresponding action according to the real intention of the user includes: if the execution condition of the real intention of the user is established, performing an action corresponding to the real intention of the user; if the execution condition of the real intention of the user is not established Terminating the execution of the action corresponding to the user's true intention, and prompting the user; if the execution condition of the user's true intention is uncertain, performing an action similar to the user's true intention.
  • performing corresponding actions according to the user's true intention includes:
  • whether or not the action corresponding to the real intention is performed depends on whether the execution condition of the real intention is established. For example, if the real intention is to "view WeChat", then the corresponding execution condition should be in the state of parking. If a voice command is received and the time when the real intention "view WeChat" is recognized is in the parking state, the action corresponding to the real intention is performed, that is, the WeChat is viewed.
  • the recognition of the user's actual intention of the intention may be uncertain.
  • an action similar to the real intention of the user should be performed, but it is necessary to ensure that the similar action is a safe action.
  • the execution condition of the real intention of the user when the execution condition of the real intention of the user is established, the action corresponding to the real intention of the user is performed, and when the execution condition of the real intention of the user is not established, the execution of the action corresponding to the real intention of the user is terminated.
  • the execution condition of the user's true intention is uncertain, that is, an action similar to the user's true intention is performed, thereby ensuring the security of the executed action by reconfirming the execution condition.
  • the method for recognizing the vehicle-mounted voice command includes: determining basic information of the user; acquiring the possible intention of the user according to the session processing; and obtaining the confidence of the different possible intentions of the user according to the intention confidence processing; according to the security processing, Determine the action that should be performed; determine whether to perform the corresponding action based on the result of the comprehensive judgment.
  • the method for identifying a vehicle voice command includes:
  • the basic information of the user is identified by the pre-trained DNN.
  • the basic information includes the user's age, gender, place of origin, occupation, and the like.
  • the Session object According to the use of the Session object to store the page that the user has used before issuing the voice instruction, the user's possible intention is obtained.
  • the confidence of different possible intents is also identified based on the pre-trained DNN.
  • the in-vehicle speech execution identification device includes an instruction acquisition module 91, a basic information determination module 92, an intent identification module 93, a confidence determination module 94, and an action execution module 96.
  • the instruction acquisition module 91 is configured to acquire a voice instruction input by a user.
  • the basic information determining module 92 is configured to determine basic information of the user according to the pre-trained deep neural network DNN model.
  • the intent identification module 93 is configured to perform content recognition on the voice instruction according to the basic information of the user, and determine at least one user intent according to the identified content and the scene page context in which the user inputs the voice instruction.
  • the confidence determination module 94 is configured to determine a confidence level of a user's possible intent based on the DNN model.
  • the intent determination module 95 is configured to determine a user's true intent from the user's possible intent based on the confidence.
  • the action execution module 96 is configured to perform a corresponding action according to the user's true intention.
  • the basic information determining module 92 includes: a feature extracting unit and a DNN identifying unit.
  • the feature extraction unit is configured to extract a voice feature parameter from the voice command, where the
  • the speech feature parameters include at least one of the following: a zero crossing rate, a short time energy, a cepstral coefficient, and a fundamental frequency.
  • the DNN identification unit is configured to use the voice feature parameter, the location, and the time as input parameters of the DNN, and determine basic information of the user according to an output parameter of the DNN, where the basic information includes At least one of the following: gender, age, place of origin, and occupation of the user.
  • the intent identification module 93 includes: a first intent identification unit or a second intent identification unit.
  • the first intent identification unit is configured to acquire a page that has appeared within a predetermined length of time before the user inputs the voice instruction, and according to the page that appears in the predetermined length time period, the dwell time of each page, and the voice
  • the key identifying corpus in the instruction determines the user's possible intent.
  • the second intent identification unit is configured to acquire a predetermined number of occurrences of the page before the user inputs the voice instruction, and according to the predetermined number of occurrences of the page, the dwell time of each page and the key in the voice instruction
  • the recognition corpus determines the user's possible intent.
  • the confidence determination module 94 includes: an emotion evaluation unit and a confidence acquisition unit.
  • the emotion evaluation unit is configured to use the voice feature parameter of the voice instruction as an input parameter, and use the DNN model to evaluate an emotional state when the user inputs the voice instruction.
  • the confidence acquisition unit is configured to acquire a confidence level of the user's possible intention according to the emotional state.
  • the intent determination module 95 includes: a matching unit and a real intent acquisition unit.
  • the matching unit is configured to match the confidence level with a confidence interval corresponding to the user's possible intention.
  • the real intention acquisition unit is configured to use the user's possible intention corresponding to the confidence interval with the highest degree of confidence matching as the user's true intention.
  • the action execution module 96 includes: a first action execution unit, a second action execution unit, and a third action execution unit.
  • the first action execution unit is configured to perform an action corresponding to the user's true intention when the execution condition of the user's true intention is established.
  • the second action execution unit is configured to: when the execution condition of the user's true intention is not established, The execution of the action corresponding to the user's true intent is terminated and the user is prompted.
  • the third action execution unit is configured to perform an action similar to the user's true intention when the execution condition of the user's real intention is uncertain.
  • the image search device may perform the image search method provided by any embodiment of the present invention, and has a function module and a beneficial effect corresponding to the execution method.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device, which can be centralized on a single computing device or distributed over a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computer device, so that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or multiple modules thereof Or the steps are made into a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • One or more storage media containing computer executable instructions for performing an in-vehicle voice instruction recognition method when executed by a computer processor, the method comprising the steps of:
  • the foregoing storage medium when performing the method, determines basic information of the user according to the pre-trained deep neural network DNN model, including:
  • Extracting a voice feature parameter from the voice command wherein the voice feature parameter includes at least one of: a zero crossing rate, a short time energy, a cepstrum coefficient, and a fundamental frequency;
  • the output parameters of the DNN determine basic information of the user, wherein the basic information includes at least one of the following: gender, age, place of origin, and occupation of the user.
  • the storage medium When the method is executed, the storage medium performs content recognition on the voice instruction according to the basic information of the user, and determines, according to the identified content and the scenario page context input by the user, the at least one user may include:
  • determining a confidence level of the user's possible intention according to the DNN model includes:
  • a confidence level of the user's possible intent is obtained based on the emotional state.
  • determining the true intention of the user from the possible intentions of the user according to the confidence includes:
  • the user corresponding to the confidence interval with the highest degree of confidence matching may be intended as the user's true intention.
  • performing corresponding actions according to the real intention of the user includes:
  • portions of the technical solution of the present invention that contribute substantially or to the prior art may be embodied in the form of a software product that may be stored in a computer readable storage medium, such as a magnetic disk.
  • a computer readable storage medium such as a magnetic disk.
  • each unit and module included is divided according to functional logic, but is not limited to the above-mentioned division, as long as the corresponding function can be implemented;
  • the specific names of the functional units are also only for the purpose of distinguishing between the two, and are not intended to limit the scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Hospice & Palliative Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Child & Adolescent Psychology (AREA)
  • Mechanical Engineering (AREA)
  • User Interface Of Digital Computer (AREA)
  • Navigation (AREA)

Abstract

一种车载语音指令识别方法、装置和存储介质。所述方法包括:获取用户输入的语音指令(S11);根据预先训练的深层神经网络DNN模型确定用户的基本信息(S12);根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图(S13);根据所述DNN模型确定用户可能意图的置信度(S14);根据所述置信度从所述用户可能意图中确定用户真实意图(S15);根据所述用户真实意图执行对应的动作(S16)。所述车载语音指令识别方法、装置和存储介质能够有效的提高语音指令的正确识别率。

Description

车载语音指令识别方法、装置和存储介质
本专利申请要求于2015年07月02日提交的,申请号为201510382215.9,申请人为百度在线网络技术(北京)有限公司,发明名称为“车载语音指令识别方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本发明实施例涉及计算机数据处理技术领域,尤其涉及一种车载语音指令识别方法、装置和存储介质。
背景技术
汽车工业的发展以及电子市场的成熟,车载智能终端逐渐成为汽车重要的配套设备。近年来,经济的繁荣也促使国内的汽车数量急剧增加,人们的出行习惯也随之发生变化,人车共处的时间越来越长。因此,车载智能终端的功能也从简单的行车导航向多功能的方向改变。
在新近发展出的众多功能中,语音指令的识别和执行十分引人注目。然而,现有的车载智能终端由于自身配备的指令集有限,经常对用户的语音指令不能准确识别。比如,对于普通话的语音指令,目前的车载智能终端的识别率还比较高,但是对于各种方言识别率则较低。由于对于用户的不同语音的适应性不强,识别率不高,造成用户的使用障碍。这样,用户真正使用车载智能终端的语音指令识别功能的比率很低。
发明内容
针对上述技术问题,本发明实施例提供了一种车载语音指令识别方法、装置和存储介质,以提高语音指令的正确识别率。
第一方面,本发明实施例提供了一种车载语音指令识别方法,所述方法包 括:
获取用户输入的语音指令;
根据预先训练的深层神经网络DNN模型确定用户的基本信息;
根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
根据所述DNN模型确定用户可能意图的置信度;
根据所述置信度从所述用户可能意图中确定用户真实意图;
根据所述用户真实意图执行对应的动作。
第二方面,本发明实施例还提供了一种车载语音指令识别装置,所述装置包括:
指令获取模块,用于获取用户输入的语音指令;
基本信息确定模块,用于根据预先训练的深层神经网络DNN模型确定用户的基本信息;
意图识别模块,用于根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
置信度确定模块,用于根据所述DNN模型确定用户可能意图的置信度;
意图确定模块,用于根据所述置信度从所述用户可能意图中确定用户真实意图;
动作执行模块,用于根据所述用户真实意图执行对应的动作。
第三方面,本发明实施例提供了一个或多个包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种车载语音指令识别方法,所述方法包括以下步骤:
获取用户输入的语音指令;
根据预先训练的深层神经网络DNN模型确定用户的基本信息;
根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
根据所述DNN模型确定用户可能意图的置信度;
根据所述置信度从所述用户可能意图中确定用户真实意图;
根据所述用户真实意图执行对应的动作。
本发明实施例提供的车载语音指令识别方法、装置和存储介质,通过利用深层神经网络DNN模型获取用户的基本信息,根据用户输入语音指令时的场景页面上下文判断用户可能意图,利用所述深层神经网络DNN模型计算所述可能意图的置信度,并最后根据该置信度确认用户真实意图,执行相应的操作,有效的提高了用户的语音指令的正确识别率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需使用的附图作简单地介绍,当然,以下描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以对这些附图进行修改和替换。
图1是本发明第一实施例提供的车载语音指令识别方法的流程图;
图2是本发明第二实施例提供的车载语音指令识别方法中基本信息确定的流程图;
图3是本发明第三实施例提供的车载语音指令识别方法的流程图;
图4是本发明第四实施例提供的车载语音指令识别方法的流程图;
图5是本发明第五实施例提供的车载语音指令识别方法中置信度确定的流程图;
图6是本发明第六实施例提供的车载语音指令识别方法中意图确定的流程图;
图7是本发明第七实施例提供的车载语音指令识别方法中动作执行的流程图;
图8是本发明第八实施例提供的车载语音指令识别方法的流程示意图;
图9是本发明第九实施例提供的车载语音指令识别装置的结构图。
具体实施方式
下面将结合附图对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例,是为了阐述本发明的原理,而不是要将本发明限制于这些具体的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
第一实施例
本实施例提供了车载语音指令识别方法的一种技术方案。本技术方案可以由车载语音指令识别装置来执行。所述车载语音指令识别装置可以集成在网络侧的服务器内。所述服务器通过互联网接收用户在车机上输入的语音指令,对所接收到的语音指令进行处理,并根据处理的结果,通过互联网指示车机下一步执行的动作。所述车载语音指令识别装置也可以集成在终端侧的计算设备内。此时,所述计算设备对所述语音指令的获取不需要通过互联网。
具体的,参见图1,所述车载语音指令识别方法包括:
S11,获取用户输入的语音指令。
随着车联网概念的兴起,汽车内一般都有具有网络连接功能的车机,用户可以通过汽车内部配备的车机输入语音指令。所述语音指令可以指明用户需要车机执行的下一步操作。例如,如果所述语音指令是:“重温周杰伦的歌曲”,则车机下一步应该执行播放所有周杰伦的歌曲的动作。
S12,根据预先训练的深层神经网络(Deep neutral network,DNN)模型确定用户的基本信息。
在本实施例中,需要根据用户的输入语音确定用户的一些基本信息。所述基本信息包括:输入语音指令的时间、输入语音指令的地点、执行语音执行输入动作的用户的年龄、性别、籍贯,甚至是职业。
为了统一对上述基本信息的存储和解析,定义了用户的“画像”。所述“画像”是用来存储上述基本信息的一种简档式的数据结构。用户的每一种基本信息的属性,被作为所述“画像”的一个字段存储起来。
为了根据用户的输入语音确定用户的基本信息,需要预先训练一个DNN模 型。在训练的过程中,可以将训练语音中的过零率、短时能量、倒谱系数和基频等特征作为训练语音的特征参数提取出来,并作为输入参数输入至所述DNN模型,并根据所述DNN模型的输出参数与所述训练语音的标注参数之间的差异来确定所述DNN模型的模型参数。训练完成之后,当接收到用户输入的一段输入语音之后,所述DNN模型能够根据所述输入语音的特征准确的判断用户的年龄、性别、籍贯、职业等基本信息。
S13,根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图。
对所述语音指令执行的内容识别是对所述语音指令的语音识别。而对所述语音指令的语音识别是参考用户的基本信息进行的语音识别。例如,可以参考用户的籍贯属性,以及用户的籍贯对应的地区的口音特点,对用户的语音指令进行语音识别。
完成了对所述语音指令的内容识别之后,进一步的对所述语音指令确定用户可能意图。所述用户可能意图是用户输入所述语音指令时的可能的目的。所述用户可能意图对应于车机下一步应该执行的至少一个操作。例如,对所述语音指令“重温周杰伦的歌曲”进行意图识别所得到的用户可能意图,会对应于车机的选取周杰伦的歌曲,以及播放被选择的歌曲的操作。
S14,根据所述DNN模型确定用户可能意图的置信度。
在针对用户的输入语音确定了至少一个用户可能意图之后,根据所述DNN模型确定每个用户可能意图的置信度。进一步的,可以通过分析对所述语音指令进行内容识别的结果,并将该结果输入至所述DNN模型,得到不同的用户可能意图的置信度。
S15,根据所述置信度从所述用户可能意图中确定用户真实意图。
可以理解的是,经过置信度确定的操作之后,不同的用户可能意图对应于不同的置信度。此时,从所述用户可能意图中选择置信度与预先确定的置信度区间之间的匹配度最高的一个用户可能意图,作为所述语音指令所对应的用户真实意图。
S16,根据所述用户真实意图执行对应的动作。
确定了用户真实意图后,执行与所述真实意图相对应的动作。所述动作可以是播放语音、播放视频、显示图片、打开网页等。
本实施例通过获取用户输入的语音指令,根据预先训练的深层神经网络DNN模型确定用户的基本信息,根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图,根据所述DNN模型确定用户可能意图的置信度,根据所述置信度从所述用户可能意图中确定用户真实意图,以及根据所述真实意图执行对应的动作,有效的提高了语音指令的正确识别率。
第二实施例
本实施例以本发明的上述实施例为基础,进一步的提供了所述车载语音指令识别方法中基本信息确定的一种技术方案。在该技术方案中,根据预先训练的深层神经网络DNN模型确定用户的基本信息包括:从所述语音指令中提取语音特征参数,其中,所述语音特征参数包括下述至少一项:过零率、短时能量、倒谱系数以及基频;将所述语音特征参数、所述位置以及所述时间作为所述DNN的输入参数,并根据所述DNN的输出参数确定用户的基本信息,其中,所述基本信息包括下述至少一项:用户的性别、年龄、籍贯以及职业。
参见图2,根据预先训练的DNN模型确定用户的基本信息包括:
S21,从所述语音指令中提取语音特征参数。
可以从用户输入的语音指令中提取到若干语音特征参数。所述语音特征参数包括:过零率、短时能量、倒谱系数、基频中的至少一个。从所述语音指令中提取的语音特征参数可以作为所述语音指令的特征而被输入至所述DNN模型。
S22,将所述语音特征参数、位置以及时间作为所述DNN的输入参数,并根据所述DNN的输出参数确定用户的基本信息。
所述DNN是一个根据DNN理论而预先训练得到的,用于判断用户的基本信息的模型。所述基本信息包括:用户的性别、年龄、籍贯以及职业。
所述DNN有输入层、隐藏层以及输出层组成。所述输入层用于输入参数的接收;所述输出层用于输出运算结果;所述隐藏层用于根据所述输入参数的取 值,计算所述运算结果。
所述输入参数包括:语音特征参数、语音指令输入时用户所在的位置以及输入所述语音指令的时间。在将所述输入参数输入至所述DNN之后,根据所述输入层、隐藏层以及输出层的计算,可以得到对用户的基本信息的判断结果。
进一步优选的,所述输入参数还可以包括:被叫用户识别号(Called user identification number,CUID)。所述CUID在确定用户的性别、年龄等基本信息时具有重要的参考价值。
本实施例通过从所述语音指令中提取语音特征参数,以及将所述语音特征参数、所述位置以及所述时间作为所述DNN的输入参数,并根据所述DNN的输入参数确定用户的基本信息,实现了通过DNN对用户的基本信息的判断。
第三实施例
本实施例以本发明的上述实施例为基础,进一步的提供了所述车载语音指令识别方法的一种技术方案。在该技术方案中,根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图包括:获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
参见图3,所述车载语音指令识别方法包括:
S31,获取用户输入的语音指令。
S32,根据预先训练的深层神经网络DNN模型确定用户的基本信息。
S33,获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
可以通过设置会话(Session)对象,并在所述Session对象中存储用户输入所述语音指令之前预定长度的时间段内出现过的页面,以及用户在所述出现过的页面上的停留时间。需要判断用户可能意图时,从所述Session对象中获取用户输入所述语音指令之前预定长度的时间段内出现过的页面,用户在各个 页面上的停留时间,并结合对所述语音指令的识别语料,综合判断用户可能意图。
比如,根据经验,如果在3分钟的预定长度的时间段内出现过的页面是地图导航页面,用户在地图导航页面上的停留时间是3分钟,并且所述识别语料中包含关键词“导航”,用户的实际意图很可能是重新规划导航线路,则出现上述情况时,可以将重新规划导航线路判定为用户可能意图。
S34,根据所述DNN模型确定用户可能意图的置信度。
S35,根据所述置信度从所述用户可能意图中确定用户真实意图。
S36,根据所述用户真实意图执行对应的动作。
本实施例通过在确定用户的基本信息之后,获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图,实现了对用户可能意图的准确判断。
第四实施例
本实施例以本发明的上述实施例为基础,进一步的提供了所述车载语音指令识别方法的一种技术方案。在该技术方案中,根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图包括:获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
参见图4,所述车载语音指令识别方法包括:
S41,获取用户输入的语音指令。
S42,根据预先训练的深层神经网络DNN模型确定用户的基本信息。
S43,获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
与本发明第三实施例相同,可以通过设置Session对象,并在所述Session 对象中存储输入所述语音指令之前预定数量的出现过的页面,以及用户在所述出现过的页面上的停留时间。需要判断用户可能意图时,从所述Session对象中获取之前存储的页面以及停留时间参数,并结合所述语音指令的识别语料,综合判断用户可能的意图。
举例来说,在输入语音指令之前出现过的两个页面分别是音乐播放页面以及地图导航页面。用户在音乐播放页面及地图导航页面上的停留时间分别是3分钟及2至10分钟,并且所述识别语料中包含关键词“导航”。根据经验,这种情况下用户的实际意图很可能是重新规划导航线路。则当出现上述情况时,可以判断用户可能意图为重新规划导航线路。
S44,根据所述DNN模型确定用户可能意图的置信度。
S45,根据所述置信度从所述用户可能意图中确定用户真实意图。
S46,根据所述用户真实意图执行对应的动作。
本实施例通过在确定了用户的基本信息之后,获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图,实现了对用户可能意图的准确判断。
第五实施例
本实施例以本发明的上述实施例为基础,进一步的提供了所述车载语音指令识别方法中置信度确定的一种技术方案。在该技术方案中,根据所述DNN模型确定用户可能意图的置信度包括:将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态;根据所述情绪状态获取所述用户可能意图的置信度。
参见图5,根据所述DNN模型确定用户可能意图的置信度包括:
S51,将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态。
所述DNN模型不仅能够用于确定用户的基本信息,在确定可能意图的置信度时,还可以被用于评估用户输入所述语音指令时的情绪状态。
具体的,可以预先定义若干个用户可能的情绪状态。比如,用户的情绪状态可能包括:高兴、伤心、愤怒等。确定了用户的情绪状态之后,在所述DNN模型的输出层上设置不同情绪状态所对应的输出单元。这样,在完成了所述DNN的训练之后,所述DNN就可以被用于情绪状态的评估。
S52,根据所述情绪状态获取所述用户可能意图的置信度。
具体的,可以根据经验值指定用户的不同情绪状态所对应的置信度的取值。比如,可以根据经验指定高兴的情绪状态下,所述置信度的取值为最高,指定伤心的情绪状态下,所述置信度的取值为最低。
本实施例通过将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态,以及根据所述情绪状态获取所述可能意图的置信度,从而利用DNN模型对用户输入语音指令时的情绪状态进行了评估,并进一步的根据所述情绪状态确定了用户可能意图的置信度。
第六实施例
本实施例以本发明的上述实施例为基础,进一步的提供了所述车载语音指令识别方法中置信度确定的一种技术方案。在该技术方案中,根据所述置信度从所述用户可能意图中确定用户真实意图包括:将所述置信度与所述用户可能意图对应的置信度区间进行匹配;将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图。
参见图6,根据所述置信度从所述用户可能意图中确定用户真实意图包括:
S61,将所述置信度与所述用户可能意图对应的置信度区间进行匹配。
不同的可能意图对应有相应的置信度区间。比如,意图“重新规划导航线路”可能的置信度区间在0.45至0.6之间。预先采集各个可能意图所对应的置信度区间,并且在得到所述语音指令所对应的可能意图以及所述可能意图的置信度之后,将所述置信度与各个采集到的置信度区间进行匹配。
进一步优选的,可能的意图,也即用户可能意图,还可能还附带有其对应的参数。比如,“改变播放模式”的意图可能附带的参数包括:循环播放、顺序播放、随机播放等目标播放模式。此时,应当将每一个附带的参数作为一个 独立的方案,单独采集其对应的置信度区间,并在获取到所述置信度之后,将所述置信度与单独采集的置信度区间进行匹配。
S62,将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图。
经过置信度区间的匹配之后,将匹配程度最高的置信度区间对应的可能意图作为用户真实意图。
本实施例通过将所述置信度与所述用户可能意图对应的置信度区间进行匹配,以及将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图,从而实现了根据置信度参数的用户真实意图的识别。
第七实施例
本实施例以本发明的上述实施例为基础,进一步的提供了所述车载语音指令识别方法中意图确定的一种技术方案。在该技术方案中,根据所述用户真实意图执行对应的动作包括:若所述用户真实意图的执行条件成立,则执行所述用户真实意图对应的动作;若所述用户真实意图的执行条件不成立,终止对所述用户真实意图对应的动作的执行,并且提示用户;若所述用户真实意图的执行条件不确定,执行与所述用户真实意图相类似的动作。
参见图7,根据所述用户真实意图执行对应的动作包括:
S71,若所述用户真实意图的执行条件成立,则执行所述用户真实意图对应的动作。
在确定了用户真实意图之后,是否执行所述真实意图对应的动作取决于所述真实意图的执行条件是否成立。比如,真实意图是“查看微信”,则其对应的执行条件应该是在停车的状态下。如果接收到语音指令,并识别出真实意图“查看微信”的时间是在停车状态中,则执行所述真实意图对应的动作,即查看微信。
S72,若所述用户真实意图的执行条件不成立,终止对所述用户真实意图对应的动作的执行,并且提示用户。
以“查看微信”的真实意图为例,如果接收到语音指令,并识别出真实意 图“查看微信”的时间是在行车状态中,则不执行查看微信的动作,并且以消息的方式提示用户在当前状态下执行该动作的危险性。
S73,若所述用户真实意图的执行条件不确定,执行与所述用户真实意图相类似的动作。
当用户的情绪状态不好,或者用户真实意图判断不清楚的条件下,对用户真实意图的执行条件的识别可能会不确定。此时,应该执行与所述用户真实意图相类似的动作,但是,务必要保证所述类似的动作是安全的动作。
本实施例通过在所述用户真实意图的执行条件成立时,执行所述用户真实意图对应的动作,在所述用户真实意图的执行条件不成立时,终止对所述用户真实意图对应的动作的执行,以及在所述用户真实意图的执行条件不确定是,执行与所述用户真实意图相类似的动作,从而通过对执行条件的再次确认,保证了所执行的动作的安全性。
第八实施例
本实施例提供了所述车载语音指令识别方法的一种技术方案。在该技术方案中,所述车载语音指令识别方法包括:判断用户的基本信息;根据Session处理,获取用户可能意图;根据意图置信度处理,获取用户的不同可能意图的置信度;根据安全处理,确定应该执行的动作;根据综合判断的结果,确定是否执行相应的动作。
参见图8,所述车载语音指令识别方法包括:
S81,判断用户的基本信息。
本实施例中,通过预先训练的DNN识别用户的基本信息。所述基本信息包括用户的年龄、性别、籍贯、职业等。
S82,根据Session处理,获取用户可能意图。
根据利用Session对象存储用户在发出语音指令之前曾经使用过的页面,获取用户可能意图。
S83,根据意图置信度处理,获取用户的不同可能意图的置信度。
在本实施例中,同样根据预先训练的DNN识别不同的可能意图的置信度。
S84,根据安全处理,确定应该执行的动作。
通过对汽车当前状态的识别,确定需要执行的动作是否是安全动作,从而进一步的确定应该执行的动作。
S85,根据综合判断的结果,确定是否执行相应的动作。
通过对前面几个步骤的结果进行综合判断,确定是否应该执行相应的动作。
本实施例通过判断用户的基本信息,根据Session处理,获取用户可能意图,根据意图置信度处理,获取用户的不同可能意图的置信度,根据安全处理,确定应该执行的动作,以及根据综合判断的结果,确定是否执行相应的动作,从而完成了从语音指令的获取,到相应动作的执行的全过程。
第九实施例
本实施例提供了车载语音指令识别装置的一种技术方案。在该技术方案中,所述车载语音执行识别装置包括:指令获取模块91、基本信息确定模块92、意图识别模块93、置信度确定模块94以及动作执行模块96。
所述指令获取模块91用于获取用户输入的语音指令。
所述基本信息确定模块92用于根据预先训练的深层神经网络DNN模型确定用户的基本信息。
所述意图识别模块93用于根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图。
所述置信度确定模块94用于根据所述DNN模型确定用户可能意图的置信度。
所述意图确定模块95用于根据所述置信度从所述用户可能意图中确定用户真实意图。
所述动作执行模块96用于根据所述用户真实意图执行对应的动作。
进一步的,所述基本信息确定模块92包括:特征提取单元以及DNN识别单元。
所述特征提取单元用于从所述语音指令中提取语音特征参数,其中,所述 语音特征参数包括下述至少一项:过零率、短时能量、倒谱系数以及基频。
所述DNN识别单元用于将所述语音特征参数、所述位置以及所述时间作为所述DNN的输入参数,并根据所述DNN的输出参数确定用户的基本信息,其中,所述基本信息包括下述至少一项:用户的性别、年龄、籍贯以及职业。
进一步的,所述意图识别模块93包括:第一意图识别单元或者第二意图识别单元。
所述第一意图识别单元用于获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
所述第二意图识别单元用于获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
进一步的,所述置信度确定模块94包括:情绪评估单元以及置信度获取单元。
所述情绪评估单元用于将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态。
所述置信度获取单元用于根据所述情绪状态获取所述用户可能意图的置信度。
进一步的,所述意图确定模块95包括:匹配单元以及真实意图获取单元。
所述匹配单元用于将所述置信度与所述用户可能意图对应的置信度区间进行匹配。
所述真实意图获取单元用于将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图。
进一步的,所述动作执行模块96包括:第一动作执行单元、第二动作执行单元以及第三动作执行单元。
所述第一动作执行单元用于当所述用户真实意图的执行条件成立之时,执行所述用户真实意图对应的动作。
所述第二动作执行单元用于当所述用户真实意图的执行条件不成立之时, 终止对所述用户真实意图对应的动作的执行,并且提示用户。
所述第三动作执行单元用于当所述用户真实意图的执行条件不确定之时,执行与所述用户真实意图相类似的动作。
上述图片搜索装置可执行本发明任意实施例所提供的图片搜索方法,具备执行方法相应的功能模块和有益效果。
本领域普通技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个计算装置上,或者分布在多个计算装置所组成的网络上,可选地,他们可以用计算机装置可执行的程序代码来实现,从而可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件的结合。
第十实施例
一个或多个包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种车载语音指令识别方法,其特征在于,所述方法包括以下步骤:
获取用户输入的语音指令;
根据预先训练的深层神经网络DNN模型确定用户的基本信息;
根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
根据所述DNN模型确定用户可能意图的置信度;
根据所述置信度从所述用户可能意图中确定用户真实意图;
根据所述用户真实意图执行对应的动作。
上述存储介质在执行所述方法时,根据预先训练的深层神经网络DNN模型确定用户的基本信息包括:
从所述语音指令中提取语音特征参数,其中,所述语音特征参数包括下述至少一项:过零率、短时能量、倒谱系数以及基频;
将所述语音特征参数、位置以及时间作为所述DNN的输入参数,并根据所 述DNN的输出参数确定用户的基本信息,其中,所述基本信息包括下述至少一项:用户的性别、年龄、籍贯以及职业。
上述存储介质在执行所述方法时,根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图包括:
获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图;或者
获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
上述存储介质在执行所述方法时,根据所述DNN模型确定用户可能意图的置信度包括:
将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态;
根据所述情绪状态获取所述用户可能意图的置信度。
上述存储介质在执行所述方法时,根据所述置信度从所述用户可能意图中确定用户真实意图包括:
将所述置信度与所述用户可能意图对应的置信度区间进行匹配;
将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图。
上述存储介质在执行所述方法时,根据所述用户真实意图执行对应的动作包括:
若所述用户真实意图的执行条件成立,则执行所述用户真实意图对应的动作;
若所述用户真实意图的执行条件不成立,终止对所述用户真实意图对应的动作的执行,并且提示用户;
若所述用户真实意图的执行条件不确定,执行与所述用户真实意图相类似 的动作。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通过硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式,基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如磁盘、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器,或者网络设备等)执行本发明各个实施例所述的方法。
值得注意的是,上述车载语音指令识别装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限定本发明的保护范围。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围以所述权利要求的保护范围为准。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间的相同或相似的部分互相参见即可。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围以所述权利要求的保护范围为准。

Claims (13)

  1. 一种车载语音指令识别方法,其特征在于,包括:
    获取用户输入的语音指令;
    根据预先训练的深层神经网络DNN模型确定用户的基本信息;
    根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
    根据所述DNN模型确定用户可能意图的置信度;
    根据所述置信度从所述用户可能意图中确定用户真实意图;
    根据所述用户真实意图执行对应的动作。
  2. 根据权利要求1所述的方法,其特征在于,根据预先训练的深层神经网络DNN模型确定用户的基本信息包括:
    从所述语音指令中提取语音特征参数,其中,所述语音特征参数包括下述至少一项:过零率、短时能量、倒谱系数以及基频;
    将所述语音特征参数、位置以及时间作为所述DNN的输入参数,并根据所述DNN的输出参数确定用户的基本信息,其中,所述基本信息包括下述至少一项:用户的性别、年龄、籍贯以及职业。
  3. 根据权利要求1所述的方法,其特征在于,根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图包括:
    获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图;或者
    获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
  4. 根据权利要求1所述的方法,其特征在于,根据所述DNN模型确定用户可能意图的置信度包括:
    将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态;
    根据所述情绪状态获取所述用户可能意图的置信度。
  5. 根据权利要求1所述的方法,其特征在于,根据所述置信度从所述用户可能意图中确定用户真实意图包括:
    将所述置信度与所述用户可能意图对应的置信度区间进行匹配;
    将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图。
  6. 根据权利要求1所述的方法,其特征在于,根据所述用户真实意图执行对应的动作包括:
    若所述用户真实意图的执行条件成立,则执行所述用户真实意图对应的动作;
    若所述用户真实意图的执行条件不成立,终止对所述用户真实意图对应的动作的执行,并且提示用户;
    若所述用户真实意图的执行条件不确定,执行与所述用户真实意图相类似的动作。
  7. 一种车载语音指令识别装置,其特征在于,包括:
    指令获取模块,用于获取用户输入的语音指令;
    基本信息确定模块,用于根据预先训练的深层神经网络DNN模型确定用户的基本信息;
    意图识别模块,用于根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
    置信度确定模块,用于根据所述DNN模型确定用户可能意图的置信度;
    意图确定模块,用于根据所述置信度从所述用户可能意图中确定用户真实意图;
    动作执行模块,用于根据所述用户真实意图执行对应的动作。
  8. 根据权利要求7所述的装置,其特征在于,所述基本信息确定模块包括:
    特征提取单元,用于从所述语音指令中提取语音特征参数,其中,所述语音特征参数包括下述至少一项:过零率、短时能量、倒谱系数以及基频;
    DNN识别单元,用于将所述语音特征参数、位置以及时间作为所述DNN的输入参数,并根据所述DNN的输出参数确定用户的基本信息,其中,所述基本信息包括下述至少一项:用户的性别、年龄、籍贯以及职业。
  9. 根据权利要求7所述的装置,其特征在于,所述意图识别模块包括:
    第一意图识别单元,用于获取用户输入所述语音指令之前预定长度时间段内出现过的页面,并根据所述预定长度时间段内出现过的页面、各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图;或者
    第二意图识别单元,用于获取用户输入所述语音指令之前预定数量的出现过的页面,并根据所述预定数量的出现过的页面,各个页面的停留时间以及所述语音指令中关键的识别语料判断用户可能意图。
  10. 根据权利要求7所述的装置,其特征在于,所述置信度确定模块包括:
    情绪评估单元,用于将所述语音指令的语音特征参数作为输入参数,利用所述DNN模型评估用户输入所述语音指令时的情绪状态;
    置信度获取单元,用于根据所述情绪状态获取所述用户可能意图的置信度。
  11. 根据权利要求7所述的装置,其特征在于,所述意图确定模块包括:
    匹配单元,用于将所述置信度与所述用户可能意图对应的置信度区间进行匹配;
    真实意图获取单元,用于将与所述置信度匹配程度最高的置信度区间对应的用户可能意图作为用户真实意图。
  12. 根据权利要求7所述的装置,其特征在于,所述动作执行模块包括:
    第一动作执行单元,用于当所述用户真实意图的执行条件成立之时,执行所述用户真实意图对应的动作;
    第二动作执行单元,用于当所述用户真实意图的执行条件不成立之时,终止对所述用户真实意图对应的动作的执行,并且提示用户;
    第三动作执行单元,用于当所述用户真实意图的执行条件不确定之时,执行与所述用户真实意图相类似的动作。
  13. 一个或多个包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种车载语音指令识别方法,其特征在于, 所述方法包括以下步骤:
    获取用户输入的语音指令;
    根据预先训练的深层神经网络DNN模型确定用户的基本信息;
    根据所述用户的基本信息对语音指令进行内容识别,并根据识别的内容以及用户输入所述语音指令的场景页面上下文确定至少一个用户可能意图;
    根据所述DNN模型确定用户可能意图的置信度;
    根据所述置信度从所述用户可能意图中确定用户真实意图;
    根据所述用户真实意图执行对应的动作。
PCT/CN2015/095269 2015-07-02 2015-11-23 车载语音指令识别方法、装置和存储介质 WO2017000489A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US15/738,946 US10446150B2 (en) 2015-07-02 2015-11-23 In-vehicle voice command recognition method and apparatus, and storage medium
JP2017530131A JP6458149B2 (ja) 2015-07-02 2015-11-23 車載音声命令の認識方法、装置及び記憶媒体
KR1020177014756A KR101955958B1 (ko) 2015-07-02 2015-11-23 차량용 음성 명령어 인식 방법, 장치 및 저장 매체
EP15897016.0A EP3319081A4 (en) 2015-07-02 2015-11-23 On-board voice command identification method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510382215.9 2015-07-02
CN201510382215.9A CN105070288B (zh) 2015-07-02 2015-07-02 车载语音指令识别方法和装置

Publications (1)

Publication Number Publication Date
WO2017000489A1 true WO2017000489A1 (zh) 2017-01-05

Family

ID=54499641

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095269 WO2017000489A1 (zh) 2015-07-02 2015-11-23 车载语音指令识别方法、装置和存储介质

Country Status (6)

Country Link
US (1) US10446150B2 (zh)
EP (1) EP3319081A4 (zh)
JP (1) JP6458149B2 (zh)
KR (1) KR101955958B1 (zh)
CN (1) CN105070288B (zh)
WO (1) WO2017000489A1 (zh)

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105070288B (zh) 2015-07-02 2018-08-07 百度在线网络技术(北京)有限公司 车载语音指令识别方法和装置
CN105376416A (zh) * 2015-12-04 2016-03-02 广东小天才科技有限公司 一种通话终端的控制方法和装置
CN106910513A (zh) * 2015-12-22 2017-06-30 微软技术许可有限责任公司 情绪智能聊天引擎
CN105529030B (zh) * 2015-12-29 2020-03-03 百度在线网络技术(北京)有限公司 语音识别处理方法和装置
CN106940998B (zh) * 2015-12-31 2021-04-16 阿里巴巴集团控股有限公司 一种设定操作的执行方法及装置
CN105931642B (zh) * 2016-05-31 2020-11-10 北京京东尚科信息技术有限公司 语音识别方法、设备及系统
CN106228989A (zh) * 2016-08-05 2016-12-14 易晓阳 一种语音交互识别控制方法
CN106601231A (zh) * 2016-12-22 2017-04-26 深圳市元征科技股份有限公司 车辆控制方法和装置
US11100384B2 (en) 2017-02-14 2021-08-24 Microsoft Technology Licensing, Llc Intelligent device user interactions
US11010601B2 (en) 2017-02-14 2021-05-18 Microsoft Technology Licensing, Llc Intelligent assistant device communicating non-verbal cues
US10467510B2 (en) 2017-02-14 2019-11-05 Microsoft Technology Licensing, Llc Intelligent assistant
CN107424607B (zh) * 2017-07-04 2023-06-06 珠海格力电器股份有限公司 语音控制模式切换方法、装置及具有该装置的设备
CN107316643B (zh) * 2017-07-04 2021-08-17 科大讯飞股份有限公司 语音交互方法及装置
CN107464115A (zh) * 2017-07-20 2017-12-12 北京小米移动软件有限公司 个人特征信息验证方法及装置
CN107507621B (zh) * 2017-07-28 2021-06-22 维沃移动通信有限公司 一种噪声抑制方法及移动终端
CN107590123B (zh) * 2017-08-07 2022-07-05 大众问问(北京)信息科技有限公司 车载中地点上下文指代消解方法及装置
CN107945796B (zh) * 2017-11-13 2021-05-25 百度在线网络技术(北京)有限公司 语音识别方法、装置、设备及计算机可读介质
CN108564374A (zh) * 2018-04-12 2018-09-21 出门问问信息科技有限公司 支付认证方法、装置、设备及存储介质
CN108648752A (zh) * 2018-04-17 2018-10-12 重庆物奇科技有限公司 一种基于云处理的智能语音控制系统及其控制方法
CN110390938A (zh) * 2018-04-20 2019-10-29 比亚迪股份有限公司 基于声纹的语音处理方法、装置和终端设备
CN110019740B (zh) * 2018-05-23 2021-10-01 京东方科技集团股份有限公司 车载终端的交互方法、车载终端、服务器和存储介质
CN109263649B (zh) * 2018-08-21 2021-09-17 北京汽车股份有限公司 车辆及其自动驾驶模式下的物体识别方法和物体识别系统
CN110875038A (zh) * 2018-09-03 2020-03-10 蔚来汽车有限公司 意图行为关系的定义方法及意图转换为行为的执行方法
KR20200042627A (ko) 2018-10-16 2020-04-24 삼성전자주식회사 전자 장치 및 그 제어 방법
CN109618204B (zh) * 2018-12-12 2021-04-23 百度在线网络技术(北京)有限公司 多媒体资源播放方法和装置
CN111508482A (zh) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 语义理解及语音交互方法、装置、设备及存储介质
KR102041617B1 (ko) * 2019-03-07 2019-11-27 주식회사 다이얼로그디자인에이전시 인공 지능의 다양한 스타일의 응답 제공 방법 및 장치
CN109948537A (zh) * 2019-03-19 2019-06-28 苏州宏裕千智能设备科技有限公司 基于用户意图识别的车载设备操控方法及其系统
CN113460070B (zh) * 2019-03-21 2022-12-16 百度在线网络技术(北京)有限公司 车辆控制方法和装置
KR102017229B1 (ko) * 2019-04-15 2019-09-02 미디어젠(주) 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템
CN110276072B (zh) * 2019-06-10 2021-07-23 湖北亿咖通科技有限公司 电子设备、存储介质及基于神经网络的语义意图识别方法
CN110400563A (zh) * 2019-07-18 2019-11-01 平安科技(深圳)有限公司 车载语音指令识别方法、装置、计算机设备及存储介质
US11568239B2 (en) * 2019-08-13 2023-01-31 Lg Electronics Inc. Artificial intelligence server and method for providing information to user
CN110534093A (zh) * 2019-08-26 2019-12-03 河北微幼趣教育科技有限公司 对幼儿语音识别的请假方法、服务器、客户端
CN110534098A (zh) * 2019-10-09 2019-12-03 国家电网有限公司客户服务中心 一种年龄增强的语音识别增强方法和装置
CN110648654A (zh) * 2019-10-09 2020-01-03 国家电网有限公司客户服务中心 一种引入语言向量的语音识别增强方法和装置
CN110853621B (zh) * 2019-10-09 2024-02-13 科大讯飞股份有限公司 语音顺滑方法、装置、电子设备及计算机存储介质
CN110795532A (zh) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 一种语音信息的处理方法、装置、智能终端以及存储介质
US11676586B2 (en) * 2019-12-10 2023-06-13 Rovi Guides, Inc. Systems and methods for providing voice command recommendations
CN111081225B (zh) * 2019-12-31 2022-04-01 思必驰科技股份有限公司 技能语音唤醒方法及装置
CN111261196A (zh) * 2020-01-17 2020-06-09 厦门快商通科技股份有限公司 一种年龄预估方法和装置以及设备
CN111210821A (zh) * 2020-02-07 2020-05-29 普强时代(珠海横琴)信息技术有限公司 一种基于互联网应用的智能语音识别系统
US11722324B2 (en) * 2020-03-11 2023-08-08 Pricewaterhousecoopers Llp Secure and accountable execution of robotic process automation
CN111737544A (zh) * 2020-05-13 2020-10-02 北京三快在线科技有限公司 搜索意图识别方法、装置、电子设备和存储介质
CN111767021A (zh) * 2020-06-28 2020-10-13 广州小鹏车联网科技有限公司 语音交互方法、车辆、服务器、系统和存储介质
KR102491119B1 (ko) * 2020-09-17 2023-01-25 주식회사 인텔로이드 차량용 음성 인식 장치, 이를 이용한 차량 문제상황 처리 방법 및 컴퓨터 프로그램
CN112489639A (zh) * 2020-11-26 2021-03-12 北京百度网讯科技有限公司 音频信号处理方法及装置、系统、电子设备、可读介质
CN112466280B (zh) * 2020-12-01 2021-12-24 北京百度网讯科技有限公司 语音交互方法、装置、电子设备和可读存储介质
DE102021129535A1 (de) * 2021-11-12 2023-05-17 Ford Global Technologies, Llc System und Verfahren zum Steuern von autonom steuerbaren Fahrzeugfunktionen eines mit Partnersubjekten kooperierenden autonomen Fahrzeugs, Computerprogrammprodukt, computerlesbarer Datenträger und Fahrzeug
CN114120972B (zh) * 2022-01-28 2022-04-12 科大讯飞华南有限公司 一种基于场景化的语音智能识别方法及系统
CN115056746A (zh) * 2022-06-10 2022-09-16 浙江吉利控股集团有限公司 应用于车辆的用户意图识别方法、装置、设备
CN115294976A (zh) * 2022-06-23 2022-11-04 中国第一汽车股份有限公司 一种基于车载语音场景的纠错交互方法、系统及其车辆

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149922A (zh) * 2006-09-21 2008-03-26 株式会社东芝 语音识别装置和语音识别方法
CN102411931A (zh) * 2010-09-15 2012-04-11 微软公司 用于大词汇量连续语音识别的深度信任网络
CN103024530A (zh) * 2012-12-18 2013-04-03 天津三星电子有限公司 智能电视语音应答系统及方法
US20140163977A1 (en) * 2012-12-12 2014-06-12 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
US20140222436A1 (en) * 2013-02-07 2014-08-07 Apple Inc. Voice trigger for a digital assistant
CN104021373A (zh) * 2014-05-27 2014-09-03 江苏大学 一种半监督语音特征可变因素分解方法
US20140257803A1 (en) * 2013-03-06 2014-09-11 Microsoft Corporation Conservatively adapting a deep neural network in a recognition system
CN104751842A (zh) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 深度神经网络的优化方法及系统
CN105070288A (zh) * 2015-07-02 2015-11-18 百度在线网络技术(北京)有限公司 车载语音指令识别方法和装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05157311A (ja) * 1991-12-03 1993-06-22 Matsushita Electric Ind Co Ltd 空調制御装置
KR100775006B1 (ko) * 2005-11-30 2007-11-08 한국정보통신대학교 산학협력단 상황인지 기반의 이동 서비스를 제공하는 단말 장치 및 그방법과, 그 단말 장치와 협력하여 옥내 이동 서비스를제공하는 서버 시스템
KR100764174B1 (ko) * 2006-03-03 2007-10-08 삼성전자주식회사 음성 대화 서비스 장치 및 방법
WO2011059997A1 (en) * 2009-11-10 2011-05-19 Voicebox Technologies, Inc. System and method for providing a natural language content dedication service
JP2016508271A (ja) * 2013-01-04 2016-03-17 コピン コーポレーション 制御可能なヘッドセットコンピュータディスプレイ
US9666188B2 (en) * 2013-10-29 2017-05-30 Nuance Communications, Inc. System and method of performing automatic speech recognition using local private data
DE112014005354T5 (de) * 2013-11-25 2016-08-04 Mitsubishi Electric Corporation Dialog-management-system und dialog-management-verfahren
US9338493B2 (en) * 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
EP3335133A4 (en) * 2015-10-27 2018-07-04 Beijing Didi Infinity Technology and Development Co., Ltd. Systems and methods for delivering a message
US20170357521A1 (en) * 2016-06-13 2017-12-14 Microsoft Technology Licensing, Llc Virtual keyboard with intent-based, dynamically generated task icons
US20180046470A1 (en) * 2016-08-11 2018-02-15 Google Inc. Methods, systems, and media for presenting a user interface customized for a predicted user activity
US10176808B1 (en) * 2017-06-20 2019-01-08 Microsoft Technology Licensing, Llc Utilizing spoken cues to influence response rendering for virtual assistants

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149922A (zh) * 2006-09-21 2008-03-26 株式会社东芝 语音识别装置和语音识别方法
CN102411931A (zh) * 2010-09-15 2012-04-11 微软公司 用于大词汇量连续语音识别的深度信任网络
US20140163977A1 (en) * 2012-12-12 2014-06-12 Amazon Technologies, Inc. Speech model retrieval in distributed speech recognition systems
CN103024530A (zh) * 2012-12-18 2013-04-03 天津三星电子有限公司 智能电视语音应答系统及方法
US20140222436A1 (en) * 2013-02-07 2014-08-07 Apple Inc. Voice trigger for a digital assistant
US20140257803A1 (en) * 2013-03-06 2014-09-11 Microsoft Corporation Conservatively adapting a deep neural network in a recognition system
CN104751842A (zh) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 深度神经网络的优化方法及系统
CN104021373A (zh) * 2014-05-27 2014-09-03 江苏大学 一种半监督语音特征可变因素分解方法
CN105070288A (zh) * 2015-07-02 2015-11-18 百度在线网络技术(北京)有限公司 车载语音指令识别方法和装置

Also Published As

Publication number Publication date
KR20170078788A (ko) 2017-07-07
JP6458149B2 (ja) 2019-01-23
CN105070288A (zh) 2015-11-18
EP3319081A4 (en) 2018-07-04
KR101955958B1 (ko) 2019-03-08
JP2018503857A (ja) 2018-02-08
EP3319081A1 (en) 2018-05-09
US10446150B2 (en) 2019-10-15
US20180190283A1 (en) 2018-07-05
CN105070288B (zh) 2018-08-07

Similar Documents

Publication Publication Date Title
WO2017000489A1 (zh) 车载语音指令识别方法、装置和存储介质
CN109086329B (zh) 基于话题关键词引导的进行多轮对话方法及装置
DE112020004504T5 (de) Kontoverbindung mit Gerät
CN109920410B (zh) 用于基于车辆的环境确定推荐的可靠性的装置和方法
CN110085261A (zh) 一种发音纠正方法、装置、设备以及计算机可读存储介质
CN105096941A (zh) 语音识别方法以及装置
CN108255934A (zh) 一种语音控制方法及装置
CN109741734B (zh) 一种语音评测方法、装置和可读介质
CN110222841A (zh) 基于间距损失函数的神经网络训练方法和装置
CN110490592A (zh) 基于人脸识别的车内消费支付方法与云端服务器
US20220130389A1 (en) Contextual content for voice user interfaces
EP4285358A1 (en) Instantaneous learning in text-to-speech during dialog
CN116226344A (zh) 对话生成方法、对话生成装置和存储介质
US11508372B1 (en) Natural language input routing
US11996081B2 (en) Visual responses to user inputs
CN111859008B (zh) 一种推荐音乐的方法及终端
CN110297617B (zh) 一种主动对话的发起方法和装置
CN110232928B (zh) 文本无关说话人验证方法和装置
CN113724693B (zh) 语音判别方法、装置、电子设备及存储介质
Giri et al. Enhancing Safety in Vehicles using Emotion Recognition with Artificial Intelligence
CN113573136A (zh) 视频处理方法、装置、计算机设备和存储介质
CN110516043A (zh) 用于问答系统的答案生成方法和装置
US11908463B1 (en) Multi-session context
US20230178071A1 (en) Method for determining a vehicle domain and a speech recognition system for a vehicle
Khan et al. Robust Feature Extraction Techniques in Speech Recognition: A Comparative Analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15897016

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20177014756

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017530131

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2015897016

Country of ref document: EP