WO2023116523A1 - Voice interaction method and apparatus, server, and readable storage medium - Google Patents

Voice interaction method and apparatus, server, and readable storage medium Download PDF

Info

Publication number
WO2023116523A1
WO2023116523A1 PCT/CN2022/138930 CN2022138930W WO2023116523A1 WO 2023116523 A1 WO2023116523 A1 WO 2023116523A1 CN 2022138930 W CN2022138930 W CN 2022138930W WO 2023116523 A1 WO2023116523 A1 WO 2023116523A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
voice
intent
intention
accuracy
Prior art date
Application number
PCT/CN2022/138930
Other languages
French (fr)
Chinese (zh)
Inventor
王亭玉
张天宇
宁洪珂
潘晓彤
赵恒艺
赵群
樊骏锋
Original Assignee
广州小鹏汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州小鹏汽车科技有限公司 filed Critical 广州小鹏汽车科技有限公司
Publication of WO2023116523A1 publication Critical patent/WO2023116523A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2540/00Input parameters relating to occupants
    • B60W2540/21Voice

Definitions

  • the present application relates to the field of voice technology, in particular to a voice interaction method and its device, server and readable storage medium.
  • the present application provides a voice interaction method and its device, server and readable storage medium.
  • the present application provides a voice interaction method.
  • the voice interaction method includes: performing voice recognition on the voice request adjusted by the vehicle preset function to obtain the text to be recognized, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; performing intent recognition on the text; using an accuracy recognition model to perform accuracy recognition on the text to be recognized; determining the target intent corresponding to the voice request according to the result of the intent recognition, and determining the corresponding target intent of the voice request according to the result of the accuracy recognition adjust the accuracy value of the target scale; modify the default value according to the target intent and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intent in the preset voice request; combine the target intent and the modified The latter default values are fused to generate control instructions to control corresponding vehicle components.
  • the voice interaction method of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the precision value of the target scale, and then modify the default value, so that the fusion voice
  • the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
  • the voice interaction method includes: obtaining the intention recognition model by training intention training data, and the intention training data is related to vehicle components and adjustable ranges of the vehicle components.
  • the voice interaction method of the present application can obtain an intention recognition model through training the intention training data, and perform intention recognition according to the intention recognition model, so as to accurately recognize the intention of the user instruction.
  • the voice interaction method includes: obtaining the precision recognition model by training the precision training data, the precision training data and the vehicle parts, the adjustable range of the vehicle parts and the scale adjustment accuracy of the vehicle parts range dependent.
  • the scale adjustment precision corresponding to the voice request can be determined.
  • the voice interaction method includes: determining the control range and non-control range of the vehicle components.
  • the function that can be adjusted by the scale of the vehicle component is confirmed, so as to determine the control range of the vehicle component, that is, the control range that can be scaled by voice interaction.
  • the voice interaction method includes: determining a default adjustment range of each of the vehicle components.
  • the voice interaction method of the present application can determine the default adjustment range of each vehicle component, thereby laying a foundation for realizing precise adjustment of vehicle components.
  • the voice interaction method includes: determining the adjustable range of the vehicle component; and correcting the intention of the preset voice request according to the adjustable range of the vehicle component.
  • the voice interaction method of the present application can correct the intention of the preset voice request according to the adjustable range of the vehicle component after determining the adjustable range of the vehicle component, so as to achieve the purpose of real precise adjustment in user instructions.
  • the voice interaction method includes: mapping the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values.
  • the voice interaction method of the present application can map the control range and adjustable range to preset intentions and corresponding preset scale adjustment accuracy values, so as to achieve precise adjustment of vehicle component accuracy.
  • the voice interaction method includes: establishing an intent-default value mapping table according to the preset intent and the default adjustment range.
  • the present application establishes a mapping table of intentions and default values, so that the intentions of the voice request can be in one-to-one correspondence with the default values, which facilitates subsequent modification of the default values.
  • the modifying the default value according to the target intent and the target scale adjustment accuracy value includes: determining the default value according to the target intent and the intent-default value mapping table; adjusting the accuracy value modification according to the target scale The default value.
  • the present application can determine the default value according to the target intention and the intention-default value mapping table, thereby modifying the default value according to the target scale adjustment accuracy value, so as to achieve the effect of correcting the user's intention to precisely adjust the scale of the vehicle parts.
  • the mapping of the control range and the adjustable range to a preset intention and a corresponding preset scale adjustment accuracy value includes: mapping each of the adjustable ranges in the control range to one of the preset As for the diagram, each preset intention corresponds to a plurality of preset scale adjustment accuracy values.
  • the voice requests for different adjustment scales of the same vehicle component all correspond to the same preset intention, thereby laying the foundation for subsequent identification of the scale that the user intends to adjust.
  • the mapping of the control range and the adjustable range to the preset intent and the corresponding preset scale adjustment accuracy value includes: setting the simplified words as slots, and setting the preset recognition text corresponding to the vehicle parts Perform slot extraction to obtain repeated fields; perform repeated statistics on the slot values of repeated fields to obtain the repeated number; map the repeated number to the preset scale adjustment corresponding to the preset intention according to the adjustable range of the simplified word precision value.
  • the repeated statistics of the slot values of the extracted repeated fields can be repeated to obtain the repeated number, and the repeated number can be mapped to the preset intention and the preset scale adjustment accuracy value, so as to realize the accurate adjustment of the vehicle parts required by the user according to the simplified words scale.
  • the determining the target intention corresponding to the voice request according to the result of the intention recognition includes: obtaining the intention discrimination probability corresponding to each preset intention from the result of the recognition of the intention; One of the preset intentions whose intention discrimination probability is greater than the first probability threshold is determined as the target intention corresponding to the voice request.
  • the preset intentions include: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map zoom out, screen brighter, screen darker, screen slide up, screen Down, Gauges brighter, Gauges dimmed, Ambient lights up, Ambient lights dimmed, Seat forward, Seat back, Seat up, Seat down, Seat back forward, Seat back back, Car at least one of window up and window down.
  • the determining the target scale adjustment accuracy value corresponding to the voice request according to the accuracy identification result includes: obtaining the accuracy discrimination probability of each preset scale adjustment accuracy value corresponding to the accuracy identification result; One of the preset scale adjustment accuracy values greater than the second probability threshold is determined as a target scale adjustment accuracy value corresponding to the voice request.
  • the voice interaction method of the present application can obtain the precision discrimination probability corresponding to each preset scale adjustment precision value of the precision recognition result, and determine the preset scale adjustment precision value whose precision discrimination probability is greater than the second probability threshold value as the target scale adjustment precision value, This allows for precise scale adjustments.
  • the present application also provides a voice interaction device.
  • the voice interaction device includes a voice recognition module, an intent recognition module, an accuracy recognition module, a determination module, a modification module and an instruction generation module.
  • the speech recognition module is used to perform speech recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts;
  • the accuracy recognition module is used to use the precision recognition model to perform precision recognition on the text to be recognized;
  • the determination module is used to determine the target according to the result of the intent recognition The target intent corresponding to the voice request, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of the accuracy recognition;
  • the modification module is used to modify the default according to the target intent and the target scale adjustment accuracy value value, the default value is the adjustment value corresponding to the target intention in the preset voice request;
  • the instruction generation module is used to fuse the target intention and the modified default value to generate a control instruction to control the corresponding Vehicle parts.
  • the voice interaction device of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
  • the application also provides a server.
  • the server includes a processor and a memory, and a computer program is stored in the memory.
  • the computer program is executed by the processor, the voice interaction method described in any one of the above-mentioned implementation manners is realized.
  • the vehicle of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
  • the present application also provides a non-volatile computer-readable storage medium containing the computer program.
  • the computer program is executed by one or more processors, the voice interaction method described in any one of the above implementation manners is realized.
  • the computer-readable storage medium of the present application can perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
  • Fig. 1 is one of the schematic flow charts of the voice interaction method of the present application
  • FIG. 2 is one of the structural schematic diagrams of the voice interaction device of the present application.
  • FIG. 3 is the second schematic flow diagram of the voice interaction method of the present application.
  • Fig. 4 is the second structural diagram of the voice interaction device of the present application.
  • Fig. 5 is the third schematic flow diagram of the voice interaction method of the present application.
  • FIG. 6 is the fourth schematic flow diagram of the voice interaction method of the present application.
  • Fig. 7 is the fifth schematic flow diagram of the voice interaction method of the present application.
  • FIG. 8 is the sixth schematic flow diagram of the voice interaction method of the present application.
  • Fig. 9 is the third structural diagram of the voice interaction device of the present application.
  • FIG. 10 is the seventh schematic flow diagram of the voice interaction method of the present application.
  • Fig. 11 is the fourth structural schematic diagram of the voice interaction device of the present application.
  • FIG. 12 is the eighth schematic flow diagram of the voice interaction method of the present application.
  • Fig. 13 is one of the structural schematic diagrams of the first determination module in the voice interaction device of the present application.
  • FIG. 14 is the ninth schematic flow diagram of the voice interaction method of the present application.
  • Fig. 15 is the second structural schematic diagram of the first determination module in the voice interaction device of the present application.
  • FIG. 16 is the tenth schematic flow diagram of the voice interaction method of the present application.
  • Fig. 17 is a schematic structural diagram of the modification module in the voice interaction device of the present application.
  • Fig. 18 is a schematic structural diagram of the server of the present application.
  • Fig. 19 is a schematic structural diagram of a computer-readable storage medium of the present application.
  • the voice interaction method includes:
  • the preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts
  • the default value is the adjustment value corresponding to the target intent in the preset voice request
  • the voice interaction device 10 includes: a voice recognition module 11 , an intention recognition module 12 , an accuracy recognition module 13 , a first determination module 14 , a modification module 15 and an instruction generation module 16 .
  • Step 01 can be realized by the speech recognition module 11
  • step 02 can be realized by the intent recognition module 12
  • step 03 can be realized by the accuracy recognition module 13
  • step 04 can be realized by the first determination module 14
  • step 05 can be realized by the modification module 15
  • Step 06 can be realized by the instruction generation module 16 . That is to say, the speech recognition module 11 is used to carry out speech recognition to the voice request of the vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts;
  • the accuracy identification module 13 is used to identify the accuracy of the text to be recognized by using the accuracy identification model;
  • the first determination module 14 is used to determine the target intent corresponding to the voice request according to the result of the intent recognition, and Determine the target scale adjustment accuracy value corresponding to the voice request according to the result of the accuracy recognition;
  • the modification module 15 is used to modify the default value according to the target intent and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intent in the preset voice request;
  • instruction The generation module 16 is used to fuse the target intent and the modified default value to generate control instructions to control corresponding vehicle components.
  • the voice request for vehicle preset function adjustment can be, for example, "the screen is bright”, “the volume is louder”, “the screen is brighter”, “the air volume of the air conditioner is louder”, “rear behind the seat”, that is, Voice requests with shortened words.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts, wherein the vehicle parts may refer to components such as mechanical knobs or buttons, which are vehicle parts that can adjust the scale.
  • voice recognition is performed through voice recognition technology, and the text to be recognized is obtained for subsequent processing. , get the text to be recognized "the screen is bright bright”.
  • the text to be recognized can determine the user's intention and precision through intent recognition and precision recognition.
  • the target intention and the target scale adjustment precision value corresponding to the voice request are determined according to the result of the intent recognition. For example, according to the result of the intention recognition of the voice request "bright screen", it is determined that the corresponding target intent is to brighten the display brightness of the in-vehicle screen, and the target scale adjustment accuracy value corresponding to the voice request "bright screen” is 3, indicating that the Brighten 3 levels.
  • the default value is the adjustment value corresponding to the target intention in the preset voice request confirmed according to the original logic.
  • the preset voice request may refer to user voice requests such as "volume up” and "volume down".
  • the adjustment value corresponding to the target intention of "volume increase” is increased once, that is, the default value can correspond to the specific scale value of each adjustment, for example, it corresponds to 3 small scales.
  • the voice request of "increasing the volume” means that the user intends to "increase the volume”, and the user's desired volume is adjusted three times, and the default value of each adjustment is 3.
  • the user actually wants to adjust the volume by 9 different scales. That is to say, under the precision logic of precise recognition of the simplified instructions, the voice request of "very louder” means that the user wants to increase the volume by 9 scales. scale.
  • the voice request of "very louder” means that the user wants to increase the volume by 9 scales. scale.
  • the scale adjustment accuracy value is 7, that is, the volume is adjusted 7 times, and the end user wants to increase the volume Turn up 27 scales.
  • the target scale adjustment precision value identified in the precision logic is used to modify the default value of the traditional logic, so as to realize the precision of vehicle parts under the joint action of traditional logic and precision logic. control.
  • the target intention corresponding to the user instruction "Volume up greatly” is to increase the volume.
  • the voice interaction method of this application will not destroy the original implementation logic of non-precision voice requests at all, and can be realized under the traditional logic framework It provides the function of controlling vehicle components and making precise adjustments according to voice requests with abbreviated words.
  • the target intent and the modified default values are fused to generate control instructions to control the corresponding vehicle components.
  • the voice interaction method and device of the present application can perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
  • voice interaction methods include:
  • the voice interaction device 10 further includes a second determination module 101 .
  • Step 001 can be implemented by the second determining module 101 . It can be understood that the second determination module 101 is used to determine the control range and non-control range of vehicle components.
  • the movement of the seat in various directions can be adjusted by vehicle components.
  • the car door does not have vehicle components such as knobs and buttons to achieve scale adjustment, but is usually only opened and closed through the door handle. Therefore, the seat adjustment belongs to the control range of vehicle components, while the door adjustment belongs to the non-control range of vehicle components.
  • determining the vehicle parts that can be scaled on the vehicle such as: “volume knob”, “screen brightness button”, “air conditioning air volume knob/button”, “seat adjustment knob/button”, etc.
  • determining the control range of vehicle components may include: car audio, screens in the vehicle, vehicle air conditioners, vehicle seats, ambient lights in the car, lights outside the vehicle, or windows, etc.
  • the non-control range of vehicle components can include: doors, rearview mirrors, trunks, etc.
  • voice prompts can be performed when the voice request is directed to the non-control range of the vehicle components.
  • the control range of vehicle components can be determined, that is, the control range that can be scaled through voice interaction.
  • Voice interaction methods include:
  • Step 002 can be implemented by the second determining module 101 . That is, the second determination module 101 is used to determine the default adjustment range of each vehicle component.
  • the voice request simulates the default value of each adjustment of the volume of the vehicle component control volume can be 3, if the corresponding volume adjustment vehicle components have a total of 60 scales, the default adjustment range is 1 ⁇ 20.
  • Voice interaction methods include:
  • the voice interaction device 10 also includes a correction module 102 .
  • Step 003 can be implemented by the second determining module 101
  • step 004 can be implemented by the correcting module 102 . That is, the second determining module 101 is used to determine the adjustable range of the vehicle component; the correcting module 102 is used to correct the intention of the preset voice request according to the adjustable range of the vehicle component.
  • an adjustable range needs to be determined for each vehicle component in the control range.
  • the adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component.
  • the adjustable range can be gear position or range. For example, if the screen brightness button is continuously pressed 5 times in total, and the screen brightness is sequentially adjusted from 1 to 5 gears to the maximum brightness, then the adjustable range of the screen brightness button is 1 to 5 gears. As another example, if the total scale value of the knob for adjusting the seat forward and backward is 90, then the adjustable range of the seat adjustment knob is scale value 1-90.
  • step 003 includes:
  • Step 0131 can be implemented by the second determining module 101 . That is, the second determination module 101 is used to determine the adjustable range of the vehicle components corresponding to each simplified word.
  • the reduced word refers to the simplified word used by the user and can accurately represent the degree of adjustment.
  • redundancies can be used as the reduced word, so that the user only needs to input the simplified voice request when inputting the voice request.
  • the brightness adjustment of the car display screen can be simplified as “the screen is bright”, “the screen is bright”, “the screen is dark” and “the screen is dark”.
  • the volume adjustment of the car audio can be simplified as “the volume is louder”, “Volume is very large”, “Volume is small” and “Volume is small”...
  • the air volume adjustment of the air conditioner can be concisely expressed as “air volume is large”, “air volume is large and large”, “air volume is small” and “air volume is small “...
  • the shortened words can be repeated words that users are used to, such as “brighter”, “darker”, “bigger” and “smaller”. “, “the screen is darker and darker”, “the volume is louder and louder” and “the volume is lower and smaller”, etc., are not specifically limited here.
  • the adjustable range corresponding to the simplified word can be determined according to the adjustable range of the vehicle components. For example, when adjusting the screen in a vehicle, the screen brightness corresponds to an adjustable range of 1 to 5 gears. During speech recognition, each voice request related to the brightness can recognize up to 5 simplified words, and the simplified words can be adjusted. The range can be 1-5. When the voice request includes multiple condensed words, each condensed word can adjust the brightness of the screen by 1 gear.
  • the volume when adjusting the car audio, the volume can be adjusted, that is, the simplified words “larger”, “bigger”, “smaller” or “smaller” can be used for adjustment.
  • the total adjustment range of the volume is 60 scales.
  • the voice request related to the volume can recognize up to 10 simplified words.
  • the adjustable range of the simplified words can be 1 to 10, and each corresponding simplified word can adjust the volume of the car audio by 3 scales. If the voice recognition recognizes voice requests with more than 10 simplified words, you can directly adjust the volume to the maximum or minimum.
  • Voice interaction methods include:
  • the voice interaction device 10 also includes a mapping module 103 .
  • Step 005 can be implemented by the mapping module 103 . That is to say, the mapping module 103 is used to map the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values.
  • control range of vehicle components and the adjustable range of each vehicle component are mapped to the intent system that the intent recognition model can understand.
  • a corresponding preset intention is formulated for the objects in the control range of the vehicle component and the corresponding adjustable range of the vehicle component.
  • system_volume_up represents the default intent "volume up”
  • system_volume_down represents the default intent "volume down”. Therefore, a specific intent mapping system is formulated for the control range of parts and the adjustable range of vehicle parts.
  • the preset scale adjustment accuracy for example, when the voice interaction simulates the operation of vehicle parts, the volume is adjusted by 3 scale values at a time, and the total scale value is 60, then the preset scale adjustment accuracy range can be 1-20.
  • the voice interaction simulates the operation of vehicle parts the seat is adjusted 18 scales each time, the total scale value is 90, and the preset scale adjustment accuracy ranges from 1 to 5.
  • step 005 includes:
  • Step 0051 can be implemented by the mapping module 103 . That is to say, the mapping module 103 is configured to establish an intent-default value mapping table according to the preset intent and the default adjustment range.
  • a mapping table of intent and default value can be established for use in the online process and for downstream operations.
  • the voice request simulates vehicle parts to adjust the volume of the car audio by 3 scales each time (the default value is 3), under the requirement of precision, the preset intentions corresponding to the volume are system_volume_up and system_volume_down respectively.
  • the intent and default value mapping table established by adjusting the volume of the vehicle components of the car audio can be:
  • the intent and default value mapping table established by the vehicle parts of the vehicle air conditioner to adjust the air volume of the air conditioner is:
  • the voice interaction method of this application also includes multiple preset intentions such as screen brightness adjustment, vehicle seat height, front and back, etc.
  • the mapping relationship between multiple preset intentions and default values can be determined according to the above method, and the mapping Relationships are stored in a database for loading and reading by online processes.
  • step 005 includes:
  • Step 0052 can be implemented by the mapping module 103 . That is, the mapping module 103 is used to map the adjustable range of each vehicle component within the control range to a preset intention, and each preset intention corresponds to multiple preset scale adjustment accuracy values.
  • the adjustable range of each vehicle component includes multiple gear positions or multiple scale values.
  • the adjustable range of the air volume adjustment button of the air conditioner includes 5 gears, and the voice request corresponding to the increase of the air volume can include 5 levels from “large air volume” to "large air volume”.
  • the words "big" all map to the same preset intention, that is, to increase the air volume.
  • One preset intent corresponds to multiple preset scale adjustment accuracy values.
  • the preset intent “increase the volume of the car audio" can correspond to 20 preset scale adjustment accuracy values.
  • the adjustable range of the volume knob is 60, that is The total scale for adjusting the volume is 60, and each preset scale adjustment precision value corresponds to 3 scales for adjustment, that is, each adjustment of a preset scale adjustment precision value represents an adjustment of 3 scales.
  • the adjustment accuracy values of the 20 preset scales are as follows: adjust the volume to increase by 3 scales, and the corresponding voice request is "loud volume”; adjust the volume to increase by 6 scales, and the corresponding voice request is “too loud”; adjust the volume to increase There are 9 gears, and the corresponding voice request is "the volume is greatly increased”.
  • different user instructions can be collected corresponding to the same preset intention, such as “the volume is very loud”, and the user can expand with different degrees of freedom, such as “volume Increase”, “Volume rises”, “Volume is high and high”, and the recognized intentions of different expansion words are to increase the volume.
  • step 005 includes:
  • the mapping module 103 includes an extraction unit 1033 , a statistical unit 1034 and a mapping unit 1035 .
  • Step 0053 can be implemented by the extracting unit 1033
  • step 0054 can be implemented by the statistical unit 1034
  • step 0055 can be implemented by the mapping unit 1035 . That is to say, the extraction unit 1033 is used to set the simplified word as a slot, and extracts the slot from the preset recognition text corresponding to the vehicle parts to obtain a repeated field; the statistical unit 1034 is used to perform repeated statistics on the slot value of the repeated field to obtain The number of repetitions; the mapping unit 1035 maps the number of repetitions to the preset scale adjustment precision value corresponding to the preset intention according to the adjustable range of the simplified word.
  • the number of repetitions of the shortened words may represent the number of calibration adjustments to the vehicle components. Therefore, the shortened words can be set as slots.
  • the condensed words of the volume knob can be adjusted in the range of 1 to 10, and the preset scale adjustment accuracy corresponding to the volume knob is in the range of 1 to 20.
  • the preset recognition text corresponding to the voice request is "Volume greatly greatly"
  • perform repeated statistics on the slot values of the extracted repeated fields and map the number of repetitions to the corresponding preset scale to adjust the accuracy.
  • voice interaction methods include:
  • the intention recognition model is obtained through training the intention training data, and the intention training data is related to the vehicle parts and the adjustable range of the vehicle parts.
  • the voice interaction device 10 includes an intention training module 104 .
  • Step 006 can be implemented by the intention training module 104, that is, the intention training module 104 is used to train the intention recognition model through intention training data, and the intention training data is related to the vehicle components and the adjustable range of the vehicle components.
  • the intention recognition model is obtained by training the vehicle parts that can be scaled and the training data corresponding to the adjustable range of the vehicle parts through machine learning, and then performs intention recognition on voice requests to realize accurate recognition of user intentions.
  • the intention training data is related to the vehicle components and the adjustable range of the components that can be scaled.
  • Vehicle parts refer to the parts that can be adjusted on the smart car, such as: “volume knob”, “screen brightness button”, “air conditioning air volume knob/button”, “seat adjustment knob/button” and so on.
  • the adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component.
  • the adjustable range can be gear position or range.
  • Intention training data can collect a certain amount of historical records of user voice requests under the condition of obtaining relevant user permissions, and simply filter the collected user voice requests to obtain voice requests with clear semantics and specific purposes, specifically: in In the screening, the voice requests with obvious semantic ambiguity and some short voice requests containing only modal particles, such as "ah” and "oh", are removed, and the voice requests with clear semantics and specific purposes are left.
  • the voice request is "brighten the screen”
  • the corresponding intent can be marked as “brighten the screen”
  • perform quality inspection on the marked data again to filter out the labeled data that does not meet the preset intent, leaving the labeled data that can be used for training the intent model.
  • the voice request is "open the car door”
  • the corresponding intention of the label is "open the car door”
  • the parts that can be adjusted by the scale are not used to adjust the car door.
  • the voice request can be removed by filtering.
  • the labeled data that can be used for intent model training is used as intent training data and divided into intent training set and intent data set.
  • the division ratio can be set according to requirements, and is not limited here.
  • the intention training set is 80%
  • the intention verification set is 20%.
  • Model training can use models such as BERT, ALBERT, XLNet, and RoBERTa.
  • the established intent recognition model at least part of the data in the intent training set is used to train the intent recognition model, and then at least part of the data in the intent verification set is used to verify the accuracy of the trained intent recognition model.
  • the accuracy of intent verification does not reach the threshold of intent accuracy
  • the accuracy of the model is verified by intent, and the process of training and verification is repeated until the accuracy of intent verification reaches the threshold of intent accuracy, it can be considered that the intent recognition model has reached the standard, and the training of the intent recognition model is completed.
  • each data in the intent training set and intent verification set is only used once. If the intent recognition model fails to reach the training standard after traversing all the data in the intent training set and intent verification set, it can be used again with the user's permission. Collect more voice requests in the case of a situation, so as to screen and label more intent training data to train the intent recognition model, so as to ensure that the intent recognition model can accurately recognize the intent corresponding to the input voice request.
  • the above intent recognition model can be trained offline, and after the offline trained intent recognition model is deployed to the server or vehicle, the server or vehicle can use the intent recognition model to perform intent recognition on the received voice request.
  • Voice interaction methods include:
  • the accuracy recognition model is obtained through the training of the accuracy training data.
  • the accuracy training data is related to the vehicle parts, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the vehicle parts.
  • the speech interaction device 10 includes an accuracy training module 105 .
  • Step 007 can be implemented by the accuracy training module 105 . That is to say, the accuracy training module 105 is used to obtain an accuracy recognition model through training on accuracy training data, and the accuracy training data is related to the vehicle parts, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the vehicle parts.
  • the application uses machine learning to obtain an accuracy recognition model from the training data corresponding to the vehicle parts that can be scaled, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the parts, and then voice requests for accuracy Identification, to achieve accurate identification of user scale adjustment accuracy.
  • the accuracy training data is related to the vehicle parts that can be adjusted by the scale of the vehicle parts and the adjustable range of the parts, which means that the accuracy training data includes all the vehicle parts that can be adjusted by the scale in the vehicle, such as "volume knob ", "Screen Brightness Button”, “Air Conditioner Air Volume Knob/Button”, “Seat Adjustment Knob/Button” etc.
  • the adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component.
  • the adjustable range can be gear position or range
  • the scale adjustment accuracy range can be the scale value of each adjustment.
  • the precision training data can collect a certain amount of historical records of user voice requests under the condition of obtaining relevant user permissions, and simply filter the collected user voice requests to obtain voice requests with clear semantics and specific purposes.
  • the screening remove the obviously semantically unclear voice requests, and some short voice requests that only contain modal particles, such as "ah", "oh”, etc., leaving voice requests with clear semantics and specific purposes.
  • the history of user voice requests acquired during precision training can be the same as the history of user voice requests acquired during intention training, and the step of filtering the collected user voice requests during precision training can be compared with that of intention training.
  • the steps for screening the collected user voice requests are the same.
  • the scale adjustment accuracy value of the corresponding label to adjust the brightness of the screen in the vehicle is 3.
  • an accuracy recognition model is established based on slot extraction. Algorithms that can be used for slot extraction include RNN slot filling, CRF, etc., and the marked data is used as accuracy training data and divided to obtain an accuracy training set and an accuracy data set.
  • the division ratio It can be set according to requirements, and is not limited here. For example, the accuracy training set is 80%, and the accuracy verification set is 20%. Use the data in the precision training set to train the precision recognition model.
  • the accuracy recognition model For the established precision recognition model, at least part of the data in the precision training set is used to train the precision recognition model, and then at least part of the data in the precision verification set is used to verify the accuracy of the trained precision recognition model.
  • the accuracy recognition model is trained again through at least another part of the data of the accuracy training set, and the accuracy recognition after retraining is performed again using another part of the data of the accuracy verification set.
  • the accuracy of the model is verified for accuracy, and the process of training and accuracy verification is repeated until the accuracy of accuracy verification reaches the threshold of accuracy and accuracy, the accuracy identification model can be considered to have reached the standard, and the training of the accuracy identification model is completed.
  • each data in the accuracy training set and accuracy verification set is only used once.
  • the accuracy recognition model traverses all the data in the accuracy training set and accuracy verification set and fails to meet the training standards, it can be used again with the user's permission. Collect more voice information under the circumstances, so as to filter and label more precision training data to train the precision recognition model, so as to ensure that the precision recognition model can accurately recognize the scale adjustment precision corresponding to the input voice request.
  • the precision recognition model can be pre-trained through the precision training data to perform precision recognition on the text to be recognized, thereby identifying the adjustment precision of a certain vehicle component, obtaining the precision recognition result, and finally determining the target scale adjustment precision value.
  • step 04 includes:
  • the first determination module 14 may further include a first acquisition unit 141 and an intention determination unit 142 .
  • Step 041 can be implemented by the first acquiring unit 141
  • step 042 can be implemented by the intention determining unit 142 . That is, the first obtaining unit 141 is used to obtain the intention identification probability corresponding to each preset intention from the result of the intention identification; the intention determination unit 142 is used to determine a preset intention whose intention identification probability is greater than the first probability threshold as a voice request corresponding target intent.
  • the result of intent recognition includes the probability that the text to be recognized matches each preset intent, that is, multiple intent discrimination probabilities can be obtained. If the first probability threshold is 0.9, the result of the intent recognition is that the intention discrimination probability of a certain type of preset intent exceeds 0.9, and the server considers that the current user's voice request is the corresponding type of preset intent as the target intent.
  • the first probability threshold may also be other values.
  • the first probability threshold may be a default value, or may be set according to user needs, and no limitation is set here.
  • the preset intentions of this application may include: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map zoom out, screen brighter, screen darker, screen slide up , screen slides down, gauge brightens, gauge dims, ambient light brightens, ambient light dims, seat forward, seat rearward, seat up, seat down, seat back forward, seat back rearward , at least one of window up and window down.
  • Step 04 also includes:
  • Step 043 can be realized by the intention determination unit 142, that is, the intention determination unit 142 is used to determine that the intention of the voice request is a non-scale adjustment intention when the intention discrimination probability of each preset intention is not greater than the first probability threshold .
  • the discriminant probabilities corresponding to the preset intentions of multiple categories are not greater than the first probability threshold, that is, the probability that the user’s intention recognition result according to the voice request matches the preset intentions of multiple categories is relatively low, lower than
  • the first probability threshold for example, the first probability threshold is 0.9
  • the non-scale adjustment intention refers to the user who does not use the vehicle parts that can be scaled to adjust the preset function of the vehicle Intent, for example, the voice request input by the user is "open the door", because the door cannot be adjusted by the vehicle parts with scales, therefore, the voice request "open the door” is a non-scale adjustment intent.
  • step 04 also includes:
  • the first determination module 14 includes a second acquisition unit 143 and an accuracy determination unit 144 .
  • Step 044 can be implemented by the second acquiring unit 143
  • step 045 can be implemented by the accuracy determining unit 144 . That is to say, the second acquisition unit 143 is used to obtain the accuracy identification probability corresponding to each preset scale adjustment accuracy value of the accuracy identification result; The adjustment accuracy value is determined as the target scale adjustment accuracy value corresponding to the voice request.
  • the accuracy discrimination probability refers to the probability that the accuracy of recognizing the voice request matches the adjustment accuracy value of each preset scale.
  • the second probability threshold may be, for example, 0.7, 0.8, 0.9 or other numerical values, which are not limited here.
  • the accuracy discrimination probability is 1 and the second probability threshold is 0.9, that is, the accuracy discrimination probability is 1 and exceeds the second probability threshold 0.9, then it is determined that the target scale adjustment accuracy value for volume adjustment corresponding to the voice request "Volume is louder is louder" is 5.
  • Step 04 also includes:
  • Step 046 can be implemented by the accuracy determination unit 144 . That is to say, the accuracy determining unit 144 is configured to determine that the accuracy of the speech request is incorrectly recognized when the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold.
  • step 05 includes:
  • the modifying module 15 includes a default value determining unit 151 and a modifying unit 152 .
  • Step 051 can be implemented by the default value determining unit 151
  • step 052 can be implemented by the modifying unit 152 . That is, the default value determination unit 151 is used to determine the default value according to the target intention and the mapping table between the intention and the default value; the modification unit 152 is used to modify the default value according to the target scale adjustment precision value.
  • the default value is determined according to the target intent and the intent-default value mapping table, that is, if the target intent of the user's voice request "Volume up" is to increase the volume, then according to the intent-default value mapping table, the default The value can be 3, that is, when the voice requests to simulate vehicle parts to adjust the volume, adjust 3 scales each time.
  • the scale value corresponding to the user's voice request "Volume is louder” is 9.
  • control instructions are generated according to the target intention and the modified default value.
  • the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized.
  • the present application also provides a server 20 .
  • the server 20 includes a processor 21 and a memory 22.
  • a computer program 221 is stored on the memory 22.
  • the computer program 221 is executed by the processor 21, the voice interaction method described in any one of the above-mentioned embodiments is realized.
  • the server 20 of the present application can perform voice recognition on the voice request for adjusting the preset function of the vehicle to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
  • the present application also provides a non-volatile computer-readable storage medium 30 containing a computer program.
  • the computer program 31 is executed by one or more processors 40, the voice interaction method of any of the above embodiments is realized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts
  • the computer program 31 includes computer program codes.
  • the computer program code may be in source code form, object code form, executable file or some intermediate form, etc.
  • the computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random memory Access memory (RAM, Random Access Memory), and software distribution media, etc.
  • the computer-readable storage medium 30 of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized.
  • the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts.
  • use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice
  • the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present application discloses a voice interaction method and apparatus, a server, and a readable storage medium. The voice interaction method comprises: performing voice recognition on a voice request for adjusting a preset function of a vehicle to obtain text to be recognized, the preset function referring to a function of simulating an operation on a vehicle part for scale adjustment; using an intention recognition model to perform intention recognition on said text; using a precision recognition model to perform precision recognition on said text; determining, according to the intention recognition result and the precision recognition result, a target intention and a target scale adjustment precision value corresponding to the voice request; modifying a default value according to the target intention and the target scale adjustment precision value, the default value being an adjustment value corresponding to the target intention in a preset voice request; and fusing the target intention and the modified default value to generate a control instruction so as to control the corresponding vehicle part. In the present application, the scale of the vehicle part corresponding to the voice request can be accurately adjusted according to the simplified voice request of a user, and user experience is improved.

Description

语音交互方法及其装置、服务器和可读存储介质Voice interaction method and its device, server and readable storage medium
本申请要求于2021年12月24日提交国家知识产权局、申请号为202111593401.9、申请名称为“语音交互方法及其装置、服务器和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111593401.9 and the application name "Voice Interaction Method and Its Device, Server and Readable Storage Medium" submitted to the State Intellectual Property Office on December 24, 2021, the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及语音技术领域,特别涉及一种语音交互方法及其装置、服务器和可读存储介质。The present application relates to the field of voice technology, in particular to a voice interaction method and its device, server and readable storage medium.
背景技术Background technique
目前在智能汽车场景中,存在着语音交互可以实现用户对车辆零部件设备的控制。At present, in the smart car scene, there is a voice interaction that can realize the user's control of the vehicle parts and equipment.
在用户精简指令需求下,目前的技术方案逻辑中,在“音量大大大”的语音请求下,意图识别为“system_volume_up”,意图对应默认值为1个档位(例如对应3个小刻度),则车辆执行“音量增大1个刻度”的命令,这与“音量增大”,“音量大一点”等非精度的语音请求实现逻辑相同,这显然与用户期望的提高三个档位不符。在“音量大大大大大大大”多个大的精简指令下,意图识别为“system_volume_max”,该意图下对应默认值为最高档位(最大刻度),则车辆执行“音量设置为最大刻度”的命令,这显然与用户期望的提高7个档位不符。Under the user's requirement for simplified instructions, in the logic of the current technical solution, under the voice request of "great volume", the intention is recognized as "system_volume_up", and the default value corresponding to the intention is 1 gear (for example, corresponding to 3 small scales), Then the vehicle executes the command "increase the volume by 1 scale", which is the same as the implementation logic of non-precision voice requests such as "increase the volume" and "make the volume louder", which obviously does not match the user's expectation of increasing the three gears. Under multiple large streamlined instructions of "Volume is big, big, big, big", the intention is identified as "system_volume_max", and the corresponding default value under this intention is the highest gear (maximum scale), then the vehicle executes the command of "set the volume to the maximum scale" , which obviously does not match the 7 gears that users expect.
上述两种语音请求在目前的技术方案中均不能实现根据用户下达的精简词语音请求执行精确的控制指令,用户体验不佳。The above two kinds of voice requests cannot execute precise control instructions according to the shortened word voice request issued by the user in the current technical solution, and the user experience is not good.
发明内容Contents of the invention
为解决或部分解决相关技术中存在的问题,本申请提供一种语音交互方法及其装置、服务器和可读存储介质。In order to solve or partly solve the problems existing in related technologies, the present application provides a voice interaction method and its device, server and readable storage medium.
本申请提供一种语音交互方法。语音交互方法包括:对车辆预设功能调节的语音请求进行语音识别得到待识别文本,所述预设功能指模拟对车辆零部件的操作进行刻度调节的功能;利用意图识别模型对所述待识别文本进行意图识别;利用精度识别模型对所述待识别文本进行精度识别;根据所述意图识别的结果确定所述语音请求对应的目标意图,和根据所述精度识别的结果确定所述语音请求对应的目标刻度调节精度值;根据所述目标意图和所述目标刻度调节精度值修改默认值,所述默认值为预设语音请求中所述目标意图对应的调节值;将所述目标意图和修改后的所述默认值融合生成控制指令,以控制对应的车辆零部件。The present application provides a voice interaction method. The voice interaction method includes: performing voice recognition on the voice request adjusted by the vehicle preset function to obtain the text to be recognized, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; performing intent recognition on the text; using an accuracy recognition model to perform accuracy recognition on the text to be recognized; determining the target intent corresponding to the voice request according to the result of the intent recognition, and determining the corresponding target intent of the voice request according to the result of the accuracy recognition adjust the accuracy value of the target scale; modify the default value according to the target intent and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intent in the preset voice request; combine the target intent and the modified The latter default values are fused to generate control instructions to control corresponding vehicle components.
如此,本申请的语音交互方法可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的目标意图和目标刻度调节精度值,然后对默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精准调节与语音请求相对应的车辆零部件的刻度的效果,提升用户体验。In this way, the voice interaction method of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the precision value of the target scale, and then modify the default value, so that the fusion voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
所述语音交互方法包括:通过意图训练数据训练得到所述意图识别模型,所述意图训练数据与车辆零部件和所述车辆零部件的可调节范围相关。The voice interaction method includes: obtaining the intention recognition model by training intention training data, and the intention training data is related to vehicle components and adjustable ranges of the vehicle components.
如此,本申请的语音交互方法可以通过意图训练数据训练得到意图识别模型,根据意图识别模型进行意图识别,可以实现精确识别用户指令的 意图。In this way, the voice interaction method of the present application can obtain an intention recognition model through training the intention training data, and perform intention recognition according to the intention recognition model, so as to accurately recognize the intention of the user instruction.
所述语音交互方法包括:通过精度训练数据训练得到所述精度识别模型,所述精度训练数据与所述车辆零部件、所述车辆零部件的可调节范围和所述车辆零部件的刻度调节精度范围相关。The voice interaction method includes: obtaining the precision recognition model by training the precision training data, the precision training data and the vehicle parts, the adjustable range of the vehicle parts and the scale adjustment accuracy of the vehicle parts range dependent.
如此,根据精度识别模型对待识别文本进行精度识别,可以确定语音请求对应的刻度调节精度。In this way, by performing precision recognition on the text to be recognized according to the precision recognition model, the scale adjustment precision corresponding to the voice request can be determined.
所述语音交互方法包括:确定所述车辆零部件的控制范围及非控制范围。The voice interaction method includes: determining the control range and non-control range of the vehicle components.
如此,确认可通过车辆零部件进行刻度调节的功能,从而确定车辆零部件的控制范围,也即是可通过语音交互进行刻度调节的控制范围。In this way, the function that can be adjusted by the scale of the vehicle component is confirmed, so as to determine the control range of the vehicle component, that is, the control range that can be scaled by voice interaction.
所述语音交互方法包括:确定每个所述车辆零部件的默认调节范围。The voice interaction method includes: determining a default adjustment range of each of the vehicle components.
如此,本申请的语音交互方法可以确定每个车辆零部件的默认调节范围,从而为实现车辆零部件的精度精准调节奠定基础。In this way, the voice interaction method of the present application can determine the default adjustment range of each vehicle component, thereby laying a foundation for realizing precise adjustment of vehicle components.
所述语音交互方法包括:确定所述车辆零部件的可调节范围;根据所述车辆零部件的可调节范围,纠正所述预设语音请求的意图。The voice interaction method includes: determining the adjustable range of the vehicle component; and correcting the intention of the preset voice request according to the adjustable range of the vehicle component.
如此,本申请的语音交互方法可以在确定车辆零部件的可调节范围后,根据车辆零部件的可调节范围纠正预设语音请求的意图,从而达到用户指令中真正的精确调节的目的。In this way, the voice interaction method of the present application can correct the intention of the preset voice request according to the adjustable range of the vehicle component after determining the adjustable range of the vehicle component, so as to achieve the purpose of real precise adjustment in user instructions.
所述语音交互方法包括:将所述控制范围和所述可调节范围映射到预设意图和对应的预设刻度调节精度值。The voice interaction method includes: mapping the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values.
如此,本申请的语音交互方法可以将控制范围和可调节范围映射到预设意图和对应的预设刻度调节精度值,从而可以实现精确调节车辆零部件的精度。In this way, the voice interaction method of the present application can map the control range and adjustable range to preset intentions and corresponding preset scale adjustment accuracy values, so as to achieve precise adjustment of vehicle component accuracy.
所述语音交互方法包括:根据所述预设意图和所述默认调节范围,建立意图与默认值映射表。The voice interaction method includes: establishing an intent-default value mapping table according to the preset intent and the default adjustment range.
如此,本申请通过建立意图和默认值的映射表,可以使得语音请求的意图与默认值一一对应,便于后续修改该默认值。In this way, the present application establishes a mapping table of intentions and default values, so that the intentions of the voice request can be in one-to-one correspondence with the default values, which facilitates subsequent modification of the default values.
所述根据所述目标意图和所述目标刻度调节精度值修改默认值,包括:根据所述目标意图和所述意图与默认值映射表确定所述默认值;根据所述目标刻度调节精度值修改所述默认值。The modifying the default value according to the target intent and the target scale adjustment accuracy value includes: determining the default value according to the target intent and the intent-default value mapping table; adjusting the accuracy value modification according to the target scale The default value.
如此,本申请可以根据目标意图和意图与默认值映射表确定默认值,从而根据目标刻度调节精度值修改该默认值,从而达到纠正用户意图的效果,以精确调节车辆零部件的刻度。In this way, the present application can determine the default value according to the target intention and the intention-default value mapping table, thereby modifying the default value according to the target scale adjustment accuracy value, so as to achieve the effect of correcting the user's intention to precisely adjust the scale of the vehicle parts.
所述将所述控制范围和所述可调节范围映射到预设意图和对应的预设刻度调节精度值,包括:将所述控制范围内每个所述可调节范围,映射到一个所述预设意图,每个所述预设意图对应多个预设刻度调节精度值。The mapping of the control range and the adjustable range to a preset intention and a corresponding preset scale adjustment accuracy value includes: mapping each of the adjustable ranges in the control range to one of the preset As for the diagram, each preset intention corresponds to a plurality of preset scale adjustment accuracy values.
如此,在语音交互过程中,使得对于同一车辆零部件不同调节刻度的语音请求都对应到相同的预设意图,从而为后续识别用户意图对应调节的刻度奠定基础。In this way, during the voice interaction process, the voice requests for different adjustment scales of the same vehicle component all correspond to the same preset intention, thereby laying the foundation for subsequent identification of the scale that the user intends to adjust.
所述将所述控制范围和所述可调节范围映射到预设意图和对应的预设刻度调节精度值,包括:将精简词设置为槽位,对所述车辆零部件对应的预设识别文本进行槽位提取得到重复字段;对重复字段的槽值进行重复统计得到重复数量;根据所述精简词可调节的范围将所述重复数量映射到所述预设意图对应的所述预设刻度调节精度值。The mapping of the control range and the adjustable range to the preset intent and the corresponding preset scale adjustment accuracy value includes: setting the simplified words as slots, and setting the preset recognition text corresponding to the vehicle parts Perform slot extraction to obtain repeated fields; perform repeated statistics on the slot values of repeated fields to obtain the repeated number; map the repeated number to the preset scale adjustment corresponding to the preset intention according to the adjustable range of the simplified word precision value.
如此,可以对抽取出的重复字段的槽值进行重复统计得到重复数量,将其重复数量映射到预设意图及预设刻度调节精度值,从而实现根据精简词精准调节用户需要的车辆零部件的刻度。In this way, the repeated statistics of the slot values of the extracted repeated fields can be repeated to obtain the repeated number, and the repeated number can be mapped to the preset intention and the preset scale adjustment accuracy value, so as to realize the accurate adjustment of the vehicle parts required by the user according to the simplified words scale.
所述预设意图为多个,所述根据所述意图识别的结果确定所述语音请求对应的目标意图,包括:获取所述意图识别的结果对应各个预设意图的意图判别概率;将所述意图判别概率大于第一概率阈值的一个所述预设意图确定为所述语音请求对应的目标意图。There are multiple preset intentions, and the determining the target intention corresponding to the voice request according to the result of the intention recognition includes: obtaining the intention discrimination probability corresponding to each preset intention from the result of the recognition of the intention; One of the preset intentions whose intention discrimination probability is greater than the first probability threshold is determined as the target intention corresponding to the voice request.
如此,可以获取意图识别的结果对应各个预设意图的意图判别概率,将意图判别概率大于第一概率阈值的一个预设意图确定为语音请求对应的目标意图,从而实现准确识别调节车辆零部件的用户意图的需求。In this way, it is possible to obtain the intention discrimination probability corresponding to each preset intention from the result of intention recognition, and determine a preset intention whose intention discrimination probability is greater than the first probability threshold as the target intention corresponding to the voice request, thereby realizing accurate identification and adjustment of vehicle components. User intent requirements.
所述预设意图包括:音量调大、音量调小、风量调大、风量调小、温度调高、温度调低、地图放大、地图缩小、屏幕调亮、屏幕调暗、屏幕上滑、屏幕下滑、仪表调亮、仪表调暗、氛围灯调亮、氛围灯调暗、座椅向前、座椅向后、座椅升高、座椅降低、椅背向前、椅背向后、车窗上升和车窗下降中的至少一种。The preset intentions include: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map zoom out, screen brighter, screen darker, screen slide up, screen Down, Gauges brighter, Gauges dimmed, Ambient lights up, Ambient lights dimmed, Seat forward, Seat back, Seat up, Seat down, Seat back forward, Seat back back, Car at least one of window up and window down.
如此,设置了多种预设意图可以进一步为识别用户的语音交互意图奠定基础,完善可能遇到的语音交互场景。In this way, setting a variety of preset intentions can further lay a foundation for recognizing the user's voice interaction intention and improve possible voice interaction scenarios.
所述根据所述精度识别的结果确定所述语音请求对应的目标刻度调节精度值,包括:获取所述精度识别的结果对应各个预设刻度调节精度值的精度判别概率;将所述精度判别概率大于第二概率阈值的一个所述预设刻度调节精度值,确定为所述语音请求对应的目标刻度调节精度值。The determining the target scale adjustment accuracy value corresponding to the voice request according to the accuracy identification result includes: obtaining the accuracy discrimination probability of each preset scale adjustment accuracy value corresponding to the accuracy identification result; One of the preset scale adjustment accuracy values greater than the second probability threshold is determined as a target scale adjustment accuracy value corresponding to the voice request.
如此,本申请的语音交互方法可以获取精度识别的结果对应各个预设刻度调节精度值的精度判别概率,确定精度判别概率大于第二概率阈值的预设刻度调节精度值为目标刻度调节精度值,从而进行精确的刻度调节。In this way, the voice interaction method of the present application can obtain the precision discrimination probability corresponding to each preset scale adjustment precision value of the precision recognition result, and determine the preset scale adjustment precision value whose precision discrimination probability is greater than the second probability threshold value as the target scale adjustment precision value, This allows for precise scale adjustments.
本申请还提供一种语音交互装置。所述语音交互装置包括语音识别模块、意图识别模块、精度识别模块、确定模块、修改模块和指令生成模块。所述语音识别模块用于对车辆预设功能调节的语音请求进行语音识别得到待识别文本,所述预设功能指模拟对车辆零部件的操作进行刻度调节的功能;所述意图识别模块用于利用意图识别模型对所述待识别文本进行意图识别;所述精度识别模块用于利用精度识别模型对所述待识别文本进行精度识别;所述确定模块用于根据所述意图识别的结果确定所述语音请求对应的目标意图,和根据所述精度识别的结果确定所述语音请求对应的目标刻度调节精度值;所述修改模块用于根据所述目标意图和所述目标刻度调节精度值修改默认值,所述默认值为预设语音请求中所述目标意图对应的调节值;所述指令生成模块用于将所述目标意图和修改后的所述默认值融合生成控制指令,以控制对应的车辆零部件。The present application also provides a voice interaction device. The voice interaction device includes a voice recognition module, an intent recognition module, an accuracy recognition module, a determination module, a modification module and an instruction generation module. The speech recognition module is used to perform speech recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; Use the intent recognition model to perform intent recognition on the text to be recognized; the accuracy recognition module is used to use the precision recognition model to perform precision recognition on the text to be recognized; the determination module is used to determine the target according to the result of the intent recognition The target intent corresponding to the voice request, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of the accuracy recognition; the modification module is used to modify the default according to the target intent and the target scale adjustment accuracy value value, the default value is the adjustment value corresponding to the target intention in the preset voice request; the instruction generation module is used to fuse the target intention and the modified default value to generate a control instruction to control the corresponding Vehicle parts.
如此,本申请的语音交互装置可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的目标意图和目标刻度调节精度值,然后对于默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精准调节与语音请求相对应的车辆零部件的刻度的效果,提升用户体验。In this way, the voice interaction device of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
本申请还提供一种服务器。所述服务器包括处理器和存储器,所述存储器上存储有计算机程序,当所述计算机程序被所述处理器执行时,实现上述任意一项实施方式所述的语音交互方法。The application also provides a server. The server includes a processor and a memory, and a computer program is stored in the memory. When the computer program is executed by the processor, the voice interaction method described in any one of the above-mentioned implementation manners is realized.
如此,本申请的车辆可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的目标意图和 目标刻度调节精度值,然后对于默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精准调节与语音请求相对应的车辆零部件的刻度的效果,提升用户体验。In this way, the vehicle of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
本申请还提供一种包含有计算机程序的非易失性计算机可读存储介质。当所述计算机程序被一个或多个处理器执行时,实现上述任意一项实施方式所述的语音交互方法。The present application also provides a non-volatile computer-readable storage medium containing the computer program. When the computer program is executed by one or more processors, the voice interaction method described in any one of the above implementation manners is realized.
如此,本申请的计算机可读存储介质可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的目标意图和目标刻度调节精度值,然后对于默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精准调节与语音请求相对应的车辆零部件的刻度的效果,提升用户体验。In this way, the computer-readable storage medium of the present application can perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
附图说明Description of drawings
通过结合附图对本申请示例性实施方式进行更详细的描述,本申请的上述以及其它目的、特征和优势将变得更加明显,其中,在本申请示例性实施方式中,相同的参考标号通常代表相同部件。The above and other objects, features and advantages of the present application will become more apparent by describing the exemplary embodiments of the present application in more detail with reference to the accompanying drawings, wherein, in the exemplary embodiments of the present application, the same reference numerals generally represent same parts.
图1是本申请的语音交互方法的流程示意图之一;Fig. 1 is one of the schematic flow charts of the voice interaction method of the present application;
图2是本申请的语音交互装置的结构示意图之一;FIG. 2 is one of the structural schematic diagrams of the voice interaction device of the present application;
图3是本申请的语音交互方法的流程示意图之二;FIG. 3 is the second schematic flow diagram of the voice interaction method of the present application;
图4是本申请的语音交互装置的结构示意图之二;Fig. 4 is the second structural diagram of the voice interaction device of the present application;
图5是本申请的语音交互方法的流程示意图之三;Fig. 5 is the third schematic flow diagram of the voice interaction method of the present application;
图6是本申请的语音交互方法的流程示意图之四;FIG. 6 is the fourth schematic flow diagram of the voice interaction method of the present application;
图7是本申请的语音交互方法的流程示意图之五;Fig. 7 is the fifth schematic flow diagram of the voice interaction method of the present application;
图8是本申请的语音交互方法的流程示意图之六;FIG. 8 is the sixth schematic flow diagram of the voice interaction method of the present application;
图9是本申请的语音交互装置的结构示意图之三;Fig. 9 is the third structural diagram of the voice interaction device of the present application;
图10是本申请的语音交互方法的流程示意图之七;FIG. 10 is the seventh schematic flow diagram of the voice interaction method of the present application;
图11是本申请的语音交互装置的结构示意图之四;Fig. 11 is the fourth structural schematic diagram of the voice interaction device of the present application;
图12是本申请的语音交互方法的流程示意图之八;FIG. 12 is the eighth schematic flow diagram of the voice interaction method of the present application;
图13是本申请的语音交互装置中第一确定模块的结构示意图之一;Fig. 13 is one of the structural schematic diagrams of the first determination module in the voice interaction device of the present application;
图14是本申请的语音交互方法的流程示意图之九;FIG. 14 is the ninth schematic flow diagram of the voice interaction method of the present application;
图15是本申请的语音交互装置中第一确定模块的结构示意图之二;Fig. 15 is the second structural schematic diagram of the first determination module in the voice interaction device of the present application;
图16是本申请的语音交互方法的流程示意图之十;FIG. 16 is the tenth schematic flow diagram of the voice interaction method of the present application;
图17是本申请的语音交互装置中修改模块的结构示意图;Fig. 17 is a schematic structural diagram of the modification module in the voice interaction device of the present application;
图18是本申请的服务器的结构示意图;Fig. 18 is a schematic structural diagram of the server of the present application;
图19是本申请的计算机可读存储介质的结构示意图。Fig. 19 is a schematic structural diagram of a computer-readable storage medium of the present application.
具体实施方式Detailed ways
下面详细描述本申请,本申请的示例在附图中示出,其中,相同或类似的标号自始至终表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。The present application is described in detail below, and examples of the present application are shown in the accompanying drawings, wherein the same or similar reference numerals represent the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, are only for explaining the present application, and should not be construed as limiting the present application.
请参阅图1,本申请提供了一种语音交互方法。该语音交互方法包括:Please refer to FIG. 1 , the present application provides a voice interaction method. The voice interaction method includes:
01:对车辆预设功能调节的语音请求进行语音识别得到待识别文本, 预设功能指模拟对车辆零部件的操作进行刻度调节的功能;01: Perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts;
02:利用意图识别模型对待识别文本进行意图识别;02: Use the intent recognition model to recognize the intent of the text to be recognized;
03:利用精度识别模型对待识别文本进行精度识别;03: Use the precision recognition model to perform precision recognition on the text to be recognized;
04:根据意图识别的结果确定语音请求对应的目标意图,和根据精度识别的结果确定语音请求对应的目标刻度调节精度值;04: Determine the target intent corresponding to the voice request according to the result of intent recognition, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of precision recognition;
05:根据目标意图和目标刻度调节精度值修改默认值,默认值为预设语音请求中目标意图对应的调节值;05: Modify the default value according to the target intent and target scale adjustment accuracy value, the default value is the adjustment value corresponding to the target intent in the preset voice request;
06:将目标意图和修改后的默认值融合生成控制指令,以控制对应的车辆零部件。06: Fuse the target intent with the modified default value to generate control instructions to control the corresponding vehicle components.
请参阅图2,本申请还提供一种语音交互装置10。语音交互装置10包括:语音识别模块11、意图识别模块12、精度识别模块13、第一确定模块14、修改模块15和指令生成模块16。Referring to FIG. 2 , the present application also provides a voice interaction device 10 . The voice interaction device 10 includes: a voice recognition module 11 , an intention recognition module 12 , an accuracy recognition module 13 , a first determination module 14 , a modification module 15 and an instruction generation module 16 .
步骤01可以由语音识别模块11实现,步骤02可以由意图识别模块12实现,步骤03可以由精度识别模块13实现,步骤04可以由第一确定模块14实现,步骤05可以由修改模块15实现,步骤06可以由指令生成模块16实现。也即是说,语音识别模块11用于对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能;意图识别模块12用于利用意图识别模型对待识别文本进行意图识别;精度识别模块13用于利用精度识别模型对待识别文本进行精度识别;第一确定模块14用于根据意图识别的结果确定语音请求对应的目标意图,和根据精度识别的结果确定语音请求对应的目标刻度调节精度值;修改模块15用于根据目标意图和目标刻度调节精度值修改默认值,默认值为预设语音请求中目标意图对应的调节值;指令生成模块16用于将目标意图和修改后的默认值融合生成控制指令,以控制对应的车辆零部件。 Step 01 can be realized by the speech recognition module 11, step 02 can be realized by the intent recognition module 12, step 03 can be realized by the accuracy recognition module 13, step 04 can be realized by the first determination module 14, and step 05 can be realized by the modification module 15, Step 06 can be realized by the instruction generation module 16 . That is to say, the speech recognition module 11 is used to carry out speech recognition to the voice request of the vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; For using the intent recognition model to identify the text to be recognized; the accuracy identification module 13 is used to identify the accuracy of the text to be recognized by using the accuracy identification model; the first determination module 14 is used to determine the target intent corresponding to the voice request according to the result of the intent recognition, and Determine the target scale adjustment accuracy value corresponding to the voice request according to the result of the accuracy recognition; the modification module 15 is used to modify the default value according to the target intent and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intent in the preset voice request; instruction The generation module 16 is used to fuse the target intent and the modified default value to generate control instructions to control corresponding vehicle components.
车辆预设功能调节的语音请求例如可以为“屏幕亮亮亮”、“音量大大大”、“屏幕亮亮亮亮”、“空调风量大大大”、“座椅后后后”,即为带有精简词的语音请求。其中,预设功能指模拟对车辆零部件的操作进行刻度调节的功能,其中的车辆零部件可以指机械旋钮或按钮等部件,这些是可以进行调节刻度的车辆零部件。The voice request for vehicle preset function adjustment can be, for example, "the screen is bright", "the volume is louder", "the screen is brighter", "the air volume of the air conditioner is louder", "rear behind the seat", that is, Voice requests with shortened words. Among them, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts, wherein the vehicle parts may refer to components such as mechanical knobs or buttons, which are vehicle parts that can adjust the scale.
首先,在接收到用户对于车辆预设功能调节的语音请求后,通过语音识别技术进行语音识别,得到待识别文本以便后续处理,例如,对用户输入的语音请求“屏幕亮亮亮”进行语音识别,得到待识别文本“屏幕亮亮亮”。First, after receiving the user's voice request for vehicle preset function adjustment, voice recognition is performed through voice recognition technology, and the text to be recognized is obtained for subsequent processing. , get the text to be recognized "the screen is bright bright".
可以理解地,在实际交互环境中,可能受车辆硬件限制,或者因为网络的不稳定性,用户表述口语化或者方言化等原因,导致语音识别后得到的待识别文本不够清晰准确,需要通过预处理进行一些常规文本纠错,例如“音量深深深深深”纠正为“音量增增增增增”,以及一些无意义词语的去除等,例如“啊”,“请”等。It is understandable that in the actual interactive environment, the text to be recognized after speech recognition may not be clear and accurate due to the limitation of the vehicle hardware, or because of network instability, colloquial or dialect-based user expressions, etc. Handle some routine text error correction, such as correcting "the volume is deep and deep" to "the volume is increasing and increasing", and removing some meaningless words, such as "ah", "please", etc.
接着,利用意图识别模型对待识别文本进行意图识别,利用精度识别模型对待识别文本进行精度识别。待识别文本经过意图识别和精度识别可以确定用户的意图和精度。Next, use the intent recognition model to perform intent recognition on the text to be recognized, and use the precision recognition model to perform precision recognition on the text to be recognized. The text to be recognized can determine the user's intention and precision through intent recognition and precision recognition.
然后,根据意图识别的结果和精度识别的结果,确定语音请求对应的目标意图和目标刻度调节精度值。例如,根据意图识别的结果确定语音请求对应的目标意图和目标刻度调节精度值。例如,根据语音请求“屏幕亮亮亮”意图识别的结果确定对应的目标意图为调亮车载内屏幕的显示亮度,语音请求“屏幕亮亮亮”对应的目标刻度调节精度值为3,表示亮度调亮 3个档次。Then, according to the result of the intention recognition and the result of the precision recognition, determine the target intention and the target scale adjustment precision value corresponding to the voice request. For example, the target intent and target scale adjustment accuracy value corresponding to the voice request are determined according to the result of the intent recognition. For example, according to the result of the intention recognition of the voice request "bright screen", it is determined that the corresponding target intent is to brighten the display brightness of the in-vehicle screen, and the target scale adjustment accuracy value corresponding to the voice request "bright screen" is 3, indicating that the Brighten 3 levels.
接着,根据目标意图和目标刻度调节精度值修改默认值,默认值为预设语音请求中目标意图对应的调节值。Next, modify the default value according to the target intention and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intention in the preset voice request.
其中,可以理解地,目前技术方案的传统逻辑中,在“音量大大大”的语音请求下,意图根据传统逻辑识别为“system_volume_up”,该意图每次默认调节3个刻度,对应默认值为3,则车辆执行“音量增大3个刻度”的命令,这与“音量增大”,“音量大一点”等非精度的语音请求实现逻辑相同。“音量大大大大大大大”多个大的精简语音请求下,意图识别为“system_volume_max”,该意图下对应默认值为最高档位或最大刻度,则车辆执行“音量设置为最大刻度”的命令。Among them, it is understandable that in the traditional logic of the current technical solution, under the voice request of "great volume", the intention is recognized as "system_volume_up" according to the traditional logic, and the intention is adjusted to 3 scales by default each time, corresponding to a default value of 3 , the vehicle executes the command "increase the volume by 3 scales", which is the same as the implementation logic of non-precision voice requests such as "increase the volume" and "make the volume louder". "Volume greatly greatly greatly greatly" under multiple large streamlined voice requests, the intent is identified as "system_volume_max", and the corresponding default value under this intent is the highest gear or the maximum scale, then the vehicle executes the command of "set the volume to the maximum scale".
也即是,默认值为根据原先的逻辑确认的预设语音请求中的目标意图对应的调节值。其中,预设语音请求可以指的是“音量增大”、“音量减小”等用户语音请求。按照传统的识别逻辑,“音量增大”的目标意图对应的调节值为调高1次,即该默认值可以对应每次调节的具体刻度值,例如对应3个小刻度。按照传统的识别逻辑,“音量减小”的目标意图对应的调节值为调低1次,即该默认值对应可以为3个小刻度。即,此时的默认值为:default value=3。That is, the default value is the adjustment value corresponding to the target intention in the preset voice request confirmed according to the original logic. Wherein, the preset voice request may refer to user voice requests such as "volume up" and "volume down". According to the traditional identification logic, the adjustment value corresponding to the target intention of "volume increase" is increased once, that is, the default value can correspond to the specific scale value of each adjustment, for example, it corresponds to 3 small scales. According to the traditional recognition logic, the adjustment value corresponding to the target intention of "volume down" is lowered once, that is, the default value can correspond to 3 small scales. That is, the default value at this time is: default value=3.
而在对精简指令进行精度识别的精度逻辑下,“音量大大大”的语音请求,用户意图为“音量调大”,且用户期望的音量调节3次,在每次调节的默认值为3的情况下,用户实际想要音量调节9个刻度不同,也就是说,在对精简指令进行精度识别的精度逻辑下,“音量大大大”的语音请求,表示的是用户想要调大音量9个刻度。对应的,“音量大大大大大大大”中多个“大”的精简语音请求下,用户的意图为音量调大,且刻度调节精度值为7,即音量调节7次,最终用户想要将音量调大27个刻度。However, under the precision logic of accurate identification of simplified instructions, the voice request of "increasing the volume" means that the user intends to "increase the volume", and the user's desired volume is adjusted three times, and the default value of each adjustment is 3. In this case, the user actually wants to adjust the volume by 9 different scales. That is to say, under the precision logic of precise recognition of the simplified instructions, the voice request of "very louder" means that the user wants to increase the volume by 9 scales. scale. Correspondingly, under multiple "large" streamlined voice requests in "volume greatly greatly greatly greatly", the user's intention is to increase the volume, and the scale adjustment accuracy value is 7, that is, the volume is adjusted 7 times, and the end user wants to increase the volume Turn up 27 scales.
如此,在保持传统逻辑通过默认值进行调节的情况下,利用精度逻辑中识别的目标刻度调节精度值对传统逻辑的默认值进行修改,从而实现传统逻辑和精度逻辑共同作用下车辆零部件的精确控制。In this way, while keeping the traditional logic adjusted by the default value, the target scale adjustment precision value identified in the precision logic is used to modify the default value of the traditional logic, so as to realize the precision of vehicle parts under the joint action of traditional logic and precision logic. control.
例如,用户指令“音量大大大”对应的目标意图为将音量调大,在识别目标刻度调节精度的情况下,可识别到目标刻度调节精度值为3,即将音量调大3次,而根据则修改默认值得到修改后的调节刻度:default_value’=刻度值*default_value=3*3=9。根据用户的语音请求提高3次的需求,将默认值修改为9。即在新增根据带有精简词的语音请求进行控制车辆零部件进行精确调节的需求下,本申请的语音交互方法完全不会破坏原有非精度语音请求的实现逻辑,在传统逻辑框架下实现了根据带有精简词的语音请求进行控制车辆零部件进行精确调节的功能。For example, the target intention corresponding to the user instruction "Volume up greatly" is to increase the volume. In the case of identifying the target scale adjustment accuracy, it can be recognized that the target scale adjustment accuracy value is 3, that is, the volume is increased 3 times, and according to Modify the default value to obtain the modified adjustment scale: default_value'=scale value*default_value=3*3=9. Modify the default value to 9 according to the user's voice request to increase by 3 times. That is to say, under the new requirement of controlling vehicle parts and making precise adjustments based on voice requests with simplified words, the voice interaction method of this application will not destroy the original implementation logic of non-precision voice requests at all, and can be realized under the traditional logic framework It provides the function of controlling vehicle components and making precise adjustments according to voice requests with abbreviated words.
最后,将目标意图和修改后的默认值融合生成控制指令,以控制对应的车辆零部件。Finally, the target intent and the modified default values are fused to generate control instructions to control the corresponding vehicle components.
如此,本申请的语音交互方法及其装置可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的目标意图和目标刻度调节精度值,然后对于默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精确调节与语音请求对应的车辆零部件的刻度的效果,提升用户体验。In this way, the voice interaction method and device of the present application can perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
请参阅图3,语音交互方法包括:Please refer to Figure 3, voice interaction methods include:
001:确定车辆零部件控制范围及非控制范围。001: Determine the control range and non-control range of vehicle parts.
请结合图4,语音交互装置10还包括第二确定模块101。Please refer to FIG. 4 , the voice interaction device 10 further includes a second determination module 101 .
步骤001可以由第二确定模块101实现。可以理解地,第二确定模块101用于确定车辆零部件的控制范围及非控制范围。Step 001 can be implemented by the second determining module 101 . It can be understood that the second determination module 101 is used to determine the control range and non-control range of vehicle components.
可以理解地,车辆并非所有功能的调节都可以、能够或有需要进行精准的刻度调节。例如,座椅在各个方向上的移动可以通过车辆零部件进行调节。而车门则没有类似旋钮、按键等车辆零部件来实现刻度调节,而通常仅通过车门把手进行开关。因此,座椅调节是属于车辆零部件的控制范围、而车门调节则属于车辆零部件的非控制范围。It is understandable that not all functions of the vehicle can, can or need to be adjusted on a precise scale. For example, the movement of the seat in various directions can be adjusted by vehicle components. The car door does not have vehicle components such as knobs and buttons to achieve scale adjustment, but is usually only opened and closed through the door handle. Therefore, the seat adjustment belongs to the control range of vehicle components, while the door adjustment belongs to the non-control range of vehicle components.
获取车辆零部件的信息,根据车辆零部件的信息,确定可通过车辆零部件进行刻度调节的硬件,确定为车辆零部件的控制范围,将不可通过车辆零部件进行调节的硬件确定为非控制范围。Obtain the information of the vehicle parts, and according to the information of the vehicle parts, determine the hardware that can be adjusted through the vehicle parts, and determine it as the control range of the vehicle parts, and determine the hardware that cannot be adjusted through the vehicle parts as the non-control range .
首先,确定在车辆上可以进行刻度调节的车辆零部件,例如:“音量旋钮”,“屏幕亮度按钮”,“空调风量旋钮/按钮”,“座椅调节旋钮/按钮”等。进一步,确定车辆零部件的控制范围可包括:车载音响、车辆内的屏幕、车辆空调、车辆座椅、车内的氛围灯、车辆外部的车灯、或车窗等。车辆零部件的非控制范围可包括:车门、后视镜、后备箱等。First, determine the vehicle parts that can be scaled on the vehicle, such as: "volume knob", "screen brightness button", "air conditioning air volume knob/button", "seat adjustment knob/button", etc. Further, determining the control range of vehicle components may include: car audio, screens in the vehicle, vehicle air conditioners, vehicle seats, ambient lights in the car, lights outside the vehicle, or windows, etc. The non-control range of vehicle components can include: doors, rearview mirrors, trunks, etc.
在后续语音交互的过程中,可在语音请求针对车辆零部件的非控制范围的情况下进行语音提示。In the process of subsequent voice interaction, voice prompts can be performed when the voice request is directed to the non-control range of the vehicle components.
如此,通过收集车辆零部件信息,确认可通过车辆零部件进行刻度调节的功能,从而确定车辆零部件的控制范围,也即是可通过语音交互进行刻度调节的控制范围。In this way, by collecting vehicle component information and confirming the functions that can be scaled through vehicle components, the control range of vehicle components can be determined, that is, the control range that can be scaled through voice interaction.
语音交互方法包括:Voice interaction methods include:
002:确定每个车辆零部件的默认调节范围。002: Determine the default adjustment range for each vehicle component.
步骤002可以由第二确定模块101实现。也即是,第二确定模块101用于确定每个车辆零部件的默认调节范围。Step 002 can be implemented by the second determining module 101 . That is, the second determination module 101 is used to determine the default adjustment range of each vehicle component.
确定某个车辆零部件下的默认调节范围。例如,当需要调节的设备为车载音响时,语音请求模拟车辆零部件控制音量每次调节的默认值可以为3,若对应的调节音量的车辆零部件共有60个刻度,则默认调节范围为1~20。Determines the default adjustment range for a vehicle component. For example, when the device that needs to be adjusted is a car stereo, the voice request simulates the default value of each adjustment of the volume of the vehicle component control volume can be 3, if the corresponding volume adjustment vehicle components have a total of 60 scales, the default adjustment range is 1 ~20.
语音交互方法包括:Voice interaction methods include:
003:确定车辆零部件的可调节范围;003: Determine the adjustable range of vehicle components;
004:根据车辆零部件的可调节范围,纠正预设语音请求的意图。004: Correct the intention of the preset voice request according to the adjustable range of the vehicle components.
语音交互装置10还包括纠正模块102。The voice interaction device 10 also includes a correction module 102 .
步骤003可以由第二确定模块101实现,步骤004可以由纠正模块102实现。也即是,第二确定模块101用于确定车辆零部件的可调节范围;纠正模块102用于根据车辆零部件的可调节范围,纠正预设语音请求的意图。Step 003 can be implemented by the second determining module 101 , and step 004 can be implemented by the correcting module 102 . That is, the second determining module 101 is used to determine the adjustable range of the vehicle component; the correcting module 102 is used to correct the intention of the preset voice request according to the adjustable range of the vehicle component.
可以理解,在确定车辆零部件的控制范围和非控制范围后,需要针对控制范围中的每一个车辆零部件确定可调节范围。车辆零部件的可调节范围与通过操作该车辆零部件进行调节的刻度范围相对应。对应不同车辆零部件,可调节范围可以是档位或量程。例如,屏幕亮度按钮累计连续按压5次,屏幕亮度依次调整1至5个档位的亮度至最大亮度,则该屏幕亮度按钮的可调节范围为1至5个档位。又如,对座椅进行前后调节的旋钮的总刻度值为90,则该座椅调节旋钮的可调节范围为刻度值1~90。It can be understood that after the control range and non-control range of the vehicle components are determined, an adjustable range needs to be determined for each vehicle component in the control range. The adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component. Corresponding to different vehicle components, the adjustable range can be gear position or range. For example, if the screen brightness button is continuously pressed 5 times in total, and the screen brightness is sequentially adjusted from 1 to 5 gears to the maximum brightness, then the adjustable range of the screen brightness button is 1 to 5 gears. As another example, if the total scale value of the knob for adjusting the seat forward and backward is 90, then the adjustable range of the seat adjustment knob is scale value 1-90.
根据车辆零部件的可调节范围,对传统逻辑下将“音量大大大”的精简语音请求识别成“最大”“最小”意图的语音请求进行意图纠正,在精简词符合条件的情况下纠正为相对应的调大调小意图。According to the adjustable range of the vehicle components, under the traditional logic, the simplified voice request of "loud volume" is recognized as "maximum" and "minimum". The corresponding major and minor intents.
如此,可以在原先的传统逻辑的基础上达到用户指令中真正的精确调节的目的。In this way, the purpose of real precise adjustment in user instructions can be achieved on the basis of the original traditional logic.
请参阅图5,步骤003包括:Please refer to Fig. 5, step 003 includes:
0031:确定车辆零部件对应精简词可调节的范围。0031: Determine the adjustable range of vehicle parts corresponding to the shortened words.
步骤0131可以由第二确定模块101实现。也即是,第二确定模块101用于确定车辆零部件对应每个精简词可调节的范围。Step 0131 can be implemented by the second determining module 101 . That is, the second determination module 101 is used to determine the adjustable range of the vehicle components corresponding to each simplified word.
精简词指的是用户使用的简化而又能精确代表调节程度的词,例如可以用叠词作为精简词,如此,用户在输入语音请求时只需输入精简化的语音请求即可。例如,车载显示屏的亮度调节可以精简表述为“屏幕亮亮”、“屏幕亮亮亮”、“屏幕暗暗”和“屏幕暗暗暗”…,车载音响的音量调节精简表述为“音量大大”、“音量大大大”、“音量小小”和“音量小小小”…,空调的风量调节可以精简表述为“风量大大”、“风量大大大”、“风量小小”和“风量小小小”…。当然,精简词可以是用户习惯使用的重复词,例如“亮一点”、“暗一点”、“大一点”和“小一点”等,相应地用户语音请求可以精简表述为“屏幕亮一点亮一点”、“屏幕暗一点暗一点”、“音量大一点大一点”和“音量小一点小一点”等,在此不做具体限定。The reduced word refers to the simplified word used by the user and can accurately represent the degree of adjustment. For example, redundancies can be used as the reduced word, so that the user only needs to input the simplified voice request when inputting the voice request. For example, the brightness adjustment of the car display screen can be simplified as "the screen is bright", "the screen is bright", "the screen is dark" and "the screen is dark"... The volume adjustment of the car audio can be simplified as "the volume is louder", "Volume is very large", "Volume is small" and "Volume is small"..., the air volume adjustment of the air conditioner can be concisely expressed as "air volume is large", "air volume is large and large", "air volume is small" and "air volume is small "... Of course, the shortened words can be repeated words that users are used to, such as "brighter", "darker", "bigger" and "smaller". ", "the screen is darker and darker", "the volume is louder and louder" and "the volume is lower and smaller", etc., are not specifically limited here.
精简词对应可调节的范围可以根据车辆零部件的可调节范围进行确定。例如,对车辆内的屏幕进行调节时,屏幕亮度对应可调节范围为1~5个档位,语音识别时亮度相关的每个语音请求中可以识别最多5个精简词,则精简词可调节的范围可以为1~5。语音请求包括多个精简词时,每个精简词可以调节屏幕亮度的1个档位。The adjustable range corresponding to the simplified word can be determined according to the adjustable range of the vehicle components. For example, when adjusting the screen in a vehicle, the screen brightness corresponds to an adjustable range of 1 to 5 gears. During speech recognition, each voice request related to the brightness can recognize up to 5 simplified words, and the simplified words can be adjusted. The range can be 1-5. When the voice request includes multiple condensed words, each condensed word can adjust the brightness of the screen by 1 gear.
又例如,对车载音响进行调节时,音量可以调整大小,即可以使用精简词“大”、“大一点”、“小一点”或“小”进行调节,音量的总调节范围为60个刻度,而语音识别时音量相关的语音请求最多可以识别10个精简词,此时,精简词可调节的范围可以为1~10,对应的每个精简词可以调节车载音响音量的3个刻度。若语音识别出精简词超过10个的语音请求,可以直接将音量调节到最大或最小。For another example, when adjusting the car audio, the volume can be adjusted, that is, the simplified words "larger", "bigger", "smaller" or "smaller" can be used for adjustment. The total adjustment range of the volume is 60 scales, During speech recognition, the voice request related to the volume can recognize up to 10 simplified words. At this time, the adjustable range of the simplified words can be 1 to 10, and each corresponding simplified word can adjust the volume of the car audio by 3 scales. If the voice recognition recognizes voice requests with more than 10 simplified words, you can directly adjust the volume to the maximum or minimum.
语音交互方法包括:Voice interaction methods include:
005:将控制范围和可调节范围映射到预设意图和对应的预设刻度调节精度值。005: Map the control range and adjustable range to the preset intention and the corresponding preset scale adjustment accuracy value.
语音交互装置10还包括映射模块103。The voice interaction device 10 also includes a mapping module 103 .
步骤005可以由映射模块103实现。也即是说,映射模块103用于将控制范围和可调节范围映射到预设意图和对应的预设刻度调节精度值。Step 005 can be implemented by the mapping module 103 . That is to say, the mapping module 103 is used to map the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values.
如此,将车辆零部件的控制范围和每个车辆零部件的可调节范围,映射到意图识别模型所能够理解的意图体系。针对车辆零部件的控制范围中的对象和对应的车辆零部件的可调节范围均制定一个相应的预设意图。例如:system_volume_up代表着预设意图“音量调大”和system_volume_down代表着预设意图“音量调小”。从而针对零部件控制范围和车辆零部件的可调节范围制定了一套具体的意图映射体系。In this way, the control range of vehicle components and the adjustable range of each vehicle component are mapped to the intent system that the intent recognition model can understand. A corresponding preset intention is formulated for the objects in the control range of the vehicle component and the corresponding adjustable range of the vehicle component. For example: system_volume_up represents the default intent "volume up" and system_volume_down represents the default intent "volume down". Therefore, a specific intent mapping system is formulated for the control range of parts and the adjustable range of vehicle parts.
对于预设刻度调节精度,例如,语音交互模拟对车辆零部件的操作时音量每次调节3个刻度值,总刻度值为60,则预设刻度调节精度范围可以为1~20。又例如,语音交互模拟对车辆零部件的操作时座椅前后每次调节18个刻度,总刻度值为90,则预设刻度调节精度范围为1~5。For the preset scale adjustment accuracy, for example, when the voice interaction simulates the operation of vehicle parts, the volume is adjusted by 3 scale values at a time, and the total scale value is 60, then the preset scale adjustment accuracy range can be 1-20. For another example, when the voice interaction simulates the operation of vehicle parts, the seat is adjusted 18 scales each time, the total scale value is 90, and the preset scale adjustment accuracy ranges from 1 to 5.
请参阅图6,步骤005包括:Please refer to Fig. 6, step 005 includes:
0051:根据预设意图和默认调节范围,建立意图与默认值映射表。0051: Establish a mapping table of intent and default value according to the preset intent and default adjustment range.
步骤0051可以由映射模块103实现。也即是说,映射模块103用于根据预设意图和默认调节范围,建立意图与默认值映射表。 Step 0051 can be implemented by the mapping module 103 . That is to say, the mapping module 103 is configured to establish an intent-default value mapping table according to the preset intent and the default adjustment range.
根据预设意图与之前确认的默认调节范围,可以建立意图与默认值的 映射表,供在线流程使用并进行下游操作。According to the preset intent and the previously confirmed default adjustment range, a mapping table of intent and default value can be established for use in the online process and for downstream operations.
例如,若语音请求模拟车辆零部件调节车载音响的音量每次调节3个刻度(默认值为3),在精度需求下,音量对应的预设意图分别为system_volume_up与system_volume_down。对应地,由车载音响的车辆零部件调节音量而建立的意图与默认值映射表可以为:For example, if the voice request simulates vehicle parts to adjust the volume of the car audio by 3 scales each time (the default value is 3), under the requirement of precision, the preset intentions corresponding to the volume are system_volume_up and system_volume_down respectively. Correspondingly, the intent and default value mapping table established by adjusting the volume of the vehicle components of the car audio can be:
{system_volume_up:3;system_volume_down:3}。{system_volume_up: 3; system_volume_down: 3}.
若语音请求模拟车辆零部件调节空调风量每次调节1个档位,由车载空调的车辆零部件调节空调风量而建立的意图与默认值映射表为:If the voice request simulates the vehicle parts to adjust the air volume of the air conditioner to adjust one gear each time, the intent and default value mapping table established by the vehicle parts of the vehicle air conditioner to adjust the air volume of the air conditioner is:
{ac_wind_up:1;ac_wind_down:1;}。{ ac_wind_up: 1; ac_wind_down: 1; }.
同理,本申请的语音交互方法还包括有屏幕亮度调节、车辆座椅高低前后等多个预设意图,多个预设意图与默认值的映射关系均可以根据上述方法确定,并将此映射关系存入数据库,供在线流程加载和读取。Similarly, the voice interaction method of this application also includes multiple preset intentions such as screen brightness adjustment, vehicle seat height, front and back, etc. The mapping relationship between multiple preset intentions and default values can be determined according to the above method, and the mapping Relationships are stored in a database for loading and reading by online processes.
请参阅图7,步骤005包括:Please refer to Fig. 7, step 005 includes:
0052:将控制范围内每个车辆零部件的可调节范围,映射到一个预设意图,每个预设意图对应多个预设刻度调节精度值。0052: Map the adjustable range of each vehicle component within the control range to a preset intention, and each preset intention corresponds to multiple preset scale adjustment accuracy values.
步骤0052可以由映射模块103实现。也即是,映射模块103用于将控制范围内每个车辆零部件的可调节范围,映射到一个预设意图,每个预设意图对应多个预设刻度调节精度值。 Step 0052 can be implemented by the mapping module 103 . That is, the mapping module 103 is used to map the adjustable range of each vehicle component within the control range to a preset intention, and each preset intention corresponds to multiple preset scale adjustment accuracy values.
每个车辆零部件的可调节范围包括多个档位或多个刻度值,在建立映射时需要将对应每个车辆部件的可调节范围都映射到同一个预设意图。例如,空调风量调节按键的可调节范围包括5个档位,对应风量增大的语音请求的说法可包括从“风量大”到“风量大大大大大”共5个,需要将这5个风量调大的说法都映射到同一个预设意图,即风量调大。The adjustable range of each vehicle component includes multiple gear positions or multiple scale values. When establishing the mapping, it is necessary to map the adjustable range corresponding to each vehicle component to the same preset intention. For example, the adjustable range of the air volume adjustment button of the air conditioner includes 5 gears, and the voice request corresponding to the increase of the air volume can include 5 levels from "large air volume" to "large air volume". The words "big" all map to the same preset intention, that is, to increase the air volume.
如此,在语音交互过程中,使得对于同一车辆零部件不同调节刻度的语音请求都对应到相同的预设意图。In this way, during the voice interaction process, voice requests for different adjustment scales of the same vehicle component all correspond to the same preset intention.
一个预设意图对应多个预设刻度调节精度值,例如“将车载音响的音量调大”的预设意图可以对应20个预设刻度调节精度值,若音量旋钮的可调节范围为60,即调节音量的总刻度为60,则每个预设刻度调节精度值对应调节的刻度值为3个刻度,也即是,每调节一个预设刻度调节精度值代表调节3个刻度。20个预设刻度调节精度值分别为:调节音量增大3个刻度,其对应语音请求为“音量大”;调节音量增大6个刻度,其对应语音请求为“音量大大”;调节音量增大9个档位,其对应语音请求为“音量大大大”……。One preset intent corresponds to multiple preset scale adjustment accuracy values. For example, the preset intent "increase the volume of the car audio" can correspond to 20 preset scale adjustment accuracy values. If the adjustable range of the volume knob is 60, that is The total scale for adjusting the volume is 60, and each preset scale adjustment precision value corresponds to 3 scales for adjustment, that is, each adjustment of a preset scale adjustment precision value represents an adjustment of 3 scales. The adjustment accuracy values of the 20 preset scales are as follows: adjust the volume to increase by 3 scales, and the corresponding voice request is "loud volume"; adjust the volume to increase by 6 scales, and the corresponding voice request is "too loud"; adjust the volume to increase There are 9 gears, and the corresponding voice request is "the volume is greatly increased"....
在本申请的其他实施例中,在用户允许的情况下可以关于同样预设意图对应收集不同的用户指令,如关于“音量大大大”的说法,用户可以有不同自由度的展开,如“音量增增增”,“音量升升升”,“音量高高高”,识别不同的展开词所识别得到的意图均为将音量调大。In other embodiments of the present application, with the permission of the user, different user instructions can be collected corresponding to the same preset intention, such as "the volume is very loud", and the user can expand with different degrees of freedom, such as "volume Increase", "Volume rises", "Volume is high and high", and the recognized intentions of different expansion words are to increase the volume.
请参阅图8,步骤005包括:Please refer to Fig. 8, step 005 includes:
0053:将精简词设置为槽位,对车辆零部件对应的预设识别文本进行槽位提取得到重复字段;0053: Set the simplified word as the slot, and extract the slot from the preset recognition text corresponding to the vehicle parts to obtain the repeated field;
0054:对重复字段的槽值进行重复统计得到重复数量;0054: Perform repeated statistics on the slot value of the repeated field to obtain the number of repetitions;
0055:根据精简词可调节的范围将重复数量映射到预设意图对应的预设刻度调节精度值。0055: According to the adjustable range of the reduced word, map the number of repetitions to the preset scale adjustment precision value corresponding to the preset intent.
请结合图9,映射模块103包括提取单元1033、统计单元1034和映射单元1035。Please refer to FIG. 9 , the mapping module 103 includes an extraction unit 1033 , a statistical unit 1034 and a mapping unit 1035 .
步骤0053可以由提取单元1033实现,步骤0054可以由统计单元1034实现和步骤0055可以由映射单元1035实现。也即是,提取单元1033用 于将精简词设置为槽位,对车辆零部件对应的预设识别文本进行槽位提取得到重复字段;统计单元1034用于对重复字段的槽值进行重复统计得到重复数量;映射单元1035根据精简词可调节的范围将重复数量映射到预设意图对应的预设刻度调节精度值。 Step 0053 can be implemented by the extracting unit 1033 , step 0054 can be implemented by the statistical unit 1034 and step 0055 can be implemented by the mapping unit 1035 . That is to say, the extraction unit 1033 is used to set the simplified word as a slot, and extracts the slot from the preset recognition text corresponding to the vehicle parts to obtain a repeated field; the statistical unit 1034 is used to perform repeated statistics on the slot value of the repeated field to obtain The number of repetitions; the mapping unit 1035 maps the number of repetitions to the preset scale adjustment precision value corresponding to the preset intention according to the adjustable range of the simplified word.
精简词的重复数量可以代表对车辆零部件进行刻度调节的次数。因此,可以将精简词设置为槽位。例如,音量旋钮的精简词可调节的范围为1~10,音量旋钮对应的预设刻度调节精度范围为1~20,在精简词可调节的范围内,若语音请求对应的预设识别文本为“音量大大大大”,则可以将“大大大大”抽取为槽位,并将该槽位设置为重复字段。然后,对抽取出的重复字段的槽值进行重复统计,将其重复数量映射到对应的预设刻度调节精度,对于抽取的槽位“大大大大”,“大”的重复数量为4,则可以映射到对应的预设刻度调节精度4。The number of repetitions of the shortened words may represent the number of calibration adjustments to the vehicle components. Therefore, the shortened words can be set as slots. For example, the condensed words of the volume knob can be adjusted in the range of 1 to 10, and the preset scale adjustment accuracy corresponding to the volume knob is in the range of 1 to 20. Within the adjustable range of the condensed words, if the preset recognition text corresponding to the voice request is "Volume greatly greatly", you can extract "largely greatly" as a slot, and set this slot as a repeated field. Then, perform repeated statistics on the slot values of the extracted repeated fields, and map the number of repetitions to the corresponding preset scale to adjust the accuracy. For the extracted slots "big, big" and the number of repetitions of "big" is 4, you can Mapped to the corresponding preset scale adjustment accuracy 4.
在本申请的其他实施例中,在用户允许的情况下可以关于同样刻度调节精度对应收集不同的用户语音请求,如关于“音量大大大”的说法,用户可以有不同自由度的展开,如“音量增增增”,“音量升升升”,“音量高高高”,识别不同的展开词所识别得到的刻度调节精度均为“音量调节3次”。In other embodiments of the present application, with the permission of the user, different user voice requests can be collected for the same scale adjustment accuracy, such as the statement "the volume is very loud", and the user can expand with different degrees of freedom, such as " "Volume increases", "Volume rises", "Volume is high, high", and the scale adjustment accuracy obtained by recognizing different expansion words is "volume adjustment 3 times".
请参阅图10,语音交互方法包括:Please refer to Figure 10, voice interaction methods include:
006:通过意图训练数据训练得到意图识别模型,意图训练数据与车辆零部件和车辆零部件的可调节范围相关。006: The intention recognition model is obtained through training the intention training data, and the intention training data is related to the vehicle parts and the adjustable range of the vehicle parts.
请结合图11,语音交互装置10包括意图训练模块104。Please refer to FIG. 11 , the voice interaction device 10 includes an intention training module 104 .
步骤006可以由意图训练模块104实现,也即是,意图训练模块104用于通过意图训练数据训练得到意图识别模型,意图训练数据与车辆零部件和车辆零部件的可调节范围相关。Step 006 can be implemented by the intention training module 104, that is, the intention training module 104 is used to train the intention recognition model through intention training data, and the intention training data is related to the vehicle components and the adjustable range of the vehicle components.
本申请通过机器学习的方式,由可进行刻度调节的车辆零部件和车辆零部件的可调节范围对应的训练数据训练得到意图识别模型,进而对语音请求进行意图识别,实现用户意图的准确识别。In this application, the intention recognition model is obtained by training the vehicle parts that can be scaled and the training data corresponding to the adjustable range of the vehicle parts through machine learning, and then performs intention recognition on voice requests to realize accurate recognition of user intentions.
其中,意图训练数据与可进行刻度调节的车辆零部件和零部件的可调节范围相关。车辆零部件指的是在智能汽车上可以进行刻度调节的零部件,例如:“音量旋钮”,“屏幕亮度按钮”,“空调风量旋钮/按钮”,“座椅调节旋钮/按钮”等。车辆零部件的可调节范围与与通过操作该车辆零部件进行调节的刻度范围相对应。对应不同车辆零部件,可调节范围可以是档位或量程。Wherein, the intention training data is related to the vehicle components and the adjustable range of the components that can be scaled. Vehicle parts refer to the parts that can be adjusted on the smart car, such as: "volume knob", "screen brightness button", "air conditioning air volume knob/button", "seat adjustment knob/button" and so on. The adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component. Corresponding to different vehicle components, the adjustable range can be gear position or range.
本申请中的意图识别模型,在使用前预先训练。意图训练的数据可以在取得相关用户权限的情况下,收集一定数量的用户语音请求的历史记录,对收集到的用户语音请求进行简单的筛选得到语义明确且包含具体目的语音请求,具体为:在筛选中去掉明显语义不明确的语音请求,以及一些只包含语气词,例如“啊”,“哦”等较短的语音请求,留下语义明确同时包含具体目的语音请求。The intent recognition model in this application is pre-trained before use. Intention training data can collect a certain amount of historical records of user voice requests under the condition of obtaining relevant user permissions, and simply filter the collected user voice requests to obtain voice requests with clear semantics and specific purposes, specifically: in In the screening, the voice requests with obvious semantic ambiguity and some short voice requests containing only modal particles, such as "ah" and "oh", are removed, and the voice requests with clear semantics and specific purposes are left.
然后,对筛选后的语音请求参照制定的预设意图进行标注,例如,语音请求为“屏幕亮亮亮”,可标注对应的意图为“屏幕调亮”,然后,对标注的数据进行质检,再次筛选去掉不符合预设意图的标注数据,留下可用于意图模型训练的标注数据。例如,语音请求为“车门开”,标注对应的意图为“打开车门”,而可进行刻度调节的零部件不用于调节车门,此时,可通过筛选将该语音请求去掉。Then, mark the screened voice request with reference to the preset intent. For example, if the voice request is "brighten the screen", the corresponding intent can be marked as "brighten the screen", and then perform quality inspection on the marked data , again to filter out the labeled data that does not meet the preset intent, leaving the labeled data that can be used for training the intent model. For example, if the voice request is "open the car door", the corresponding intention of the label is "open the car door", and the parts that can be adjusted by the scale are not used to adjust the car door. At this time, the voice request can be removed by filtering.
在训练过程中,将可用于意图模型训练的的标注数据作为意图训练数据并划分为意图训练集和意图数据集,划分比例可根据需求设定,在此不 作限定。例如意图训练集80%,意图验证集为20%。利用意图训练集中的数据进行意图识别模型的训练。模型训练可以利用BERT、ALBERT、XLNet、RoBERTa等模型。During the training process, the labeled data that can be used for intent model training is used as intent training data and divided into intent training set and intent data set. The division ratio can be set according to requirements, and is not limited here. For example, the intention training set is 80%, and the intention verification set is 20%. Use the data in the intent training set to train the intent recognition model. Model training can use models such as BERT, ALBERT, XLNet, and RoBERTa.
例如,对于建立好的意图识别模型,先利用意图训练集中的至少部分数据用于训练意图识别模型,然后利用意图验证集的至少部分数据对训练后的意图识别模型的准确率进行意图验证。在意图验证的准确率没有达到意图准确率阈值的情况下,再次通过意图训练集的至少另一部分数据对意图识别模型进行训练,以及再次利用意图验证集的另一部分数据对再次训练后的意图识别模型的准确率进行意图验证,如此重复训练和意图验证的过程,直到意图验证的准确率达到意图准确率阈值时,可以认为意图识别模型已经达标,完成意图识别模型的训练。For example, for the established intent recognition model, at least part of the data in the intent training set is used to train the intent recognition model, and then at least part of the data in the intent verification set is used to verify the accuracy of the trained intent recognition model. In the case that the accuracy of intent verification does not reach the threshold of intent accuracy, train the intent recognition model with at least another part of the data in the intent training set, and use another part of the data in the intent verification set to recognize the intent after retraining The accuracy of the model is verified by intent, and the process of training and verification is repeated until the accuracy of intent verification reaches the threshold of intent accuracy, it can be considered that the intent recognition model has reached the standard, and the training of the intent recognition model is completed.
需要说明的是,意图训练集和意图验证集中的每个数据均只使用一次,在意图识别模型遍历意图训练集和意图验证集的所有数据均未能训练达标的情况下,可以再次在用户允许的情况下收集更多的语音请求,从而筛选并标注得到更多的意图训练数据对意图识别模型进行训练,从而保证意图识别模型能够准确识别输入的语音请求对应的意图。It should be noted that each data in the intent training set and intent verification set is only used once. If the intent recognition model fails to reach the training standard after traversing all the data in the intent training set and intent verification set, it can be used again with the user's permission. Collect more voice requests in the case of a situation, so as to screen and label more intent training data to train the intent recognition model, so as to ensure that the intent recognition model can accurately recognize the intent corresponding to the input voice request.
可以理解,上述意图识别模型可以离线进行训练,将离线训练好的意图识别模型部署到服务器或车辆后,服务器或车辆可以对接收到的语音请求,利用意图识别模型进行意图识别。It can be understood that the above intent recognition model can be trained offline, and after the offline trained intent recognition model is deployed to the server or vehicle, the server or vehicle can use the intent recognition model to perform intent recognition on the received voice request.
语音交互方法包括:Voice interaction methods include:
007:通过精度训练数据训练得到精度识别模型,精度训练数据与车辆零部件、车辆零部件的可调节范围和车辆零部件的刻度调节精度范围相关。007: The accuracy recognition model is obtained through the training of the accuracy training data. The accuracy training data is related to the vehicle parts, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the vehicle parts.
语音交互装置10包括精度训练模块105。The speech interaction device 10 includes an accuracy training module 105 .
步骤007可以由精度训练模块105实现。也即是说,精度训练模块105用于通过精度训练数据训练得到精度识别模型,精度训练数据与车辆零部件、车辆零部件的可调节范围和车辆零部件的刻度调节精度范围相关。Step 007 can be implemented by the accuracy training module 105 . That is to say, the accuracy training module 105 is used to obtain an accuracy recognition model through training on accuracy training data, and the accuracy training data is related to the vehicle parts, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the vehicle parts.
如此,本申请通过机器学习的方式,由可进行刻度调节的车辆零部件、车辆零部件的可调节范围和零部件的刻度调节精度范围对应的训练数据训练得到精度识别模型,进而语音请求进行精度识别,实现用户刻度调节精度的准确识别。In this way, the application uses machine learning to obtain an accuracy recognition model from the training data corresponding to the vehicle parts that can be scaled, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the parts, and then voice requests for accuracy Identification, to achieve accurate identification of user scale adjustment accuracy.
其中,精度训练数据与可通过车辆零部件进行刻度调节的车辆零部件、零部件的可调节范围相关,指的是精度训练数据包括车辆中所有可以进行刻度调节的车辆零部件,例如“音量旋钮”,“屏幕亮度按钮”,“空调风量旋钮/按钮”,“座椅调节旋钮/按钮”等。车辆零部件的可调节范围与与通过操作该车辆零部件进行调节的刻度范围相对应。对应不同车辆零部件,可调节范围可以是档位或量程,刻度调节精度范围可以是每次调节的刻度值。Among them, the accuracy training data is related to the vehicle parts that can be adjusted by the scale of the vehicle parts and the adjustable range of the parts, which means that the accuracy training data includes all the vehicle parts that can be adjusted by the scale in the vehicle, such as "volume knob ", "Screen Brightness Button", "Air Conditioner Air Volume Knob/Button", "Seat Adjustment Knob/Button" etc. The adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component. Corresponding to different vehicle components, the adjustable range can be gear position or range, and the scale adjustment accuracy range can be the scale value of each adjustment.
其中,精度训练的数据可以在取得相关用户权限的情况下,收集一定数量的用户语音请求的历史记录,对收集到的用户语音请求进行简单的筛选得到语义明确且包含具体目的语音请求,具体为:在筛选中去掉明显语义不明确的语音请求,以及一些只包含语气词,例如“啊”,“哦”等较短的语音请求,留下语义明确同时包含具体目的语音请求。此时,精度训练时获取的用户语音请求的历史记录可以与意图训练时获取的用户语音请求的历史记录相同,以及精度训练时对收集到的用户语音请求进行筛选的步骤可以与意图训练时对收集到的用户语音请求进行筛选的步骤相同。Among them, the precision training data can collect a certain amount of historical records of user voice requests under the condition of obtaining relevant user permissions, and simply filter the collected user voice requests to obtain voice requests with clear semantics and specific purposes. Specifically, : In the screening, remove the obviously semantically unclear voice requests, and some short voice requests that only contain modal particles, such as "ah", "oh", etc., leaving voice requests with clear semantics and specific purposes. At this point, the history of user voice requests acquired during precision training can be the same as the history of user voice requests acquired during intention training, and the step of filtering the collected user voice requests during precision training can be compared with that of intention training. The steps for screening the collected user voice requests are the same.
然后对筛选后的语音请求进行人工标注,需标注出用户想要调节的刻 度调节精度值。例如,语音请求为“屏幕亮亮亮”,对应标注对车辆内屏幕亮度进行调节的刻度调节精度值为3。然后,基于槽位提取的方式建立精度识别模型,槽位提取可以使用的算法包括RNN槽填充,CRF等,将标注好的数据作为精度训练数据并划分得到精度训练集和精度数据集,划分比例可根据需求设定,在此不作限定。例如精度训练集80%,精度验证集为20%。利用精度训练集中的数据进行精度识别模型的训练。对于建立好的精度识别模型,先利用精度训练集中的至少部分数据用于训练精度识别模型,然后利用精度验证集的至少部分数据对训练后的精度识别模型的准确率进行精度验证。在精度验证的准确率没有达到精度准确率阈值的情况下,再次通过精度训练集的至少另一部分数据对精度识别模型进行训练,以及再次利用精度验证集的另一部分数据对再次训练后的精度识别模型的准确率进行精度验证,如此重复训练和精度验证的过程,直到精度验证的准确率达到精度准确率阈值时,可以认为精度识别模型已经达标,完成精度识别模型的训练。Then manually mark the screened voice requests, and mark the scale adjustment accuracy value that the user wants to adjust. For example, if the voice request is "the screen is bright and bright", the scale adjustment accuracy value of the corresponding label to adjust the brightness of the screen in the vehicle is 3. Then, an accuracy recognition model is established based on slot extraction. Algorithms that can be used for slot extraction include RNN slot filling, CRF, etc., and the marked data is used as accuracy training data and divided to obtain an accuracy training set and an accuracy data set. The division ratio It can be set according to requirements, and is not limited here. For example, the accuracy training set is 80%, and the accuracy verification set is 20%. Use the data in the precision training set to train the precision recognition model. For the established precision recognition model, at least part of the data in the precision training set is used to train the precision recognition model, and then at least part of the data in the precision verification set is used to verify the accuracy of the trained precision recognition model. In the case that the accuracy of accuracy verification does not reach the threshold of accuracy accuracy, the accuracy recognition model is trained again through at least another part of the data of the accuracy training set, and the accuracy recognition after retraining is performed again using another part of the data of the accuracy verification set The accuracy of the model is verified for accuracy, and the process of training and accuracy verification is repeated until the accuracy of accuracy verification reaches the threshold of accuracy and accuracy, the accuracy identification model can be considered to have reached the standard, and the training of the accuracy identification model is completed.
需要说明的是,精度训练集和精度验证集中的每个数据均只使用一次,在精度识别模型遍历精度训练集和精度验证集的所有数据均未能训练达标的情况下,可以再次在用户允许的情况下收集更多的语音信息,从而筛选并标注得到更多的精度训练数据对精度识别模型进行训练,从而保证精度识别模型能够准确识别输入的语音请求对应的刻度调节精度。It should be noted that each data in the accuracy training set and accuracy verification set is only used once. When the accuracy recognition model traverses all the data in the accuracy training set and accuracy verification set and fails to meet the training standards, it can be used again with the user's permission. Collect more voice information under the circumstances, so as to filter and label more precision training data to train the precision recognition model, so as to ensure that the precision recognition model can accurately recognize the scale adjustment precision corresponding to the input voice request.
如此,可以通过精度训练数据预先训练好精度识别模型对待识别文本进行精度识别,从而识别出某个车辆零部件的调节精度,得到精度识别结果,最终确定目标刻度调节精度值。In this way, the precision recognition model can be pre-trained through the precision training data to perform precision recognition on the text to be recognized, thereby identifying the adjustment precision of a certain vehicle component, obtaining the precision recognition result, and finally determining the target scale adjustment precision value.
请参阅图12,预设意图为多个,步骤04包括:Please refer to Figure 12, there are multiple preset intentions, step 04 includes:
041:获取意图识别的结果对应各个预设意图的意图判别概率;041: Obtain the intent discrimination probability corresponding to each preset intent from the result of intent recognition;
042:将意图判别概率大于第一概率阈值的一个预设意图确定为语音请求对应的目标意图。042: Determine a preset intent with an intent discrimination probability greater than a first probability threshold as a target intent corresponding to the voice request.
请结合图13,第一确定模块14还可以包括第一获取单元141和意图确定单元142。Please refer to FIG. 13 , the first determination module 14 may further include a first acquisition unit 141 and an intention determination unit 142 .
步骤041可以由第一获取单元141实现,步骤042可以由意图确定单元142实现。也即是,第一获取单元141用于获取意图识别的结果对应各个预设意图的意图判别概率;意图确定单元142用于将意图判别概率大于第一概率阈值的一个预设意图确定为语音请求对应的目标意图。Step 041 can be implemented by the first acquiring unit 141 , and step 042 can be implemented by the intention determining unit 142 . That is, the first obtaining unit 141 is used to obtain the intention identification probability corresponding to each preset intention from the result of the intention identification; the intention determination unit 142 is used to determine a preset intention whose intention identification probability is greater than the first probability threshold as a voice request corresponding target intent.
使用训练好的的模型针对待识别文本进行意图识别得到意图识别的结果,意图识别的结果中包括待识别文本与各个预设意图相匹配的概率,即可以得到多个意图判别概率。若第一概率阈值为0.9,则意图识别的结果为某个类别的预设意图的意图判别概率超过0.9,那么服务端认为当前用户的语音请求为对应类别的预设意图就是目标意图。第一概率阈值也可以为其他数值,第一概率阈值可以为默认设置的数值,也可以根据用户需要自行设定,在此不作限制。Use the trained model to perform intent recognition on the text to be recognized to obtain the result of intent recognition. The result of intent recognition includes the probability that the text to be recognized matches each preset intent, that is, multiple intent discrimination probabilities can be obtained. If the first probability threshold is 0.9, the result of the intent recognition is that the intention discrimination probability of a certain type of preset intent exceeds 0.9, and the server considers that the current user's voice request is the corresponding type of preset intent as the target intent. The first probability threshold may also be other values. The first probability threshold may be a default value, or may be set according to user needs, and no limitation is set here.
本申请的预设意图可包括:音量调大、音量调小、风量调大、风量调小、温度调高、温度调低、地图放大、地图缩小、屏幕调亮、屏幕调暗、屏幕上滑、屏幕下滑、仪表调亮、仪表调暗、氛围灯调亮、氛围灯调暗、座椅向前、座椅向后、座椅升高、座椅降低、椅背向前、椅背向后、车窗上升和车窗下降中的至少一种。The preset intentions of this application may include: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map zoom out, screen brighter, screen darker, screen slide up , screen slides down, gauge brightens, gauge dims, ambient light brightens, ambient light dims, seat forward, seat rearward, seat up, seat down, seat back forward, seat back rearward , at least one of window up and window down.
应当理解地,本申请中的预设意图仅为示意性说明,对于车辆中可进行刻度调节的对象都可以根据其实际的操作设定相应的预设意图。It should be understood that the preset intentions in this application are only illustrative, and the corresponding preset intentions can be set according to the actual operation of the objects in the vehicle that can be scaled.
如此,可根据车辆的具体情况制定多个预设意图,完善可能遇到的语 音交互场景。In this way, multiple preset intentions can be formulated according to the specific conditions of the vehicle to improve possible voice interaction scenarios.
步骤04还包括: Step 04 also includes:
043:在各个预设意图的意图判别概率均不大于第一概率阈值的情况下,确定语音请求的意图为非刻度调节意图。043: When the intention discrimination probabilities of each preset intention are not greater than the first probability threshold, determine that the intention of the voice request is a non-scale adjustment intention.
步骤043可以由意图确定单元142实现,也即是,意图确定单元142用于在各个预设意图的意图判别概率均不大于第一概率阈值的情况下,确定语音请求的意图为非刻度调节意图。Step 043 can be realized by the intention determination unit 142, that is, the intention determination unit 142 is used to determine that the intention of the voice request is a non-scale adjustment intention when the intention discrimination probability of each preset intention is not greater than the first probability threshold .
例如,当多个类别的预设意图对应的判别概率均不大于第一概率阈值的情况,即根据语音请求得到用户的意图识别结果与多个类别预设意图相匹配的概率比较低,低于第一概率阈值,例如第一概率阈值为0.9,则确定该语音请求的意图为非刻度调节意图,非刻度调节意图指的是不用可进行刻度调节的车辆零部件来调节车辆预设功能的用户意图,例如,用户输入的语音请求为“车门开开开”,因为车门不能用带有刻度的车辆零部件进行调节,因此,该语音请求“车门开开开”的意图是非刻度调节意图。For example, when the discriminant probabilities corresponding to the preset intentions of multiple categories are not greater than the first probability threshold, that is, the probability that the user’s intention recognition result according to the voice request matches the preset intentions of multiple categories is relatively low, lower than The first probability threshold, for example, the first probability threshold is 0.9, then it is determined that the intention of the voice request is a non-scale adjustment intention, and the non-scale adjustment intention refers to the user who does not use the vehicle parts that can be scaled to adjust the preset function of the vehicle Intent, for example, the voice request input by the user is "open the door", because the door cannot be adjusted by the vehicle parts with scales, therefore, the voice request "open the door" is a non-scale adjustment intent.
请参阅图14,步骤04还包括:Please refer to Figure 14, step 04 also includes:
044:获取精度识别的结果对应各个预设刻度调节精度值的精度判别概率;044: Acquire the accuracy identification probability corresponding to each preset scale to adjust the accuracy value of the accuracy identification result;
045:将精度判别概率大于第二概率阈值的一个预设刻度调节精度值,确定为语音请求对应的目标刻度调节精度值。045: Determine a preset scale adjustment accuracy value whose accuracy discrimination probability is greater than the second probability threshold as a target scale adjustment accuracy value corresponding to the voice request.
请结合图15,第一确定模块14包括第二获取单元143和精度确定单元144。Please refer to FIG. 15 , the first determination module 14 includes a second acquisition unit 143 and an accuracy determination unit 144 .
步骤044可以由第二获取单元143实现,步骤045可以由精度确定单元144实现。也即是说,第二获取单元143用于获取精度识别的结果对应各个预设刻度调节精度值的精度判别概率;精度确定单元144用于将精度判别概率大于第二概率阈值的一个预设刻度调节精度值,确定为语音请求对应的目标刻度调节精度值。Step 044 can be implemented by the second acquiring unit 143 , and step 045 can be implemented by the accuracy determining unit 144 . That is to say, the second acquisition unit 143 is used to obtain the accuracy identification probability corresponding to each preset scale adjustment accuracy value of the accuracy identification result; The adjustment accuracy value is determined as the target scale adjustment accuracy value corresponding to the voice request.
精度判别概率指的是识别该语音请求的精度与各个预设刻度调节精度值相匹配的概率。第二概率阈值例如可以为0.7、0.8、0.9或其他数值,在此不作限制。The accuracy discrimination probability refers to the probability that the accuracy of recognizing the voice request matches the adjustment accuracy value of each preset scale. The second probability threshold may be, for example, 0.7, 0.8, 0.9 or other numerical values, which are not limited here.
当精度判别概率为1,第二概率阈值为0.9时,即精度判别概率为1超过第二概率阈值0.9,则确定语音请求“音量大大大大大”对应音量调节的目标刻度调节精度值为5。When the accuracy discrimination probability is 1 and the second probability threshold is 0.9, that is, the accuracy discrimination probability is 1 and exceeds the second probability threshold 0.9, then it is determined that the target scale adjustment accuracy value for volume adjustment corresponding to the voice request "Volume is louder is louder" is 5.
步骤04还包括: Step 04 also includes:
046:在各个预设刻度调节精度值的精度判别概率均不大于第二概率阈值的情况下,确定语音请求的精度识别错误。046: In a case where the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold, determine that the accuracy recognition of the voice request is wrong.
步骤046可以由精度确定单元144实现。也即是说,精度确定单元144用于在各个预设刻度调节精度值的精度判别概率均不大于第二概率阈值的情况下,确定语音请求的精度识别错误。Step 046 can be implemented by the accuracy determination unit 144 . That is to say, the accuracy determining unit 144 is configured to determine that the accuracy of the speech request is incorrectly recognized when the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold.
各个预设刻度调节精度值的精度判别概率均不大于第二概率阈值的情况,说明输入的语音请求的精度识别有误,可以排除非刻度调节精度相关的语音请求。If the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold, it indicates that the accuracy recognition of the input voice request is incorrect, and voice requests not related to scale adjustment accuracy can be excluded.
请参阅图16,步骤05包括:Please refer to Figure 16, step 05 includes:
051:根据目标意图和意图与默认值映射表确定默认值;051: Determine the default value according to the target intent and the intent-default value mapping table;
052:根据目标刻度调节精度值修改默认值。052: Modify the default value according to the target scale adjustment accuracy value.
请参阅图17,修改模块15包括默认值确定单元151和修改单元152。Referring to FIG. 17 , the modifying module 15 includes a default value determining unit 151 and a modifying unit 152 .
步骤051可以由默认值确定单元151实现,步骤052可以由修改单元152实现。也即是,默认值确定单元151用于根据目标意图和意图与默认 值映射表确定默认值;修改单元152用于根据目标刻度调节精度值修改默认值。Step 051 can be implemented by the default value determining unit 151 , and step 052 can be implemented by the modifying unit 152 . That is, the default value determination unit 151 is used to determine the default value according to the target intention and the mapping table between the intention and the default value; the modification unit 152 is used to modify the default value according to the target scale adjustment precision value.
根据目标意图和意图与默认值映射表确定默认值,也即是,如果用户的语音请求“音量大大大”的目标意图为将音量调大,则根据意图与默认值映射表可知,此时默认值可以为3,即语音请求模拟车辆零部件对音量进行调节时,每次调节3个刻度。The default value is determined according to the target intent and the intent-default value mapping table, that is, if the target intent of the user's voice request "Volume up" is to increase the volume, then according to the intent-default value mapping table, the default The value can be 3, that is, when the voice requests to simulate vehicle parts to adjust the volume, adjust 3 scales each time.
根据对用户语音请求“音量大大大”进行精度识别的结果可以为:识别得到目标刻度调节精度值为3,则根据目标刻度调节精度值修改默认值为3*3=9,即,修改后与用户语音请求“音量大大大”对应调节的刻度值为9。进而根据目标意图和修改后的默认值生成控制指令,在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精确调节与语音请求对应的车辆零部件的刻度的效果。The result of precision recognition based on the user's voice request "Volume is very loud" can be: the target scale adjustment precision value is recognized as 3, and then the default value is modified according to the target scale adjustment precision value to 3*3=9, that is, after modification, it is the same as The scale value corresponding to the user's voice request "Volume is louder" is 9. Then, control instructions are generated according to the target intention and the modified default value. In the case of integrating the traditional logic of the voice request, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized.
请参阅图18,本申请还提供一种服务器20。该服务器20包括处理器21和存储器22,存储器22上存储有计算机程序221,当计算机程序221被处理器21执行时,实现上述任意一个实施例中所述的语音交互方法。Please refer to FIG. 18 , the present application also provides a server 20 . The server 20 includes a processor 21 and a memory 22. A computer program 221 is stored on the memory 22. When the computer program 221 is executed by the processor 21, the voice interaction method described in any one of the above-mentioned embodiments is realized.
本申请的服务器20可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的目标意图和目标刻度调节精度值,然后对于默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精准调节与语音请求相对应的车辆零部件的刻度的效果,提升用户体验。The server 20 of the present application can perform voice recognition on the voice request for adjusting the preset function of the vehicle to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
请参阅图19,本申请还提供一种包含有计算机程序的非易失性计算机可读存储介质30。当计算机程序31被一个或多个处理器40执行时,实现上述任意实施条例的语音交互方法。Referring to FIG. 19 , the present application also provides a non-volatile computer-readable storage medium 30 containing a computer program. When the computer program 31 is executed by one or more processors 40, the voice interaction method of any of the above embodiments is realized.
例如,计算机程序31被处理器40执行时实现以下语音交互方法的步骤:For example, when the computer program 31 is executed by the processor 40, the steps of the following voice interaction method are realized:
01:对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能;01: Perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts;
02:利用意图识别模型对待识别文本进行意图识别;02: Use the intent recognition model to recognize the intent of the text to be recognized;
03:利用精度识别模型对待识别文本进行精度识别;03: Use the precision recognition model to perform precision recognition on the text to be recognized;
04:根据意图识别的结果和精度识别的结果,确定语音请求对应的目标意图和目标刻度调节精度值;04: According to the results of intent recognition and accuracy recognition, determine the target intent and target scale adjustment accuracy value corresponding to the voice request;
05:根据意图识别的结果确定语音请求对应的目标意图,和根据精度识别的结果确定语音请求对应的目标刻度调节精度值;05: Determine the target intent corresponding to the voice request according to the result of intent recognition, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of precision recognition;
06:将目标意图和修改后的默认值融合生成控制指令,以控制对应的车辆零部件。06: Fuse the target intent with the modified default value to generate control instructions to control the corresponding vehicle components.
可以理解地,计算机程序31包括计算机程序代码。计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读存储介质可以包括:能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、以及软件分发介质等。It can be understood that the computer program 31 includes computer program codes. The computer program code may be in source code form, object code form, executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random memory Access memory (RAM, Random Access Memory), and software distribution media, etc.
本申请的计算机可读存储介质30可以先对车辆预设功能调节的语音请求进行语音识别得到待识别文本,预设功能指模拟对车辆零部件的操作进行刻度调节的功能。然后利用意图识别模型对待识别文本进行意图识别,且利用精度识别模型对待识别文本进行精度识别,识别出语音请求对应的 目标意图和目标刻度调节精度值,然后对于默认值进行修改,从而在融合语音请求传统逻辑的情况下,实现根据用户精简语音请求精准调节与语音请求相对应的车辆零部件的刻度的效果,提升用户体验。The computer-readable storage medium 30 of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (17)

  1. 一种语音交互方法,其特征在于,包括:A voice interaction method, characterized in that, comprising:
    对车辆预设功能调节的语音请求进行语音识别得到待识别文本,所述预设功能指模拟对车辆零部件的操作进行刻度调节的功能;Carrying out speech recognition on the voice request for adjusting the preset function of the vehicle to obtain the text to be recognized, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts;
    利用意图识别模型对所述待识别文本进行意图识别;performing intent recognition on the text to be recognized by using an intent recognition model;
    利用精度识别模型对所述待识别文本进行精度识别;Using the precision recognition model to perform precision recognition on the text to be recognized;
    根据所述意图识别的结果确定所述语音请求对应的目标意图,和根据所述精度识别的结果确定所述语音请求对应的目标刻度调节精度值;determining the target intent corresponding to the voice request according to the result of the intent recognition, and determining the target scale adjustment precision value corresponding to the voice request according to the result of the precision recognition;
    根据所述目标意图和所述目标刻度调节精度值修改默认值,所述默认值为预设语音请求中所述目标意图对应的调节值;Modifying a default value according to the target intent and the target scale adjustment accuracy value, the default value being the adjustment value corresponding to the target intent in the preset voice request;
    将所述目标意图和修改后的所述默认值融合生成控制指令,以控制对应的车辆零部件。The target intention and the modified default value are fused to generate a control instruction to control corresponding vehicle components.
  2. 根据权利要求1所述的语音交互方法,其特征在于,所述语音交互方法包括:The voice interaction method according to claim 1, wherein the voice interaction method comprises:
    通过意图训练数据训练得到所述意图识别模型,所述意图训练数据与所述车辆零部件和所述车辆零部件的可调节范围相关。The intention recognition model is obtained by training the intention training data, and the intention training data is related to the vehicle component and the adjustable range of the vehicle component.
  3. 根据权利要求1所述的语音交互方法,其特征在于,所述语音交互方法包括:The voice interaction method according to claim 1, wherein the voice interaction method comprises:
    通过精度训练数据训练得到所述精度识别模型,所述精度训练数据与所述车辆零部件、所述车辆零部件的可调节范围和所述车辆零部件的刻度调节精度范围相关。The accuracy identification model is obtained by training the accuracy training data, and the accuracy training data is related to the vehicle component, the adjustable range of the vehicle component, and the scale adjustment accuracy range of the vehicle component.
  4. 根据权利要求1所述的语音交互方法,其特征在于,所述语音交互方法包括:确定所述车辆零部件的控制范围及非控制范围。The voice interaction method according to claim 1, characterized in that the voice interaction method comprises: determining a control range and a non-control range of the vehicle components.
  5. 根据权利要求4所述的语音交互方法,其特征在于,所述语音交互方法包括:确定每个所述车辆零部件的默认调节范围。The voice interaction method according to claim 4, characterized in that the voice interaction method comprises: determining a default adjustment range of each of the vehicle components.
  6. 根据权利要求5所述的语音交互方法,其特征在于,所述语音交互方法包括:The voice interaction method according to claim 5, wherein the voice interaction method comprises:
    确定所述车辆零部件的可调节范围;determining the adjustable range of said vehicle component;
    根据所述车辆零部件的可调节范围,纠正所述预设语音请求的意图。Correcting the intention of the preset voice request according to the adjustable range of the vehicle component.
  7. 根据权利要求6所述的语音交互方法,其特征在于,所述语音交互方法包括:The voice interaction method according to claim 6, wherein the voice interaction method comprises:
    将所述控制范围和所述可调节范围映射到预设意图和对应的预设刻度调节精度值。The control range and the adjustable range are mapped to preset intentions and corresponding preset scale adjustment accuracy values.
  8. 根据权利要求7所述的语音交互方法,其特征在于,所述语音交互方法包括:The voice interaction method according to claim 7, wherein the voice interaction method comprises:
    根据所述预设意图和所述默认调节范围,建立意图与默认值映射表。An intent-default value mapping table is established according to the preset intent and the default adjustment range.
  9. 根据权利要求8所述的语音交互方法,其特征在于,所述根据所述目标意图和所述目标刻度调节精度值修改默认值,包括:The voice interaction method according to claim 8, wherein said modifying a default value according to said target intent and said target scale adjustment accuracy value comprises:
    根据所述目标意图和所述意图与默认值映射表确定所述默认值;determining the default value according to the target intent and the intent-default value mapping table;
    根据所述目标刻度调节精度值修改所述默认值。The default value is modified according to the target scale adjustment accuracy value.
  10. 根据权利要求8所述的语音交互方法,其特征在于,所述将所述控制范围和所述可调节范围映射到预设意图和对应的预设刻度调节精度值,包括:The voice interaction method according to claim 8, wherein the mapping the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values includes:
    将所述控制范围内每个所述可调节范围,映射到一个所述预设意图,每个所述预设意图对应多个预设刻度调节精度值。Each of the adjustable ranges within the control range is mapped to one of the preset intentions, and each of the preset intentions corresponds to a plurality of preset scale adjustment accuracy values.
  11. 根据权利要求10所述的语音交互方法,其特征在于,所述将所述控制范围和所述可调节范围映射到预设意图和对应的预设刻度调节精度 值,包括:The voice interaction method according to claim 10, wherein the mapping of the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values includes:
    将精简词设置为槽位,对所述车辆零部件对应的预设识别文本进行槽位提取得到重复字段;Setting the simplified words as slots, and extracting the slots from the preset recognition text corresponding to the vehicle parts to obtain repeated fields;
    对重复字段的槽值进行重复统计得到重复数量;Perform repeated statistics on the slot value of the repeated field to obtain the number of repetitions;
    根据所述精简词可调节的范围将所述重复数量映射到所述预设刻度调节精度值。The repetition quantity is mapped to the preset scale adjustment accuracy value according to the adjustable range of the reduced word.
  12. 根据权利要求11所述的语音交互方法,其特征在于,所述预设意图为多个,所述根据所述意图识别的结果确定所述语音请求对应的目标意图,包括:The voice interaction method according to claim 11, wherein there are multiple preset intentions, and determining the target intention corresponding to the voice request according to the result of the intention recognition includes:
    获取所述意图识别的结果对应各个预设意图的意图判别概率;Obtaining the intention discrimination probability corresponding to each preset intention from the result of the intention recognition;
    将所述意图判别概率大于第一概率阈值的一个所述预设意图确定为所述语音请求对应的目标意图。Determining one of the preset intentions whose intention discrimination probability is greater than a first probability threshold as the target intention corresponding to the voice request.
  13. 根据权利要求12所述的语音交互方法,其特征在于,所述预设意图包括:音量调大、音量调小、风量调大、风量调小、温度调高、温度调低、地图放大、地图缩小、屏幕调亮、屏幕调暗、屏幕上滑、屏幕下滑、仪表调亮、仪表调暗、氛围灯调亮、氛围灯调暗、座椅向前、座椅向后、座椅升高、座椅降低、椅背向前、椅背向后、车窗上升和车窗下降中的至少一种。The voice interaction method according to claim 12, wherein the preset intention includes: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map Zoom out, screen brighter, screen dim, screen slide up, screen slide down, instrument brighten, instrument dim, ambient light brighten, ambient light dim, seat forward, seat back, seat up, At least one of seat lowering, seat back forward, seat back rearward, window up, and window down.
  14. 根据权利要求12所述的语音交互方法,其特征在于,所述根据所述精度识别的结果确定所述语音请求对应的目标刻度调节精度值,包括:The voice interaction method according to claim 12, wherein said determining the target scale adjustment precision value corresponding to the voice request according to the result of the precision recognition comprises:
    获取所述精度识别的结果对应各个预设刻度调节精度值的精度判别概率;Acquiring the accuracy identification probability corresponding to each preset scale adjustment accuracy value of the accuracy identification result;
    将所述精度判别概率大于第二概率阈值的一个所述预设刻度调节精度值,确定为所述语音请求对应的目标刻度调节精度值。A preset scale adjustment accuracy value whose accuracy discrimination probability is greater than a second probability threshold is determined as a target scale adjustment accuracy value corresponding to the voice request.
  15. 一种语音交互装置,其特征在于,所述语音交互装置包括:A voice interaction device, characterized in that the voice interaction device includes:
    语音识别模块,所述语音识别模块用于对车辆预设功能调节的语音请求进行语音识别得到待识别文本,所述预设功能指模拟对车辆零部件的操作进行刻度调节的功能;A speech recognition module, the speech recognition module is used to perform speech recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized, and the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts;
    意图识别模块,所述意图识别模块用于利用意图识别模型对所述待识别文本进行意图识别;An intent recognition module, configured to use an intent recognition model to perform intent recognition on the text to be recognized;
    精度识别模块,所述精度识别模块用于利用精度识别模型对所述待识别文本进行精度识别;an accuracy identification module, the accuracy identification module is used to perform accuracy identification on the text to be identified by using an accuracy identification model;
    确定模块,所述确定模块用于根据所述意图识别的结果确定所述语音请求对应的目标意图,和根据所述精度识别的结果确定所述语音请求对应的目标刻度调节精度值;A determination module, configured to determine the target intent corresponding to the voice request according to the result of the intention recognition, and determine the target scale adjustment precision value corresponding to the voice request according to the result of the precision recognition;
    修改模块,所述修改模块用于根据所述目标意图和所述目标刻度调节精度值修改默认值,所述默认值为预设语音请求中所述目标意图对应的调节值;A modification module, configured to modify a default value according to the target intent and the target scale adjustment accuracy value, the default value being the adjustment value corresponding to the target intent in the preset voice request;
    指令生成模块,所述指令生成模块用于将所述目标意图和修改后的所述默认值融合生成控制指令,以控制对应的车辆零部件。An instruction generating module, the instruction generating module is used to fuse the target intention and the modified default value to generate a control instruction to control corresponding vehicle components.
  16. 一种服务器,其特征在于,所述服务器包括处理器和存储器,所述存储器上存储有计算机程序,当所述计算机程序被所述处理器执行时,实现权利要求1-14任一项所述的语音交互方法。A server, characterized in that the server includes a processor and a memory, and a computer program is stored on the memory, and when the computer program is executed by the processor, the computer program described in any one of claims 1-14 is realized. voice interaction method.
  17. 一种包含有计算机程序的非易失性计算机可读存储介质,其特征在于,当所述计算机程序被一个或多个处理器执行时,实现权利要求1-14任一项所述的语音交互方法。A non-volatile computer-readable storage medium containing a computer program, characterized in that, when the computer program is executed by one or more processors, the voice interaction described in any one of claims 1-14 is realized method.
PCT/CN2022/138930 2021-12-24 2022-12-14 Voice interaction method and apparatus, server, and readable storage medium WO2023116523A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111593401.9A CN113990298B (en) 2021-12-24 2021-12-24 Voice interaction method and device, server and readable storage medium
CN202111593401.9 2021-12-24

Publications (1)

Publication Number Publication Date
WO2023116523A1 true WO2023116523A1 (en) 2023-06-29

Family

ID=80081347

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138930 WO2023116523A1 (en) 2021-12-24 2022-12-14 Voice interaction method and apparatus, server, and readable storage medium

Country Status (2)

Country Link
CN (1) CN113990298B (en)
WO (1) WO2023116523A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990298B (en) * 2021-12-24 2022-05-13 广州小鹏汽车科技有限公司 Voice interaction method and device, server and readable storage medium
CN115268324A (en) * 2022-07-25 2022-11-01 青岛海尔科技有限公司 Instruction correction method and apparatus, storage medium, and electronic apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020036766A1 (en) * 2018-08-14 2020-02-20 Reading Research Associates, Inc. Methods and systems for improving mastery of phonics skills
CN112185369A (en) * 2019-07-04 2021-01-05 百度在线网络技术(北京)有限公司 Volume adjusting method, device, equipment and medium based on voice control
CN113220839A (en) * 2021-05-13 2021-08-06 湖北亿咖通科技有限公司 Intention identification method, electronic equipment and computer readable storage medium
CN113990298A (en) * 2021-12-24 2022-01-28 广州小鹏汽车科技有限公司 Voice interaction method and device, server and readable storage medium thereof

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7610011B2 (en) * 2004-09-19 2009-10-27 Adam Albrett Providing alternative programming on a radio in response to user input
CN102087782A (en) * 2010-11-29 2011-06-08 青岛海信信芯科技有限公司 Method for enabling remote controller to transmit wireless signals continuously and remote controller
CN102831894B (en) * 2012-08-09 2014-07-09 华为终端有限公司 Command processing method, command processing device and command processing system
US10074367B2 (en) * 2014-03-28 2018-09-11 Panasonic Intellectual Property Management Co., Ltd. Voice command input device and voice command input method
CN103941686B (en) * 2014-04-14 2017-06-13 广东美的制冷设备有限公司 Sound control method and system
CN105578274A (en) * 2015-12-23 2016-05-11 Tcl集团股份有限公司 Smart television volume adjusting method and apparatus
JP6767796B2 (en) * 2016-07-08 2020-10-14 株式会社日立情報通信エンジニアリング Call management system and its voice recognition control method
CN107672547B (en) * 2017-10-10 2020-09-18 新昌县捷庭科技有限公司 New energy automobile voice control method and device, mobile terminal and storage medium
CN108040171A (en) * 2017-11-30 2018-05-15 北京小米移动软件有限公司 Voice operating method, apparatus and computer-readable recording medium
CN109920427A (en) * 2019-04-23 2019-06-21 上海天诚通信技术股份有限公司 Volume adjusting method based on voice control
CN110047486A (en) * 2019-05-20 2019-07-23 合肥美的电冰箱有限公司 Sound control method, device, server, system and storage medium
CN110265015A (en) * 2019-06-24 2019-09-20 付金龙 A kind of method, system and translator by voice control volume
US20230093165A1 (en) * 2020-03-23 2023-03-23 Sony Group Corporation Information processing apparatus, information processing method, and program
CN112786039B (en) * 2020-10-21 2022-12-13 青岛经济技术开发区海尔热水器有限公司 Voice control method and device, electronic equipment and storage medium
CN113990299B (en) * 2021-12-24 2022-05-13 广州小鹏汽车科技有限公司 Voice interaction method and device, server and readable storage medium thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020036766A1 (en) * 2018-08-14 2020-02-20 Reading Research Associates, Inc. Methods and systems for improving mastery of phonics skills
CN112185369A (en) * 2019-07-04 2021-01-05 百度在线网络技术(北京)有限公司 Volume adjusting method, device, equipment and medium based on voice control
CN113220839A (en) * 2021-05-13 2021-08-06 湖北亿咖通科技有限公司 Intention identification method, electronic equipment and computer readable storage medium
CN113990298A (en) * 2021-12-24 2022-01-28 广州小鹏汽车科技有限公司 Voice interaction method and device, server and readable storage medium thereof

Also Published As

Publication number Publication date
CN113990298B (en) 2022-05-13
CN113990298A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
WO2023116523A1 (en) Voice interaction method and apparatus, server, and readable storage medium
WO2023116500A1 (en) Speech interaction method and apparatus, server, and readable storage medium
WO2023124957A1 (en) Voice interaction method and apparatus, and server and readable storage medium
WO2023125002A1 (en) Voice interaction method and apparatus, model training method, vehicle and storage medium
CN109583501A (en) Picture classification, the generation method of Classification and Identification model, device, equipment and medium
WO2022057152A1 (en) Voice interaction method, server, and computer-readable storage medium
KR20120012919A (en) Apparatus for voice command recognition and method thereof
US20200210770A1 (en) Image realism predictor
CN113111968B (en) Image recognition model training method, device, electronic equipment and readable storage medium
WO2023130951A1 (en) Speech sentence segmentation method and apparatus, electronic device, and storage medium
CN116028821B (en) Pre-training model training method integrating domain knowledge and data processing method
CN114049894A (en) Voice interaction method and device, vehicle and storage medium
CN110309727B (en) Building identification model establishing method, building identification method and building identification device
CN114360518A (en) Voice interaction method and device, server and readable storage medium thereof
CN114299929A (en) Voice interaction method and device, server and storage medium
CN115994225B (en) Text classification method and device, storage medium and electronic equipment
CN116705018A (en) Voice control method, voice control device, electronic equipment and readable storage medium
CN115904075A (en) Vehicle configuration improvement method, system, device and storage medium
CN114005448A (en) Voice interaction method and device, model training method, vehicle and storage medium
CN115512696A (en) Simulation training method and vehicle
CN114360519A (en) Voice interaction method and device, server and readable storage medium thereof
CN111179284B (en) Interactive image segmentation method, system and terminal
CN114341867B (en) Translation method, translation device, translation client, translation server and translation storage medium
CN114048296A (en) Semantic gate-based chatting type multi-round conversation method, system, medium and equipment
CN114299931A (en) Voice interaction method and device, server and readable storage medium thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909837

Country of ref document: EP

Kind code of ref document: A1