WO2023116523A1

WO2023116523A1 - Voice interaction method and apparatus, server, and readable storage medium

Info

Publication number: WO2023116523A1
Application number: PCT/CN2022/138930
Authority: WO
Inventors: 王亭玉; 张天宇; 宁洪珂; 潘晓彤; 赵恒艺; 赵群; 樊骏锋
Original assignee: 广州小鹏汽车科技有限公司
Priority date: 2021-12-24
Filing date: 2022-12-14
Publication date: 2023-06-29
Also published as: CN113990298B; CN113990298A

Abstract

The present application discloses a voice interaction method and apparatus, a server, and a readable storage medium. The voice interaction method comprises: performing voice recognition on a voice request for adjusting a preset function of a vehicle to obtain text to be recognized, the preset function referring to a function of simulating an operation on a vehicle part for scale adjustment; using an intention recognition model to perform intention recognition on said text; using a precision recognition model to perform precision recognition on said text; determining, according to the intention recognition result and the precision recognition result, a target intention and a target scale adjustment precision value corresponding to the voice request; modifying a default value according to the target intention and the target scale adjustment precision value, the default value being an adjustment value corresponding to the target intention in a preset voice request; and fusing the target intention and the modified default value to generate a control instruction so as to control the corresponding vehicle part. In the present application, the scale of the vehicle part corresponding to the voice request can be accurately adjusted according to the simplified voice request of a user, and user experience is improved.

Description

Voice interaction method and its device, server and readable storage medium

This application claims the priority of the Chinese patent application with the application number 202111593401.9 and the application name "Voice Interaction Method and Its Device, Server and Readable Storage Medium" submitted to the State Intellectual Property Office on December 24, 2021, the entire content of which Incorporated in this application by reference.

technical field

The present application relates to the field of voice technology, in particular to a voice interaction method and its device, server and readable storage medium.

Background technique

At present, in the smart car scene, there is a voice interaction that can realize the user's control of the vehicle parts and equipment.

Under the user's requirement for simplified instructions, in the logic of the current technical solution, under the voice request of "great volume", the intention is recognized as "system_volume_up", and the default value corresponding to the intention is 1 gear (for example, corresponding to 3 small scales), Then the vehicle executes the command "increase the volume by 1 scale", which is the same as the implementation logic of non-precision voice requests such as "increase the volume" and "make the volume louder", which obviously does not match the user's expectation of increasing the three gears. Under multiple large streamlined instructions of "Volume is big, big, big, big", the intention is identified as "system_volume_max", and the corresponding default value under this intention is the highest gear (maximum scale), then the vehicle executes the command of "set the volume to the maximum scale" , which obviously does not match the 7 gears that users expect.

The above two kinds of voice requests cannot execute precise control instructions according to the shortened word voice request issued by the user in the current technical solution, and the user experience is not good.

Contents of the invention

In order to solve or partly solve the problems existing in related technologies, the present application provides a voice interaction method and its device, server and readable storage medium.

The present application provides a voice interaction method. The voice interaction method includes: performing voice recognition on the voice request adjusted by the vehicle preset function to obtain the text to be recognized, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; performing intent recognition on the text; using an accuracy recognition model to perform accuracy recognition on the text to be recognized; determining the target intent corresponding to the voice request according to the result of the intent recognition, and determining the corresponding target intent of the voice request according to the result of the accuracy recognition adjust the accuracy value of the target scale; modify the default value according to the target intent and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intent in the preset voice request; combine the target intent and the modified The latter default values are fused to generate control instructions to control corresponding vehicle components.

In this way, the voice interaction method of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the precision value of the target scale, and then modify the default value, so that the fusion voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

The voice interaction method includes: obtaining the intention recognition model by training intention training data, and the intention training data is related to vehicle components and adjustable ranges of the vehicle components.

In this way, the voice interaction method of the present application can obtain an intention recognition model through training the intention training data, and perform intention recognition according to the intention recognition model, so as to accurately recognize the intention of the user instruction.

The voice interaction method includes: obtaining the precision recognition model by training the precision training data, the precision training data and the vehicle parts, the adjustable range of the vehicle parts and the scale adjustment accuracy of the vehicle parts range dependent.

In this way, by performing precision recognition on the text to be recognized according to the precision recognition model, the scale adjustment precision corresponding to the voice request can be determined.

The voice interaction method includes: determining the control range and non-control range of the vehicle components.

In this way, the function that can be adjusted by the scale of the vehicle component is confirmed, so as to determine the control range of the vehicle component, that is, the control range that can be scaled by voice interaction.

The voice interaction method includes: determining a default adjustment range of each of the vehicle components.

In this way, the voice interaction method of the present application can determine the default adjustment range of each vehicle component, thereby laying a foundation for realizing precise adjustment of vehicle components.

The voice interaction method includes: determining the adjustable range of the vehicle component; and correcting the intention of the preset voice request according to the adjustable range of the vehicle component.

In this way, the voice interaction method of the present application can correct the intention of the preset voice request according to the adjustable range of the vehicle component after determining the adjustable range of the vehicle component, so as to achieve the purpose of real precise adjustment in user instructions.

The voice interaction method includes: mapping the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values.

In this way, the voice interaction method of the present application can map the control range and adjustable range to preset intentions and corresponding preset scale adjustment accuracy values, so as to achieve precise adjustment of vehicle component accuracy.

The voice interaction method includes: establishing an intent-default value mapping table according to the preset intent and the default adjustment range.

In this way, the present application establishes a mapping table of intentions and default values, so that the intentions of the voice request can be in one-to-one correspondence with the default values, which facilitates subsequent modification of the default values.

The modifying the default value according to the target intent and the target scale adjustment accuracy value includes: determining the default value according to the target intent and the intent-default value mapping table; adjusting the accuracy value modification according to the target scale The default value.

In this way, the present application can determine the default value according to the target intention and the intention-default value mapping table, thereby modifying the default value according to the target scale adjustment accuracy value, so as to achieve the effect of correcting the user's intention to precisely adjust the scale of the vehicle parts.

The mapping of the control range and the adjustable range to a preset intention and a corresponding preset scale adjustment accuracy value includes: mapping each of the adjustable ranges in the control range to one of the preset As for the diagram, each preset intention corresponds to a plurality of preset scale adjustment accuracy values.

In this way, during the voice interaction process, the voice requests for different adjustment scales of the same vehicle component all correspond to the same preset intention, thereby laying the foundation for subsequent identification of the scale that the user intends to adjust.

The mapping of the control range and the adjustable range to the preset intent and the corresponding preset scale adjustment accuracy value includes: setting the simplified words as slots, and setting the preset recognition text corresponding to the vehicle parts Perform slot extraction to obtain repeated fields; perform repeated statistics on the slot values of repeated fields to obtain the repeated number; map the repeated number to the preset scale adjustment corresponding to the preset intention according to the adjustable range of the simplified word precision value.

In this way, the repeated statistics of the slot values of the extracted repeated fields can be repeated to obtain the repeated number, and the repeated number can be mapped to the preset intention and the preset scale adjustment accuracy value, so as to realize the accurate adjustment of the vehicle parts required by the user according to the simplified words scale.

There are multiple preset intentions, and the determining the target intention corresponding to the voice request according to the result of the intention recognition includes: obtaining the intention discrimination probability corresponding to each preset intention from the result of the recognition of the intention; One of the preset intentions whose intention discrimination probability is greater than the first probability threshold is determined as the target intention corresponding to the voice request.

In this way, it is possible to obtain the intention discrimination probability corresponding to each preset intention from the result of intention recognition, and determine a preset intention whose intention discrimination probability is greater than the first probability threshold as the target intention corresponding to the voice request, thereby realizing accurate identification and adjustment of vehicle components. User intent requirements.

The preset intentions include: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map zoom out, screen brighter, screen darker, screen slide up, screen Down, Gauges brighter, Gauges dimmed, Ambient lights up, Ambient lights dimmed, Seat forward, Seat back, Seat up, Seat down, Seat back forward, Seat back back, Car at least one of window up and window down.

In this way, setting a variety of preset intentions can further lay a foundation for recognizing the user's voice interaction intention and improve possible voice interaction scenarios.

The determining the target scale adjustment accuracy value corresponding to the voice request according to the accuracy identification result includes: obtaining the accuracy discrimination probability of each preset scale adjustment accuracy value corresponding to the accuracy identification result; One of the preset scale adjustment accuracy values greater than the second probability threshold is determined as a target scale adjustment accuracy value corresponding to the voice request.

In this way, the voice interaction method of the present application can obtain the precision discrimination probability corresponding to each preset scale adjustment precision value of the precision recognition result, and determine the preset scale adjustment precision value whose precision discrimination probability is greater than the second probability threshold value as the target scale adjustment precision value, This allows for precise scale adjustments.

The present application also provides a voice interaction device. The voice interaction device includes a voice recognition module, an intent recognition module, an accuracy recognition module, a determination module, a modification module and an instruction generation module. The speech recognition module is used to perform speech recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; Use the intent recognition model to perform intent recognition on the text to be recognized; the accuracy recognition module is used to use the precision recognition model to perform precision recognition on the text to be recognized; the determination module is used to determine the target according to the result of the intent recognition The target intent corresponding to the voice request, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of the accuracy recognition; the modification module is used to modify the default according to the target intent and the target scale adjustment accuracy value value, the default value is the adjustment value corresponding to the target intention in the preset voice request; the instruction generation module is used to fuse the target intention and the modified default value to generate a control instruction to control the corresponding Vehicle parts.

In this way, the voice interaction device of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

The application also provides a server. The server includes a processor and a memory, and a computer program is stored in the memory. When the computer program is executed by the processor, the voice interaction method described in any one of the above-mentioned implementation manners is realized.

In this way, the vehicle of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

The present application also provides a non-volatile computer-readable storage medium containing the computer program. When the computer program is executed by one or more processors, the voice interaction method described in any one of the above implementation manners is realized.

In this way, the computer-readable storage medium of the present application can perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Description of drawings

The above and other objects, features and advantages of the present application will become more apparent by describing the exemplary embodiments of the present application in more detail with reference to the accompanying drawings, wherein, in the exemplary embodiments of the present application, the same reference numerals generally represent same parts.

Fig. 1 is one of the schematic flow charts of the voice interaction method of the present application;

FIG. 2 is one of the structural schematic diagrams of the voice interaction device of the present application;

FIG. 3 is the second schematic flow diagram of the voice interaction method of the present application;

Fig. 4 is the second structural diagram of the voice interaction device of the present application;

Fig. 5 is the third schematic flow diagram of the voice interaction method of the present application;

FIG. 6 is the fourth schematic flow diagram of the voice interaction method of the present application;

Fig. 7 is the fifth schematic flow diagram of the voice interaction method of the present application;

FIG. 8 is the sixth schematic flow diagram of the voice interaction method of the present application;

Fig. 9 is the third structural diagram of the voice interaction device of the present application;

FIG. 10 is the seventh schematic flow diagram of the voice interaction method of the present application;

Fig. 11 is the fourth structural schematic diagram of the voice interaction device of the present application;

FIG. 12 is the eighth schematic flow diagram of the voice interaction method of the present application;

Fig. 13 is one of the structural schematic diagrams of the first determination module in the voice interaction device of the present application;

FIG. 14 is the ninth schematic flow diagram of the voice interaction method of the present application;

Fig. 15 is the second structural schematic diagram of the first determination module in the voice interaction device of the present application;

FIG. 16 is the tenth schematic flow diagram of the voice interaction method of the present application;

Fig. 17 is a schematic structural diagram of the modification module in the voice interaction device of the present application;

Fig. 18 is a schematic structural diagram of the server of the present application;

Fig. 19 is a schematic structural diagram of a computer-readable storage medium of the present application.

Detailed ways

The present application is described in detail below, and examples of the present application are shown in the accompanying drawings, wherein the same or similar reference numerals represent the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, are only for explaining the present application, and should not be construed as limiting the present application.

Please refer to FIG. 1 , the present application provides a voice interaction method. The voice interaction method includes:

01: Perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts;

02: Use the intent recognition model to recognize the intent of the text to be recognized;

03: Use the precision recognition model to perform precision recognition on the text to be recognized;

04: Determine the target intent corresponding to the voice request according to the result of intent recognition, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of precision recognition;

05: Modify the default value according to the target intent and target scale adjustment accuracy value, the default value is the adjustment value corresponding to the target intent in the preset voice request;

06: Fuse the target intent with the modified default value to generate control instructions to control the corresponding vehicle components.

Referring to FIG. 2 , the present application also provides a voice interaction device 10 . The voice interaction device 10 includes: a voice recognition module 11 , an intention recognition module 12 , an accuracy recognition module 13 , a first determination module 14 , a modification module 15 and an instruction generation module 16 .

Step 01 can be realized by the speech recognition module 11, step 02 can be realized by the intent recognition module 12, step 03 can be realized by the accuracy recognition module 13, step 04 can be realized by the first determination module 14, and step 05 can be realized by the modification module 15, Step 06 can be realized by the instruction generation module 16 . That is to say, the speech recognition module 11 is used to carry out speech recognition to the voice request of the vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts; For using the intent recognition model to identify the text to be recognized; the accuracy identification module 13 is used to identify the accuracy of the text to be recognized by using the accuracy identification model; the first determination module 14 is used to determine the target intent corresponding to the voice request according to the result of the intent recognition, and Determine the target scale adjustment accuracy value corresponding to the voice request according to the result of the accuracy recognition; the modification module 15 is used to modify the default value according to the target intent and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intent in the preset voice request; instruction The generation module 16 is used to fuse the target intent and the modified default value to generate control instructions to control corresponding vehicle components.

The voice request for vehicle preset function adjustment can be, for example, "the screen is bright", "the volume is louder", "the screen is brighter", "the air volume of the air conditioner is louder", "rear behind the seat", that is, Voice requests with shortened words. Among them, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts, wherein the vehicle parts may refer to components such as mechanical knobs or buttons, which are vehicle parts that can adjust the scale.

First, after receiving the user's voice request for vehicle preset function adjustment, voice recognition is performed through voice recognition technology, and the text to be recognized is obtained for subsequent processing. , get the text to be recognized "the screen is bright bright".

It is understandable that in the actual interactive environment, the text to be recognized after speech recognition may not be clear and accurate due to the limitation of the vehicle hardware, or because of network instability, colloquial or dialect-based user expressions, etc. Handle some routine text error correction, such as correcting "the volume is deep and deep" to "the volume is increasing and increasing", and removing some meaningless words, such as "ah", "please", etc.

Next, use the intent recognition model to perform intent recognition on the text to be recognized, and use the precision recognition model to perform precision recognition on the text to be recognized. The text to be recognized can determine the user's intention and precision through intent recognition and precision recognition.

Then, according to the result of the intention recognition and the result of the precision recognition, determine the target intention and the target scale adjustment precision value corresponding to the voice request. For example, the target intent and target scale adjustment accuracy value corresponding to the voice request are determined according to the result of the intent recognition. For example, according to the result of the intention recognition of the voice request "bright screen", it is determined that the corresponding target intent is to brighten the display brightness of the in-vehicle screen, and the target scale adjustment accuracy value corresponding to the voice request "bright screen" is 3, indicating that the Brighten 3 levels.

Next, modify the default value according to the target intention and the target scale adjustment accuracy value, and the default value is the adjustment value corresponding to the target intention in the preset voice request.

Among them, it is understandable that in the traditional logic of the current technical solution, under the voice request of "great volume", the intention is recognized as "system_volume_up" according to the traditional logic, and the intention is adjusted to 3 scales by default each time, corresponding to a default value of 3 , the vehicle executes the command "increase the volume by 3 scales", which is the same as the implementation logic of non-precision voice requests such as "increase the volume" and "make the volume louder". "Volume greatly greatly greatly greatly" under multiple large streamlined voice requests, the intent is identified as "system_volume_max", and the corresponding default value under this intent is the highest gear or the maximum scale, then the vehicle executes the command of "set the volume to the maximum scale".

That is, the default value is the adjustment value corresponding to the target intention in the preset voice request confirmed according to the original logic. Wherein, the preset voice request may refer to user voice requests such as "volume up" and "volume down". According to the traditional identification logic, the adjustment value corresponding to the target intention of "volume increase" is increased once, that is, the default value can correspond to the specific scale value of each adjustment, for example, it corresponds to 3 small scales. According to the traditional recognition logic, the adjustment value corresponding to the target intention of "volume down" is lowered once, that is, the default value can correspond to 3 small scales. That is, the default value at this time is: default value=3.

However, under the precision logic of accurate identification of simplified instructions, the voice request of "increasing the volume" means that the user intends to "increase the volume", and the user's desired volume is adjusted three times, and the default value of each adjustment is 3. In this case, the user actually wants to adjust the volume by 9 different scales. That is to say, under the precision logic of precise recognition of the simplified instructions, the voice request of "very louder" means that the user wants to increase the volume by 9 scales. scale. Correspondingly, under multiple "large" streamlined voice requests in "volume greatly greatly greatly greatly", the user's intention is to increase the volume, and the scale adjustment accuracy value is 7, that is, the volume is adjusted 7 times, and the end user wants to increase the volume Turn up 27 scales.

In this way, while keeping the traditional logic adjusted by the default value, the target scale adjustment precision value identified in the precision logic is used to modify the default value of the traditional logic, so as to realize the precision of vehicle parts under the joint action of traditional logic and precision logic. control.

For example, the target intention corresponding to the user instruction "Volume up greatly" is to increase the volume. In the case of identifying the target scale adjustment accuracy, it can be recognized that the target scale adjustment accuracy value is 3, that is, the volume is increased 3 times, and according to Modify the default value to obtain the modified adjustment scale: default_value'=scale value*default_value=3*3=9. Modify the default value to 9 according to the user's voice request to increase by 3 times. That is to say, under the new requirement of controlling vehicle parts and making precise adjustments based on voice requests with simplified words, the voice interaction method of this application will not destroy the original implementation logic of non-precision voice requests at all, and can be realized under the traditional logic framework It provides the function of controlling vehicle components and making precise adjustments according to voice requests with abbreviated words.

Finally, the target intent and the modified default values are fused to generate control instructions to control the corresponding vehicle components.

In this way, the voice interaction method and device of the present application can perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

Please refer to Figure 3, voice interaction methods include:

001: Determine the control range and non-control range of vehicle parts.

Please refer to FIG. 4 , the voice interaction device 10 further includes a second determination module 101 .

Step 001 can be implemented by the second determining module 101 . It can be understood that the second determination module 101 is used to determine the control range and non-control range of vehicle components.

It is understandable that not all functions of the vehicle can, can or need to be adjusted on a precise scale. For example, the movement of the seat in various directions can be adjusted by vehicle components. The car door does not have vehicle components such as knobs and buttons to achieve scale adjustment, but is usually only opened and closed through the door handle. Therefore, the seat adjustment belongs to the control range of vehicle components, while the door adjustment belongs to the non-control range of vehicle components.

Obtain the information of the vehicle parts, and according to the information of the vehicle parts, determine the hardware that can be adjusted through the vehicle parts, and determine it as the control range of the vehicle parts, and determine the hardware that cannot be adjusted through the vehicle parts as the non-control range .

First, determine the vehicle parts that can be scaled on the vehicle, such as: "volume knob", "screen brightness button", "air conditioning air volume knob/button", "seat adjustment knob/button", etc. Further, determining the control range of vehicle components may include: car audio, screens in the vehicle, vehicle air conditioners, vehicle seats, ambient lights in the car, lights outside the vehicle, or windows, etc. The non-control range of vehicle components can include: doors, rearview mirrors, trunks, etc.

In the process of subsequent voice interaction, voice prompts can be performed when the voice request is directed to the non-control range of the vehicle components.

In this way, by collecting vehicle component information and confirming the functions that can be scaled through vehicle components, the control range of vehicle components can be determined, that is, the control range that can be scaled through voice interaction.

Voice interaction methods include:

002: Determine the default adjustment range for each vehicle component.

Step 002 can be implemented by the second determining module 101 . That is, the second determination module 101 is used to determine the default adjustment range of each vehicle component.

Determines the default adjustment range for a vehicle component. For example, when the device that needs to be adjusted is a car stereo, the voice request simulates the default value of each adjustment of the volume of the vehicle component control volume can be 3, if the corresponding volume adjustment vehicle components have a total of 60 scales, the default adjustment range is 1 ~20.

Voice interaction methods include:

003: Determine the adjustable range of vehicle components;

004: Correct the intention of the preset voice request according to the adjustable range of the vehicle components.

The voice interaction device 10 also includes a correction module 102 .

Step 003 can be implemented by the second determining module 101 , and step 004 can be implemented by the correcting module 102 . That is, the second determining module 101 is used to determine the adjustable range of the vehicle component; the correcting module 102 is used to correct the intention of the preset voice request according to the adjustable range of the vehicle component.

It can be understood that after the control range and non-control range of the vehicle components are determined, an adjustable range needs to be determined for each vehicle component in the control range. The adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component. Corresponding to different vehicle components, the adjustable range can be gear position or range. For example, if the screen brightness button is continuously pressed 5 times in total, and the screen brightness is sequentially adjusted from 1 to 5 gears to the maximum brightness, then the adjustable range of the screen brightness button is 1 to 5 gears. As another example, if the total scale value of the knob for adjusting the seat forward and backward is 90, then the adjustable range of the seat adjustment knob is scale value 1-90.

According to the adjustable range of the vehicle components, under the traditional logic, the simplified voice request of "loud volume" is recognized as "maximum" and "minimum". The corresponding major and minor intents.

In this way, the purpose of real precise adjustment in user instructions can be achieved on the basis of the original traditional logic.

Please refer to Fig. 5, step 003 includes:

0031: Determine the adjustable range of vehicle parts corresponding to the shortened words.

Step 0131 can be implemented by the second determining module 101 . That is, the second determination module 101 is used to determine the adjustable range of the vehicle components corresponding to each simplified word.

The reduced word refers to the simplified word used by the user and can accurately represent the degree of adjustment. For example, redundancies can be used as the reduced word, so that the user only needs to input the simplified voice request when inputting the voice request. For example, the brightness adjustment of the car display screen can be simplified as "the screen is bright", "the screen is bright", "the screen is dark" and "the screen is dark"... The volume adjustment of the car audio can be simplified as "the volume is louder", "Volume is very large", "Volume is small" and "Volume is small"..., the air volume adjustment of the air conditioner can be concisely expressed as "air volume is large", "air volume is large and large", "air volume is small" and "air volume is small "... Of course, the shortened words can be repeated words that users are used to, such as "brighter", "darker", "bigger" and "smaller". ", "the screen is darker and darker", "the volume is louder and louder" and "the volume is lower and smaller", etc., are not specifically limited here.

The adjustable range corresponding to the simplified word can be determined according to the adjustable range of the vehicle components. For example, when adjusting the screen in a vehicle, the screen brightness corresponds to an adjustable range of 1 to 5 gears. During speech recognition, each voice request related to the brightness can recognize up to 5 simplified words, and the simplified words can be adjusted. The range can be 1-5. When the voice request includes multiple condensed words, each condensed word can adjust the brightness of the screen by 1 gear.

For another example, when adjusting the car audio, the volume can be adjusted, that is, the simplified words "larger", "bigger", "smaller" or "smaller" can be used for adjustment. The total adjustment range of the volume is 60 scales, During speech recognition, the voice request related to the volume can recognize up to 10 simplified words. At this time, the adjustable range of the simplified words can be 1 to 10, and each corresponding simplified word can adjust the volume of the car audio by 3 scales. If the voice recognition recognizes voice requests with more than 10 simplified words, you can directly adjust the volume to the maximum or minimum.

Voice interaction methods include:

005: Map the control range and adjustable range to the preset intention and the corresponding preset scale adjustment accuracy value.

The voice interaction device 10 also includes a mapping module 103 .

Step 005 can be implemented by the mapping module 103 . That is to say, the mapping module 103 is used to map the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values.

In this way, the control range of vehicle components and the adjustable range of each vehicle component are mapped to the intent system that the intent recognition model can understand. A corresponding preset intention is formulated for the objects in the control range of the vehicle component and the corresponding adjustable range of the vehicle component. For example: system_volume_up represents the default intent "volume up" and system_volume_down represents the default intent "volume down". Therefore, a specific intent mapping system is formulated for the control range of parts and the adjustable range of vehicle parts.

For the preset scale adjustment accuracy, for example, when the voice interaction simulates the operation of vehicle parts, the volume is adjusted by 3 scale values at a time, and the total scale value is 60, then the preset scale adjustment accuracy range can be 1-20. For another example, when the voice interaction simulates the operation of vehicle parts, the seat is adjusted 18 scales each time, the total scale value is 90, and the preset scale adjustment accuracy ranges from 1 to 5.

Please refer to Fig. 6, step 005 includes:

0051: Establish a mapping table of intent and default value according to the preset intent and default adjustment range.

Step 0051 can be implemented by the mapping module 103 . That is to say, the mapping module 103 is configured to establish an intent-default value mapping table according to the preset intent and the default adjustment range.

According to the preset intent and the previously confirmed default adjustment range, a mapping table of intent and default value can be established for use in the online process and for downstream operations.

For example, if the voice request simulates vehicle parts to adjust the volume of the car audio by 3 scales each time (the default value is 3), under the requirement of precision, the preset intentions corresponding to the volume are system_volume_up and system_volume_down respectively. Correspondingly, the intent and default value mapping table established by adjusting the volume of the vehicle components of the car audio can be:

{system_volume_up: 3; system_volume_down: 3}.

If the voice request simulates the vehicle parts to adjust the air volume of the air conditioner to adjust one gear each time, the intent and default value mapping table established by the vehicle parts of the vehicle air conditioner to adjust the air volume of the air conditioner is:

{ ac_wind_up: 1; ac_wind_down: 1; }.

Similarly, the voice interaction method of this application also includes multiple preset intentions such as screen brightness adjustment, vehicle seat height, front and back, etc. The mapping relationship between multiple preset intentions and default values can be determined according to the above method, and the mapping Relationships are stored in a database for loading and reading by online processes.

Please refer to Fig. 7, step 005 includes:

0052: Map the adjustable range of each vehicle component within the control range to a preset intention, and each preset intention corresponds to multiple preset scale adjustment accuracy values.

Step 0052 can be implemented by the mapping module 103 . That is, the mapping module 103 is used to map the adjustable range of each vehicle component within the control range to a preset intention, and each preset intention corresponds to multiple preset scale adjustment accuracy values.

The adjustable range of each vehicle component includes multiple gear positions or multiple scale values. When establishing the mapping, it is necessary to map the adjustable range corresponding to each vehicle component to the same preset intention. For example, the adjustable range of the air volume adjustment button of the air conditioner includes 5 gears, and the voice request corresponding to the increase of the air volume can include 5 levels from "large air volume" to "large air volume". The words "big" all map to the same preset intention, that is, to increase the air volume.

In this way, during the voice interaction process, voice requests for different adjustment scales of the same vehicle component all correspond to the same preset intention.

One preset intent corresponds to multiple preset scale adjustment accuracy values. For example, the preset intent "increase the volume of the car audio" can correspond to 20 preset scale adjustment accuracy values. If the adjustable range of the volume knob is 60, that is The total scale for adjusting the volume is 60, and each preset scale adjustment precision value corresponds to 3 scales for adjustment, that is, each adjustment of a preset scale adjustment precision value represents an adjustment of 3 scales. The adjustment accuracy values of the 20 preset scales are as follows: adjust the volume to increase by 3 scales, and the corresponding voice request is "loud volume"; adjust the volume to increase by 6 scales, and the corresponding voice request is "too loud"; adjust the volume to increase There are 9 gears, and the corresponding voice request is "the volume is greatly increased"....

In other embodiments of the present application, with the permission of the user, different user instructions can be collected corresponding to the same preset intention, such as "the volume is very loud", and the user can expand with different degrees of freedom, such as "volume Increase", "Volume rises", "Volume is high and high", and the recognized intentions of different expansion words are to increase the volume.

Please refer to Fig. 8, step 005 includes:

0053: Set the simplified word as the slot, and extract the slot from the preset recognition text corresponding to the vehicle parts to obtain the repeated field;

0054: Perform repeated statistics on the slot value of the repeated field to obtain the number of repetitions;

0055: According to the adjustable range of the reduced word, map the number of repetitions to the preset scale adjustment precision value corresponding to the preset intent.

Please refer to FIG. 9 , the mapping module 103 includes an extraction unit 1033 , a statistical unit 1034 and a mapping unit 1035 .

Step 0053 can be implemented by the extracting unit 1033 , step 0054 can be implemented by the statistical unit 1034 and step 0055 can be implemented by the mapping unit 1035 . That is to say, the extraction unit 1033 is used to set the simplified word as a slot, and extracts the slot from the preset recognition text corresponding to the vehicle parts to obtain a repeated field; the statistical unit 1034 is used to perform repeated statistics on the slot value of the repeated field to obtain The number of repetitions; the mapping unit 1035 maps the number of repetitions to the preset scale adjustment precision value corresponding to the preset intention according to the adjustable range of the simplified word.

The number of repetitions of the shortened words may represent the number of calibration adjustments to the vehicle components. Therefore, the shortened words can be set as slots. For example, the condensed words of the volume knob can be adjusted in the range of 1 to 10, and the preset scale adjustment accuracy corresponding to the volume knob is in the range of 1 to 20. Within the adjustable range of the condensed words, if the preset recognition text corresponding to the voice request is "Volume greatly greatly", you can extract "largely greatly" as a slot, and set this slot as a repeated field. Then, perform repeated statistics on the slot values of the extracted repeated fields, and map the number of repetitions to the corresponding preset scale to adjust the accuracy. For the extracted slots "big, big" and the number of repetitions of "big" is 4, you can Mapped to the corresponding preset scale adjustment accuracy 4.

In other embodiments of the present application, with the permission of the user, different user voice requests can be collected for the same scale adjustment accuracy, such as the statement "the volume is very loud", and the user can expand with different degrees of freedom, such as " "Volume increases", "Volume rises", "Volume is high, high", and the scale adjustment accuracy obtained by recognizing different expansion words is "volume adjustment 3 times".

Please refer to Figure 10, voice interaction methods include:

006: The intention recognition model is obtained through training the intention training data, and the intention training data is related to the vehicle parts and the adjustable range of the vehicle parts.

Please refer to FIG. 11 , the voice interaction device 10 includes an intention training module 104 .

Step 006 can be implemented by the intention training module 104, that is, the intention training module 104 is used to train the intention recognition model through intention training data, and the intention training data is related to the vehicle components and the adjustable range of the vehicle components.

In this application, the intention recognition model is obtained by training the vehicle parts that can be scaled and the training data corresponding to the adjustable range of the vehicle parts through machine learning, and then performs intention recognition on voice requests to realize accurate recognition of user intentions.

Wherein, the intention training data is related to the vehicle components and the adjustable range of the components that can be scaled. Vehicle parts refer to the parts that can be adjusted on the smart car, such as: "volume knob", "screen brightness button", "air conditioning air volume knob/button", "seat adjustment knob/button" and so on. The adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component. Corresponding to different vehicle components, the adjustable range can be gear position or range.

The intent recognition model in this application is pre-trained before use. Intention training data can collect a certain amount of historical records of user voice requests under the condition of obtaining relevant user permissions, and simply filter the collected user voice requests to obtain voice requests with clear semantics and specific purposes, specifically: in In the screening, the voice requests with obvious semantic ambiguity and some short voice requests containing only modal particles, such as "ah" and "oh", are removed, and the voice requests with clear semantics and specific purposes are left.

Then, mark the screened voice request with reference to the preset intent. For example, if the voice request is "brighten the screen", the corresponding intent can be marked as "brighten the screen", and then perform quality inspection on the marked data , again to filter out the labeled data that does not meet the preset intent, leaving the labeled data that can be used for training the intent model. For example, if the voice request is "open the car door", the corresponding intention of the label is "open the car door", and the parts that can be adjusted by the scale are not used to adjust the car door. At this time, the voice request can be removed by filtering.

During the training process, the labeled data that can be used for intent model training is used as intent training data and divided into intent training set and intent data set. The division ratio can be set according to requirements, and is not limited here. For example, the intention training set is 80%, and the intention verification set is 20%. Use the data in the intent training set to train the intent recognition model. Model training can use models such as BERT, ALBERT, XLNet, and RoBERTa.

For example, for the established intent recognition model, at least part of the data in the intent training set is used to train the intent recognition model, and then at least part of the data in the intent verification set is used to verify the accuracy of the trained intent recognition model. In the case that the accuracy of intent verification does not reach the threshold of intent accuracy, train the intent recognition model with at least another part of the data in the intent training set, and use another part of the data in the intent verification set to recognize the intent after retraining The accuracy of the model is verified by intent, and the process of training and verification is repeated until the accuracy of intent verification reaches the threshold of intent accuracy, it can be considered that the intent recognition model has reached the standard, and the training of the intent recognition model is completed.

It should be noted that each data in the intent training set and intent verification set is only used once. If the intent recognition model fails to reach the training standard after traversing all the data in the intent training set and intent verification set, it can be used again with the user's permission. Collect more voice requests in the case of a situation, so as to screen and label more intent training data to train the intent recognition model, so as to ensure that the intent recognition model can accurately recognize the intent corresponding to the input voice request.

It can be understood that the above intent recognition model can be trained offline, and after the offline trained intent recognition model is deployed to the server or vehicle, the server or vehicle can use the intent recognition model to perform intent recognition on the received voice request.

Voice interaction methods include:

007: The accuracy recognition model is obtained through the training of the accuracy training data. The accuracy training data is related to the vehicle parts, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the vehicle parts.

The speech interaction device 10 includes an accuracy training module 105 .

Step 007 can be implemented by the accuracy training module 105 . That is to say, the accuracy training module 105 is used to obtain an accuracy recognition model through training on accuracy training data, and the accuracy training data is related to the vehicle parts, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the vehicle parts.

In this way, the application uses machine learning to obtain an accuracy recognition model from the training data corresponding to the vehicle parts that can be scaled, the adjustable range of the vehicle parts, and the scale adjustment accuracy range of the parts, and then voice requests for accuracy Identification, to achieve accurate identification of user scale adjustment accuracy.

Among them, the accuracy training data is related to the vehicle parts that can be adjusted by the scale of the vehicle parts and the adjustable range of the parts, which means that the accuracy training data includes all the vehicle parts that can be adjusted by the scale in the vehicle, such as "volume knob ", "Screen Brightness Button", "Air Conditioner Air Volume Knob/Button", "Seat Adjustment Knob/Button" etc. The adjustable range of a vehicle component corresponds to the scale range adjusted by operating the vehicle component. Corresponding to different vehicle components, the adjustable range can be gear position or range, and the scale adjustment accuracy range can be the scale value of each adjustment.

Among them, the precision training data can collect a certain amount of historical records of user voice requests under the condition of obtaining relevant user permissions, and simply filter the collected user voice requests to obtain voice requests with clear semantics and specific purposes. Specifically, : In the screening, remove the obviously semantically unclear voice requests, and some short voice requests that only contain modal particles, such as "ah", "oh", etc., leaving voice requests with clear semantics and specific purposes. At this point, the history of user voice requests acquired during precision training can be the same as the history of user voice requests acquired during intention training, and the step of filtering the collected user voice requests during precision training can be compared with that of intention training. The steps for screening the collected user voice requests are the same.

Then manually mark the screened voice requests, and mark the scale adjustment accuracy value that the user wants to adjust. For example, if the voice request is "the screen is bright and bright", the scale adjustment accuracy value of the corresponding label to adjust the brightness of the screen in the vehicle is 3. Then, an accuracy recognition model is established based on slot extraction. Algorithms that can be used for slot extraction include RNN slot filling, CRF, etc., and the marked data is used as accuracy training data and divided to obtain an accuracy training set and an accuracy data set. The division ratio It can be set according to requirements, and is not limited here. For example, the accuracy training set is 80%, and the accuracy verification set is 20%. Use the data in the precision training set to train the precision recognition model. For the established precision recognition model, at least part of the data in the precision training set is used to train the precision recognition model, and then at least part of the data in the precision verification set is used to verify the accuracy of the trained precision recognition model. In the case that the accuracy of accuracy verification does not reach the threshold of accuracy accuracy, the accuracy recognition model is trained again through at least another part of the data of the accuracy training set, and the accuracy recognition after retraining is performed again using another part of the data of the accuracy verification set The accuracy of the model is verified for accuracy, and the process of training and accuracy verification is repeated until the accuracy of accuracy verification reaches the threshold of accuracy and accuracy, the accuracy identification model can be considered to have reached the standard, and the training of the accuracy identification model is completed.

It should be noted that each data in the accuracy training set and accuracy verification set is only used once. When the accuracy recognition model traverses all the data in the accuracy training set and accuracy verification set and fails to meet the training standards, it can be used again with the user's permission. Collect more voice information under the circumstances, so as to filter and label more precision training data to train the precision recognition model, so as to ensure that the precision recognition model can accurately recognize the scale adjustment precision corresponding to the input voice request.

In this way, the precision recognition model can be pre-trained through the precision training data to perform precision recognition on the text to be recognized, thereby identifying the adjustment precision of a certain vehicle component, obtaining the precision recognition result, and finally determining the target scale adjustment precision value.

Please refer to Figure 12, there are multiple preset intentions, step 04 includes:

041: Obtain the intent discrimination probability corresponding to each preset intent from the result of intent recognition;

042: Determine a preset intent with an intent discrimination probability greater than a first probability threshold as a target intent corresponding to the voice request.

Please refer to FIG. 13 , the first determination module 14 may further include a first acquisition unit 141 and an intention determination unit 142 .

Step 041 can be implemented by the first acquiring unit 141 , and step 042 can be implemented by the intention determining unit 142 . That is, the first obtaining unit 141 is used to obtain the intention identification probability corresponding to each preset intention from the result of the intention identification; the intention determination unit 142 is used to determine a preset intention whose intention identification probability is greater than the first probability threshold as a voice request corresponding target intent.

Use the trained model to perform intent recognition on the text to be recognized to obtain the result of intent recognition. The result of intent recognition includes the probability that the text to be recognized matches each preset intent, that is, multiple intent discrimination probabilities can be obtained. If the first probability threshold is 0.9, the result of the intent recognition is that the intention discrimination probability of a certain type of preset intent exceeds 0.9, and the server considers that the current user's voice request is the corresponding type of preset intent as the target intent. The first probability threshold may also be other values. The first probability threshold may be a default value, or may be set according to user needs, and no limitation is set here.

The preset intentions of this application may include: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map zoom out, screen brighter, screen darker, screen slide up , screen slides down, gauge brightens, gauge dims, ambient light brightens, ambient light dims, seat forward, seat rearward, seat up, seat down, seat back forward, seat back rearward , at least one of window up and window down.

It should be understood that the preset intentions in this application are only illustrative, and the corresponding preset intentions can be set according to the actual operation of the objects in the vehicle that can be scaled.

In this way, multiple preset intentions can be formulated according to the specific conditions of the vehicle to improve possible voice interaction scenarios.

Step 04 also includes:

043: When the intention discrimination probabilities of each preset intention are not greater than the first probability threshold, determine that the intention of the voice request is a non-scale adjustment intention.

Step 043 can be realized by the intention determination unit 142, that is, the intention determination unit 142 is used to determine that the intention of the voice request is a non-scale adjustment intention when the intention discrimination probability of each preset intention is not greater than the first probability threshold .

For example, when the discriminant probabilities corresponding to the preset intentions of multiple categories are not greater than the first probability threshold, that is, the probability that the user’s intention recognition result according to the voice request matches the preset intentions of multiple categories is relatively low, lower than The first probability threshold, for example, the first probability threshold is 0.9, then it is determined that the intention of the voice request is a non-scale adjustment intention, and the non-scale adjustment intention refers to the user who does not use the vehicle parts that can be scaled to adjust the preset function of the vehicle Intent, for example, the voice request input by the user is "open the door", because the door cannot be adjusted by the vehicle parts with scales, therefore, the voice request "open the door" is a non-scale adjustment intent.

Please refer to Figure 14, step 04 also includes:

044: Acquire the accuracy identification probability corresponding to each preset scale to adjust the accuracy value of the accuracy identification result;

045: Determine a preset scale adjustment accuracy value whose accuracy discrimination probability is greater than the second probability threshold as a target scale adjustment accuracy value corresponding to the voice request.

Please refer to FIG. 15 , the first determination module 14 includes a second acquisition unit 143 and an accuracy determination unit 144 .

Step 044 can be implemented by the second acquiring unit 143 , and step 045 can be implemented by the accuracy determining unit 144 . That is to say, the second acquisition unit 143 is used to obtain the accuracy identification probability corresponding to each preset scale adjustment accuracy value of the accuracy identification result; The adjustment accuracy value is determined as the target scale adjustment accuracy value corresponding to the voice request.

The accuracy discrimination probability refers to the probability that the accuracy of recognizing the voice request matches the adjustment accuracy value of each preset scale. The second probability threshold may be, for example, 0.7, 0.8, 0.9 or other numerical values, which are not limited here.

When the accuracy discrimination probability is 1 and the second probability threshold is 0.9, that is, the accuracy discrimination probability is 1 and exceeds the second probability threshold 0.9, then it is determined that the target scale adjustment accuracy value for volume adjustment corresponding to the voice request "Volume is louder is louder" is 5.

Step 04 also includes:

046: In a case where the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold, determine that the accuracy recognition of the voice request is wrong.

Step 046 can be implemented by the accuracy determination unit 144 . That is to say, the accuracy determining unit 144 is configured to determine that the accuracy of the speech request is incorrectly recognized when the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold.

If the accuracy discrimination probabilities of each preset scale adjustment accuracy value are not greater than the second probability threshold, it indicates that the accuracy recognition of the input voice request is incorrect, and voice requests not related to scale adjustment accuracy can be excluded.

Please refer to Figure 16, step 05 includes:

051: Determine the default value according to the target intent and the intent-default value mapping table;

052: Modify the default value according to the target scale adjustment accuracy value.

Referring to FIG. 17 , the modifying module 15 includes a default value determining unit 151 and a modifying unit 152 .

Step 051 can be implemented by the default value determining unit 151 , and step 052 can be implemented by the modifying unit 152 . That is, the default value determination unit 151 is used to determine the default value according to the target intention and the mapping table between the intention and the default value; the modification unit 152 is used to modify the default value according to the target scale adjustment precision value.

The default value is determined according to the target intent and the intent-default value mapping table, that is, if the target intent of the user's voice request "Volume up" is to increase the volume, then according to the intent-default value mapping table, the default The value can be 3, that is, when the voice requests to simulate vehicle parts to adjust the volume, adjust 3 scales each time.

The result of precision recognition based on the user's voice request "Volume is very loud" can be: the target scale adjustment precision value is recognized as 3, and then the default value is modified according to the target scale adjustment precision value to 3*3=9, that is, after modification, it is the same as The scale value corresponding to the user's voice request "Volume is louder" is 9. Then, control instructions are generated according to the target intention and the modified default value. In the case of integrating the traditional logic of the voice request, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized.

Please refer to FIG. 18 , the present application also provides a server 20 . The server 20 includes a processor 21 and a memory 22. A computer program 221 is stored on the memory 22. When the computer program 221 is executed by the processor 21, the voice interaction method described in any one of the above-mentioned embodiments is realized.

The server 20 of the present application can perform voice recognition on the voice request for adjusting the preset function of the vehicle to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

Referring to FIG. 19 , the present application also provides a non-volatile computer-readable storage medium 30 containing a computer program. When the computer program 31 is executed by one or more processors 40, the voice interaction method of any of the above embodiments is realized.

For example, when the computer program 31 is executed by the processor 40, the steps of the following voice interaction method are realized:

04: According to the results of intent recognition and accuracy recognition, determine the target intent and target scale adjustment accuracy value corresponding to the voice request;

05: Determine the target intent corresponding to the voice request according to the result of intent recognition, and determine the target scale adjustment accuracy value corresponding to the voice request according to the result of precision recognition;

It can be understood that the computer program 31 includes computer program codes. The computer program code may be in source code form, object code form, executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random memory Access memory (RAM, Random Access Memory), and software distribution media, etc.

The computer-readable storage medium 30 of the present application can first perform voice recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized. The preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts. Then use the intent recognition model to recognize the intent of the text to be recognized, and use the accuracy recognition model to perform precision recognition on the text to be recognized, recognize the target intent corresponding to the voice request and adjust the target scale to adjust the precision value, and then modify the default value, so that in the fusion of voice In the case of requesting traditional logic, the effect of accurately adjusting the scale of the vehicle parts corresponding to the voice request according to the user's streamlined voice request is realized, and the user experience is improved.

Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims

A voice interaction method, characterized in that, comprising:

Carrying out speech recognition on the voice request for adjusting the preset function of the vehicle to obtain the text to be recognized, the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts;

performing intent recognition on the text to be recognized by using an intent recognition model;

Using the precision recognition model to perform precision recognition on the text to be recognized;

determining the target intent corresponding to the voice request according to the result of the intent recognition, and determining the target scale adjustment precision value corresponding to the voice request according to the result of the precision recognition;

Modifying a default value according to the target intent and the target scale adjustment accuracy value, the default value being the adjustment value corresponding to the target intent in the preset voice request;

The target intention and the modified default value are fused to generate a control instruction to control corresponding vehicle components.
The voice interaction method according to claim 1, wherein the voice interaction method comprises:

The intention recognition model is obtained by training the intention training data, and the intention training data is related to the vehicle component and the adjustable range of the vehicle component.
The voice interaction method according to claim 1, wherein the voice interaction method comprises:

The accuracy identification model is obtained by training the accuracy training data, and the accuracy training data is related to the vehicle component, the adjustable range of the vehicle component, and the scale adjustment accuracy range of the vehicle component.
The voice interaction method according to claim 1, characterized in that the voice interaction method comprises: determining a control range and a non-control range of the vehicle components.
The voice interaction method according to claim 4, characterized in that the voice interaction method comprises: determining a default adjustment range of each of the vehicle components.
The voice interaction method according to claim 5, wherein the voice interaction method comprises:

determining the adjustable range of said vehicle component;

Correcting the intention of the preset voice request according to the adjustable range of the vehicle component.
The voice interaction method according to claim 6, wherein the voice interaction method comprises:

The control range and the adjustable range are mapped to preset intentions and corresponding preset scale adjustment accuracy values.
The voice interaction method according to claim 7, wherein the voice interaction method comprises:

An intent-default value mapping table is established according to the preset intent and the default adjustment range.
The voice interaction method according to claim 8, wherein said modifying a default value according to said target intent and said target scale adjustment accuracy value comprises:

determining the default value according to the target intent and the intent-default value mapping table;

The default value is modified according to the target scale adjustment accuracy value.
The voice interaction method according to claim 8, wherein the mapping the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values includes:

Each of the adjustable ranges within the control range is mapped to one of the preset intentions, and each of the preset intentions corresponds to a plurality of preset scale adjustment accuracy values.
The voice interaction method according to claim 10, wherein the mapping of the control range and the adjustable range to preset intentions and corresponding preset scale adjustment accuracy values includes:

Setting the simplified words as slots, and extracting the slots from the preset recognition text corresponding to the vehicle parts to obtain repeated fields;

Perform repeated statistics on the slot value of the repeated field to obtain the number of repetitions;

The repetition quantity is mapped to the preset scale adjustment accuracy value according to the adjustable range of the reduced word.
The voice interaction method according to claim 11, wherein there are multiple preset intentions, and determining the target intention corresponding to the voice request according to the result of the intention recognition includes:

Obtaining the intention discrimination probability corresponding to each preset intention from the result of the intention recognition;

Determining one of the preset intentions whose intention discrimination probability is greater than a first probability threshold as the target intention corresponding to the voice request.
The voice interaction method according to claim 12, wherein the preset intention includes: volume up, volume down, air volume up, air volume down, temperature up, temperature down, map zoom in, map Zoom out, screen brighter, screen dim, screen slide up, screen slide down, instrument brighten, instrument dim, ambient light brighten, ambient light dim, seat forward, seat back, seat up, At least one of seat lowering, seat back forward, seat back rearward, window up, and window down.
The voice interaction method according to claim 12, wherein said determining the target scale adjustment precision value corresponding to the voice request according to the result of the precision recognition comprises:

Acquiring the accuracy identification probability corresponding to each preset scale adjustment accuracy value of the accuracy identification result;

A preset scale adjustment accuracy value whose accuracy discrimination probability is greater than a second probability threshold is determined as a target scale adjustment accuracy value corresponding to the voice request.
A voice interaction device, characterized in that the voice interaction device includes:

A speech recognition module, the speech recognition module is used to perform speech recognition on the voice request for vehicle preset function adjustment to obtain the text to be recognized, and the preset function refers to the function of simulating the scale adjustment of the operation of the vehicle parts;

An intent recognition module, configured to use an intent recognition model to perform intent recognition on the text to be recognized;

an accuracy identification module, the accuracy identification module is used to perform accuracy identification on the text to be identified by using an accuracy identification model;

A determination module, configured to determine the target intent corresponding to the voice request according to the result of the intention recognition, and determine the target scale adjustment precision value corresponding to the voice request according to the result of the precision recognition;

A modification module, configured to modify a default value according to the target intent and the target scale adjustment accuracy value, the default value being the adjustment value corresponding to the target intent in the preset voice request;

An instruction generating module, the instruction generating module is used to fuse the target intention and the modified default value to generate a control instruction to control corresponding vehicle components.
A server, characterized in that the server includes a processor and a memory, and a computer program is stored on the memory, and when the computer program is executed by the processor, the computer program described in any one of claims 1-14 is realized. voice interaction method.
A non-volatile computer-readable storage medium containing a computer program, characterized in that, when the computer program is executed by one or more processors, the voice interaction described in any one of claims 1-14 is realized method.