CN114005449B

CN114005449B - Voice interaction method and device, model training method, vehicle and storage medium

Info

Publication number: CN114005449B
Application number: CN202111628094.3A
Authority: CN
Inventors: 王亭玉; 赵群; 樊骏锋; 潘晓彤; 宁洪珂; 赵恒艺
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-05-13
Anticipated expiration: 2041-12-29
Also published as: CN114005449A; WO2023125002A1

Abstract

The invention discloses a voice interaction method and device, a model training method, a vehicle and a readable storage medium. The voice interaction method comprises the following steps: receiving a voice request for adjusting a vehicle navigation map, wherein a scale of the navigation map can be subjected to scale adjustment through simulating operation of vehicle parts, and a word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than a preset frequency; when the network connection state of the vehicle is in an abnormal state, performing intention recognition on the voice request by using an intention recognition model on the vehicle; performing precision identification on the voice request by using a precision identification model on the vehicle; generating a first control instruction according to the intention recognition result and the precision recognition result; and adjusting the display state of the navigation map according to the scale of the first control instruction. The invention can control the requirement of the navigation map according to the simplified voice request of the user under the condition of network abnormity, and realize the quick response to the voice request of the user.

Description

Voice interaction method and device, model training method, vehicle and storage medium

Technical Field

The invention relates to the technical field of voice, in particular to a voice interaction method and device, a model training method, a vehicle and a storage medium.

Background

At present, in an intelligent automobile scene, the requirement that voice interaction can realize user navigation exists. The navigation scenario is different from other vehicle control scenarios. The user often uses the navigation under the vehicle driving state, receives road conditions, light, the influence of surrounding environment, and the size that the user need adjust the navigation map in real time makes the target appear in own sight range.

In a related navigation scene, although voice interaction of map enlargement and map reduction can be realized, for the requirement of precision, the simplified voice requests embodying enlargement scale and frequency, such as 'large map' and 'large scale', cannot be effectively identified and correct vehicle-end commands can not be issued. In addition, the simplified voice request cannot be effectively identified under the scheme with poor network signal condition, and the user experience is influenced.

Disclosure of Invention

The embodiment of the invention provides a voice interaction method and device, a model training method, a vehicle and a storage medium.

The embodiment of the invention provides a voice interaction method. The voice interaction method comprises the following steps: receiving a voice request for adjusting a vehicle navigation map, wherein a scale of the navigation map can be subjected to scale adjustment through simulating operation of vehicle parts, and a word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than a preset frequency; in the case that the network connection state of the vehicle is in an abnormal state, performing intention recognition on the voice request by using an intention recognition model on the vehicle; performing precision recognition on the voice request by using a precision recognition model on the vehicle; generating a first control instruction according to the intention recognition result and the precision recognition result; and adjusting the display state of the navigation map according to the scale of the first control instruction.

Therefore, the voice interaction method can control the requirement of the navigation map according to the simplified voice request of the user under the condition of network abnormity, and realize quick response to the voice request of the user.

The range of the overlapped words is smaller than the adjustable range of the scale.

In this way, a larger range of scale adjustments can be achieved with fewer word stacks expressed by the user.

The adjusting the display state of the navigation map according to the scale of the first control instruction comprises: and under the condition that the scale of the first control instruction exceeds a preset threshold, adjusting the display state of the navigation map according to the preset threshold, and feeding back first prompt information to a user.

Therefore, when the scale of the first control instruction exceeds the preset threshold, the display state of the navigation map can be adjusted according to the preset threshold, and the first prompt information is fed back to the user, so that the display state of the navigation map can be accurately adjusted by the vehicle end.

The adjusting the display state of the navigation map according to the scale of the first control instruction comprises: and under the condition that the scale of the first control instruction does not exceed a preset threshold value, adjusting the display state of the navigation map according to the scale of the first control instruction, and feeding back second prompt information to a user.

Therefore, under the condition that the scale of the first control instruction does not exceed the preset threshold, the display state of the navigation map can be adjusted according to the scale of the first control instruction, and the second prompt information is fed back to the user, so that the display state of the navigation map can be accurately adjusted by the vehicle end.

The generating a first control instruction according to the intention recognition result and the precision recognition result comprises: determining a target intention according to the result of intention recognition; determining a target scale adjustment precision value according to the precision identification result; modifying a default value according to the target intention and the target scale adjustment precision value; and fusing the target intention and the modified default value to generate the first control instruction.

Therefore, after the target intention and the target scale adjustment precision value are determined, the default value is modified according to the target intention and the target scale adjustment precision value, the requirement of controlling the navigation map according to the simplified voice request of the user is met, and meanwhile, the vehicle terminal can correctly receive the first control instruction for amplifying the scale of the map.

The determining the target intention according to the result of the intention recognition comprises: acquiring intention distinguishing probability of each preset intention corresponding to the intention recognition result; determining one of the preset intents, of which the intention discrimination probability is greater than a first probability threshold, as the target intention.

In this way, the intention distinguishing probability of each preset intention corresponding to the intention recognition result can be obtained, and one preset intention with the intention distinguishing probability larger than the first probability threshold is determined as the target intention corresponding to the voice request, so that the requirement of controlling the navigation map according to the simplified voice request of the user is met.

The determining the target scale adjustment precision value according to the precision identification result comprises: acquiring precision discrimination probabilities of the precision identification results corresponding to the precision values of the preset scales; and determining the preset scale adjustment precision value with the precision discrimination probability larger than a second probability threshold value as the target scale adjustment precision value.

Therefore, the voice interaction method can obtain the precision discrimination probability of the precision recognition result corresponding to each preset scale adjustment precision value, and determines the preset scale adjustment precision value with the precision discrimination probability larger than the second probability threshold value as the target scale adjustment precision value, so that the navigation map is subjected to precise scale adjustment.

The adjusting the display state of the navigation map according to the scale of the first control instruction comprises: determining an adjustment direction of a scale of the navigation map according to the target intention; determining the adjusting span of the scale of the navigation map according to the target scale adjusting precision value; determining the scale of the first control instruction according to the current scale, the adjusting direction and the adjusting span; and adjusting the scale of the navigation map to the scale of the first control instruction.

Thus, the adjusting direction of the scale of the navigation map is determined according to the target intention; the method comprises the steps of determining the adjusting span of a scale of a navigation map according to a target scale adjusting precision value, determining the scale of a first control instruction according to the current scale, the adjusting direction and the adjusting span, and then adjusting the scale of the navigation map to the scale of the first control instruction, so that a vehicle end can correctly receive the first control instruction of the scale of the amplified map, and the accurate adjustment of the scale of the navigation map is realized.

The voice interaction method comprises the following steps: sending the voice request to a server under the condition that the network connection of the vehicle is in a normal state; receiving a second control instruction issued by the server according to the voice request; and adjusting the display state of the navigation map according to the scale of the second control instruction.

Thus, the voice request is sent to the server under the condition that the network connection of the vehicle is in a normal state; receiving a second control instruction issued by the server according to the voice request; and adjusting the display state of the navigation map according to the scale of the second control instruction, so that the voice request can be quickly responded in real time under the condition of network existence when the network connection state is a normal state.

The voice interaction method comprises the following steps: determining that the voice request can adjust a scale and an adjustable range of the navigation map.

Therefore, the voice interaction method can provide a basis for accurately adjusting the scale of the navigation map according to the voice request in the follow-up process according to the adjustable scale and the adjustable range of the navigation map.

The voice interaction method comprises the following steps: and determining the range of overlapped words which can be supported by the voice request according to the scale and the voice request with the use frequency higher than the preset frequency.

Therefore, the voice interaction method determines the word overlapping range which can be supported by the voice request according to the scale and the voice request with the use frequency higher than the preset frequency, and can lay a foundation for realizing the requirement of controlling the navigation map according to the simplified voice request of the user.

The invention also provides a model training method. The intention recognition model and the accuracy recognition model of any one of the above embodiments are obtained by training the models. The model training method comprises the following steps: training through intention training data to obtain the intention recognition model, wherein the intention training data is related to a scale and an adjustable range of a navigation map; and training precision training data to obtain the precision recognition model, wherein the precision training data is related to the scale and the adjustable range of the navigation map and the scale adjustment precision range of the navigation map.

Therefore, the model training method can obtain the intention recognition model through the intention training data training, and further carry out intention recognition according to the intention recognition model, so that the intention of the user can be recognized accurately. In addition, the model training method can obtain the precision recognition model through precision training data training, carry out precision recognition on the voice request according to the precision recognition model, and can determine the scale adjustment precision of the navigation map corresponding to the voice request.

The invention provides a voice interaction device. The voice interaction device comprises: the device comprises an instruction receiving module, an intention identification module, a precision identification module, a control instruction generation module and an adjustment module. The instruction receiving module is used for receiving a voice request for adjusting a vehicle navigation map, a scale of the navigation map can be subjected to scale adjustment through simulation of operation on vehicle parts, and a word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than a preset frequency; the intention recognition module is used for performing intention recognition on the voice request by using an intention recognition model on the vehicle under the condition that the network connection state of the vehicle is in an abnormal state; the precision recognition module is used for carrying out precision recognition on the voice request by utilizing a precision recognition model on the vehicle; the control instruction generating module is used for generating a first control instruction according to the intention recognition result and the precision recognition result; the adjusting module is used for adjusting the display state of the navigation map according to the scale of the first control instruction.

Therefore, the voice interaction device can meet the requirement of controlling the navigation map according to the simplified voice request of the user under the condition of network abnormity so as to realize quick response to the voice request of the user, and simultaneously, the vehicle end can correctly receive the first control instruction for amplifying the scale of the map.

The invention also provides a vehicle. The vehicle comprises a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the voice interaction method of any of the above embodiments.

Therefore, the vehicle can control the requirement of the navigation map according to the simplified voice request of the user under the condition of abnormal network, and realize the quick response to the voice request of the user.

The present invention also provides a non-transitory computer-readable storage medium containing the computer program. The computer program, when executed by one or more processors, implements the method of speech interaction of any of the above embodiments and/or the method of model training of any of the above embodiments.

Therefore, the computer-readable storage medium can control the requirement of the navigation map according to the simplified voice request of the user and realize the quick response to the voice request of the user under the condition of network abnormity.

Additional aspects and advantages of embodiments of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is one of the flow diagrams of the voice interaction method of the present invention;

FIG. 2 is a schematic structural diagram of a voice interaction apparatus according to the present invention;

FIG. 3 is a second flowchart of the voice interaction method of the present invention;

FIG. 4 is a third flowchart of the voice interaction method of the present invention;

FIG. 5 is a fourth flowchart illustrating a voice interaction method according to the present invention;

FIG. 6 is a schematic structural diagram of a first control command generating module in the voice interaction apparatus according to the present invention;

FIG. 7 is a fifth flowchart illustrating a voice interaction method according to the present invention;

FIG. 8 is a schematic diagram of the structure of an intention determining unit in the voice interaction apparatus of the present invention;

FIG. 9 is a sixth flowchart illustrating a voice interaction method of the present invention;

FIG. 10 is a schematic structural diagram of a precision determination unit in the voice interaction apparatus of the present invention;

FIG. 11 is a seventh schematic flow chart of the voice interaction method of the present invention;

FIG. 12 is a schematic structural diagram of a regulation module in the voice interaction apparatus of the present invention;

FIG. 13 is an eighth flowchart illustrating a voice interaction method of the present invention;

FIG. 14 is a second schematic structural diagram of the voice interaction apparatus of the present invention;

FIG. 15 is a schematic flow chart diagram of a model training method of the present invention;

FIG. 16 is a schematic view of the structure of the model training apparatus of the present invention;

FIG. 17 is a schematic structural view of the vehicle of the present invention;

fig. 18 is a schematic structural diagram of a computer-readable storage medium of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for the purpose of illustrating the embodiments of the present invention and are not to be construed as limiting the embodiments of the present invention.

Referring to fig. 1, the present invention provides a voice interaction method. The voice interaction method comprises the following steps:

01: receiving a voice request for adjusting a vehicle navigation map, wherein a scale of the navigation map can be subjected to scale adjustment through simulating operation of vehicle parts, and a word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than a preset frequency;

03: when the network connection state of the vehicle is in an abnormal state, performing intention recognition on the voice request by using an intention recognition model on the vehicle;

05: performing precision identification on the voice request by using a precision identification model on the vehicle;

07: generating a first control instruction according to the intention recognition result and the precision recognition result;

09: and adjusting the display state of the navigation map according to the scale of the first control instruction.

Referring to fig. 2, the present invention further provides a voice interaction apparatus 10. The voice interaction apparatus 10 includes: the system comprises a receiving module 11, an intention identification module 13, a precision identification module 15, a first control instruction generation module 17 and an adjustment module 19.

Step 01 may be implemented by the receiving module 11, step 03 may be implemented by the intention identifying module 13, step 05 may be implemented by the accuracy identifying module 15, step 07 may be implemented by the first control instruction generating module 17, and step 09 may be implemented by the adjusting module 19. That is, the receiving module 11 is configured to receive a voice request for adjusting a vehicle navigation map, where a scale of the navigation map may be scaled by simulating an operation on a vehicle component, and a word-overlapping range that can be supported by the voice request is determined according to the scale and the voice request with a usage frequency higher than a preset frequency; the intention identification module 13 is used for carrying out intention identification on the voice request by using an intention identification model on the vehicle when the network connection state of the vehicle is in an abnormal state; the precision identification module 15 is used for carrying out precision identification on the voice request by utilizing a precision identification model on the vehicle; the first control instruction generating module 17 is configured to generate a first control instruction according to the intention recognition result and the precision recognition result; the adjusting module 19 is configured to adjust a display state of the navigation map according to the scale of the first control instruction.

Specifically, the voice request for vehicle navigation map adjustment may be, for example, "map large" and "map small" where the number of "large" represents the number of layers that the user wants to zoom in, and the number of "small" represents the number of layers that the user wants to zoom out, i.e., the voice request with simplified words. It is understood that in the navigation map, the map enlargement is achieved by reducing the level of the scale, and the reduction is achieved by enlarging the level of the scale.

The scale of the navigation map can be adjusted in scale through simulating the operation of vehicle parts, and the word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than the preset frequency. The vehicle component may refer to a mechanical knob or a button. The preset frequency may be a frequency set by a default of the vehicle system, or may be a frequency set by a user. The word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than the preset frequency, so that the requirement of a user for adjusting the navigation map can be met to the maximum extent.

Under the condition that the network connection state of the vehicle is in an abnormal state, intention recognition is carried out on the voice request through an intention recognition model on the vehicle, accuracy recognition is carried out on the voice request through an accuracy recognition model on the vehicle, a first control instruction is generated according to an intention recognition result and an accuracy recognition result, the display state of the navigation map is adjusted according to a scale of the first control instruction, the intention of the navigation map adjustment corresponding to the voice request of the user can be rapidly recognized in real time through the high-frequency voice request under the non-network state, and user experience is improved.

After receiving a voice request of a user for adjusting a preset function of a vehicle, performing voice recognition through a voice recognition technology to obtain a text to be recognized for subsequent processing, for example, performing voice recognition on a voice request 'map big' input by the user to obtain the text to be recognized 'map big'.

In conclusion, the voice interaction method can meet the requirement of controlling the navigation map according to the simplified voice request of the user under the condition of network abnormity, and realize quick response to the voice request of the user, so that the vehicle end can correctly receive the first control instruction for amplifying the scale of the map.

Specifically, the voice interaction device 10 is configured to determine that the voice request can adjust the scale and the adjustable range of the navigation map.

And determining the scale and the adjustable range of the voice request adjustable navigation map, thereby providing a basis for accurately adjusting the scale of the navigation map according to the voice request.

It is understood that the scale of the navigation map has a scale of 5 meters, 10 meters, 25 meters, 50 meters, 100 meters, 200 meters, 500 meters, 1 kilometer, 2 kilometers, 5 kilometers, 10 kilometers, and the like. The corresponding rendering levels are 20, 19, 18, 17, 16, 15, 14 and 13 levels in sequence. That is, the adjustable range of the scale may be an adjustment value of 5 meters to 10 kilometers or more. The rendering level of the scale can determine the range of the overlapped words.

Specifically, the voice interaction device 10 is configured to determine a word overlap range that can be supported by the voice request according to the scale and the voice request with the usage frequency higher than the preset frequency.

It will be appreciated that the number of "large" in "large map" speech requests from the user represents the number of levels of layers that the user would like to zoom in. The number of "small" in the "map small cell" voice request of the user represents the number of layer levels the user wants to narrow down. Under the precision requirement of navigation map adjustment, the requirement of a user for 'large map' is that the scale level directly crosses 3 levels leftwards, the rendering level is narrowed by 3 levels, and the requirement for 'small map' is that the scale level crosses rightwards, and the narrowed 3 levels are displayed.

The method and the device for controlling the navigation map based on the voice request determine the word overlapping range which can be supported by the voice request according to the scale and the voice request with the use frequency higher than the preset frequency, and can lay a foundation for realizing the requirement of controlling the navigation map according to the simplified voice request of the user. For example, the rendering level of the scale supported by the navigation map of a certain vehicle is 20 levels, and the corresponding word-overlapping range may be: 2-20 superimposed words. However, the user typically does not adjust from the smallest rendering level to the largest rendering level when adjusting the scale of the navigation map, i.e., the speech request does not employ word overlaps up to 20. Therefore, the present invention may combine the voice request with the use frequency higher than the preset frequency, where the preset frequency may be, for example, 60%, and for example, the word-stacking used by the user in the voice request with the use frequency higher than 60% is 2-10 word-stacking, so that it may be determined that the word-stacking range may be: 2 to 10.

Wherein the range of the overlapped words is smaller than the adjustable range of the scale.

The voice interaction method further comprises the following steps: and correcting the intention of the preset voice request according to the adjustable range of the scale. The voice interaction device 10 is used for correcting the intention of the preset voice request according to the adjustable range of the scale.

According to the adjustable range of the scale, intention correction is carried out on the voice request which identifies the simplified voice request with the large map into the maximum intention and the minimum intention under the traditional logic, and the voice request is corrected into the corresponding small intention of the large scale under the condition that the simplified words meet the conditions.

Therefore, the purpose of real accurate adjustment in user instructions can be achieved on the basis of original traditional logic.

The voice interaction method comprises the following steps: and mapping the adjustable range and the word overlapping range of the scale to a preset intention and a corresponding preset scale adjustment precision value.

The voice interaction device 10 is configured to map the adjustable range and the range of the superimposed words of the scale to a preset intention and a corresponding preset scale adjustment precision value.

In this manner, the adjustable range of the scale is mapped to an intent system that the intent recognition model can understand. For example, "navigation _ map _ zoom" represents a preset intention "map zoom-in" and "navigation _ map _ zoom out" represents a preset intention "map zoom-out". Therefore, a specific intention mapping system is established according to the adjustable range of the scale.

For the preset scale adjustment precision value, for example, the range of the overlapped words supportable by the voice interaction simulation on the operation of the vehicle part is 2-10, and the range of the preset scale adjustment precision value can be 2-10. And each preset intention corresponds to a plurality of preset scale adjustment precision values.

In other embodiments of the present invention, different user instructions may be correspondingly collected with respect to the same preset intention when the user allows, for example, in terms of "map is large", the user may have more freedom to expand, for example, related generalization terms such as "adjust up and up, and" enlarge and enlarge "and the like.

Referring to fig. 3, step 09 includes:

091: and under the condition that the scale of the first control instruction exceeds a preset threshold, adjusting the display state of the navigation map according to the preset threshold, and feeding back first prompt information to the user.

Referring to fig. 2, step 091 may be implemented by adjustment module 19. That is, the adjusting module 19 is configured to adjust the display state of the navigation map according to the preset threshold value and feed back the first prompt information to the user when the scale of the first control instruction exceeds the preset threshold value.

Specifically, the preset threshold may be a maximum value that a scale of a navigation map set by a default of the vehicle system can be enlarged, or may be a numerical value set by a user, which is not limited herein.

In detail, after the sum of the scale level of the current navigation map and the scale level required to be increased exceeds a preset threshold, first prompt information can be fed back to a user, for example, the first prompt information can be voice broadcast information such as 'the scale in the instruction exceeds the threshold', so that the user knows that the scale of the first control instruction exceeds the preset threshold, and at the same time, the vehicle can independently adjust the display state of the navigation map according to the preset threshold, so that the vehicle end can correctly adjust the display state of the navigation map.

Referring to fig. 4, step 09 includes:

092: and under the condition that the scale of the first control instruction does not exceed the preset threshold, adjusting the display state of the navigation map according to the scale of the first control instruction, and feeding back second prompt information to the user.

Referring to fig. 2, step 092 can be implemented by the adjusting module 19. That is, the adjusting module 19 is configured to, when the scale of the first control instruction does not exceed the preset threshold, adjust the display state of the navigation map according to the scale of the first control instruction, and feed back the second prompt information to the user.

Specifically, the preset threshold may be a maximum value or a minimum value that can be enlarged or reduced of a scale of a navigation map set by a default of the vehicle system, or may be a numerical value set by a user, which is not limited herein.

The scale of the first control instruction is not more than the preset threshold, the display state of the navigation map is adjusted according to the scale of the first control instruction, namely, the display state of the navigation map can be automatically adjusted according to the voice request of the user by the vehicle, and the user experience is improved.

The second prompt message fed back can be other voice broadcast messages such as 'you have already tuned to the target scale', so that the user can timely know the current adjusted scale of the vehicle navigation map.

Referring to fig. 5, step 07 includes:

071: determining a target intention according to the result of intention recognition;

072: determining a target scale adjustment precision value according to the precision identification result;

073: adjusting the precision value according to the target intention and the target scale to modify a default value;

074: and fusing the target intention and the modified default value to generate a first control instruction.

Referring to fig. 6, the first control instruction generation module 17 includes an intention determination unit 171, a precision determination unit 172, a modification unit 173, and a first instruction generation unit 174.

Step 071 may be implemented by the intention determining unit 171, step 072 may be implemented by the accuracy determining unit 172, step 073 may be implemented by the modifying unit 173, and step 074 may be implemented by the first instruction generating unit 174. That is, the intention determining unit 171 is configured to determine a target intention corresponding to the voice request according to a result of the intention recognition; the precision determining unit 172 is configured to determine a target scale adjustment precision value corresponding to the voice request according to the result of the precision identification; the modifying unit 173 is configured to modify the default value according to the target intent and the target scale adjustment precision value; the first instruction generating unit 174 is configured to fuse the target intent and the modified default value to generate a first control instruction.

It will be appreciated that, under conventional logic, a user may want to zoom in and out of a map only one tick across from the next. Taking the current scale level as 50 meters as an example, namely the rendering level is 17, if a user wants to magnify the map, the scale level is set to be 25 meters, the rendering level corresponding to the 25 meters is 18 levels, at this time, if the user wants to magnify the map again, the user needs to say 'map magnifying' again, the scale level is set to be 10 meters, and the rendering level is set to be 19 levels; it is impossible to realize that the scale level crosses two scales according to a simplified voice request of 'map big'.

That is, the default value is an adjustment value corresponding to the target intention in the predetermined voice request confirmed according to the original conventional logic. The preset voice request may refer to a user voice request such as "map zoom in", "map zoom out", and the like. According to the conventional recognition logic, the adjustment value corresponding to the target intention of map enlargement is 1 scale level of enlargement, namely the default value is 1 scale level. According to the conventional recognition logic, the adjustment value corresponding to the target intention of the map reduction is 1 scale level lower, that is, the default value is 1 scale level. That is, the default values at this time are: default value = 1.

Under the precision logic of precision identification of the simplified instruction, the voice request of the user for the 'map big' corresponds to the target intention that the scale of the navigation map is increased, the rendering level of the scale expected by the user is adjusted for 2 times, and under the condition that the target scale adjustment precision is identified, the target scale adjustment precision value is 2, and the default value is modified to obtain the modified adjustment scale: default _ value' = scale value × default _ value =1 × 2= 2. The default value is improved by the requirement of 2 gears according to the voice request of the user, and the default value is modified to be 2. The voice interaction method provided by the invention can completely not damage the realization logic of the original non-precision instruction under the condition that the requirement of accurately adjusting the vehicle parts according to the voice request with the simplified words is newly added, and the function of accurately adjusting the vehicle parts according to the voice request with the simplified words is realized under the original recognition logic framework.

And finally, fusing the target intention and the modified default value to generate a first control instruction so as to control the display state of the navigation map.

Referring to fig. 7, step 071 includes:

0711: acquiring intention distinguishing probability of each preset intention corresponding to the result of intention identification;

0712: and determining a preset intention with the intention discrimination probability larger than the first probability threshold value as the target intention.

Referring to fig. 8, intention determining unit 171 includes a first obtaining subunit 1711 and an intention determining subunit 1712.

Step 0711 may be implemented by the first acquisition subunit 1711 and step 0712 may be implemented by the intent determination subunit 1712. That is, the first obtaining subunit 1711 is configured to obtain first decision probabilities that the result of the intention identification corresponds to each preset intention; the intention determining subunit 1712 is configured to determine, as the target intention, one preset intention of which the first discrimination probability is greater than the first probability threshold.

Specifically, the intention recognition is performed on the voice request by using an intention recognition model of the vehicle to obtain an intention recognition result, and the intention recognition result comprises probabilities that the voice request is matched with each preset intention, namely, a plurality of intention judgment probabilities can be obtained. If the first probability threshold is 0.9, the intention discrimination probability that the result of the intention recognition is that the preset intention of a certain category exceeds 0.9, and then the voice request of the current user is considered as the preset intention of the corresponding category, namely the target intention. The first probability threshold may be other values, and the first probability threshold may be a default value, or may be set by the user according to the user's needs, which is not limited herein.

The preset intentions of the present invention may include map enlargement and map reduction.

Therefore, the invention can identify different intentions according to the voice request with the simplified words provided by the user, thereby realizing corresponding target intentions.

Step 071 further comprises:

0713: and under the condition that the intention judging probability of each preset intention is not greater than the first probability threshold, determining the intention of the voice request as a non-map scale adjusting intention.

Step 0713 may be implemented by the intention determining subunit 1712, that is, the intention determining subunit 1712 is configured to determine, in a case that the intention discrimination probabilities of the respective preset intentions are not greater than the first probability threshold, that the intention of the voice request is the non-map scale adjustment intention.

For example, in a case where the determination probability corresponding to the preset intentions of map enlargement and map reduction is not greater than the first probability threshold, that is, the probability that the intention recognition result of the user obtained according to the voice request matches the preset intention is relatively low and is lower than the first probability threshold, for example, the first probability threshold is 0.9, it is determined that the intention of the voice request is a non-map scale adjustment intention, the non-map scale adjustment intention refers to a user intention for adjusting the navigation map scale without using a scale-adjustable vehicle component, for example, the voice request input by the user is "volume great", because the volume is not adjusted to the scale of the navigation map, and therefore, the intention of "volume great" of the voice request is a non-map scale adjustment intention.

Referring to fig. 9, step 072 includes:

0721: acquiring precision discrimination probabilities of precision identification results corresponding to the precision values of the preset scales;

0722: and determining a preset scale adjustment precision value with the precision discrimination probability larger than the second probability threshold value as a target scale adjustment precision value.

Referring to fig. 10, the precision determining unit 172 includes a second obtaining sub-unit 1721 and a precision determining sub-unit 1722.

Step 0721 may be implemented by the second acquisition subunit 1721, and step 0722 may be implemented by the accuracy determination subunit 1722. The second obtaining subunit 1721 is configured to obtain second determination probabilities that the result of the accuracy identification corresponds to the adjustment accuracy of each preset scale; the precision determining subunit 1722 is configured to determine a preset scale adjustment precision value with the second determination probability being greater than the second probability threshold as the target scale adjustment precision value.

The precision discrimination probability refers to the probability that the precision of the voice request is identified to be matched with each preset scale adjustment precision value. The second probability threshold may be, for example, 0.7, 0.8, 0.9, or other values, and is not limited herein.

And when the precision discrimination probability is 1 and the second probability threshold is 0.9, namely the precision discrimination probability is 1 and exceeds the second probability threshold by 0.9, determining that the target scale adjustment precision value of the voice request, which corresponds to the volume adjustment, is 5, wherein the map is large and the volume is large.

Step 072 further comprises:

0723: and determining the accuracy recognition error of the voice request under the condition that the accuracy discrimination probabilities of the preset scale adjustment accuracy values are not greater than the second probability threshold.

Step 0723 may be implemented by the accuracy determination subunit 1722. That is, the precision determining subunit 1722 is configured to determine that the precision identification of the voice request is incorrect when the precision determination probabilities of the respective preset scale adjustment precision values are not greater than the second probability threshold.

The precision discrimination probability of each preset scale adjustment precision value is not greater than the second probability threshold, which indicates that the precision identification of the input voice request is wrong, and the voice request related to the non-scale adjustment precision can be eliminated.

Referring to fig. 11, step 09 includes:

093: determining the adjusting direction of the scale of the navigation map according to the target intention;

094: determining the adjusting span of a scale of the navigation map according to the target scale adjusting precision value;

095: determining a scale of a first control instruction according to the current scale, the adjusting direction and the adjusting span;

096: and adjusting the scale of the navigation map to the scale of the first control instruction.

Referring to fig. 12, the adjusting module 19 includes a direction adjusting unit 193, a span adjusting unit 194, a scale determining unit 195, and an adjusting subunit 196.

Step 093 may be implemented by the direction adjusting unit 193, step 094 may be implemented by the span adjusting unit 194, step 095 may be implemented by the scale determining unit 195, and step 096 may be implemented by the adjusting subunit 196.

Specifically, the adjustment direction of the scale of the navigation map is determined according to the target intention, for example, if the target intention is to zoom in to the a direction of the navigation map by 3 levels, the adjustment direction of the scale of the navigation map of the target intention is the a direction. At this time, the target scale adjustment precision value is 3, and the adjustment span of the corresponding scale is 3 levels.

Then, the scale of the first control command is determined by integrating the current scale, the adjustment direction and the adjustment span. It is understood that the scale of the navigation map has a scale of 5 meters, 10 meters, 25 meters, 50 meters, 100 meters, 200 meters, 500 meters, 1 kilometer, 2 kilometers, 5 kilometers, 10 kilometers, and the like. The corresponding rendering levels are 20, 19, 18, 17, 16, 15, 14 and 13 levels in sequence. If the current scale is 5 meters (corresponding to 20 levels of rendering level), the adjustment direction is the direction a, and the adjustment span is increased by 3 levels, then the scale of the first control instruction is 25 meters (corresponding to 18 levels of rendering level).

And finally, adjusting the scale of the navigation map to the scale of the first control instruction. And if the current scale is 5 meters and the scale of the first control instruction is 25 meters, adjusting the scale of the navigation map from 5 meters to 25 meters, thereby realizing accurate control according to the simplified voice request.

Referring to fig. 13, the voice interaction method includes:

02: sending the voice request to a server under the condition that the network connection of the vehicle is in a normal state;

04: receiving a second control instruction issued by the server according to the voice request;

06: and adjusting the display state of the navigation map according to the scale of the second control instruction.

Referring to fig. 14, the voice interaction apparatus 10 includes: a request sending module 12, an instruction receiving module 14 and a regulating module 19.

Step 02 may be implemented by the request sending module 12, step 04 may be implemented by the instruction receiving module 14, and step 06 may be implemented by the adjusting module 19. That is, the request sending module 12 is configured to send the voice request to the server if the network connection state of the vehicle is in a normal state; the instruction receiving module 14 is configured to receive a second control instruction issued by the server according to the voice request; the adjusting module 19 is configured to adjust the display state of the navigation map according to the scale of the second control instruction.

Specifically, when the network connection state of the vehicle is in a normal state, the voice request is sent to the server, the second control instruction issued by the server according to the voice request is received, and the display state of the navigation map is adjusted according to the scale of the second control instruction, so that the navigation map scale can be quickly adjusted according to the quick real-time response of the voice request under the condition that the network connection state is a normal state and the network is available.

The invention also provides a model training method, which is used for training the intention recognition model and the precision recognition model. Referring to fig. 15, the model training method includes:

011: training through intention training data to obtain an intention recognition model, wherein the intention training data is related to a scale and an adjustable range of the navigation map;

013: and training by precision training data to obtain a precision recognition model, wherein the precision training data is related to the scale and the adjustable range of the navigation map and the scale adjustment precision range of the navigation map.

Referring to fig. 16, the present invention further provides a model training apparatus 100. The model training apparatus 100 includes an intent training module 110 and an accuracy training module 130.

Step 11 may be implemented by intent training module 110 and step 13 may be implemented by precision training module 130. That is, the intention training module 110 is configured to obtain the intention recognition model through the training of intention training data, the intention training data being related to the scale and the adjustable range of the navigation map; the precision training module 130 is configured to obtain the precision recognition model through training precision training data, where the precision training data is related to the scale and the adjustable range of the navigation map and the scale adjustment precision range of the navigation map.

According to the method, the intention recognition model is obtained through training by the aid of the scale of the navigation map and training data corresponding to the adjustable range in a machine learning mode, and further intention recognition is performed on the voice request, so that the intention of the user is accurately recognized. And training by the scale and the adjustable range of the navigation map and training data corresponding to the scale adjustment precision range of the navigation map to obtain a precision recognition model, and then carrying out precision recognition by a voice request to realize accurate recognition of the scale adjustment precision of a user.

The data for intention training and precision training can be used for collecting a certain number of historical records of user voice requests under the condition of obtaining related user authority, and simply screening the collected user voice requests to obtain voice requests with clear semantics and specific purposes, specifically comprising: in the screening, voice requests with obvious semantic ambiguity and shorter voice requests only containing voice words such as 'o', 'or' and the like are removed, and voice requests with definite semantic ambiguity and specific purposes are left.

Marking the screened voice request according to the formulated preset intention, for example, marking the voice request as 'map big and big', marking the corresponding intention as 'map enlargement', then, performing quality inspection on the marked data, screening again to remove the marked data which do not accord with the preset intention, and leaving the marked data which can be used for intention model training. For example, the voice request is "door open", the corresponding intention is labeled "door open", and the scale of the navigation map is not adjusted, and in this case, the voice request can be removed by filtering.

In the intention training process, the labeling data which can be used for intention model training is used as intention training data and is divided into an intention training set and an intention verification set, and the division ratio can be set according to requirements, and is not limited herein. For example, 80% of the intent training set and 20% of the intent validation set. And training an intention recognition model by using the data in the intention training set. Model training may utilize BERT, ALBERT, XLNET, RoBERTA, etc. models.

Specifically, for the established intention recognition model, at least part of data in an intention training set is used for training the intention recognition model, and then at least part of data in an intention verification set is used for performing intention verification on the accuracy of the trained intention recognition model. And under the condition that the accuracy of the intention verification does not reach the intention accuracy threshold, training the intention recognition model through at least another part of data of the intention training set again, and performing intention verification on the accuracy of the intention recognition model after the retraining by using another part of data of the intention verification set again, repeating the training and intention verification processes until the accuracy of the intention verification reaches the intention accuracy threshold, considering that the intention recognition model reaches the standard, and completing the training of the intention recognition model.

Due to the limited computing resources of the vehicle, the trained intention recognition model can be distilled through a distillation technology to obtain a distilled intention recognition model, so that the size of the model can be reduced, for example, the trained intention recognition model is distilled into a small model with the size of 10M. And then, performing intention verification on the distilled intention recognition model by using at least one part of the intention verification set, and if the accuracy of the intention verification reaches an intention accuracy threshold, considering that the distilled intention recognition model reaches the standard, so that the distilled intention recognition model can be subjected to model quantization, for example, from float32 to int8, further compressing the model, thereby reducing the dependence of the model on the vehicle performance, and finally deploying the distilled and quantized intention recognition model on the vehicle.

And in the process of performing intention verification on the distilled intention recognition model, if the accuracy of intention verification does not reach the intention accuracy threshold, continuing to train the trained intention recognition model again through more data of the intention training set, and performing intention verification on the retrained intention recognition model again until the accuracy of intention verification of the distilled intention recognition model reaches the intention accuracy threshold.

It should be noted that each data in the intention training set and the intention verification set is used only once, and when the intention recognition model traverses all the data in the intention training set and the intention verification set, which are not trained to reach the standard, more voice requests can be collected again under the condition that the user allows, so that more intention training data obtained by screening and labeling are used for training the intention recognition model, and therefore, the intention recognition model can be ensured to accurately recognize the intention corresponding to the input voice request.

And manually marking the screened voice request according to a preset scale accuracy value, wherein the scale adjustment accuracy value which the user wants to adjust needs to be marked. For example, the voice request is "map big", and the scale adjustment accuracy value for adjusting the brightness of the screen in the vehicle corresponding to the label is 3. Then, an accuracy identification model is established based on a slot extraction mode, algorithms which can be used for slot extraction include RNN slot filling, CRF and the like, the marked data are used as accuracy training data and are divided to obtain an accuracy training set and an accuracy verification set, the division ratio can be set according to requirements, and the method is not limited here. For example, the precision training set is 80% and the precision verification set is 20%. And training the precision recognition model by using the data in the precision training set. For the established precision recognition model, at least part of data in the precision training set is used for training the precision recognition model, and then at least part of data in the precision verification set is used for carrying out precision verification on the accuracy of the trained precision recognition model. And under the condition that the accuracy of the accuracy verification does not reach the accuracy threshold, training the accuracy recognition model again through at least another part of data of the accuracy training set, and performing accuracy verification on the accuracy of the accuracy recognition model after the re-training by using another part of data of the accuracy verification set again, repeating the training and accuracy verification processes in such a way, and finishing the training of the accuracy recognition model after the accuracy of the accuracy verification reaches the accuracy threshold.

Correspondingly, the distillation technology can be used for distilling the precision recognition model which reaches the standard after training, so that the distilled precision recognition model is obtained, and the size of the model is reduced. And then, performing precision verification on the distilled precision recognition model by using at least one part of the precision verification set, and if the accuracy of the precision verification reaches a precision accuracy threshold, determining that the distilled precision recognition model reaches the standard, so that the distilled precision recognition model can be subjected to model quantization, further compressing the model, reducing the dependence of the model on the vehicle performance, and finally deploying the distilled and quantized precision recognition model on the vehicle.

And in the process of performing precision verification on the distilled precision recognition model, if the accuracy of the precision verification does not reach the precision accuracy threshold, continuing to train the precision recognition model which reaches the standard again through more data of the precision training set, and performing precision verification on the precision recognition model which reaches the standard again until the accuracy of the precision verification of the distilled precision recognition model reaches the precision accuracy threshold.

It should be noted that each data in the precision training set and the precision verification set is used only once, and under the condition that the precision recognition model traverses all data in the precision training set and the precision verification set, which are not trained to reach the standard, more voice information can be collected again under the condition that the user allows, so that more precision training data obtained by screening and labeling are used for training the precision recognition model, and the precision recognition model can be ensured to accurately recognize the scale adjustment precision corresponding to the input voice request.

Therefore, the precision recognition model can be trained in advance through the precision training data to perform precision recognition on the voice request, so that the adjustment precision of the map scale is recognized, a precision recognition result is obtained, and the target scale adjustment precision value is finally determined.

Referring to fig. 17, the present invention also provides a vehicle 20. The vehicle 20 comprises a processor 21 and a memory 22, the memory 22 having stored thereon a computer program 221, the computer program 221, when executed by the processor 21, implementing the voice interaction method as described in any of the above embodiments.

The vehicle 20 of the invention can meet the requirement of controlling the navigation map according to the simplified voice request of the user under the condition of network abnormality, and realize the quick response to the voice request of the user, so that the vehicle end can correctly receive the first control instruction of amplifying the scale of the map.

Referring to fig. 18, the present invention also provides a non-volatile computer readable storage medium 30 containing a computer program. The computer program 31, when executed by one or more processors 40, implements the speech interaction method and model training method of any of the implementation examples described above.

For example, the computer program 31, when executed by the processor 40, implements the steps of the following voice interaction method:

It will be appreciated that the computer program 31 comprises computer program code. The computer program code may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

The computer-readable storage medium can meet the requirement of controlling the navigation map according to the simplified voice request of the user under the condition of network abnormality, and realize quick response to the voice request of the user, so that the vehicle end can correctly receive the first control instruction for amplifying the scale of the map.

Claims

1. A method of voice interaction, comprising:

receiving a voice request for adjusting a vehicle navigation map, wherein a scale of the navigation map can be subjected to scale adjustment through simulating operation of vehicle parts, and a word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than a preset frequency;

in the case that the network connection state of the vehicle is in an abnormal state, performing intention recognition on the voice request by using an intention recognition model on the vehicle;

performing precision recognition on the voice request by using a precision recognition model on the vehicle;

generating a first control instruction according to the intention recognition result and the precision recognition result;

and adjusting the display state of the navigation map according to the scale of the first control instruction.

2. The method of claim 1, wherein the range of word overlaps is less than the adjustable range of the scale.

3. The voice interaction method according to claim 1, wherein the adjusting the display state of the navigation map according to the scale of the first control instruction comprises:

and under the condition that the scale of the first control instruction exceeds a preset threshold, adjusting the display state of the navigation map according to the preset threshold, and feeding back first prompt information to a user.

4. The voice interaction method according to claim 1, wherein the adjusting the display state of the navigation map according to the scale of the first control instruction comprises:

and under the condition that the scale of the first control instruction does not exceed a preset threshold value, adjusting the display state of the navigation map according to the scale of the first control instruction, and feeding back second prompt information to a user.

5. The voice interaction method according to claim 1, wherein the generating a first control instruction according to the intention recognition result and the precision recognition result comprises:

determining a target intention according to the result of intention recognition;

determining a target scale adjustment precision value according to the precision identification result;

modifying a default value according to the target intention and the target scale adjustment precision value;

and fusing the target intention and the modified default value to generate the first control instruction.

6. The method of claim 5, wherein the determining the target intent according to the result of the intent recognition comprises:

acquiring intention judgment probabilities of the intention identification results corresponding to the preset intentions;

determining one of the preset intents, of which the intention discrimination probability is greater than a first probability threshold, as the target intention.

7. The voice interaction method of claim 5, wherein the determining the target scale adjustment precision value according to the result of the precision recognition comprises:

acquiring precision discrimination probabilities of the precision identification results corresponding to the precision values of the preset scales;

and determining the preset scale adjustment precision value with the precision discrimination probability larger than a second probability threshold value as the target scale adjustment precision value.

8. The voice interaction method according to claim 5, wherein the adjusting the display state of the navigation map according to the scale of the first control instruction comprises:

determining an adjustment direction of a scale of the navigation map according to the target intention;

determining the adjusting span of the scale of the navigation map according to the target scale adjusting precision value;

determining the scale of the first control instruction according to the current scale, the adjusting direction and the adjusting span;

and adjusting the scale of the navigation map to the scale of the first control instruction.

9. The voice interaction method according to claim 1, wherein the voice interaction method comprises:

sending the voice request to a server under the condition that the network connection of the vehicle is in a normal state;

receiving a second control instruction issued by the server according to the voice request;

and adjusting the display state of the navigation map according to the scale of the second control instruction.

10. The voice interaction method according to claim 1, wherein the voice interaction method comprises:

determining that the voice request can adjust a scale and an adjustable range of the navigation map.

11. A model training method for training a model to obtain an intention recognition model and an accuracy recognition model according to any one of 1 to 10, comprising:

training through intention training data to obtain the intention recognition model, wherein the intention training data is related to a scale and an adjustable range of a navigation map;

and training precision training data to obtain the precision recognition model, wherein the precision training data is related to the scale and the adjustable range of the navigation map and the scale adjustment precision range of the navigation map.

12. A voice interaction apparatus, comprising:

the instruction receiving module is used for receiving a voice request for adjusting a vehicle navigation map, a scale of the navigation map can be subjected to scale adjustment through simulation of operation on vehicle parts, and a word overlapping range which can be supported by the voice request is determined according to the scale and the voice request with the use frequency higher than a preset frequency;

an intention recognition module for performing intention recognition on the voice request using an intention recognition model on a vehicle if a network connection state of the vehicle is in an abnormal state;

the precision recognition module is used for carrying out precision recognition on the voice request by utilizing a precision recognition model on the vehicle;

the control instruction generation module is used for generating a first control instruction according to the intention recognition result and the precision recognition result;

and the adjusting module is used for adjusting the display state of the navigation map according to the scale of the first control instruction.

13. A vehicle, characterized in that the vehicle comprises a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, carries out the voice interaction method of any one of claims 1-10.

14. A non-transitory computer-readable storage medium containing a computer program, wherein the computer program, when executed by one or more processors, implements the method of speech interaction of any one of claims 1-10 and/or the method of model training of claim 11.