CN115512704A - Voice interaction method, server and computer readable storage medium - Google Patents

Voice interaction method, server and computer readable storage medium Download PDF

Info

Publication number
CN115512704A
CN115512704A CN202211400473.1A CN202211400473A CN115512704A CN 115512704 A CN115512704 A CN 115512704A CN 202211400473 A CN202211400473 A CN 202211400473A CN 115512704 A CN115512704 A CN 115512704A
Authority
CN
China
Prior art keywords
target
reference point
position information
vehicle
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211400473.1A
Other languages
Chinese (zh)
Other versions
CN115512704B (en
Inventor
樊骏锋
赵群
巴思然
朱晚贺
童栎滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202211400473.1A priority Critical patent/CN115512704B/en
Publication of CN115512704A publication Critical patent/CN115512704A/en
Application granted granted Critical
Publication of CN115512704B publication Critical patent/CN115512704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • B60W2040/089Driver voice
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2540/00Input parameters relating to occupants
    • B60W2540/21Voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Navigation (AREA)

Abstract

The application discloses a voice interaction method, which comprises the following steps: receiving a voice request which is transmitted by a vehicle and interacts with a user interface of a vehicle-mounted system; extracting intention information and slot position information of the voice request; determining a target operation object according to the target position and the intention information; generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object; and forwarding the vehicle control command to the vehicle to complete the voice interaction. In the method, the server can extract intention information and slot position information of the voice request and determine a target position of the voice request according to the slot position information in the process that a user interacts with a vehicle-mounted system user interface through voice; and further determining a target operation object in the target position according to the intention information, and finally generating a vehicle control command. The method and the device can extract the slot position information including the target reference point and the target relative position information, so that a user does not need to carry out multiple rounds of clarification, and the target operation object can be quickly positioned.

Description

Voice interaction method, server and computer readable storage medium
Technical Field
The present application relates to the field of vehicle-mounted voice technologies, and in particular, to a voice interaction method, a server, and a computer-readable storage medium.
Background
Currently, in-vehicle voice technology may enable a user to interact within a vehicle cabin through voice, such as controlling vehicle components or interacting with components in an in-vehicle system user interface. For example, a user controls a music player control in a user interface of the in-vehicle system to open via speech, etc. In a practical vehicle scene, a plurality of controls or sub-user interfaces often exist in a user interface of an on-board system, and a user voice request may hit a plurality of identically expressed controls or sub-user interfaces at the same time. In this case, a second round of clarification query is often required to ask the user to make a secondary selection from multiple candidates to confirm the final target, which affects the convenience of voice interaction.
Disclosure of Invention
The application provides a voice interaction method, a server and a computer readable storage medium.
The voice interaction method comprises the following steps:
receiving a voice request which is transmitted by a vehicle and interacts with a user interface of a vehicle-mounted system;
extracting intention information and slot position information of the voice request, wherein the slot position information comprises a target reference point and a target relative position;
determining a target position of the voice request according to the slot position information;
determining a target operation object according to the target position and the intention information;
generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object;
forwarding the vehicle control instructions to the vehicle to complete the voice interaction.
Therefore, in the method and the device, in the process that the user interacts with the vehicle-mounted system user interface through voice, the server can extract intention information and slot position information of the voice request for the received voice request, wherein the slot position information comprises the target reference point and the target relative position. Therefore, the server can determine the target position of the voice request according to the slot position information, further determine a target operation object in the target position according to the intention information, and finally generate a vehicle control instruction. The method and the device can extract the slot position information including the target reference point and the target relative position information, so that the target operation object can be quickly positioned without the need of multiple rounds of clarification of a user, and the fluency and the convenience of the voice instruction are improved.
The method further comprises the following steps:
selecting an object with a fixed position in the vehicle-mounted system user interface as a reference point to be selected;
preloading the reference point to be selected;
and acquiring the position information of each reference point to be selected and the area position information of each reference point to be selected.
Therefore, an object with a fixed position in the vehicle-mounted system user interface can be selected as a reference point to be selected, each reference point to be selected in the user interface can be preloaded before the target operation object is positioned, and the vehicle-mounted system user interface is divided into a plurality of areas according to the position information of the reference point, so that the subsequent process of selecting and positioning the reference point according to a specific voice request can be facilitated.
The acquiring the position information of each reference point to be selected and the area position information of each reference point to be selected includes:
selecting one of all reference points to be selected as a reference point;
and determining the position information of each reference point to be selected and the area position information of each reference point to be selected according to the reference point.
In this way, one of the candidate reference points with fixed positions in the vehicle-mounted system user interface can be selected as the reference point, and the position information of each candidate reference point in the interface can be expressed as the position coordinates of the candidate reference point relative to the reference point. The in-vehicle system user interface may be divided into a plurality of regions according to the position coordinates of each candidate reference point relative to the reference point. And determining the regional position information of each reference point to be selected is favorable for quickly and accurately positioning the target reference point in the subsequent voice interaction process.
The determining the target position of the voice request according to the slot position information includes:
carrying out normalization processing on the slot position information so as to correspond the target reference point to the reference point to be selected;
according to the corresponding relation between the target reference point and the reference point to be selected, corresponding the target relative position to the regional position information of the reference point to be selected;
and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the regional position information of the reference point to be selected.
Therefore, the slot position information in the extracted voice request can be normalized, after the target reference point and the corresponding reference point to be selected are corresponded, the relative position of the target is limited in the area position of the reference point to be selected, and the target position is finally determined. The exclusion of the invalid region contributes to the accuracy and efficiency of the target position determination process.
The determining a target operation object according to the target position and the intention information includes:
and preloading the to-be-selected operation object of the vehicle-mounted system user interface in the target position.
Therefore, all the operation objects to be selected which are possibly related to the voice request of the user can be selected in the determined target position range, all the operation objects to be selected can be preloaded before the target operation objects are searched, and the subsequent searching of the target operation objects is carried out in the preloaded operation objects to be selected. The pre-loading of the operation object to be selected in the target position of the user interface reduces the searching range of the target operation object and the time required by the searching process, and improves the efficiency of voice interaction.
Preloading the to-be-selected operation object of the vehicle-mounted system user interface in the target position, wherein the preloading comprises the following steps:
determining a control with a central point positioned in the target position as the object to be selected;
preloading the control.
Therefore, the server preloads the control positioned in the target position in the user interface, and limits the search range of the target control in the target operation object in the target position of the user interface, so that the search range of the control in the user interface is reduced, and the operation efficiency is improved.
The operation object to be selected comprises a sub user interface, and the preloading of the operation object to be selected of the vehicle-mounted system user interface in the target position comprises the following steps:
determining a sub user interface with the region information positioned in the target position as the operation object to be selected;
preloading the sub-user interfaces.
Therefore, the server preloads the sub-interface positioned in the target position in the user interface, and limits the search of the target sub-user interface in the target operation object within the target position area of the user interface, thereby reducing the search range of the sub-user interface and improving the operation efficiency.
The determining a target operation object according to the target position and the intention information comprises:
and searching an operable object in the to-be-selected operation object according to the intention information to determine the operable object as the target operation object.
Therefore, the server selects a target control or a sub-user interface capable of executing the intention information in the voice request of the user from the pre-loaded to-be-selected operation objects, and determines the target control or the sub-user interface as the target operation object, so that a command capable of being recognized and executed by the vehicle-mounted system is generated in the following process, and finally voice interaction is completed.
The server of the present application comprises a processor and a memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the method is realized.
The computer-readable storage medium of the present application stores a computer program that, when executed by one or more processors, implements the method described above.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a voice interaction method according to the present application;
FIG. 2 is a schematic view of a voice interaction method according to the present application;
FIG. 3 is a second flowchart of the voice interaction method of the present application;
FIG. 4 is a third flowchart of the voice interaction method of the present application;
FIG. 5 is a schematic diagram of a state of the voice interaction method of the present application;
FIG. 6 is a second state diagram of the voice interaction method of the present application;
fig. 7 is a third schematic diagram of a state of the voice interaction method of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the embodiments of the present application, and are not to be construed as limiting the embodiments of the present application.
Referring to fig. 1, fig. 2 and fig. 3, the present application provides a voice interaction method, including:
01: receiving a voice request which is transmitted by a vehicle and interacts with a user interface of a vehicle-mounted system;
02: extracting intention information and slot position information of the voice request;
03: determining a target position of the voice request according to the slot position information;
04: determining a target operation object according to the target position and the intention information;
05: generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object;
06: the vehicle control instructions are forwarded to the vehicle to complete the voice interaction.
The application also provides a server comprising a memory and a processor. The voice processing method can be realized by the server. Specifically, the memory stores a computer program, and the processor is configured to receive a voice request, which is forwarded by a vehicle and interacts with a vehicle-mounted system user interface, extract intention information and slot position information of the voice request, then determine a target position of the voice request according to the slot position information, determine a target operation object according to the target position and the intention information, generate a vehicle control instruction corresponding to the voice request according to the target position and the target operation object, and finally forward the vehicle control instruction to the vehicle to complete voice interaction.
The vehicle system user interface is the medium for information exchange between the vehicle system and the user. To facilitate user interaction with the vehicle, users may be supported to interact within the vehicle cabin via speech, such as controlling vehicle components or interacting with components in the vehicle system user interface. For example, a user sends a voice request to control the vehicle-mounted user interface and each control, sub-user interface and the like in the user interface. When multiple areas in the user interface can display the same interface, or a voice request simultaneously hits multiple controls in the user interface, the voice assistant is often required to perform a second round of clarification inquiry, and the user needs to continuously focus on prompt information on the central control display screen and perform secondary selection to confirm the final target operation object. In the example shown in FIG. 2, both the left and right sides of the user interface may display a navigation or music interface. When a user sends a request of switching the left side of a screen into music, grooves cannot be extracted for the screen, the left side and the like in the related art, so that the user requirement cannot be directly determined according to a natural voice understanding result of the voice request, and the left side of a user interface is switched into a music interface. At this time, the voice assistant will make a secondary inquiry to the user about the user interface area capable of implementing the music interface switching, and the user needs to make a secondary confirmation about the left-side switching or the right-side switching. However, not only does this affect the convenience of voice interaction, but it may also lead to user distraction that affects driving safety.
As shown in fig. 3, in the present application, for the above scenario, for a voice request sent by a user to interact with a vehicle-mounted system user interface, for example, "switch the left side of a screen to music" in the above example, after receiving the voice request of this type forwarded by a vehicle, a server performs natural language understanding on the voice request, and extracts intention information and slot information in the voice request respectively by using an intention classification model and a slot extraction model. The intention classification model is used for classifying and predicting the content of the execution instruction part in the voice request, namely classifying and predicting the content of 'switching to music' in the voice request to obtain intention information 'switching to music'. The intent classification model used may be a pre-set correlation model of the vehicle control system that reduces the time cost and inconsistency of invocation.
The slot position extraction model can extract the position positioning information in the voice request, and the slot position extraction model comprises a slot position of the target reference point information and a slot position of the target relative position information. The target reference points refer to natural language information describing user interface components or areas in the voice request, and can include a screen, a dashboard and the like; the target relative position refers to the area position information describing the position of the reference point relative to the target in the voice request, and may include "left side", "right side", "upper side", and the like. For the voice request "switch the left side of the screen to music", extracting the slot information includes: referring to the dot slot "screen", the position information slot "left". Then, a target position of the voice request, i.e., absolute coordinates or a coordinate range of "left of screen" on the reference coordinate system is determined according to the result of the slot extraction. And further searching a target operation object in the user interface according to the target position and the extracted intention information, wherein the target operation object is positioned in the coordinate range corresponding to the slot position information and can complete the control instruction corresponding to the intention information. And finally, the server fuses the obtained target position and the target operation object into a vehicle control command corresponding to the user voice request, the control command is issued to the vehicle, and finally the vehicle executes the command action. The voice interaction method based on the position information of the fixed reference point on the central control display screen can complete the positioning of a plurality of elements in the voice request in one round according to the position information relative to the preset fixed reference point in the voice request of a user, and directly complete the execution of the voice request. The voice assistant does not need to initiate two rounds of confirmation to the user, so that the voice interaction process is smoother.
In summary, in the application, in the process of the user interacting with the vehicle-mounted system user interface through the voice, for the received voice request, the server may extract intention information and slot position information of the voice request, where the slot position information includes the target reference point and the target relative position. Therefore, the server can determine the target position of the voice request according to the slot position information, further determine a target operation object in the target position according to the intention information, and finally generate a vehicle control instruction. The method and the device can extract the slot position information including the target reference point and the target relative position information, so that the target operation object can be quickly positioned without the need of multiple rounds of clarification of a user, and the fluency and the convenience of the voice instruction are improved.
Referring to fig. 3 and 4, the method further includes:
07: selecting an object with a fixed position in a vehicle-mounted system user interface as a reference point to be selected;
08: preloading a reference point to be selected;
09: and acquiring the position information of each reference point to be selected and the area position information of each reference point to be selected.
The processor is used for selecting an object with a fixed position in a vehicle-mounted system user interface as a reference point to be selected, preloading the reference point to be selected, and acquiring position information of each reference point to be selected and area position information of each reference point to be selected.
Specifically, the reference points to be selected refer to some fixed points on the screen, such as: the device comprises a central control display screen, an instrument panel, a status bar and the like. The position of the reference point to be selected does not change along with the change of the display content of the user interface and can be used as a reference of the relative position.
In a specific implementation process, the server loads basic information of all reference points to be selected in the user interface in advance. And then, acquiring the position information of each pre-loaded candidate reference point, and determining the area position information of each candidate reference point. The specific process of acquiring the position information of each candidate reference point and the area position information of each candidate reference point will be described in detail below.
Therefore, an object with a fixed position in the vehicle-mounted system user interface can be selected as a reference point to be selected, each reference point to be selected in the user interface can be preloaded before the target operation object is positioned, and the vehicle-mounted system user interface is divided into a plurality of areas according to the position information of the reference point, so that the subsequent process of selecting and positioning the reference point according to a specific voice request can be facilitated.
Referring to fig. 5, step 09 includes:
091: selecting one of all reference points to be selected as a reference point;
092: and determining the position information of each reference point to be selected and the area position information of each reference point to be selected according to the reference point.
The processor is used for selecting one of all the reference points to be selected as a reference point, and determining the position information of each reference point to be selected and the area position information of each reference point to be selected according to the reference point.
Specifically, the base reference point is a preloaded candidate reference point representing the positions of other candidate reference points, and in one example, the coordinates of the base reference point are usually set to (0, 0).
In the interaction process, the server can select one of the reference points to be selected as a reference point according to a voice request input by a user, and the pre-loaded position information and the area position information of each reference point to be selected are expressed based on the position coordinates of the selected reference point.
In one example, the center of the center control display screen of the user interface may be selected as the reference point with coordinates set to (0, 0). When the upper part of the center of the central control display screen is the positive direction of the vertical axis and the right part is the positive direction of the horizontal axis, the coordinates of the instrument panel are (0, 10) and the coordinates of the status bar are (0, -10). For the central control display screen, after the voice request related to the left side is input, the area needing to be acquired is all the areas on the left side of the longitudinal axis in the up-down direction by taking the central point of the central control display screen as the origin.
In other examples, other positions may also be selected as the reference point, for example, the lower left corner of the center control display screen may be selected as the reference point, and the like, which is not limited herein.
The position information calculation method is that according to the position information of each candidate reference point, the position information of the area divided by taking the candidate reference point as the center is calculated. For example, in an actual scenario, the "left side" of the center control display screen may be represented by coordinates as: (center control display screen center point abscissa, plus infinity), (minus infinity ), that is, all the areas between the coordinates of the upper right corner and the lower left corner of the area on the left side of the center control display screen. Similarly, representing the "upper left" area of the status bar, the coordinates are: (negative infinity, status bar center point ordinate) (status bar abscissa, positive infinity), i.e., all regions between the lower left and upper right coordinates of the "upper left" region of the status bar.
In this way, one of the candidate reference points with fixed positions in the vehicle-mounted system user interface can be selected as the reference point, and the position information of each candidate reference point in the interface can be expressed as the position coordinates of the candidate reference point relative to the reference point. The in-vehicle system user interface may be divided into a plurality of regions according to the position coordinates of each candidate reference point relative to the reference point. And determining the regional position information of each reference point to be selected, which is favorable for quickly and accurately positioning the target reference point in the subsequent voice interaction process.
Referring to fig. 6, step 03 includes:
031: carrying out normalization processing on the slot position information so as to correspond the target reference point to the reference point to be selected;
032: according to the corresponding relation between the target reference point and the reference point to be selected, corresponding the target relative position to the regional position information of the reference point to be selected;
033: and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the regional position information of the reference point to be selected.
The processor is used for carrying out normalization processing on the slot position information so as to correspond the target reference point with the reference point to be selected, corresponding the target relative position with the area position information of the reference point to be selected according to the corresponding relation between the target reference point and the reference point to be selected, and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the area position information of the reference point to be selected.
Specifically, the server of the vehicle-mounted system can perform normalization processing on the extracted slot position information, that is, the target reference point in the slot position information of the voice request input by the user and the reference point to be selected after the server is preloaded are subjected to entity normalization according to a preset semantic rule. The predetermined semantic rules include word vectors, edit distances, normalized word lists, and the like, and are not limited herein.
In an actual scene, when a user inputs a voice request of "switching the left side of a screen into music", normalization processes required to be performed include positioning of a reference point "screen" and acquisition of two types of slot position information of a "left side" area. Firstly, entity normalization is carried out on the term of the screen to the central control display screen according to a preset semantic rule, namely, correspondence between the center point of the target reference point screen and the center point of the central control display screen of the reference point to be selected is realized. After normalization, the position of the center of the central control display screen is taken as a reference point, and the coordinates are set to be (0, 0). Further, the term "left side" is also corresponding to the position information "left side" according to the predetermined semantic rule, and then the effective area is identified as the whole area of the "left side" of the center control display screen by taking the center point of the center control display screen as a reference point. After normalization, the coordinate positions of the lower left corner and the upper right corner of the region are obtained according to the region information calculation method described above, and thus the region range is determined.
It is to be understood that the user input voice request may include other kinds of slot information in addition to the two kinds of slot information of the target reference point information and the target relative position information. After the various slot position information is extracted, entity normalization can be carried out according to the preset semantic rule, so that the corresponding relation between the target reference point and the reference point to be selected, the corresponding relation between the target relative position and the regional position information of the reference point to be selected are established, and the target position of the voice request is determined.
After the target reference point and the reference point to be selected are corresponded through the normalization process, the target relative position is corresponded to the area position information of the reference point to be selected, in fig. 6, the shadow area determines the voice request target position, namely the left side of the central control display screen, and can be recorded as: (0, + ∞), (∞, + ∞).
Therefore, the slot position information in the extracted voice request can be normalized, after the target reference point and the corresponding reference point to be selected are corresponded, the relative position of the target is limited in the area position of the reference point to be selected, and the target position is finally determined. The exclusion of the invalid region is beneficial to the accuracy and efficiency of the target location determination process.
Step 04 comprises:
041: and preloading the to-be-selected operation object of the vehicle-mounted system user interface in the target position.
The processor is used for preloading the to-be-selected operation object with the vehicle-mounted system user interface located in the target position.
Specifically, the operation object of the user interface includes each control, a sub-user interface, and the like. The information to be loaded by the server mainly comprises control position information or size and position information of the sub-user interface. The control is used for storing the coordinate position of the central point of the corresponding central control display screen, and the sub user interface is used for storing the coordinate information of the lower left corner and the upper right corner of the area. After determining the target position of the voice request, the server determines the control and the sub-user interface which are positioned in the target position range of the user interface as the operation object to be selected, and preloads the position information and the size information of the operation object to be selected.
Therefore, all the operation objects to be selected which are possibly related to the voice request of the user can be selected in the determined target position range, all the operation objects to be selected can be preloaded before the target operation objects are searched, and the subsequent searching of the target operation objects is carried out in the preloaded operation objects to be selected. The pre-loading of the operation object to be selected in the target position of the user interface reduces the searching range of the target operation object and the time required by the searching process, and improves the efficiency of voice interaction.
Step 041 includes:
0411: determining a control with a central point positioned in a target position as an operation object to be selected;
0412: and preloading the control.
The processor is used for determining the control with the central point located in the target position as an operation object to be selected and preloading the control.
Specifically, the server determines the control with the central point located in the target position as the operation object to be selected. And the central point of the central control display screen is preloaded with the coordinate position information of the central point of the central control display screen. And the server subsequently determines the search range of the target operation object as the pre-loaded operation object to be selected. And for the control with other central points not in the target position, the server does not perform preloading.
Therefore, the server preloads the control positioned in the target position in the user interface, and limits the search range of the target control in the target operation object in the target position of the user interface, so that the search range of the control in the user interface is reduced, and the operation efficiency is improved.
Referring to fig. 7, the candidate operation object includes a sub-user interface, and step 041 further includes:
0413: determining the sub user interface with the area information in the target position as an operation object to be selected;
0414: the sub-user interfaces are preloaded.
The processor is used for determining the sub-user interface with the area information in the target position as an operation object to be selected and preloading the sub-user interface.
Specifically, the server sequentially acquires the area position information of the sub-user interfaces. In fig. 7, the entire area position of the left screen is included in the "left screen" area obtained after the normalization of the "left screen" of the voice request, and the server determines the "left screen" sub-user interface as the operation object to be selected and preloads the operation object. And the server subsequently determines the search range of the target operation object as the pre-loaded operation object to be selected. While for other sub-user interfaces not included in the target location, the server does not preload.
Therefore, the server preloads the sub-interface positioned in the target position in the user interface, and limits the search of the target sub-user interface in the target operation object in the target position area of the user interface, thereby reducing the search range of the sub-user interface and improving the operation efficiency.
Referring to fig. 3, step 04 further includes:
042: and searching the operable objects in the to-be-selected operable objects according to the intention information to determine the operable objects as target operable objects.
The processor is used for searching the operable objects in the to-be-selected operable objects according to the intention information to determine the operable objects as target operable objects.
Specifically, the intention information in the user voice request generally comprises a specific execution action, and the intention information of the voice request is predicted by using a preset natural language understanding model. In an actual scene, intention information of a voice request of switching the left side of a screen into music is extracted, the intention information is switched into music, a control or a sub-user interface related to a keyword music is searched in an operation object to be selected screened by a server, and finally, the action is confirmed to be executed by a left screen and a music player. The left screen and the music player are respectively determined as a target sub-user interface and a target control, namely a target operation object is determined.
Therefore, the server selects a target control or a sub-user interface capable of executing the intention information in the voice request of the user from the pre-loaded to-be-selected operation objects, and determines the target control or the sub-user interface as the target operation object, so that a command capable of being recognized and executed by the vehicle-mounted system is generated in the following process, and finally voice interaction is completed.
The computer-readable storage medium of the present application stores a computer program that, when executed by one or more processors, implements the method described above.
In the description of the present specification, reference to the description of the terms "above," "specifically," "similarly," "intelligible," or the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requirements for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application and that variations, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A method of voice interaction, comprising:
receiving a voice request which is transmitted by a vehicle and interacts with a user interface of a vehicle-mounted system;
extracting intention information and slot position information of the voice request, wherein the slot position information comprises a target reference point and a target relative position;
determining a target position of the voice request according to the slot position information;
determining a target operation object according to the target position and the intention information;
generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object;
forwarding the vehicle control instructions to the vehicle to complete the voice interaction.
2. The voice interaction method of claim 1, further comprising:
selecting an object with a fixed position in the vehicle-mounted system user interface as a reference point to be selected;
preloading the reference point to be selected;
and acquiring the position information of each reference point to be selected and the area position information of each reference point to be selected.
3. The voice interaction method according to claim 2, wherein the obtaining of the position information of each candidate reference point and the area position information of each candidate reference point comprises:
selecting one of all the reference points to be selected as a reference point;
and determining the position information of each reference point to be selected and the area position information of each reference point to be selected according to the reference point.
4. The method of claim 2, wherein determining the target location of the voice request according to the slot information comprises:
normalizing the slot position information to correspond the target reference point to the reference point to be selected;
according to the corresponding relation between the target reference point and the reference point to be selected, corresponding the target relative position to the regional position information of the reference point to be selected;
and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the regional position information of the reference point to be selected.
5. The voice interaction method according to claim 1, wherein the determining a target operation object according to the target position and the intention information comprises:
and preloading the to-be-selected operation object of the vehicle-mounted system user interface in the target position.
6. The voice interaction method of claim 5, wherein preloading the in-vehicle system user interface for the object to be selected in the target position comprises:
determining a control with a central point positioned in the target position as the object to be selected;
preloading the control.
7. The voice interaction method of claim 5, wherein the candidate operation objects comprise sub-user interfaces, and the preloading of the candidate operation objects of the in-vehicle system user interface in the target position comprises:
determining a sub user interface with the region information positioned in the target position as the operation object to be selected;
preloading the sub-user interfaces.
8. The method of claim 5, wherein determining a target operation object according to the target location and the intent information comprises:
and searching an operable object in the to-be-selected operation objects according to the intention information to determine the operable object as the target operation object.
9. A server, characterized in that the server comprises a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, carries out the method of any one of claims 1-8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by one or more processors, implements the method of any one of claims 1-8.
CN202211400473.1A 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium Active CN115512704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211400473.1A CN115512704B (en) 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211400473.1A CN115512704B (en) 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN115512704A true CN115512704A (en) 2022-12-23
CN115512704B CN115512704B (en) 2023-08-29

Family

ID=84514271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211400473.1A Active CN115512704B (en) 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115512704B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006195637A (en) * 2005-01-12 2006-07-27 Toyota Motor Corp Voice interaction system for vehicle
CN108806684A (en) * 2018-06-27 2018-11-13 Oppo广东移动通信有限公司 Position indicating method, device, storage medium and electronic equipment
CN109029449A (en) * 2018-06-29 2018-12-18 英华达(上海)科技有限公司 It looks for something method, device for searching article and system of looking for something
GB201905974D0 (en) * 2017-02-06 2019-06-12 Toshiba Kk A spoken dialogue system, a spoken dialogue method and a method of adapting a spoken dialogue system
CN111508482A (en) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 Semantic understanding and voice interaction method, device, equipment and storage medium
CN112242141A (en) * 2020-10-15 2021-01-19 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN113470649A (en) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 Voice interaction method and device
CN113723528A (en) * 2021-09-01 2021-11-30 斑马网络技术有限公司 Vehicle-mounted voice-video fusion multi-mode interaction method, system, device and storage medium
CN113823280A (en) * 2020-06-19 2021-12-21 华为技术有限公司 Intelligent device control method, electronic device and system
CN114913856A (en) * 2022-07-11 2022-08-16 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006195637A (en) * 2005-01-12 2006-07-27 Toyota Motor Corp Voice interaction system for vehicle
GB201905974D0 (en) * 2017-02-06 2019-06-12 Toshiba Kk A spoken dialogue system, a spoken dialogue method and a method of adapting a spoken dialogue system
CN108806684A (en) * 2018-06-27 2018-11-13 Oppo广东移动通信有限公司 Position indicating method, device, storage medium and electronic equipment
CN109029449A (en) * 2018-06-29 2018-12-18 英华达(上海)科技有限公司 It looks for something method, device for searching article and system of looking for something
CN111508482A (en) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 Semantic understanding and voice interaction method, device, equipment and storage medium
CN113823280A (en) * 2020-06-19 2021-12-21 华为技术有限公司 Intelligent device control method, electronic device and system
CN112242141A (en) * 2020-10-15 2021-01-19 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN113470649A (en) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 Voice interaction method and device
CN113723528A (en) * 2021-09-01 2021-11-30 斑马网络技术有限公司 Vehicle-mounted voice-video fusion multi-mode interaction method, system, device and storage medium
CN114913856A (en) * 2022-07-11 2022-08-16 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Also Published As

Publication number Publication date
CN115512704B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN108170859B (en) Voice query method, device, storage medium and terminal equipment
CN110825093B (en) Automatic driving strategy generation method, device, equipment and storage medium
CN111767021A (en) Voice interaction method, vehicle, server, system and storage medium
CN110928409B (en) Vehicle-mounted scene mode control method and device, vehicle and storage medium
US20060155546A1 (en) Method and system for controlling input modalities in a multimodal dialog system
CN110659360A (en) Man-machine conversation method, device and system
US20110135191A1 (en) Apparatus and method for recognizing image based on position information
CN115457959B (en) Voice interaction method, server and computer readable storage medium
CN107832035B (en) Voice input method of intelligent terminal
CN115457960B (en) Voice interaction method, server and computer readable storage medium
CN114880569A (en) Recommendation control method and device for vehicle, electronic equipment, system and storage medium
CN115512704A (en) Voice interaction method, server and computer readable storage medium
CN113012687A (en) Information interaction method and device and electronic equipment
CN115376513B (en) Voice interaction method, server and computer readable storage medium
CN105955698B (en) Voice control method and device
CN112109729A (en) Human-computer interaction method, device and system for vehicle-mounted system
CN115510290A (en) Rapid retrieval method and terminal under digital twin environment
CN112164402B (en) Vehicle voice interaction method and device, server and computer readable storage medium
CN111931702B (en) Target pushing method, system and equipment based on eyeball tracking
KR20220166784A (en) Riding method, device, facility and storage medium based on autonomous driving
DE102005059390A1 (en) Speech recognition method for navigation system of motor vehicle, involves carrying out one of speech recognitions by user to provide one of recognizing results that is function of other recognizing result and/or complete word input
CN102999275B (en) Obtain method and the device of word conversion result
CN115588432B (en) Voice interaction method, server and computer readable storage medium
CN115565532B (en) Voice interaction method, server and computer readable storage medium
CN111104611A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant