CN115512704B - Voice interaction method, server and computer readable storage medium - Google Patents

Voice interaction method, server and computer readable storage medium Download PDF

Info

Publication number
CN115512704B
CN115512704B CN202211400473.1A CN202211400473A CN115512704B CN 115512704 B CN115512704 B CN 115512704B CN 202211400473 A CN202211400473 A CN 202211400473A CN 115512704 B CN115512704 B CN 115512704B
Authority
CN
China
Prior art keywords
target
reference point
position information
operation object
user interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211400473.1A
Other languages
Chinese (zh)
Other versions
CN115512704A (en
Inventor
樊骏锋
赵群
巴思然
朱晚贺
童栎滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202211400473.1A priority Critical patent/CN115512704B/en
Publication of CN115512704A publication Critical patent/CN115512704A/en
Application granted granted Critical
Publication of CN115512704B publication Critical patent/CN115512704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • B60W2040/089Driver voice
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2540/00Input parameters relating to occupants
    • B60W2540/21Voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Navigation (AREA)

Abstract

The application discloses a voice interaction method, which comprises the following steps: receiving a voice request which is forwarded by a vehicle and interacted with a user interface of a vehicle-mounted system; extracting intention information and slot position information of a voice request; determining a target operation object according to the target position and the intention information; generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object; and forwarding the vehicle control instruction to the vehicle to complete voice interaction. In the application, in the process that a user interacts with a vehicle-mounted system user interface through voice, a server can extract intention information and slot position information of a voice request and determine a target position of the voice request according to the slot position information; further, according to the intention information, a target operation object is determined in the target position, and finally, a vehicle control instruction is generated. According to the application, the slot position information comprising the target reference point and the target relative position information can be extracted, so that a user is not required to perform multi-round clarification, and the positioning of the target operation object is rapidly completed.

Description

Voice interaction method, server and computer readable storage medium
Technical Field
The present application relates to the field of vehicle-mounted voice technologies, and in particular, to a voice interaction method, a server, and a computer readable storage medium.
Background
Currently, in-vehicle voice technology may support user interaction within a vehicle cabin via voice, such as controlling vehicle components or interacting with components in an in-vehicle system user interface. For example, a user controls the opening of a music player control in a user interface of the in-vehicle system by voice, and so on. In an actual vehicle scenario, there are often multiple controls or sub-user interfaces in the user interface of the vehicle-mounted system, and the user voice request may hit multiple identically expressed controls or sub-user interfaces at the same time. In this case, a second round of clarification inquiry is often required to request the user to perform secondary selection from multiple candidates, and confirm the final target, thereby affecting the convenience of voice interaction.
Disclosure of Invention
The application provides a voice interaction method, a server and a computer readable storage medium.
The voice interaction method of the application comprises the following steps:
receiving a voice request which is forwarded by a vehicle and interacted with a user interface of a vehicle-mounted system;
extracting intention information and slot position information of the voice request, wherein the slot position information comprises a target reference point and a target relative position;
determining a target position of the voice request according to the slot position information;
determining a target operation object according to the target position and the intention information;
generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object;
forwarding the vehicle control instructions to the vehicle to complete the voice interaction.
In the application, in the process that the user interacts with the user interface of the vehicle-mounted system through voice, the server can extract the intention information and the slot information of the voice request for the received voice request, wherein the slot information comprises the target reference point and the target relative position. Thus, the server can determine the target position of the voice request according to the slot position information, further determine the target operation object in the target position according to the intention information, and finally generate the vehicle control instruction. According to the application, the slot position information comprising the target reference point and the target relative position information can be extracted, so that the positioning of the target operation object can be rapidly completed, multiple rounds of clarification by a user is not required, and the fluency and convenience of voice instructions are improved.
The method further comprises the steps of:
selecting an object with a fixed position in a user interface of the vehicle-mounted system as a reference point to be selected;
preloading the reference points to be selected;
and acquiring the position information of each reference point to be selected and the region position information of each reference point to be selected.
Therefore, the object with fixed position in the user interface of the vehicle-mounted system can be selected as the reference point to be selected, each reference point to be selected in the user interface can be preloaded before the target operation object is positioned, and the user interface of the vehicle-mounted system is divided into a plurality of areas according to the position information of the reference points, so that the process of selecting and positioning the reference points according to specific voice requests can be carried out later.
The obtaining the position information of each reference point to be selected and the region position information of each reference point to be selected includes:
selecting one of all the reference points to be selected as a datum reference point;
and determining the position information of each reference point to be selected and the region position information of each reference point to be selected according to the reference point.
In this way, one of the reference points to be selected with a fixed position in the user interface of the vehicle-mounted system can be selected as the datum reference point, and the position information of each reference point to be selected in the interface can be expressed as the coordinates of the reference point relative to the datum reference point. The on-board system user interface may be divided into a plurality of regions according to the position coordinates of each of the candidate reference points relative to the base reference point. And determining the regional position information of each reference point to be selected, thereby being beneficial to quickly and accurately positioning the target reference point in the subsequent voice interaction process.
The determining the target position of the voice request according to the slot position information comprises the following steps:
normalizing the slot position information to enable the target reference point to correspond to the reference point to be selected;
according to the corresponding relation between the target reference point and the reference point to be selected, the target relative position corresponds to the region position information of the reference point to be selected;
and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the region position information of the reference point to be selected.
Therefore, the extracted slot position information in the voice request can be normalized, after the target reference point corresponds to the corresponding reference point to be selected, the relative position of the target is limited in the area position of the reference point to be selected, and finally the target position is determined. The exclusion of invalid regions is advantageous for the accuracy and efficiency of the target location determination process.
The determining a target operation object according to the target position and the intention information includes:
and preloading the operation object to be selected, which is positioned in the target position, of the user interface of the vehicle-mounted system.
In this way, all the candidate operation objects which may be related to the user voice request can be selected in the determined target position range, all the candidate operation objects can be preloaded before the target operation object is searched, and the subsequent search of the target operation object is performed in the preloaded candidate operation objects. The pre-loading of the operation object to be selected in the target position of the user interface reduces the searching range of the target operation object and the time required by the searching process, and improves the efficiency of voice interaction.
The preloading the operation object to be selected of the vehicle-mounted system user interface in the target position comprises the following steps:
determining a control with a center point located in the target position as the operation object to be selected;
and preloading the control.
Therefore, the server preloads the control in the target position in the user interface, and limits the searching range of the target control in the target operation object to the target position of the user interface, so that the searching range of the control in the user interface is reduced, and the running efficiency is improved.
The to-be-selected operation object comprises a sub-user interface, and the pre-loading of the to-be-selected operation object of the vehicle-mounted system user interface in the target position comprises the following steps:
determining a sub user interface with the region information positioned in the target position as the operation object to be selected;
and preloading the sub user interfaces.
Therefore, the server preloads the sub-interface positioned in the target position in the user interface, and the search of the target sub-user interface in the target operation object is limited in the target position area of the user interface, so that the search range of the sub-user interface is reduced, and the operation efficiency is improved.
The determining a target operation object according to the target position and the intention information includes:
and searching an operable object in the operation object to be selected according to the intention information so as to determine the operable object as the target operation object.
In this way, the server selects a target control or a sub-user interface capable of executing the intended information in the user voice request from the preloaded to-be-selected operation objects, and determines the target control or the sub-user interface as the target operation object so as to generate the instructions which can be recognized and executed by the vehicle-mounted system later, and finally completes voice interaction.
The server of the present application comprises a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, implements the method described above.
The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the method described above.
Additional aspects and advantages of embodiments of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a voice interaction method according to the present application;
FIG. 2 is a schematic diagram of a voice interaction method according to the present application;
FIG. 3 is a second flowchart of the voice interaction method of the present application;
FIG. 4 is a third flow chart of the voice interaction method of the present application;
FIG. 5 is a schematic diagram showing a voice interaction method according to the present application;
FIG. 6 is a second state diagram of the voice interaction method of the present application;
FIG. 7 is a third exemplary embodiment of a voice interaction method according to the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application and are not to be construed as limiting the embodiments of the present application.
Referring to fig. 1, 2 and 3, the present application provides a voice interaction method, which includes:
01: receiving a voice request which is forwarded by a vehicle and interacted with a user interface of a vehicle-mounted system;
02: extracting intention information and slot position information of a voice request;
03: determining a target position of the voice request according to the slot position information;
04: determining a target operation object according to the target position and the intention information;
05: generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object;
06: and forwarding the vehicle control instruction to the vehicle to complete voice interaction.
The application also provides a server, which comprises a memory and a processor. The speech processing method of the present application may be implemented by the server of the present application. Specifically, the memory stores a computer program, the processor is used for receiving a voice request which is forwarded by the vehicle and interacts with a user interface of the vehicle-mounted system, extracting intention information and slot information of the voice request, then determining a target position of the voice request according to the slot information, determining a target operation object according to the target position and the intention information, generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object, and finally forwarding the vehicle control instruction to the vehicle to complete voice interaction.
The user interface of the vehicle-mounted system is a medium for information exchange between the vehicle-mounted system and a user. To facilitate user interaction with the vehicle, currently, a user may be supported to interact within the vehicle cabin via voice, such as controlling vehicle components or interacting with components in a user interface of the vehicle system. And if the user sends out a voice request, controlling each control, sub-user interface and the like in the vehicle-mounted user interface and the user interface. When multiple areas in the user interface can display the same interface or a voice request hits multiple controls in the user interface at the same time, a voice assistant is often required to perform a second round of clarification inquiry, and a user needs to continuously pay attention to prompt information on a central control display screen and perform secondary selection to confirm a final target operation object. In the example shown in fig. 2, both the left and right sides of the user interface may display a navigation or music interface. When a user sends a request of switching the left side of a screen to music, the related art cannot carry out groove lifting on the screen, the left side of the screen and the like, and the user requirement cannot be directly determined according to a natural voice understanding result of the voice request, so that the left side of a user interface is switched to a music interface. At this time, the voice assistant will ask the user for the user interface area capable of implementing the music interface switch twice, and the user needs to confirm whether the left-hand switch or the right-hand switch is twice. However, the convenience of voice interaction is not only affected, but also the safety of driving can be affected due to distraction of a user.
As shown in fig. 3, in the present application, for the above scenario, for a voice request sent by a user and interacting with a user interface of a vehicle-mounted system, for example, "switch left of screen to music" in the above example, after receiving the voice request forwarded by a vehicle, a server performs natural language understanding on the voice request, and extracts intention information and slot information in the voice request by using an intention classification model and a slot extraction model respectively. The intention classification model is used for carrying out classification prediction on the content of the execution instruction part in the voice request, namely carrying out classification prediction on the content of the voice request, which is switched to music, so as to obtain intention information, which is switched to music. The intent classification model used may be a correlation model preset by the vehicle control system, reducing the time cost and inconsistencies of invocation.
The slot extraction model may extract the position location information in the voice request, including the slot of the target reference point information and the slot of the target relative position information. Wherein, the target reference point refers to some natural language information describing the user interface component or region in the voice request, which can include a screen, a dashboard, etc.; the target relative position refers to the region position information in the voice request describing the position relative to the target reference point, and may include "left", "right", "upper", and so on. For the voice request "switch screen left to music", the extraction of slot information includes: reference point slot "screen", position information slot "left". Then, the target position of the voice request, that is, the absolute coordinates or coordinate range of the "left of the screen" on the reference coordinate system is determined based on the result of the slot extraction. And further searching a target operation object in the user interface according to the target position and the extracted intention information, wherein the target operation object is positioned in a coordinate range corresponding to the slot position information and can complete a control instruction corresponding to the intention information. And finally, the server fuses the obtained target position and the target operation object into a vehicle control instruction corresponding to the voice request of the user, and issues the control instruction to the vehicle, and finally the vehicle executes the instruction action. According to the voice interaction method based on the fixed reference point position information on the central control display screen, according to the position information of the user voice request relative to the preset fixed reference point position, the positioning of a plurality of elements in the voice request can be completed in one round, and the execution of the voice request can be directly completed. The voice assistant does not need to initiate two rounds of confirmation to the user, so that the voice interaction process is smoother.
In summary, in the process that a user interacts with a user interface of a vehicle-mounted system through voice, for a received voice request, a server can extract intention information and slot information of the voice request, wherein the slot information comprises a target reference point and a target relative position. Thus, the server can determine the target position of the voice request according to the slot position information, further determine the target operation object in the target position according to the intention information, and finally generate the vehicle control instruction. According to the application, the slot position information comprising the target reference point and the target relative position information can be extracted, so that the positioning of the target operation object can be rapidly completed, multiple rounds of clarification by a user is not required, and the fluency and convenience of voice instructions are improved.
Referring to fig. 3 and 4, the method further includes:
07: selecting an object with a fixed position in a user interface of the vehicle-mounted system as a reference point to be selected;
08: preloading a reference point to be selected;
09: and acquiring the position information of each reference point to be selected and the region position information of each reference point to be selected.
The processor is used for selecting an object with a fixed position in the user interface of the vehicle-mounted system as a reference point to be selected, preloading the reference point to be selected, and acquiring the position information of each reference point to be selected and the regional position information of each reference point to be selected.
Specifically, the reference points to be selected refer to some fixed points on the screen, such as: "center control display", "dashboard", "status bar", etc. The position of the reference point to be selected does not change along with the change of the display content of the user interface, and can be used as a reference of the relative position.
In the implementation process, the server pre-loads basic information of all the reference points to be selected in the user interface. Subsequently, for each of the reference points to be selected that have been preloaded, its position information is acquired, and the area position information of each of the reference points to be selected is determined. The specific process of acquiring the position information of each reference point to be selected and the region position information of each reference point to be selected will be described in detail below.
Therefore, the object with fixed position in the user interface of the vehicle-mounted system can be selected as the reference point to be selected, each reference point to be selected in the user interface can be preloaded before the target operation object is positioned, and the user interface of the vehicle-mounted system is divided into a plurality of areas according to the position information of the reference points, so that the process of selecting and positioning the reference points according to specific voice requests can be carried out later.
Referring to fig. 5, step 09 includes:
091: selecting one of all the reference points to be selected as a datum reference point;
092: and determining the position information of each reference point to be selected and the region position information of each reference point to be selected according to the datum reference point.
The processor is used for selecting one of all the reference points to be selected as a datum reference point, and determining the position information of each reference point to be selected and the region position information of each reference point to be selected according to the datum reference point.
Specifically, the base reference point is a preloaded reference point representing the position of other reference points to be selected, and in one example, the coordinates of the base reference point are typically set to (0, 0).
In the interaction process, the server can select one of the reference points to be selected as a datum reference point according to the voice request input by the user, and the preloaded position information and the preloaded region position information of each reference point to be selected are represented based on the position coordinates of the selected datum reference point.
In one example, the center of the center control display of the user interface may be selected as the fiducial reference point and the coordinates set to (0, 0). Setting the upper part of the center of the central control display screen as the positive direction of the vertical axis and the right part as the positive direction of the horizontal axis, the coordinates of the instrument panel are (0, 10), and the coordinates of the status bar are (0, -10). For the central control display screen, after the voice request related to the left side is input, the area to be acquired is all the areas on the left side of the vertical axis where the up-down direction is located by taking the central point of the central control display screen as the origin.
In other examples, other positions may be selected as the reference point, for example, the lower left corner of the central control display screen may be selected as the reference point, and the like, which is not limited herein.
The position information calculating method calculates the position information of the area divided by taking the reference point to be selected as the center according to the position information of each reference point to be selected. For example, in an actual scenario, the "left side" of the center control display may be represented by coordinates: (center point abscissa of center control display screen, plus infinity), (minus infinity ), namely all areas between the upper right corner and lower left corner coordinates of the "left side" area of center control display screen. Similarly, the status bar "upper left" region is represented with coordinates: (minus infinity, status bar center point ordinate) (status bar abscissa, plus infinity), i.e. the entire area between the lower left corner and the upper right corner coordinates of the status bar "upper left" area.
In this way, one of the reference points to be selected with a fixed position in the user interface of the vehicle-mounted system can be selected as the datum reference point, and the position information of each reference point to be selected in the interface can be expressed as the coordinates of the reference point relative to the datum reference point. The on-board system user interface may be divided into a plurality of regions according to the position coordinates of each of the candidate reference points relative to the base reference point. And determining the regional position information of each reference point to be selected, thereby being beneficial to quickly and accurately positioning the target reference point in the subsequent voice interaction process.
Referring to fig. 6, step 03 includes:
031: normalizing the slot position information to enable the target reference point to correspond to the reference point to be selected;
032: according to the corresponding relation between the target reference point and the reference point to be selected, the target relative position is corresponding to the region position information of the reference point to be selected;
033: and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the regional position information of the reference point to be selected.
The processor is used for carrying out normalization processing on the slot position information so as to enable the target reference point to correspond to the reference point to be selected, enabling the target relative position to correspond to the region position information of the reference point to be selected according to the corresponding relation between the target reference point and the reference point to be selected, and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the region position information of the reference point to be selected.
Specifically, the server of the vehicle-mounted system can perform normalization processing on the extracted slot information, namely, performing entity normalization on a target reference point in the slot information of the voice request input by the user and a reference point to be selected after the preloading of the server is completed according to a preset semantic rule. The predetermined semantic rules include word vectors, edit distances, normalized word lists, etc., and are not limited herein.
In an actual scene, when a user inputs a voice request to switch the left side of a screen into music, the normalization process to be performed comprises positioning of a reference point screen and acquisition of two types of slot position information of a left area. Firstly, the term "screen" is subjected to entity normalization to the "central control display screen" according to a preset semantic rule, namely, the correspondence between the center point of the target reference point "screen" and the center point of the central control display screen "of the reference point to be selected is realized. After normalization, the position of the center of the "center control display screen" is used as a reference point, and the coordinates are set as (0, 0). Further, the term "left" is also corresponding to the position information "left" according to the predetermined semantic rule, and the effective area is identified as the whole area "left" with the center point of the "center control display screen" as the reference point. After normalization, the coordinate positions of the lower left corner and the upper right corner of the region are obtained according to the region information calculation method described above, so that the region range is determined.
It will be appreciated that the user input voice request may include other types of slot information in addition to the two types of slot information, target reference point information and target relative position information. And after the various slot position information is extracted, entity normalization can be carried out according to the preset semantic rule, so that the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the region position information of the reference point to be selected are established, and the target position of the voice request is determined.
After the target reference point and the reference point to be selected are normalized, the target relative position corresponds to the region position information of the reference point to be selected, and in fig. 6, the hatched region determines the "left side of the central control display screen" of the target position of the voice request, which can be recorded as: (0, + -infinity) (- + -infinity).
Therefore, the extracted slot position information in the voice request can be normalized, after the target reference point corresponds to the corresponding reference point to be selected, the relative position of the target is limited in the area position of the reference point to be selected, and finally the target position is determined. The exclusion of invalid regions is advantageous for the accuracy and efficiency of the target location determination process.
Step 04 includes:
041: and preloading the operation object to be selected in the target position of the user interface of the vehicle-mounted system.
The processor is used for preloading the operation object to be selected, which is located in the target position by the user interface of the vehicle-mounted system.
Specifically, the operation objects of the user interface include controls, sub-user interfaces and the like. The information to be loaded by the server mainly comprises control position information or the size and position information of the sub-user interface. The control is used for storing the coordinate position of the center point of the corresponding central control display screen, and the sub-user interface is used for storing the coordinate information of the lower left corner and the upper right corner of the area. After determining the target position of the voice request, the server determines that the control and the sub-user interface which are positioned in the target position range of the user interface are to-be-selected operation objects, and preloads the position information and the size information of the to-be-selected operation objects.
In this way, all the candidate operation objects which may be related to the user voice request can be selected in the determined target position range, all the candidate operation objects can be preloaded before the target operation object is searched, and the subsequent search of the target operation object is performed in the preloaded candidate operation objects. The pre-loading of the operation object to be selected in the target position of the user interface reduces the searching range of the target operation object and the time required by the searching process, and improves the efficiency of voice interaction.
Step 041 includes:
0411: determining a control with a center point positioned in a target position as an operation object to be selected;
0412: the control is preloaded.
The processor is used for determining the control with the center point positioned in the target position as the operation object to be selected and preloading the control.
Specifically, the server determines the control with the center point located in the target position as the operation object to be selected. And preloading the coordinate position information of the central point of the central control display screen. The search range of the server when the target operation object is determined later is the pre-loaded operation object to be selected. While for controls whose other center points are not within the target location, the server is not preloaded.
Therefore, the server preloads the control in the target position in the user interface, and limits the searching range of the target control in the target operation object to the target position of the user interface, so that the searching range of the control in the user interface is reduced, and the running efficiency is improved.
Referring to fig. 7, the candidate operation object includes a sub-user interface, and step 041 further includes:
0413: determining a sub user interface with the region information positioned in the target position as an operation object to be selected;
0414: the sub-user interface is preloaded.
The processor is used for determining the sub-user interface with the area information in the target position as an operation object to be selected and preloading the sub-user interface.
Specifically, the server sequentially acquires the region position information of the sub-user interface. In fig. 7, the entire area position of the left screen is included in the "left screen" area obtained by normalizing the "left screen" of the voice request, and the server determines the "left screen" sub-user interface as the candidate operation object and preloads it. The search range of the server when the target operation object is determined later is the pre-loaded operation object to be selected. While for other child user interfaces not included in the target location, the server is not preloaded.
Therefore, the server preloads the sub-interface positioned in the target position in the user interface, and the search of the target sub-user interface in the target operation object is limited in the target position area of the user interface, so that the search range of the sub-user interface is reduced, and the operation efficiency is improved.
Referring to fig. 3, step 04 further includes:
042: and searching the operable object in the operation object to be selected according to the intention information to determine the operable object as a target operation object.
The processor is used for searching the operable object in the operation object to be selected according to the intention information so as to determine the operable object as a target operation object.
Specifically, the intent information in the user's voice request generally includes specific execution actions, and the intent information of the voice request is predicted using a preset natural language understanding model. In an actual scene, extracting intention information of a voice request of switching the left side of a screen into music, namely switching the left side of the screen into music, searching a control or a sub-user interface related to a keyword of music in a to-be-selected operation object screened by a server, and finally confirming that the action is executed by the left screen and a music player. The left screen and the music player are determined as a target sub-user interface and a target control, i.e., a target operation object, respectively.
In this way, the server selects a target control or a sub-user interface capable of executing the intended information in the user voice request from the preloaded to-be-selected operation objects, and determines the target control or the sub-user interface as the target operation object so as to generate the instructions which can be recognized and executed by the vehicle-mounted system later, and finally completes voice interaction.
The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the method described above.
In the description of the present specification, reference to the terms "above," "specifically," "similarly," "understandably," and the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requests for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (9)

1. A method of voice interaction, comprising:
receiving a voice request forwarded by a vehicle and interacted with a user interface element in a user interface of a vehicle-mounted system;
selecting an object with a fixed position in a user interface of the vehicle-mounted system as a reference point to be selected;
preloading the reference points to be selected;
acquiring the position information of each reference point to be selected and the region position information of each reference point to be selected;
extracting intention information and slot position information of the voice request, wherein the slot position information comprises a target reference point and a target relative position;
determining a target position of the voice request according to the slot position information, the reference point to be selected and the regional position information of the reference point to be selected;
determining a target operation object according to the target position and the intention information;
generating a vehicle control instruction corresponding to the voice request according to the target position and the target operation object;
forwarding the vehicle control instructions to the vehicle to complete the voice interaction.
2. The voice interaction method according to claim 1, wherein the obtaining the position information of each of the reference points to be selected and the area position information of each of the reference points to be selected includes:
selecting one of all the reference points to be selected as a datum reference point;
and determining the position information of each reference point to be selected and the region position information of each reference point to be selected according to the reference point.
3. The voice interaction method according to claim 1, wherein the determining the target position of the voice request according to the slot position information, the reference point to be selected, and the area position information of the reference point to be selected includes:
normalizing the slot position information to enable the target reference point to correspond to the reference point to be selected;
according to the corresponding relation between the target reference point and the reference point to be selected, the target relative position corresponds to the region position information of the reference point to be selected;
and determining the target position of the voice request according to the corresponding relation between the target reference point and the reference point to be selected and the corresponding relation between the target relative position and the region position information of the reference point to be selected.
4. The voice interaction method according to claim 1, wherein the determining a target operation object according to the target position and the intention information includes:
and preloading the operation object to be selected, which is positioned in the target position, of the user interface of the vehicle-mounted system.
5. The voice interaction method according to claim 4, wherein preloading the selected operation object of the in-vehicle system user interface in the target location comprises:
determining a control with a center point located in the target position as the operation object to be selected;
and preloading the control.
6. The voice interaction method according to claim 4, wherein the operation object to be selected includes a sub-user interface, and the preloading the operation object to be selected in the target location by the in-vehicle system user interface includes:
determining a sub user interface with the region information positioned in the target position as the operation object to be selected;
and preloading the sub user interfaces.
7. The voice interaction method according to claim 4, wherein the determining a target operation object according to the target position and the intention information comprises:
and searching an operable object in the operation object to be selected according to the intention information so as to determine the operable object as the target operation object.
8. A server comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the method of any of claims 1-7.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by one or more processors, implements the method according to any of claims 1-7.
CN202211400473.1A 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium Active CN115512704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211400473.1A CN115512704B (en) 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211400473.1A CN115512704B (en) 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN115512704A CN115512704A (en) 2022-12-23
CN115512704B true CN115512704B (en) 2023-08-29

Family

ID=84514271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211400473.1A Active CN115512704B (en) 2022-11-09 2022-11-09 Voice interaction method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115512704B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006195637A (en) * 2005-01-12 2006-07-27 Toyota Motor Corp Voice interaction system for vehicle
CN108806684A (en) * 2018-06-27 2018-11-13 Oppo广东移动通信有限公司 Position indicating method, device, storage medium and electronic equipment
CN109029449A (en) * 2018-06-29 2018-12-18 英华达(上海)科技有限公司 It looks for something method, device for searching article and system of looking for something
GB201905974D0 (en) * 2017-02-06 2019-06-12 Toshiba Kk A spoken dialogue system, a spoken dialogue method and a method of adapting a spoken dialogue system
CN111508482A (en) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 Semantic understanding and voice interaction method, device, equipment and storage medium
CN112242141A (en) * 2020-10-15 2021-01-19 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN113470649A (en) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 Voice interaction method and device
CN113723528A (en) * 2021-09-01 2021-11-30 斑马网络技术有限公司 Vehicle-mounted voice-video fusion multi-mode interaction method, system, device and storage medium
CN113823280A (en) * 2020-06-19 2021-12-21 华为技术有限公司 Intelligent device control method, electronic device and system
CN114913856A (en) * 2022-07-11 2022-08-16 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006195637A (en) * 2005-01-12 2006-07-27 Toyota Motor Corp Voice interaction system for vehicle
GB201905974D0 (en) * 2017-02-06 2019-06-12 Toshiba Kk A spoken dialogue system, a spoken dialogue method and a method of adapting a spoken dialogue system
CN108806684A (en) * 2018-06-27 2018-11-13 Oppo广东移动通信有限公司 Position indicating method, device, storage medium and electronic equipment
CN109029449A (en) * 2018-06-29 2018-12-18 英华达(上海)科技有限公司 It looks for something method, device for searching article and system of looking for something
CN111508482A (en) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 Semantic understanding and voice interaction method, device, equipment and storage medium
CN113823280A (en) * 2020-06-19 2021-12-21 华为技术有限公司 Intelligent device control method, electronic device and system
CN112242141A (en) * 2020-10-15 2021-01-19 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN113470649A (en) * 2021-08-18 2021-10-01 三星电子(中国)研发中心 Voice interaction method and device
CN113723528A (en) * 2021-09-01 2021-11-30 斑马网络技术有限公司 Vehicle-mounted voice-video fusion multi-mode interaction method, system, device and storage medium
CN114913856A (en) * 2022-07-11 2022-08-16 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Also Published As

Publication number Publication date
CN115512704A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
CN110825093B (en) Automatic driving strategy generation method, device, equipment and storage medium
CN112766468B (en) Trajectory prediction method and device, storage medium and electronic equipment
US20060155546A1 (en) Method and system for controlling input modalities in a multimodal dialog system
CN112595337B (en) Obstacle avoidance path planning method and device, electronic device, vehicle and storage medium
CN112164401B (en) Voice interaction method, server and computer-readable storage medium
JPH11211489A (en) Navigation device
CN115064166B (en) Vehicle voice interaction method, server and storage medium
CN115457959B (en) Voice interaction method, server and computer readable storage medium
CN115064167A (en) Voice interaction method, server and storage medium
CN114880569A (en) Recommendation control method and device for vehicle, electronic equipment, system and storage medium
CN113421561A (en) Voice control method, voice control device, server and storage medium
JP6808064B2 (en) Map information management device, map information management system, and map information management method
CN115457960B (en) Voice interaction method, server and computer readable storage medium
CN115512704B (en) Voice interaction method, server and computer readable storage medium
CN108900973A (en) A kind of vehicle positioning method, device, system, terminal and readable medium
CN115376513B (en) Voice interaction method, server and computer readable storage medium
JP7262526B2 (en) Determining method, device and electronic equipment for focal position
CN105955698B (en) Voice control method and device
CN112164402B (en) Vehicle voice interaction method and device, server and computer readable storage medium
CN111931702B (en) Target pushing method, system and equipment based on eyeball tracking
KR20220166784A (en) Riding method, device, facility and storage medium based on autonomous driving
CN111104611B (en) Data processing method, device, equipment and storage medium
CN109783608B (en) Target hypothesis determination method and device, readable storage medium and electronic equipment
CN115565532B (en) Voice interaction method, server and computer readable storage medium
CN115588432B (en) Voice interaction method, server and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant