WO2022042664A1 - Human-computer interaction method and device - Google Patents

Human-computer interaction method and device Download PDF

Info

Publication number
WO2022042664A1
WO2022042664A1 PCT/CN2021/114853 CN2021114853W WO2022042664A1 WO 2022042664 A1 WO2022042664 A1 WO 2022042664A1 CN 2021114853 W CN2021114853 W CN 2021114853W WO 2022042664 A1 WO2022042664 A1 WO 2022042664A1
Authority
WO
WIPO (PCT)
Prior art keywords
historical
command
decision
target
weight
Prior art date
Application number
PCT/CN2021/114853
Other languages
French (fr)
Chinese (zh)
Inventor
王仁宇
杨宇庭
钱莉
黄雪妍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022042664A1 publication Critical patent/WO2022042664A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and in particular, to a human-computer interaction method and device.
  • AI Artificial Intelligence
  • HAI Human-Computer Interaction Techniques
  • electronic devices can use Human-Computer Interaction Techniques (HCI) to communicate with users, so that electronic devices can understand the user's intention and complete the work of the user's intention.
  • HCI Human-Computer Interaction Techniques
  • human-computer interaction has a wide range of applications in many fields, such as smart home, automatic driving and so on.
  • the interaction of electronic devices with users is not very “natural” and “smart”.
  • the intention obtained by the electronic device through natural language understanding of the received voice command is not very close to the real intention of the user.
  • the decision-making of electronic devices is mechanically rigid and cannot give users optimal decisions.
  • the user experience of human-computer interaction is low.
  • the present application provides a human-computer interaction method and device.
  • natural language understanding When performing natural language understanding on a target command issued by a user in a current human-computer interaction task, the semantics of historical commands in historical human-computer interaction tasks are referred to, which assists in understanding the target command.
  • Natural language understanding makes the results of natural language understanding more appropriate to the user's real intentions; historical decisions are referenced when executing system decisions, and target decisions can be optimized according to historical decisions, effectively improving the user experience of human-computer interaction.
  • the present application provides a human-computer interaction method, and the method can be applied to an electronic device, or the method can be applied to a human-computer interaction device that can support an electronic device to implement the method, for example, the human-computer interaction device includes a chip
  • the system and the method include: after the electronic device receives the target command issued by the user, using the historical command, the historical decision of the historical command and the target command to generate the target decision of the target command, and output the target decision.
  • the historical command is the command of the historical human-computer interaction task
  • the target command is the command of the current human-computer interaction task.
  • the historical command may be one or more historical commands for the user to perform multiple rounds of human-computer interaction tasks with the electronic device.
  • the historical commands may be commands for multiple historical users to perform multiple rounds of human-computer interaction tasks with the electronic device.
  • History users can also contain users who issued the target command.
  • the semantics of the historical command in the historical human-computer interaction task are referred to, which assists the natural language understanding of the target command, and makes the natural language understanding result.
  • the historical decision-making is referred to when executing the system decision, and the target decision can be optimized according to the historical decision-making, which effectively improves the user experience of human-computer interaction.
  • generating the target decision of the target command by using the historical command, the historical decision of the historical command and the target command includes: the electronic device performs weighted coding on the historical command based on the relevant weight of the command semantics, and obtains the coding information of the historical command,
  • the command semantic correlation weight indicates the degree of semantic correlation between the target command and the historical command; the target decision is generated according to the target command, the coding information of the historical command and the historical decision of the historical command.
  • the electronic device before weighted coding of historical commands based on the command semantic correlation weight, performs semantic coding on the target command to obtain the semantic vector of the target command; the command is obtained by calculating the similarity according to the semantic vector of the target command and the semantic vector of the historical command. Semantic relevance weights.
  • weighted encoding is performed on historical commands based on command semantic correlation weights to obtain historical command encoding information, including: weighted encoding historical commands based on command semantic correlation weights and user weights to obtain historical command encoding information , the user weight represents the degree of association between the user and the historical user who issued the historical command.
  • the electronic device may obtain the user weight according to the user's voiceprint to obtain the degree of association between the user and the historical user.
  • weighted coding is performed on historical commands based on command semantic correlation weights and user weights to obtain historical command coding information, including: historical command coding based on command semantic correlation weights, user weights, and user relationship correlation weights Weighted coding is performed to obtain historical command coding information, and the user relationship relevance weight is a preset relationship strength value of multiple users.
  • generating the target decision according to the target command, the historical command coding information and the historical decision of the historical command including: using an intention understanding model to perform natural language understanding on the word vector of the target command and the historical command coding information, Obtain the intent and slot of the target command; generate the target decision according to the intent and slot of the target command, as well as the historical decision encoding vector of the historical command.
  • generating the target decision according to the intent and slot position of the target command and the historical decision coding vector of the historical command including: encoding the intention and slot position of the target command to obtain a decision coding vector; based on the historical decision coding vector weights
  • the historical decision coding vector is weighted and encoded to obtain the historical decision coding information.
  • the weight of the historical decision coding vector represents the degree of correlation between the decision coding vector and the historical decision coding vector; the decision coding vector and the historical decision coding information are analyzed by the decision model to generate the target decision. .
  • the historical decision of the historical command is referenced, and the decision content can be optimized according to the historical decision of the historical command, which enriches the information of the decision-making of the electronic device, and effectively improves the user experience of the human-computer interaction.
  • the electronic device performs similarity calculation on the decision coding vector and the historical decision coding vector to obtain the historical decision coding vector weight.
  • weighted encoding is performed on the historical decision encoding vector based on the historical decision encoding vector weight to obtain historical decision encoding information, including: weighted encoding the historical decision encoding vector based on the historical decision encoding vector weight and the user weight , to obtain the historical decision coding information; or, weighted coding the historical decision coding vector based on the historical decision coding vector weight, the user weight and the user relationship correlation weight to obtain the historical decision coding information.
  • the electronic device encodes the intent and slot position of the target command to obtain a decision encoding vector, including: the electronic device encodes the intent and slot position of the target command and the occupancy state of the electronic device, Get the decision coding vector.
  • the semantic vector of historical commands is enhanced by the occupancy status of electronic devices, thereby further improving the accuracy of system decision-making.
  • the present application provides a human-computer interaction device, the human-computer interaction device is applied to an electronic device; the electronic device includes a voice transceiver, and the voice transceiver is used to receive a target command issued by a user, and feedback the voice of decision-making to the user.
  • the human-computer interaction device includes: an acquisition unit, a processing unit and a feedback unit.
  • the acquisition unit is used to receive the target command issued by the user;
  • the processing unit is used to generate the target decision of the target command by using the historical command, the historical decision of the historical command and the target command, the historical command is the command of the historical human-computer interaction task, and the target command is Command for the current human-computer interaction task; feedback unit for outputting target decisions.
  • the semantics of the historical command in the historical human-computer interaction task are referred to, which assists the natural language understanding of the target command, and makes the natural language understanding result. It is more appropriate to the real intention of the user; the historical decision-making is referred to when executing the system decision, and the target decision can be optimized according to the historical decision-making, which effectively improves the user experience of human-computer interaction.
  • These units may perform the corresponding functions in the method examples of the first aspect. For details, refer to the detailed descriptions in the method examples, which will not be repeated here.
  • the present application provides an electronic device, the electronic device comprising: at least one processor, a memory and a voice transceiver, wherein the voice transceiver is used to receive a target command issued by a user and feed back the voice of a decision to the user, and the memory It is used for storing computer programs and instructions, and the processor is used for invoking computer programs and instructions, and assisting with the voice transceiver to execute the human-computer interaction method according to any one of the first aspect or the possible implementation manners of the first aspect.
  • the present application provides a computer-readable storage medium, comprising: computer software instructions; when the computer software instructions are executed in an electronic device, the electronic device enables the electronic device to perform the first aspect or possible implementations of the first aspect.
  • the present application provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute the first aspect or possible implementations of the first aspect.
  • the present application provides a chip system, which is applied to an electronic device; the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected by a line; the interface circuit is used for receiving signals from a memory of the electronic device, A signal is sent to the processor, where the signal includes computer instructions stored in the memory; when the processor executes the computer instructions, the chip system executes the first aspect or possible implementations of the first aspect.
  • FIG. 1 is a schematic diagram of the composition of an electronic device provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a human-computer interaction method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a speech recognition process provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a human-computer interaction method provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a human-computer interaction method provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a human-computer interaction method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an intent understanding model provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the composition of a human-computer interaction device provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the composition of a human-computer interaction device according to an embodiment of the present application.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
  • Multiple refers to two or more than two, and other quantifiers are similar.
  • “And/or” describes the association relationship between related objects, indicating that there can be three kinds of relationships, for example, A and/or B, can mean: exist independently A, there are both A and B, and there are three cases of B alone.
  • the occurrence of elements in the singular forms “a”, “an” and “the” does not mean that unless the context clearly dictates otherwise means “one or only one", but means “one or more than one”.
  • a device means for one or more such devices.
  • at least one of).. .. means one or any combination of subsequent associated objects, eg "at least one of A, B, and C" includes A, B, C, AB, AC, BC, or ABC.
  • the electronic device in this embodiment is a device including a display screen and a camera.
  • the specific form of the electronic device is not particularly limited in the embodiments of the present application.
  • electronic devices may be televisions, tablets, projectors, cell phones, desktops, laptops, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, and personal digital assistants (personal digital assistant, PDA), augmented reality (AR), virtual reality (virtual reality, VR) devices, smart speakers, smart TVs and other Internet of things (Internet of things, IoT) devices.
  • FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device includes: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a power management module 140, an antenna, a wireless communication module 160, an audio Module 170, speaker 170A, speaker interface 170B, microphone 170C, sensor module 180, buttons 190, indicator 191, display screen 192, camera 193 and so on.
  • the aforementioned sensor module 180 may include sensors such as a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, and an ambient light sensor.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • a controller can be the nerve center and command center of an electronic device.
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, and/or USB interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • USB interface etc.
  • the processor 110 when the processor 110 is configured to perform natural language understanding on the target command, the natural language understanding of the target command is performed in combination with the historical command to obtain the intent and slot of the target command.
  • the historical command is the command of the historical human-computer interaction task
  • the target command is the command of the current human-computer interaction task.
  • the historical commands may be commands of multiple historical human-computer interaction tasks.
  • the multiple historical human-computer interaction tasks may be tasks in which multiple historical users have conducted multiple rounds of conversations with the electronic device.
  • the processor 110 determines the target decision in combination with the decision of the historical command and the intent and slot of the target command.
  • the slot includes a slot, for example, a taxi scene, and the slot includes a departure slot and a destination slot.
  • the power management module 140 is used to connect power.
  • the power management module 140 may also be connected with the processor 110 , the internal memory 121 , the display screen 192 , the camera 193 , the wireless communication module 160 and the like.
  • the power management module 140 receives power input, and supplies power to the processor 110 , the internal memory 121 , the display screen 192 , the camera 193 , the wireless communication module 160 , and the like.
  • the power management module 140 may also be provided in the processor 110 .
  • the wireless communication function of the electronic device can be implemented by the antenna and the wireless communication module 160 and the like.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), global navigation, etc. applied on the electronic device.
  • WLAN wireless local area networks
  • WiFi wireless fidelity
  • BT Bluetooth
  • global navigation etc.
  • Satellite system global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna.
  • the antenna of the electronic device is coupled to the wireless communication module 160 so that the electronic device can communicate with the network and other devices through wireless communication techniques.
  • the electronic device realizes the display function through the GPU, the display screen 192, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 192 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 192 is used to display images, videos, and the like.
  • the display screen 192 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed
  • quantum dot light-emitting diode quantum dot light emitting diodes, QLED
  • the electronic device can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 192 and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the electronic device may not include a camera, that is, the above-mentioned camera 193 is not provided in the electronic device (eg, a television).
  • the electronic device can connect to the camera 193 through an interface (eg, the USB interface 130 ).
  • the external camera 193 can be fixed on the electronic device by an external fixing member (such as a camera bracket with a clip).
  • the external camera 193 can be fixed at the edge of the display screen 192 of the electronic device, such as the upper edge, by means of an external fixing member.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device selects the frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
  • Video codecs are used to compress or decompress digital video.
  • An electronic device may support one or more video codecs. In this way, the electronic device can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the program storage area can store the operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area can store data (such as audio data, etc.) created during the use of the electronic device.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device may implement audio functions through an audio module 170, a speaker 170A, a microphone 170C, a speaker interface 170B, and an application processor. For example, music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 . Speaker 170A, also referred to as a "speaker”, is used to convert audio electrical signals into sound signals. In this application, the speaker 170A is used to output the speech of the decision. Microphone 170C, also called “microphone”,
  • the microphone 170C is used to receive the voice of the target command or the voice of the historical command issued by the user.
  • the speaker interface 170B is used to connect a wired speaker.
  • the speaker interface 170B can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device may receive key input and generate key signal input related to user settings and function control of the electronic device.
  • the indicator 191 may be an indicator light, which may be used to indicate that the electronic device is in a power-on state, a standby state, or a power-off state, or the like. For example, if the indicator light is off, it can indicate that the electronic device is in a shutdown state; if the indicator light is green or blue, it can indicate that the electronic device is in a power-on state; if the indicator light is red, it can indicate that the electronic device is in a standby state.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device. It may have more or fewer components than shown in FIG. 1 , may combine two or more components, or may have a different configuration of components.
  • the electronic device may also include components such as speakers.
  • the various components shown in Figure 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing or application specific integrated circuits.
  • the electronic device receives a target command sent by a user.
  • the target command is user-recognizable natural language text.
  • the user can input target commands to the electronic device through an input device (eg, a virtual keyboard or a physical keyboard).
  • the user can speak to the electronic device.
  • the electronic device performs voice recognition on the voice and converts the voice into a target command.
  • Voice refers to the voice of a user who communicates with an electronic device.
  • the electronic device may receive a mixed voice, and the mixed voice includes the voice and the noise of the external environment.
  • the electronic device can utilize the user's voiceprint characteristics to separate speech from the mixed speech.
  • FIG. 3 a schematic diagram of speech separation and recognition provided by an embodiment of the present application.
  • the mixed speech is analyzed by short-time Fourier transform (short-time fourier transform, or short-term fourier transform, STFT) to obtain the mixed speech spectrum, and the mixed speech spectrum and the pre-registered user voiceprint features in the system are input into the pre-trained
  • the speech separation model of the user's speech frequency spectrum is separated from the mixed speech frequency spectrum, and the obtained frequency spectrum is then used for automatic speech recognition technology to perform speech recognition on the speech frequency spectrum of the target speech to obtain the target command.
  • the speech separation model is trained from pre-collected multi-user speech data.
  • the speech separation model can be a multi-layer long short-term memory model (LSTM).
  • the electronic device generates the target decision of the target command by using the historical command, the historical decision of the historical command, and the target command.
  • FIG. 4 is a schematic flowchart of another human-computer interaction method provided in this embodiment, wherein the method process shown in FIG. 4 is an elaboration of the specific operation process included in S202 in FIG. 2 , as shown in the figure.
  • the electronic device performs weighted coding on the historical command based on the command semantic correlation weight to obtain historical command coding information.
  • the electronic device generates a target decision according to the target command, the coding information of the historical command, and the historical decision of the historical command.
  • the command semantic correlation weight represents the semantic correlation degree between the target command and the historical command.
  • a historical command related to a target command may be a historical command that has some connection with the intent of the target command. For example, the target command is "the temperature is a little cold", and the historical command is "it's so hot, turn on the air conditioner to 20 degrees". Both the target command and the history command are related to adjusting the temperature of the air conditioner. However, the target command does not clearly indicate that the temperature of the air conditioner is adjusted, and the historical command indicates that the air conditioner is adjusted to a specific temperature.
  • the electronic device semantically encodes the target command to obtain the semantic vector of the target command, and calculates the similarity of the semantic vector of the target command and the semantic vector of the historical command to obtain the command semantic correlation weight.
  • the electronic device first performs Chinese word segmentation on the target command to obtain a word vector of the target command.
  • Chinese word segmentation refers to dividing a continuous sequence of words into individual words.
  • the electronic device inputs the word vector of the target command into the semantic encoding model for encoding, and obtains the semantic vector of the target command.
  • the semantic encoding model can be a recurrent neural network (RNN), and the most commonly used RNN model is a bidirectional long short-term memory (BiLSTM).
  • BiLSTM can be implemented using a network with 3 hidden layers of 600 nodes each. For example, the target command is "the temperature is a little cold", and the target command is subjected to Chinese word segmentation to obtain the word vector of "temperature", the word vector of "point” and the word vector of "cold”.
  • the electronic device stores the semantic vector of the target command, so that the semantic vector of the target command can be used as the semantic vector of the historical command to assist the electronic device to perform natural language understanding of subsequent commands.
  • the semantic vector of historical commands may be a matrix of M columns, each column representing a semantic vector of commands of a historical human-computer interaction task. Multiply each column in the matrix with the semantic vector of the target command to get the command semantic correlation weight.
  • the command semantic correlation weight satisfies Equation (1).
  • u T [x 1 ,...,x j ]
  • u T represents the semantic vector of the target command
  • h m represents the semantic vector of historical commands
  • p m represents the command semantic correlation weight
  • the electronic device may further perform weighted encoding on the historical command based on the command semantic correlation weight and the user weight to obtain historical command encoding information.
  • the user weight represents the degree of association between the user and the historical user who issued the historical command.
  • the electronic device may prompt the user to provide the user's voiceprint, and the electronic device stores the user's voiceprint. The electronic device compares the user's voiceprint with the historical user's voiceprint to obtain the degree of association between the user and the historical user, that is, the user weight.
  • Historical users refer to users who have performed human-computer interaction with electronic devices.
  • the electronic device obtains the similarity between the user and the historical user according to the user's voiceprint, and obtains the user weight.
  • the degree of similarity may be the likelihood of the user and historical users. It is understandable that if the user has a large weight, it means that the user is more likely to be the historical user, and a higher weight is set; if the user weight is small, it means that the user is less likely to be the historical user, and a lower weight is set. .
  • the electronic device may further perform weighted coding on the historical command based on the command semantic correlation weight, the user weight and the user relationship correlation degree weight to obtain historical command coding information.
  • the user relationship relevance weight is a preset relationship strength value of multiple users.
  • the electronic device is a smart home, the users who use the smart home are usually fixed family members, and any member of the family member can set the value of the strength of the relationship with other members. Relationship strength values can include high, medium, low, and no relationship, among others.
  • the electronic device performs weighted encoding on the historical command based on the weighted information to obtain the semantic vector of the weighted encoded historical command, merges the semantic vector of the weighted encoded historical command and the semantic vector of the target command, and executes the process through a fully connected network.
  • the weighting information includes at least one of a command semantic relevance weight, a user weight, and a user relationship relevance weight.
  • the semantic vector of the weighted encoded history command satisfies the formula (2).
  • h' represents the semantic vector of the weighted encoded historical command
  • p m represents the command semantic correlation weight
  • h m represents the semantic vector of the historical command
  • S m represents the user weight.
  • S m represents the user relationship relevance weight.
  • S m represents the user weight and the user relationship relevance weight.
  • FIG. 5 is a schematic flow chart of another human-computer interaction method provided in this embodiment, wherein the method flow described in FIG. 5 is an elaboration of the specific operation process included in S2022 in FIG. 4 , as shown in the figure .
  • the electronic device uses the intent understanding model to perform natural language understanding on the word vector of the target command and historical command coding information, and obtains the intent and slot of the target command.
  • the electronic device generates a target decision according to the intent and slot position of the target command and the historical decision coding vector of the historical command.
  • the electronic device performs Chinese word segmentation on the target command to obtain a word vector of the target command (execute S601 ).
  • the target command is "the temperature is a little cold”
  • the word vector of the target command includes the word vector of "temperature”, the word vector of "a bit”, and the word vector of "cold”.
  • the electronic device uses the semantic encoding model to encode the word vector of the target command to obtain the semantic vector of the target command (go to S602).
  • the electronic device calculates the similarity according to the semantic vector of the target command and the semantic vector of the historical command to obtain the command semantic correlation weight
  • the command semantic correlation weight includes the semantic correlation degree of the target command being "the temperature is a little cold" and the historical command being "so hot, the air conditioner is turned on to 20 degrees". It is worth noting that the command semantic correlation weight includes the semantic correlation degree between the target command and the commands of multiple historical human-computer interaction tasks. Weighted encoding is performed on the semantic vector of the historical command based on the first weighted information to obtain historical command encoding information (go to S604).
  • the first weighting information includes command semantic-related weights.
  • the first weighting information further includes user weights and user relationship relevance weights.
  • the electronic device uses the intent understanding model to perform natural language understanding on the word vector of the target command and historical command coding information, and obtains the intent and slot of the target command (go to S605). For example, if the target command is "temperature is a little cold," the intent of the target command may indicate that it is about adjusting the temperature.
  • the slots for the target command can be "temperature”, “somewhat", and “cold”.
  • the intent understanding model can be RNN, and the following specific cases are implemented using the BERT model based on the TRANSFORMER structure.
  • Intent understanding models can be trained with bidirectional encoder representations from transformers (BERT).
  • BERT is a two-way transformer-based model proposed by google. It can be pre-trained using a large amount of unsupervised text corpus. The pre-training process includes two techniques. One is to randomly mask some characters in the training sentence to predict the masked characters. The second is to train to understand the relationship between sentences, and to predict the next sentence given the current text conditions.
  • the intent understanding model includes a pre-trained deep structure for semantic analysis, and then the BERT model is fine-tuned using the target-related intent understanding task.
  • the first input of the BERT network uses a weighted semantic encoding vector, as shown in Figure 7, the word vector of the target command and the historical command encoding information. Enter the intent understanding model to get the intent and slot of the target command.
  • Decision model is a classification model whose input is intention, dialogue state and system database information, and output is specific decision.
  • the electronic device encodes the intent of the target command, the dialog state, and the information obtained in the system database to obtain a decision encoding vector (execute S606 ).
  • the encoding network can be implemented using a multilayer convolutional neural network (CNN).
  • CNN multilayer convolutional neural network
  • the similarity calculation is performed between the electronic device decision coding vector and the historical decision coding vector to obtain the weight of the historical decision coding vector (go to S607).
  • Historical decisions are system actions determined by electronic devices based on historical commands. For example, the historical command is "how is the weather today". The system action is to output "overcast, temperature”.
  • the historical command is "It's so hot, turn on the air conditioner to 20 degrees".
  • the system action is to adjust the temperature of the air conditioner to 20 degrees.
  • the weight of the historical decision coding vector includes the correlation degree between the decision coding vector of the target command "the temperature is a little cold” and the historical decision coding vector of the historical decision "turn on the air conditioner to 20 degrees". It is worth noting that the weight of the historical decision coding vector includes the degree of correlation between the decision coding vector of the target command and the historical decision coding vectors of the decisions of multiple historical human-computer interaction tasks.
  • the electronic device performs weighted encoding on the historical decision encoding vector based on the second weighted information to obtain historical decision encoding information (go to S608).
  • the second weighting information includes historical decision coding vector weights.
  • the second weighting information further includes user weights and user relationship relevance weights.
  • the weight of the historical decision coding vector represents the degree of correlation between the decision coding vector and the historical decision coding vector.
  • the electronic device uses the decision model to analyze the decision coding vector and the historical decision coding information to generate a target decision (go to S609).
  • the target command is "the temperature is a little cold"
  • the target decision can be "turn on the air conditioner to 29 degrees”.
  • Decision models can be implemented using shallow classifiers, such as support vector machines, or deep neural networks (DNNs), such as multi-layer fully connected feedforward networks (FNNs).
  • the electronic device stores the decision coding vector, which assists the electronic device to make decisions on the target command issued by the subsequent user.
  • the historical decision coding vector may be a matrix of M columns, and each column represents a historical decision coding vector of a decision of a historical human-computer interaction task. Multiply each column in the matrix by the decision coding vector to get the weight of the historical decision coding vector. The weight of the historical decision coding vector satisfies the formula (3).
  • w T [y 1 ,...,y]
  • w T represents the decision coding vector
  • k m represents the historical decision coding vector
  • q m represents the weight of the historical decision coding vector.
  • the electronic device performs weighted coding on the historical decision coding vector based on the second weighted information to obtain the weighted coding historical decision coding vector, combines the weighted coding historical decision coding vector and the decision coding vector, and encodes it through a fully connected network to obtain Historical decision encoding information.
  • the weighted encoded historical decision encoding vector satisfies formula (4).
  • k' represents the weighted encoded historical decision encoding vector.
  • k m represents the historical decision coding vector, and
  • q m represents the weight of the historical decision coding vector.
  • S m represents the user weight.
  • S m represents the user relationship relevance weight.
  • S m represents the user weight and the user relationship relevance weight.
  • the electronic device uses the decision model to analyze the decision coding vector, the historical decision coding information and the user portrait of the user to determine the target decision.
  • User portraits are also called user roles.
  • User portraits are virtual representatives of real users. As an effective tool for delineating users, linking user demands and design directions, user portraits have been widely used in various fields.
  • the electronic device outputs a target decision.
  • the electronic device can use the natural language generation (Natural Language Generation, NLG) technology to map the target decision into a natural language expression, that is, to generate the target decision text according to the target decision.
  • Natural language generation refers to converting machine-readable decisions into natural language text.
  • the electronic device can display the target decision text through the display screen, so that the user can obtain the system dialogue statement output by the electronic device.
  • the electronic device may also convert the target decision text into target decision voice, and play it to the user in the form of voice.
  • the semantics of the historical command in the historical human-computer interaction task are referred to, which assists the natural language understanding of the target command, and makes the natural language understanding result. It is more appropriate to the real intention of the user; the historical decision-making is referred to when executing the system decision, and the target decision can be optimized according to the historical decision-making, which effectively improves the user experience of human-computer interaction.
  • the electronic device includes corresponding hardware structures and/or software modules for performing each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.
  • FIG. 8 is a schematic structural diagram of a possible human-computer interaction apparatus provided by an embodiment of the present application.
  • These human-computer interaction apparatuses can be used to implement the functions of the electronic device in the above method embodiments, and thus can also achieve the beneficial effects of the above method embodiments.
  • the human-computer interaction apparatus may be an electronic device as shown in FIG. 1 , or may be a module (eg, a chip) applied to the electronic device.
  • the human-computer interaction apparatus 800 includes an acquisition unit 810 , a processing unit 820 and a feedback unit 830 .
  • the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 2 , FIG. 4 , FIG. 5 or FIG. 6 .
  • the human-computer interaction apparatus 800 When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 2 : the obtaining unit 810 is used to perform S201 ; the processing unit 820 is used to perform S202 ; and the feedback unit 830 is used to perform S203 .
  • the human-computer interaction apparatus 800 When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 4 : the obtaining unit 810 is used to perform S201; the processing unit 820 is used to perform S2021 and S2022; and the feedback unit 830 is used to perform S203.
  • the human-computer interaction apparatus 800 When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 5: the obtaining unit 810 is used to execute S201; the processing unit 820 is used to execute S2021, S20221 and S20222; the feedback unit 830 is used to execute S203.
  • the processing unit 820 is used to execute S601 to S609.
  • processing unit 820 and feedback unit 830 can be obtained directly by referring to the relevant descriptions in the method embodiments shown in FIG. 2 , FIG. 4 , FIG. 5 or FIG. 6 , and details are not repeated here.
  • the functions of the acquiring unit 810 , the processing unit 820 and the feedback unit 830 may be implemented by the processor 110 in FIG. 1 described above.
  • the human-computer interaction device 900 may include a speech recognition unit 910 , a language understanding unit 920 , a dialogue management unit 930 , a language generation unit 940 and a speech synthesis unit 950 .
  • the speech recognition unit 910 is used to realize the function of the acquisition unit 810 .
  • the voice recognition unit 910 is used to recognize the voice issued by the user to obtain the target command.
  • the language understanding unit 920 and the dialogue management unit 930 are used to implement the functions of the processing unit 820 to obtain target decisions.
  • the language understanding unit 920 is configured to use the intent understanding model to perform natural language understanding on the word vector of the target command and historical command coding information, and obtain the intent and slot of the target command.
  • the dialogue management unit 930 is configured to generate the target decision according to the intent and slot of the target command and the historical decision encoding vector of the historical command.
  • the language generation unit 940 and the speech synthesis unit 950 are used to realize the function of the feedback unit 830 .
  • the language generation unit 940 is used to convert the target decision into natural language.
  • the speech synthesis unit 950 is used to feed back the decision language to the user.
  • the processor in the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field Programmable Gate Array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • a general-purpose processor may be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM) , PROM), Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disks, removable hard disks, CD-ROMs or known in the art in any other form of storage medium.
  • RAM Random Access Memory
  • ROM read-only memory
  • PROM programmable read-only memory
  • PROM Erasable Programmable Read-Only Memory
  • EPROM Electrically Erasable Programmable Read-Only Memory
  • An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage medium may reside in an ASIC.
  • the ASIC may reside in a network device or an electronic device.
  • the processor and storage medium may also exist as discrete components in a network device or an electronic device.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer programs or instructions.
  • the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits by wire or wireless to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media.
  • the usable medium can be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it can also be an optical medium, such as a digital video disc (DVD); it can also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).
  • “at least one” means one or more, and “plurality” means two or more.
  • “And/or”, which describes the relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, it can indicate that A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are a kind of "or” relationship; in the formula of this application, the character "/” indicates that the related objects are a kind of "division” Relationship.

Abstract

The present application relates to the field of artificial intelligence, and provides a human-computer interaction method and device. The method comprises: receiving a target command sent by a user; generating a target decision of the target command by using a historical command, a historical decision of the historical command and the target command, the historical command being a command of a historical human-computer interaction task, and the target command being a command of a current human-computer interaction task; and outputting a target decision.

Description

人机交互方法及装置Human-computer interaction method and device
本申请要求于2020年08月28日提交国家知识产权局、申请号为202010886462.3、申请名称为“人机交互方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202010886462.3 and the application name "Human-Computer Interaction Method and Device", which was submitted to the State Intellectual Property Office on August 28, 2020, the entire contents of which are incorporated into this application by reference .
技术领域technical field
本申请涉及人工智能(Artificial Intelligence,AI)领域,尤其涉及一种人机交互方法及装置。The present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and in particular, to a human-computer interaction method and device.
背景技术Background technique
随着人工智能(Artificial Intelligence,AI)的发展,电子设备可以利用人机交互技术(Human-Computer Interaction Techniques,HCI)与用户进行相互沟通,使电子设备理解用户的意图,完成用户的意图的工作。目前,人机交互在多个领域具有广泛的应用,例如涉及智能家居、自动驾驶等方面。但是,电子设备与用户进行交互并不是很“自然”和“聪明”。电子设备对接收到的语音命令进行自然语言理解得到的意图,与用户真实的意图并不是很贴切。此外,电子设备的决策机械生硬,无法给予用户最优的决策。人机交互的用户体验较低。With the development of artificial intelligence (AI), electronic devices can use Human-Computer Interaction Techniques (HCI) to communicate with users, so that electronic devices can understand the user's intention and complete the work of the user's intention. . At present, human-computer interaction has a wide range of applications in many fields, such as smart home, automatic driving and so on. However, the interaction of electronic devices with users is not very "natural" and "smart". The intention obtained by the electronic device through natural language understanding of the received voice command is not very close to the real intention of the user. In addition, the decision-making of electronic devices is mechanically rigid and cannot give users optimal decisions. The user experience of human-computer interaction is low.
发明内容SUMMARY OF THE INVENTION
本申请提供一种人机交互方法及装置,对当前人机交互任务中用户发出的目标命令进行自然语言理解时,参考了历史人机交互任务中的历史命令的语义,辅助了对目标命令的自然语言理解,使得自然语言理解的结果与用户的真实意图更贴切;在执行系统决策时参考了历史决策,可以根据历史决策优化目标决策,有效地提升了人机交互的用户体验度。The present application provides a human-computer interaction method and device. When performing natural language understanding on a target command issued by a user in a current human-computer interaction task, the semantics of historical commands in historical human-computer interaction tasks are referred to, which assists in understanding the target command. Natural language understanding makes the results of natural language understanding more appropriate to the user's real intentions; historical decisions are referenced when executing system decisions, and target decisions can be optimized according to historical decisions, effectively improving the user experience of human-computer interaction.
为达到上述目的,本申请采用如下技术方案:To achieve the above object, the application adopts the following technical solutions:
第一方面,本申请提供了一种人机交互方法,该方法可应用于电子设备,或者该方法可应用于可以支持电子设备实现该方法的人机交互装置,例如该人机交互装置包括芯片系统,方法包括:电子设备接收用户发出的目标命令后,利用历史命令、历史命令的历史决策和目标命令生成目标命令的目标决策,输出目标决策。其中,历史命令是历史人机交互任务的命令,目标命令是当前人机交互任务的命令。历史命令可以是一个或多个历史用户与电子设备进行多轮的人机交互任务的命令。例如,历史命令可以是多个历史用户与电子设备进行多轮的人机交互任务的命令。历史用户也可以包含发出的目标命令的用户。如此,对当前人机交互任务中用户发出的目标命令进行自然语言理解时,参考了历史人机交互任务中的历史命令的语义,辅助了对目标命令的自然语言理解,使得自然语言理解的结果与用户的真实意图更贴切;在执行系统决策时参考了历史决策,可以根据历史决策优化目标决策,有效地提升了人机交互的用户体验度。In a first aspect, the present application provides a human-computer interaction method, and the method can be applied to an electronic device, or the method can be applied to a human-computer interaction device that can support an electronic device to implement the method, for example, the human-computer interaction device includes a chip The system and the method include: after the electronic device receives the target command issued by the user, using the historical command, the historical decision of the historical command and the target command to generate the target decision of the target command, and output the target decision. Among them, the historical command is the command of the historical human-computer interaction task, and the target command is the command of the current human-computer interaction task. The historical command may be one or more historical commands for the user to perform multiple rounds of human-computer interaction tasks with the electronic device. For example, the historical commands may be commands for multiple historical users to perform multiple rounds of human-computer interaction tasks with the electronic device. History users can also contain users who issued the target command. In this way, when the natural language understanding of the target command issued by the user in the current human-computer interaction task is carried out, the semantics of the historical command in the historical human-computer interaction task are referred to, which assists the natural language understanding of the target command, and makes the natural language understanding result. It is more appropriate to the real intention of the user; the historical decision-making is referred to when executing the system decision, and the target decision can be optimized according to the historical decision-making, which effectively improves the user experience of human-computer interaction.
在一种可能的实现方式中,利用历史命令、历史命令的历史决策和目标命令生成 目标命令的目标决策,包括:电子设备基于命令语义相关权重对历史命令进行加权编码,得到历史命令编码信息,命令语义相关权重表示目标命令与历史命令的语义相关程度;根据目标命令、历史命令编码信息和历史命令的历史决策生成目标决策。In a possible implementation manner, generating the target decision of the target command by using the historical command, the historical decision of the historical command and the target command includes: the electronic device performs weighted coding on the historical command based on the relevant weight of the command semantics, and obtains the coding information of the historical command, The command semantic correlation weight indicates the degree of semantic correlation between the target command and the historical command; the target decision is generated according to the target command, the coding information of the historical command and the historical decision of the historical command.
其中,在基于命令语义相关权重对历史命令进行加权编码之前,电子设备对目标命令进行语义编码,得到目标命令的语义向量;根据目标命令的语义向量和历史命令的语义向量进行相似度计算得到命令语义相关权重。Among them, before weighted coding of historical commands based on the command semantic correlation weight, the electronic device performs semantic coding on the target command to obtain the semantic vector of the target command; the command is obtained by calculating the similarity according to the semantic vector of the target command and the semantic vector of the historical command. Semantic relevance weights.
在另一种可能的实现方式中,基于命令语义相关权重对历史命令进行加权编码,得到历史命令编码信息,包括:基于命令语义相关权重和用户权重对历史命令进行加权编码,得到历史命令编码信息,用户权重表示用户与发出历史命令的历史用户之间的关联程度。其中,电子设备可以依据用户的声纹获取用户与历史用户的关联程度,得到用户权重。In another possible implementation manner, weighted encoding is performed on historical commands based on command semantic correlation weights to obtain historical command encoding information, including: weighted encoding historical commands based on command semantic correlation weights and user weights to obtain historical command encoding information , the user weight represents the degree of association between the user and the historical user who issued the historical command. The electronic device may obtain the user weight according to the user's voiceprint to obtain the degree of association between the user and the historical user.
在另一种可能的实现方式中,基于命令语义相关权重和用户权重对历史命令进行加权编码,得到历史命令编码信息,包括:基于命令语义相关权重、用户权重和用户关系相关度权重对历史命令进行加权编码,得到历史命令编码信息,用户关系相关度权重为预设的多个用户的关系强度值。In another possible implementation manner, weighted coding is performed on historical commands based on command semantic correlation weights and user weights to obtain historical command coding information, including: historical command coding based on command semantic correlation weights, user weights, and user relationship correlation weights Weighted coding is performed to obtain historical command coding information, and the user relationship relevance weight is a preset relationship strength value of multiple users.
在另一种可能的实现方式中,根据目标命令、历史命令编码信息和历史命令的历史决策生成目标决策,包括:利用意图理解模型对目标命令的词向量和历史命令编码信息进行自然语言理解,得到目标命令的意图和槽位;根据目标命令的意图和槽位,以及历史命令的历史决策编码向量,生成目标决策。In another possible implementation manner, generating the target decision according to the target command, the historical command coding information and the historical decision of the historical command, including: using an intention understanding model to perform natural language understanding on the word vector of the target command and the historical command coding information, Obtain the intent and slot of the target command; generate the target decision according to the intent and slot of the target command, as well as the historical decision encoding vector of the historical command.
具体的,根据目标命令的意图和槽位,以及历史命令的历史决策编码向量,生成目标决策,包括:对目标命令的意图和槽位进行编码,得到决策编码向量;基于历史决策编码向量权重对历史决策编码向量进行加权编码,得到历史决策编码信息,历史决策编码向量权重表示决策编码向量与历史决策编码向量的相关程度;利用决策模型对决策编码向量和历史决策编码信息进行分析,生成目标决策。在执行系统决策时参考了历史命令的历史决策,可以根据历史命令的历史决策优化决策内容,丰富了电子设备进行决策的信息,有效地提升了人机交互的用户体验度。Specifically, generating the target decision according to the intent and slot position of the target command and the historical decision coding vector of the historical command, including: encoding the intention and slot position of the target command to obtain a decision coding vector; based on the historical decision coding vector weights The historical decision coding vector is weighted and encoded to obtain the historical decision coding information. The weight of the historical decision coding vector represents the degree of correlation between the decision coding vector and the historical decision coding vector; the decision coding vector and the historical decision coding information are analyzed by the decision model to generate the target decision. . When executing the system decision, the historical decision of the historical command is referenced, and the decision content can be optimized according to the historical decision of the historical command, which enriches the information of the decision-making of the electronic device, and effectively improves the user experience of the human-computer interaction.
其中,在基于历史决策编码向量权重对历史决策编码向量进行加权编码,得到历史决策编码信息之前,电子设备对决策编码向量和历史决策编码向量进行相似度计算,得到历史决策编码向量权重。Wherein, before the historical decision coding vector is weighted and encoded based on the historical decision coding vector weight to obtain historical decision coding information, the electronic device performs similarity calculation on the decision coding vector and the historical decision coding vector to obtain the historical decision coding vector weight.
在另一种可能的实现方式中,基于历史决策编码向量权重对历史决策编码向量进行加权编码,得到历史决策编码信息,包括:基于历史决策编码向量权重和用户权重对历史决策编码向量进行加权编码,得到历史决策编码信息;或者,基于历史决策编码向量权重、用户权重和用户关系相关度权重对历史决策编码向量进行加权编码,得到历史决策编码信息。In another possible implementation, weighted encoding is performed on the historical decision encoding vector based on the historical decision encoding vector weight to obtain historical decision encoding information, including: weighted encoding the historical decision encoding vector based on the historical decision encoding vector weight and the user weight , to obtain the historical decision coding information; or, weighted coding the historical decision coding vector based on the historical decision coding vector weight, the user weight and the user relationship correlation weight to obtain the historical decision coding information.
在另一种可能的实现方式中,电子设备对目标命令的意图和槽位进行编码,得到决策编码向量,包括:电子设备对目标命令的意图和槽位,以及电子设备的占用状态进行编码,得到决策编码向量。利用电子设备的占用状态对历史命令的语义向量进行增强,从而进一步提升系统决策的准确性。In another possible implementation manner, the electronic device encodes the intent and slot position of the target command to obtain a decision encoding vector, including: the electronic device encodes the intent and slot position of the target command and the occupancy state of the electronic device, Get the decision coding vector. The semantic vector of historical commands is enhanced by the occupancy status of electronic devices, thereby further improving the accuracy of system decision-making.
第二方面,本申请提供了一种人机交互装置,人机交互装置应用于电子设备;电 子设备包括语音收发器,语音收发器用于接收用户发出的目标命令,并向用户反馈决策的语音。人机交互装置包括:获取单元、处理单元和反馈单元。获取单元,用于接收用户发出的目标命令;处理单元,用于利用历史命令、历史命令的历史决策和目标命令生成目标命令的目标决策,历史命令是历史人机交互任务的命令,目标命令是当前人机交互任务的命令;反馈单元,用于输出目标决策。如此,对当前人机交互任务中用户发出的目标命令进行自然语言理解时,参考了历史人机交互任务中的历史命令的语义,辅助了对目标命令的自然语言理解,使得自然语言理解的结果与用户的真实意图更贴切;在执行系统决策时参考了历史决策,可以根据历史决策优化目标决策,有效地提升了人机交互的用户体验度。这些单元可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。In a second aspect, the present application provides a human-computer interaction device, the human-computer interaction device is applied to an electronic device; the electronic device includes a voice transceiver, and the voice transceiver is used to receive a target command issued by a user, and feedback the voice of decision-making to the user. The human-computer interaction device includes: an acquisition unit, a processing unit and a feedback unit. The acquisition unit is used to receive the target command issued by the user; the processing unit is used to generate the target decision of the target command by using the historical command, the historical decision of the historical command and the target command, the historical command is the command of the historical human-computer interaction task, and the target command is Command for the current human-computer interaction task; feedback unit for outputting target decisions. In this way, when the natural language understanding of the target command issued by the user in the current human-computer interaction task is carried out, the semantics of the historical command in the historical human-computer interaction task are referred to, which assists the natural language understanding of the target command, and makes the natural language understanding result. It is more appropriate to the real intention of the user; the historical decision-making is referred to when executing the system decision, and the target decision can be optimized according to the historical decision-making, which effectively improves the user experience of human-computer interaction. These units may perform the corresponding functions in the method examples of the first aspect. For details, refer to the detailed descriptions in the method examples, which will not be repeated here.
第三方面,本申请提供了一种电子设备,该电子设备包括:至少一个处理器、存储器和语音收发器,其中,语音收发器用于接收用户发出的目标命令并向用户反馈决策的语音,存储器用于存储计算机程序和指令,处理器用于调用计算机程序和指令,与语音收发器协助执行第一方面或第一方面可能的实现方式中任一项所述的人机交互方法。In a third aspect, the present application provides an electronic device, the electronic device comprising: at least one processor, a memory and a voice transceiver, wherein the voice transceiver is used to receive a target command issued by a user and feed back the voice of a decision to the user, and the memory It is used for storing computer programs and instructions, and the processor is used for invoking computer programs and instructions, and assisting with the voice transceiver to execute the human-computer interaction method according to any one of the first aspect or the possible implementation manners of the first aspect.
第四方面,本申请提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在电子设备中运行时,使得电子设备执行如第一方面或第一方面可能的实现方式。In a fourth aspect, the present application provides a computer-readable storage medium, comprising: computer software instructions; when the computer software instructions are executed in an electronic device, the electronic device enables the electronic device to perform the first aspect or possible implementations of the first aspect.
第五方面,本申请提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面或第一方面可能的实现方式。In a fifth aspect, the present application provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute the first aspect or possible implementations of the first aspect.
第六方面,本申请提供一种芯片系统,该芯片系统应用于电子设备;芯片系统包括接口电路和处理器;接口电路和处理器通过线路互联;接口电路用于从电子设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行该计算机指令时,芯片系统执行如第一方面或第一方面可能的实现方式。In a sixth aspect, the present application provides a chip system, which is applied to an electronic device; the chip system includes an interface circuit and a processor; the interface circuit and the processor are interconnected by a line; the interface circuit is used for receiving signals from a memory of the electronic device, A signal is sent to the processor, where the signal includes computer instructions stored in the memory; when the processor executes the computer instructions, the chip system executes the first aspect or possible implementations of the first aspect.
应当理解的是,本申请中对技术特征、技术方案、有益效果或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此,本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。It should be understood that the description of technical features, technical solutions, beneficial effects or similar language in this application does not imply that all features and advantages may be realized in any single embodiment. On the contrary, it can be understood that the description of features or beneficial effects means that a specific technical feature, technical solution or beneficial effect is included in at least one embodiment. Therefore, descriptions of technical features, technical solutions or beneficial effects in this specification do not necessarily refer to the same embodiments. Furthermore, the technical features, technical solutions and beneficial effects described in this embodiment can also be combined in any appropriate manner. Those skilled in the art will understand that an embodiment can be implemented without one or more specific technical features, technical solutions or beneficial effects of a specific embodiment. In other embodiments, additional technical features and benefits may also be identified in specific embodiments that do not embody all embodiments.
附图说明Description of drawings
图1为本申请一实施例提供的电子设备的组成示意图;FIG. 1 is a schematic diagram of the composition of an electronic device provided by an embodiment of the present application;
图2为本申请一实施例提供的人机交互方法的流程图;2 is a flowchart of a human-computer interaction method provided by an embodiment of the present application;
图3为本申请一实施例提供的语音识别过程的示意图;3 is a schematic diagram of a speech recognition process provided by an embodiment of the present application;
图4为本申请一实施例提供的人机交互方法的流程图;FIG. 4 is a flowchart of a human-computer interaction method provided by an embodiment of the present application;
图5为本申请一实施例提供的人机交互方法的流程图;FIG. 5 is a flowchart of a human-computer interaction method provided by an embodiment of the present application;
图6为本申请一实施例提供的人机交互方法的流程图;FIG. 6 is a flowchart of a human-computer interaction method provided by an embodiment of the present application;
图7为本申请一实施例提供的意图理解模型的示意图;7 is a schematic diagram of an intent understanding model provided by an embodiment of the present application;
图8为本申请一实施例提供的人机交互装置的组成示意图;FIG. 8 is a schematic diagram of the composition of a human-computer interaction device provided by an embodiment of the present application;
图9为本申请一实施例提供的人机交互装置的组成示意图。FIG. 9 is a schematic diagram of the composition of a human-computer interaction device according to an embodiment of the present application.
具体实施方式detailed description
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。The terms "first", "second" and "third" in the description and claims of the present application and the above drawings are used to distinguish different objects, rather than to limit a specific order.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.
多个”是指两个或两个以上,其它量词与之类似。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,对于单数形式“a”,“an”和“the”出现的元素(element),除非上下文另有明确规定,否则其不意味着“一个或仅一个”,而是意味着“一个或多于一个”。例如,“a device”意味着对一个或多个这样的device。再者,至少一个(at least one of).......”意味着后续关联对象中的一个或任意组合,例如“A、B和C中的至少一个”包括A,B,C,AB,AC,BC,或ABC。"Multiple" refers to two or more than two, and other quantifiers are similar. "And/or" describes the association relationship between related objects, indicating that there can be three kinds of relationships, for example, A and/or B, can mean: exist independently A, there are both A and B, and there are three cases of B alone. In addition, the occurrence of elements in the singular forms "a", "an" and "the" does not mean that unless the context clearly dictates otherwise means "one or only one", but means "one or more than one". For example, "a device" means for one or more such devices. Furthermore, at least one of).. ....." means one or any combination of subsequent associated objects, eg "at least one of A, B, and C" includes A, B, C, AB, AC, BC, or ABC.
下面将结合附图对本申请实施例的实施方式进行详细描述。The implementation of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
本实施例中的电子设备是包含显示屏和摄像头的设备。本申请实施例对该电子设备的具体形态不作特殊限制。例如,电子设备可以是电视机、平板电脑、投影仪、手机、桌面型、膝上型、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)、虚拟现实(virtual reality,VR)设备、智能音箱、智能电视等物联网(internet of things,IoT)设备。The electronic device in this embodiment is a device including a display screen and a camera. The specific form of the electronic device is not particularly limited in the embodiments of the present application. For example, electronic devices may be televisions, tablets, projectors, cell phones, desktops, laptops, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, and personal digital assistants (personal digital assistant, PDA), augmented reality (AR), virtual reality (virtual reality, VR) devices, smart speakers, smart TVs and other Internet of things (Internet of things, IoT) devices.
请参考图1,为本申请实施例提供的一种电子设备的结构示意图。如图1所示,电子设备包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,电源管理模块140,天线,无线通信模块160,音频模块170,扬声器170A,音箱接口170B,麦克风170C,传感器模块180,按键190,指示器191,显示屏192,以及摄像头193等。其中,上述传感器模块180可以包括距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器等传感器。Please refer to FIG. 1 , which is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 1, the electronic device includes: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a power management module 140, an antenna, a wireless communication module 160, an audio Module 170, speaker 170A, speaker interface 170B, microphone 170C, sensor module 180, buttons 190, indicator 191, display screen 192, camera 193 and so on. The aforementioned sensor module 180 may include sensors such as a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, and an ambient light sensor.
可以理解的是,本实施例示意的结构并不构成对电子设备的具体限定。在另一些实施例中,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device. In other embodiments, the electronic device may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网 络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
控制器可以是电子设备的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。A controller can be the nerve center and command center of an electronic device. The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,和/或USB接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, and/or USB interface, etc.
在本实施例中,处理器110用于对目标命令进行自然语言理解时,结合历史命令对目标命令进行自然语言理解,获得目标命令的意图和槽位。历史命令是历史人机交互任务的命令,目标命令是当前人机交互任务的命令。可选的,历史命令可以是多个历史人机交互任务的命令。多个历史人机交互任务可以是多个历史用户与电子设备进行了多轮对话的任务。进而,处理器110结合历史命令的决策,以及目标命令的意图和槽位,确定目标决策。In this embodiment, when the processor 110 is configured to perform natural language understanding on the target command, the natural language understanding of the target command is performed in combination with the historical command to obtain the intent and slot of the target command. The historical command is the command of the historical human-computer interaction task, and the target command is the command of the current human-computer interaction task. Optionally, the historical commands may be commands of multiple historical human-computer interaction tasks. The multiple historical human-computer interaction tasks may be tasks in which multiple historical users have conducted multiple rounds of conversations with the electronic device. Further, the processor 110 determines the target decision in combination with the decision of the historical command and the intent and slot of the target command.
其中,意图和槽位共同构成了“用户动作”,机器是无法直接理解自然语言的,因此用户动作的作用便是将自然语言映射为机器能够理解的结构化语义表示。槽具有多轮记忆状态的能力。槽包括槽位,例如,打车场景,槽位包括出发地点槽和目的地槽。Among them, the intention and the slot together constitute the "user action", and the machine cannot directly understand the natural language, so the function of the user action is to map the natural language into a structured semantic representation that the machine can understand. Slots have the ability to memorize states for multiple rounds. The slot includes a slot, for example, a taxi scene, and the slot includes a departure slot and a destination slot.
电源管理模块140用于连接电源。电源管理模块140还可以与处理器110、内部存储器121、显示屏192、摄像头193和无线通信模块160等连接。电源管理模块140接收电源的输入,为处理器110、内部存储器121、显示屏192、摄像头193和无线通信模块160等供电。在一些实施例中,电源管理模块140也可以设置于处理器110中。The power management module 140 is used to connect power. The power management module 140 may also be connected with the processor 110 , the internal memory 121 , the display screen 192 , the camera 193 , the wireless communication module 160 and the like. The power management module 140 receives power input, and supplies power to the processor 110 , the internal memory 121 , the display screen 192 , the camera 193 , the wireless communication module 160 , and the like. In some embodiments, the power management module 140 may also be provided in the processor 110 .
电子设备的无线通信功能可以通过天线和无线通信模块160等实现。其中,无线通信模块160可以提供应用在电子设备上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。The wireless communication function of the electronic device can be implemented by the antenna and the wireless communication module 160 and the like. Wherein, the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), global navigation, etc. applied on the electronic device. Satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线转为电磁波辐射出去。在一些实施例中,电子设备的天线和无线通信模块160耦合,使得电子设备可以通过无线通信技术与网络以及其他设备通 信。The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna. In some embodiments, the antenna of the electronic device is coupled to the wireless communication module 160 so that the electronic device can communicate with the network and other devices through wireless communication techniques.
电子设备通过GPU,显示屏192,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏192和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device realizes the display function through the GPU, the display screen 192, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 192 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏192用于显示图像,视频等。该显示屏192包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。The display screen 192 is used to display images, videos, and the like. The display screen 192 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
电子设备可以通过ISP,摄像头193,视频编解码器,GPU,显示屏192以及应用处理器等实现拍摄功能。ISP用于处理摄像头193反馈的数据。在一些实施例中,ISP可以设置在摄像头193中。The electronic device can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 192 and the application processor. The ISP is used to process the data fed back by the camera 193 . In some embodiments, the ISP may be provided in the camera 193 .
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device may include 1 or N cameras 193 , where N is a positive integer greater than 1.
或者,电子设备可以不包括摄像头,即上述摄像头193并未设置于电子设备(如电视机)中。电子设备可以通过接口(如USB接口130)外接摄像头193。该外接的摄像头193可以通过外部固定件(如带夹子的摄像头支架)固定在电子设备上。例如,外接的摄像头193可以通过外部固定件,固定在电子设备的显示屏192的边缘处,如上侧边缘处。Alternatively, the electronic device may not include a camera, that is, the above-mentioned camera 193 is not provided in the electronic device (eg, a television). The electronic device can connect to the camera 193 through an interface (eg, the USB interface 130 ). The external camera 193 can be fixed on the electronic device by an external fixing member (such as a camera bracket with a clip). For example, the external camera 193 can be fixed at the edge of the display screen 192 of the electronic device, such as the upper edge, by means of an external fixing member.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。视频编解码器用于对数字视频压缩或解压缩。电子设备可以支持一种或多种视频编解码器。这样,电子设备可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device selects the frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, etc. Video codecs are used to compress or decompress digital video. An electronic device may support one or more video codecs. In this way, the electronic device can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图 像播放功能等)等。存储数据区可存储电子设备使用过程中所创建的数据(比如音频数据等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device by executing the instructions stored in the internal memory 121 . The internal memory 121 may include a storage program area and a storage data area. Wherein, the program storage area can store the operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area can store data (such as audio data, etc.) created during the use of the electronic device. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
电子设备可以通过音频模块170,扬声器170A,麦克风170C,音箱接口170B,以及应用处理器等实现音频功能。例如,音乐播放,录音等。The electronic device may implement audio functions through an audio module 170, a speaker 170A, a microphone 170C, a speaker interface 170B, and an application processor. For example, music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。在本申请中,扬声器170A用于输出决策的语音。麦克风170C,也称“话筒”,The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 . Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. In this application, the speaker 170A is used to output the speech of the decision. Microphone 170C, also called "microphone",
“传声器”,用于将声音信号转换为电信号。在本申请中,麦克风170C用于接收用户发出的目标命令的语音或历史命令的语音。A "microphone" that converts sound signals into electrical signals. In the present application, the microphone 170C is used to receive the voice of the target command or the voice of the historical command issued by the user.
音箱接口170B用于连接有线音箱。音箱接口170B可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The speaker interface 170B is used to connect a wired speaker. The speaker interface 170B can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备可以接收按键输入,产生与电子设备的用户设置以及功能控制有关的键信号输入。The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device may receive key input and generate key signal input related to user settings and function control of the electronic device.
指示器191可以是指示灯,可以用于指示电子设备处于开机状态、待机状态或者关机状态等。例如,指示灯灭灯,可指示电子设备处于关机状态;指示灯为绿色或者蓝色,可指示电子设备处于开机状态;指示灯为红色,可指示电子设备处于待机状态。The indicator 191 may be an indicator light, which may be used to indicate that the electronic device is in a power-on state, a standby state, or a power-off state, or the like. For example, if the indicator light is off, it can indicate that the electronic device is in a shutdown state; if the indicator light is green or blue, it can indicate that the electronic device is in a power-on state; if the indicator light is red, it can indicate that the electronic device is in a standby state.
可以理解的是,本申请实施例示意的结构并不构成对电子设备的具体限定。其可以具有比图1中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。例如,该电子设备还可以包括音箱等部件。图1中所示出的各种部件可以在包括一个或多个信号处理或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device. It may have more or fewer components than shown in FIG. 1 , may combine two or more components, or may have a different configuration of components. For example, the electronic device may also include components such as speakers. The various components shown in Figure 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing or application specific integrated circuits.
接下来,结合图2,对本申请一实施例提供的人机交互方法进行详细说明。Next, the human-computer interaction method provided by an embodiment of the present application will be described in detail with reference to FIG. 2 .
S201、电子设备接收用户发出的目标命令。S201. The electronic device receives a target command sent by a user.
目标命令是用户可识别的自然语言文本。在一些实施例中,用户可以通过输入设备(如:虚拟键盘或实体键盘)向电子设备输入目标命令。在另一些实施例中,用户可以向电子设备发出语音。电子设备对语音进行语音识别,将语音转换成目标命令。语音是指与电子设备进行语音交流的用户的话语声音。The target command is user-recognizable natural language text. In some embodiments, the user can input target commands to the electronic device through an input device (eg, a virtual keyboard or a physical keyboard). In other embodiments, the user can speak to the electronic device. The electronic device performs voice recognition on the voice and converts the voice into a target command. Voice refers to the voice of a user who communicates with an electronic device.
可选的,如果用户处于嘈杂的环境中,电子设备可能接收到混合语音,混合语音包括语音和外界环境的杂音。电子设备可以利用用户的声纹特征从混合语音中分离语音。示例的,如图3所示,为本申请一实施例提供的语音分离识别示意图。混合语音通过短时傅里叶变换(short-time fourier transform,或short-term fourier transform,STFT)分析得到混合语音频谱,将混合语音频谱和系统中预先注册的用户声纹特征输入到预 先训练好的语音分离模型中,从混合语音频谱中分离出用户的语音频谱,分离完成得到的频谱再利用自动语音识别技术对目标语音的语音频谱进行语音识别,得到目标命令。其中,语音分离模型是由预先收集的多用户语音数据训练得到。语音分离模型可以是一个多层长短期记忆模型(long short-term memory,LSTM)。Optionally, if the user is in a noisy environment, the electronic device may receive a mixed voice, and the mixed voice includes the voice and the noise of the external environment. The electronic device can utilize the user's voiceprint characteristics to separate speech from the mixed speech. Illustratively, as shown in FIG. 3 , a schematic diagram of speech separation and recognition provided by an embodiment of the present application. The mixed speech is analyzed by short-time Fourier transform (short-time fourier transform, or short-term fourier transform, STFT) to obtain the mixed speech spectrum, and the mixed speech spectrum and the pre-registered user voiceprint features in the system are input into the pre-trained In the speech separation model of , the user's speech frequency spectrum is separated from the mixed speech frequency spectrum, and the obtained frequency spectrum is then used for automatic speech recognition technology to perform speech recognition on the speech frequency spectrum of the target speech to obtain the target command. Among them, the speech separation model is trained from pre-collected multi-user speech data. The speech separation model can be a multi-layer long short-term memory model (LSTM).
S202、电子设备利用历史命令、历史命令的历史决策和目标命令生成目标命令的目标决策。S202, the electronic device generates the target decision of the target command by using the historical command, the historical decision of the historical command, and the target command.
图4为本实施例提供的另一种人机交互方法的流程示意图,其中,图4所述的方法流程是对图2中S202所包括的具体操作过程的阐述,如图所示。S2021、电子设备基于命令语义相关权重对历史命令进行加权编码,得到历史命令编码信息。S2022、电子设备根据目标命令、历史命令编码信息和历史命令的历史决策生成目标决策。FIG. 4 is a schematic flowchart of another human-computer interaction method provided in this embodiment, wherein the method process shown in FIG. 4 is an elaboration of the specific operation process included in S202 in FIG. 2 , as shown in the figure. S2021 , the electronic device performs weighted coding on the historical command based on the command semantic correlation weight to obtain historical command coding information. S2022, the electronic device generates a target decision according to the target command, the coding information of the historical command, and the historical decision of the historical command.
命令语义相关权重表示目标命令与历史命令的语义相关程度。相关表示彼此关联的意思。与目标命令相关的历史命令可以是与目标命令的意图具有某种联系的历史命令。例如,目标命令为“温度有点冷了”,历史命令为“好热啊,空调开到20度”。目标命令和历史命令均是关于调节空调温度相关。但是目标命令没有明确表示是调节空调温度的,历史命令表示将空调调到具体的温度,因此,在本实施例中,对当前人机交互任务中用户发出的目标命令进行自然语言理解时,参考了历史人机交互任务中的历史命令的语义,辅助了对目标命令的自然语言理解,使得自然语言理解的结果与用户的真实意图更贴切。The command semantic correlation weight represents the semantic correlation degree between the target command and the historical command. Related means related to each other. A historical command related to a target command may be a historical command that has some connection with the intent of the target command. For example, the target command is "the temperature is a little cold", and the historical command is "it's so hot, turn on the air conditioner to 20 degrees". Both the target command and the history command are related to adjusting the temperature of the air conditioner. However, the target command does not clearly indicate that the temperature of the air conditioner is adjusted, and the historical command indicates that the air conditioner is adjusted to a specific temperature. Therefore, in this embodiment, when performing natural language understanding on the target command issued by the user in the current human-computer interaction task, refer to The semantics of historical commands in historical human-computer interaction tasks are assisted, and the natural language understanding of target commands is assisted, so that the results of natural language understanding are more appropriate to the user's true intentions.
在一种可能的实现方式中,电子设备对目标命令进行语义编码,得到目标命令的语义向量,并根据目标命令的语义向量和历史命令的语义向量进行相似度计算得到命令语义相关权重。In a possible implementation manner, the electronic device semantically encodes the target command to obtain the semantic vector of the target command, and calculates the similarity of the semantic vector of the target command and the semantic vector of the historical command to obtain the command semantic correlation weight.
具体的,电子设备先对目标命令进行中文分词(chinese word segmentation)处理得到目标命令的词向量。中文分词是指将连续的字序列切分成一个一个单独的词。Specifically, the electronic device first performs Chinese word segmentation on the target command to obtain a word vector of the target command. Chinese word segmentation refers to dividing a continuous sequence of words into individual words.
电子设备将目标命令的词向量输入语义编码模型进行编码,得到目标命令的语义向量。语义编码模型可以是一种循环神经网络(recurrent neural network,RNN),最经常采用的RNN模型是双向长短期记忆模型(bidirectional long short-term memory,BiLSTM)。BiLSTM可以使用一个包含3个隐层,每层600个节点的网络实现。例如,目标命令为“温度有点冷了”,将目标命令进行中文分词得到“温度”词向量、“有点”词向量和“冷”词向量。将“温度”词向量、“有点”词向量和“冷”词向量输入语义编码模型,经过语义编码模型的推理,得到目标命令“温度有点冷了”的语义向量。The electronic device inputs the word vector of the target command into the semantic encoding model for encoding, and obtains the semantic vector of the target command. The semantic encoding model can be a recurrent neural network (RNN), and the most commonly used RNN model is a bidirectional long short-term memory (BiLSTM). BiLSTM can be implemented using a network with 3 hidden layers of 600 nodes each. For example, the target command is "the temperature is a little cold", and the target command is subjected to Chinese word segmentation to obtain the word vector of "temperature", the word vector of "point" and the word vector of "cold". Input the word vector of "temperature", the word vector of "a bit" and the word vector of "cold" into the semantic encoding model, and after the inference of the semantic encoding model, the semantic vector of the target command "temperature is a little cold" is obtained.
需要说明的是,电子设备存储目标命令的语义向量,以便于目标命令的语义向量作为历史命令的语义向量,辅助电子设备对后续命令进行自然语言理解。It should be noted that the electronic device stores the semantic vector of the target command, so that the semantic vector of the target command can be used as the semantic vector of the historical command to assist the electronic device to perform natural language understanding of subsequent commands.
历史命令的语义向量可以是一个M列的矩阵,每一列表示一个历史人机交互任务的命令的语义向量。用目标命令的语义向量与矩阵中的每一列相乘,得到命令语义相关权重。命令语义相关权重满足公式(1)。The semantic vector of historical commands may be a matrix of M columns, each column representing a semantic vector of commands of a historical human-computer interaction task. Multiply each column in the matrix with the semantic vector of the target command to get the command semantic correlation weight. The command semantic correlation weight satisfies Equation (1).
p m=u Th m     (1) p m = u Th m (1)
其中,u T=[x 1,…,x j],u T表示目标命令的语义向量,
Figure PCTCN2021114853-appb-000001
h m表 示历史命令的语义向量,p m表示命令语义相关权重。
where u T =[x 1 ,...,x j ], u T represents the semantic vector of the target command,
Figure PCTCN2021114853-appb-000001
h m represents the semantic vector of historical commands, and p m represents the command semantic correlation weight.
可选的,电子设备还可以基于命令语义相关权重和用户权重对历史命令进行加权编码,得到历史命令编码信息。用户权重表示用户与发出历史命令的历史用户之间的关联程度。在一些实施例中,在用户与电子设备进行人机交互前,电子设备可以提示用户提供用户的声纹,电子设备存储用户的声纹。电子设备利用用户的声纹与历史用户的声纹进行比较,得到用户与历史用户的关联程度,即用户权重。历史用户是指与电子设备进行人机交互过的用户。例如,电子设备依据用户的声纹获取用户与历史用户的相似程度,得到用户权重。相似程度可以是用户与历史用户的似然度。可理解的,用户权重较大,表示用户是该历史用户的可能性较高,设定较高的权重;用户权重较小,表示用户是该历史用户的可能性较低,设定权重较低。Optionally, the electronic device may further perform weighted encoding on the historical command based on the command semantic correlation weight and the user weight to obtain historical command encoding information. The user weight represents the degree of association between the user and the historical user who issued the historical command. In some embodiments, before the user performs human-computer interaction with the electronic device, the electronic device may prompt the user to provide the user's voiceprint, and the electronic device stores the user's voiceprint. The electronic device compares the user's voiceprint with the historical user's voiceprint to obtain the degree of association between the user and the historical user, that is, the user weight. Historical users refer to users who have performed human-computer interaction with electronic devices. For example, the electronic device obtains the similarity between the user and the historical user according to the user's voiceprint, and obtains the user weight. The degree of similarity may be the likelihood of the user and historical users. It is understandable that if the user has a large weight, it means that the user is more likely to be the historical user, and a higher weight is set; if the user weight is small, it means that the user is less likely to be the historical user, and a lower weight is set. .
可选的,电子设备还可以基于命令语义相关权重、用户权重和用户关系相关度权重对历史命令进行加权编码,得到历史命令编码信息。用户关系相关度权重为预设的多个用户的关系强度值。例如,该电子设备是智能家居,使用智能家居的用户通常是固定的家庭成员,家庭成员中的任何一成员可以设置与其他成员的关系强度值。关系强度值可以包括高、中、低和无关联等。Optionally, the electronic device may further perform weighted coding on the historical command based on the command semantic correlation weight, the user weight and the user relationship correlation degree weight to obtain historical command coding information. The user relationship relevance weight is a preset relationship strength value of multiple users. For example, the electronic device is a smart home, the users who use the smart home are usually fixed family members, and any member of the family member can set the value of the strength of the relationship with other members. Relationship strength values can include high, medium, low, and no relationship, among others.
具体的,电子设备基于加权信息对历史命令进行加权编码得到加权编码后的历史命令的语义向量,将加权编码后的历史命令的语义向量和目标命令的语义向量合并,并通过一个全连接网络进行编码,得到历史命令编码信息。加权信息包括命令语义相关权重、用户权重和用户关系相关度权重中至少一个。加权编码后的历史命令的语义向量满足公式(2)。Specifically, the electronic device performs weighted encoding on the historical command based on the weighted information to obtain the semantic vector of the weighted encoded historical command, merges the semantic vector of the weighted encoded historical command and the semantic vector of the target command, and executes the process through a fully connected network. Encoding to obtain historical command encoding information. The weighting information includes at least one of a command semantic relevance weight, a user weight, and a user relationship relevance weight. The semantic vector of the weighted encoded history command satisfies the formula (2).
h′=∑ mp mh mS m      (2) h′=∑ m p m h m S m (2)
其中,h′表示加权编码后的历史命令的语义向量,p m表示命令语义相关权重,h m表示历史命令的语义向量,S m表示用户权重。或者,S m表示用户关系相关度权重。或者,S m表示用户权重和用户关系相关度权重。 Among them, h' represents the semantic vector of the weighted encoded historical command, p m represents the command semantic correlation weight, h m represents the semantic vector of the historical command, and S m represents the user weight. Alternatively, S m represents the user relationship relevance weight. Alternatively, S m represents the user weight and the user relationship relevance weight.
进一步地,图5为本实施例提供的另一种人机交互方法的流程示意图,其中,图5所述的方法流程是对图4中S2022所包括的具体操作过程的阐述,如图所示。S20221、电子设备利用意图理解模型对目标命令的词向量和历史命令编码信息进行自然语言理解,得到目标命令的意图和槽位。S20222、电子设备根据目标命令的意图和槽位,以及历史命令的历史决策编码向量,生成目标决策。Further, FIG. 5 is a schematic flow chart of another human-computer interaction method provided in this embodiment, wherein the method flow described in FIG. 5 is an elaboration of the specific operation process included in S2022 in FIG. 4 , as shown in the figure . S20221 , the electronic device uses the intent understanding model to perform natural language understanding on the word vector of the target command and historical command coding information, and obtains the intent and slot of the target command. S20222. The electronic device generates a target decision according to the intent and slot position of the target command and the historical decision coding vector of the historical command.
具体的,如图6中的(a)所示,电子设备对目标命令进行中文分词,得到目标命令的词向量(执行S601)。例如,目标命令为“温度有点冷了”,目标命令的词向量包括“温度”词向量、“有点”词向量和“冷”词向量。电子设备利用语义编码模型对目标命令的词向量进行编码,得到目标命令的语义向量(执行S602)。电子设备根据目标命令的语义向量和历史命令的语义向量进行相似度计算得到命令语义相关权重Specifically, as shown in (a) of FIG. 6 , the electronic device performs Chinese word segmentation on the target command to obtain a word vector of the target command (execute S601 ). For example, the target command is "the temperature is a little cold", and the word vector of the target command includes the word vector of "temperature", the word vector of "a bit", and the word vector of "cold". The electronic device uses the semantic encoding model to encode the word vector of the target command to obtain the semantic vector of the target command (go to S602). The electronic device calculates the similarity according to the semantic vector of the target command and the semantic vector of the historical command to obtain the command semantic correlation weight
(执行S603)。假设历史命令为“好热啊,空调开到20度”。命令语义相关权重包括目标命令为“温度有点冷了”和历史命令为“好热啊,空调开到20度”的语义相关程度。值得说明的是,命令语义相关权重包括目标命令与多个历史人机交互任务的命令的语义相关程度。基于第一加权信息对历史命令的语义向量进行加权编码,得到历史命令编码信息(执行S604)。第一加权信息包括命令语义相关权重。可选的,第一 加权信息还包括用户权重和用户关系相关度权重。电子设备利用意图理解模型对目标命令的词向量和历史命令编码信息进行自然语言理解,得到目标命令的意图和槽位(执行S605)。例如,目标命令为“温度有点冷了”,目标命令的意图可能表示关于调节温度。目标命令的槽位可以是“温度”、“有点”和“冷”。意图理解模型可以是RNN,下文具体案例使用基于TRANSFORMER结构的BERT模型来实现。(Go to S603). Suppose the historical command is "It's so hot, turn on the air conditioner to 20 degrees". The command semantic correlation weight includes the semantic correlation degree of the target command being "the temperature is a little cold" and the historical command being "so hot, the air conditioner is turned on to 20 degrees". It is worth noting that the command semantic correlation weight includes the semantic correlation degree between the target command and the commands of multiple historical human-computer interaction tasks. Weighted encoding is performed on the semantic vector of the historical command based on the first weighted information to obtain historical command encoding information (go to S604). The first weighting information includes command semantic-related weights. Optionally, the first weighting information further includes user weights and user relationship relevance weights. The electronic device uses the intent understanding model to perform natural language understanding on the word vector of the target command and historical command coding information, and obtains the intent and slot of the target command (go to S605). For example, if the target command is "temperature is a little cold," the intent of the target command may indicate that it is about adjusting the temperature. The slots for the target command can be "temperature", "somewhat", and "cold". The intent understanding model can be RNN, and the following specific cases are implemented using the BERT model based on the TRANSFORMER structure.
意图理解模型可以通过来自变压器的双向编码器表示(bidirectional encoder representations from transformers,BERT)的方式训练得到。BERT是google提出的基于双向transformer的模型,它可以先使用大量无监督的文本语料预训练得到,预训练过程包括两个技术,一是随机屏蔽训练语句中的部分字符来预测被屏蔽的字符,二是训练理解句子间的关系,给定当前文本条件下预测下一句的任务。BERT模型训练完成之后,意图理解模型包含了对语义分析的预训练深层结构,接着使用目标相关的意图理解任务对BERT模型进行二次训练调整(fine-tune)。需要注意的是,为了在模型中引入前文获取的历史语义相关信息,BERT网络的句首第一个输入使用加权语义编码向量,如图7所示,将目标命令的词向量和历史命令编码信息输入意图理解模型得到目标命令的意图和槽位。Intent understanding models can be trained with bidirectional encoder representations from transformers (BERT). BERT is a two-way transformer-based model proposed by google. It can be pre-trained using a large amount of unsupervised text corpus. The pre-training process includes two techniques. One is to randomly mask some characters in the training sentence to predict the masked characters. The second is to train to understand the relationship between sentences, and to predict the next sentence given the current text conditions. After the BERT model is trained, the intent understanding model includes a pre-trained deep structure for semantic analysis, and then the BERT model is fine-tuned using the target-related intent understanding task. It should be noted that, in order to introduce the historical semantic related information obtained above into the model, the first input of the BERT network uses a weighted semantic encoding vector, as shown in Figure 7, the word vector of the target command and the historical command encoding information. Enter the intent understanding model to get the intent and slot of the target command.
决策模型是输入为意图、对话状态以及系统数据库信息,输出为具体决策的一个分类模型。如图6中的(b)所示,电子设备对目标命令的意图、对话状态以及在系统数据库中获得的信息进行编码,得到决策编码向量(执行S606)。编码网络可以使用一个多层卷积神经网络(convolutional neural networks,CNN)来实现。电子设备决策编码向量和历史决策编码向量进行相似度计算,得到历史决策编码向量权重(执行S607)。历史决策是电子设备根据历史命令确定的系统动作。例如,历史命令为“今天天气怎么样”。系统动作为输出“天气阴,温度”。又如,历史命令为“好热啊,空调开到20度”。系统动作为调节空调温度为20度。历史决策编码向量权重包括目标命令“温度有点冷了”的决策编码向量和历史决策“空调开到20度”的历史决策编码向量的相关程度。值得说明的是,历史决策编码向量权重包括目标命令的决策编码向量与多个历史人机交互任务的决策的历史决策编码向量的相关程度。电子设备基于第二加权信息对历史决策编码向量进行加权编码,得到历史决策编码信息(执行S608)。第二加权信息包括历史决策编码向量权重。可选的,第二加权信息还包括用户权重和用户关系相关度权重。历史决策编码向量权重表示决策编码向量与历史决策编码向量的相关程度。电子设备利用决策模型对决策编码向量和历史决策编码信息进行分析,生成目标决策(执行S609)。例如,目标命令为“温度有点冷了”,目标决策可以是“空调开到29度”。决策模型可以使用浅层分类器来实现,如支持向量机;也可以使用深度神经网络(deep neural networks,DNN),如多层全连接前向网络(feedforward neural network,FNN)来实现。Decision model is a classification model whose input is intention, dialogue state and system database information, and output is specific decision. As shown in (b) of FIG. 6 , the electronic device encodes the intent of the target command, the dialog state, and the information obtained in the system database to obtain a decision encoding vector (execute S606 ). The encoding network can be implemented using a multilayer convolutional neural network (CNN). The similarity calculation is performed between the electronic device decision coding vector and the historical decision coding vector to obtain the weight of the historical decision coding vector (go to S607). Historical decisions are system actions determined by electronic devices based on historical commands. For example, the historical command is "how is the weather today". The system action is to output "overcast, temperature". Another example, the historical command is "It's so hot, turn on the air conditioner to 20 degrees". The system action is to adjust the temperature of the air conditioner to 20 degrees. The weight of the historical decision coding vector includes the correlation degree between the decision coding vector of the target command "the temperature is a little cold" and the historical decision coding vector of the historical decision "turn on the air conditioner to 20 degrees". It is worth noting that the weight of the historical decision coding vector includes the degree of correlation between the decision coding vector of the target command and the historical decision coding vectors of the decisions of multiple historical human-computer interaction tasks. The electronic device performs weighted encoding on the historical decision encoding vector based on the second weighted information to obtain historical decision encoding information (go to S608). The second weighting information includes historical decision coding vector weights. Optionally, the second weighting information further includes user weights and user relationship relevance weights. The weight of the historical decision coding vector represents the degree of correlation between the decision coding vector and the historical decision coding vector. The electronic device uses the decision model to analyze the decision coding vector and the historical decision coding information to generate a target decision (go to S609). For example, the target command is "the temperature is a little cold", and the target decision can be "turn on the air conditioner to 29 degrees". Decision models can be implemented using shallow classifiers, such as support vector machines, or deep neural networks (DNNs), such as multi-layer fully connected feedforward networks (FNNs).
需要说明的是,电子设备存储决策编码向量,辅助电子设备对后续用户发出的目标命令进行决策。It should be noted that the electronic device stores the decision coding vector, which assists the electronic device to make decisions on the target command issued by the subsequent user.
历史决策编码向量可以是一个M列的矩阵,每一列表示一个历史人机交互任务的决策的历史决策编码向量。用决策编码向量与矩阵中的每一列相乘,得到历史决策编码向量权重。历史决策编码向量权重满足公式(3)。The historical decision coding vector may be a matrix of M columns, and each column represents a historical decision coding vector of a decision of a historical human-computer interaction task. Multiply each column in the matrix by the decision coding vector to get the weight of the historical decision coding vector. The weight of the historical decision coding vector satisfies the formula (3).
q m=w Tk m     (3) q m =w T k m (3)
其中,w T=[y 1,…,y],w T表示决策编码向量,
Figure PCTCN2021114853-appb-000002
k m表示历史决策编码向量,q m表示历史决策编码向量权重。
Among them, w T =[y 1 ,...,y], w T represents the decision coding vector,
Figure PCTCN2021114853-appb-000002
k m represents the historical decision coding vector, and q m represents the weight of the historical decision coding vector.
电子设备基于第二加权信息对历史决策编码向量进行加权编码得到加权编码后的历史决策编码向量,将加权编码后的历史决策编码向量和决策编码向量合并,并通过一个全连接网络进行编码,得到历史决策编码信息。加权编码后的历史决策编码向量满足公式(4)。The electronic device performs weighted coding on the historical decision coding vector based on the second weighted information to obtain the weighted coding historical decision coding vector, combines the weighted coding historical decision coding vector and the decision coding vector, and encodes it through a fully connected network to obtain Historical decision encoding information. The weighted encoded historical decision encoding vector satisfies formula (4).
k′=∑ mq mk mS m   (4) k′=∑ m q m k m S m (4)
其中,k′表示加权编码后的历史决策编码向量。k m表示历史决策编码向量,q m表示历史决策编码向量权重。S m表示用户权重。或者,S m表示用户关系相关度权重。或者,S m表示用户权重和用户关系相关度权重。 Among them, k' represents the weighted encoded historical decision encoding vector. k m represents the historical decision coding vector, and q m represents the weight of the historical decision coding vector. S m represents the user weight. Alternatively, S m represents the user relationship relevance weight. Alternatively, S m represents the user weight and the user relationship relevance weight.
可选的,电子设备利用决策模型对决策编码向量、历史决策编码信息和用户的用户画像进行分析,确定目标决策。用户画像又称用户角色,用户画像是真实用户的虚拟代表,作为一种勾画用户、联系用户诉求与设计方向的有效工具,用户画像在各领域得到了广泛的应用。Optionally, the electronic device uses the decision model to analyze the decision coding vector, the historical decision coding information and the user portrait of the user to determine the target decision. User portraits are also called user roles. User portraits are virtual representatives of real users. As an effective tool for delineating users, linking user demands and design directions, user portraits have been widely used in various fields.
S203、电子设备输出目标决策。S203, the electronic device outputs a target decision.
为了使电子设备与用户交流,电子设备可以利用自然语言生成(Natural Language Generation,NLG)技术将目标决策映射成自然语言表达,即根据目标决策生成目标决策文本。自然语言生成是指将机器可读的决策转换为自然语言文本。电子设备可以通过显示屏显示目标决策文本,以便于用户获取到电子设备输出的系统对话语句。可选的,电子设备还可以将目标决策文本转换为目标决策语音,通过语音形式播放给用户听。In order to make the electronic device communicate with the user, the electronic device can use the natural language generation (Natural Language Generation, NLG) technology to map the target decision into a natural language expression, that is, to generate the target decision text according to the target decision. Natural language generation refers to converting machine-readable decisions into natural language text. The electronic device can display the target decision text through the display screen, so that the user can obtain the system dialogue statement output by the electronic device. Optionally, the electronic device may also convert the target decision text into target decision voice, and play it to the user in the form of voice.
如此,对当前人机交互任务中用户发出的目标命令进行自然语言理解时,参考了历史人机交互任务中的历史命令的语义,辅助了对目标命令的自然语言理解,使得自然语言理解的结果与用户的真实意图更贴切;在执行系统决策时参考了历史决策,可以根据历史决策优化目标决策,有效地提升了人机交互的用户体验度。In this way, when the natural language understanding of the target command issued by the user in the current human-computer interaction task is carried out, the semantics of the historical command in the historical human-computer interaction task are referred to, which assists the natural language understanding of the target command, and makes the natural language understanding result. It is more appropriate to the real intention of the user; the historical decision-making is referred to when executing the system decision, and the target decision can be optimized according to the historical decision-making, which effectively improves the user experience of human-computer interaction.
可以理解的是,为了实现上述实施例中功能,电子设备包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。It can be understood that, in order to implement the functions in the foregoing embodiments, the electronic device includes corresponding hardware structures and/or software modules for performing each function. Those skilled in the art should easily realize that the units and method steps of each example described in conjunction with the embodiments disclosed in the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.
图8为本申请的实施例提供的可能的人机交互装置的结构示意图。这些人机交互装置可以用于实现上述方法实施例中电子设备的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该人机交互装置可以是如图1所示的电子设备,还可以是应用于电子设备的模块(如芯片)。FIG. 8 is a schematic structural diagram of a possible human-computer interaction apparatus provided by an embodiment of the present application. These human-computer interaction apparatuses can be used to implement the functions of the electronic device in the above method embodiments, and thus can also achieve the beneficial effects of the above method embodiments. In the embodiment of the present application, the human-computer interaction apparatus may be an electronic device as shown in FIG. 1 , or may be a module (eg, a chip) applied to the electronic device.
如图8所示,人机交互装置800包括获取单元810、处理单元820和反馈单元830。人机交互装置800用于实现上述图2、图4、图5或图6中所示的方法实施例中电子设备的功能。As shown in FIG. 8 , the human-computer interaction apparatus 800 includes an acquisition unit 810 , a processing unit 820 and a feedback unit 830 . The human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 2 , FIG. 4 , FIG. 5 or FIG. 6 .
当人机交互装置800用于实现图2所示的方法实施例中电子设备的功能时:获取单元810用于执行S201;处理单元820用于执行S202;反馈单元830用于执行S203。When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 2 : the obtaining unit 810 is used to perform S201 ; the processing unit 820 is used to perform S202 ; and the feedback unit 830 is used to perform S203 .
当人机交互装置800用于实现图4所示的方法实施例中电子设备的功能时:获取单元810用于执行S201;处理单元820用于执行S2021和S2022;反馈单元830用于执行S203。When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 4 : the obtaining unit 810 is used to perform S201; the processing unit 820 is used to perform S2021 and S2022; and the feedback unit 830 is used to perform S203.
当人机交互装置800用于实现图5所示的方法实施例中电子设备的功能时:获取单元810用于执行S201;处理单元820用于执行S2021、S20221和S20222;反馈单元830用于执行S203。When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 5: the obtaining unit 810 is used to execute S201; the processing unit 820 is used to execute S2021, S20221 and S20222; the feedback unit 830 is used to execute S203.
当人机交互装置800用于实现图6所示的方法实施例中电子设备的功能时:处理单元820用于执行S601至S609。When the human-computer interaction apparatus 800 is used to implement the functions of the electronic device in the method embodiment shown in FIG. 6 : the processing unit 820 is used to execute S601 to S609.
有关上述获取单元810、处理单元820和反馈单元830更详细的描述可以直接参考图2、图4、图5或图6所示的方法实施例中相关描述直接得到,这里不加赘述。获取单元810、处理单元820和反馈单元830的功能可以有上述图1中处理器110实现。More detailed descriptions of the above obtaining unit 810 , processing unit 820 and feedback unit 830 can be obtained directly by referring to the relevant descriptions in the method embodiments shown in FIG. 2 , FIG. 4 , FIG. 5 or FIG. 6 , and details are not repeated here. The functions of the acquiring unit 810 , the processing unit 820 and the feedback unit 830 may be implemented by the processor 110 in FIG. 1 described above.
可选的,如图9所示,人机交互装置900可以包括语音识别单元910、语言理解单元920、对话管理单元930、语言生成单元940和语音合成单元950。其中,语音识别单元910用于实现获取单元810的功能。如:语音识别单元910用于识别用户发出的语音,得到目标命令。语言理解单元920和对话管理单元930用于实现处理单元820的功能,得到目标决策。例如,语言理解单元920用于利用意图理解模型对目标命令的词向量和历史命令编码信息进行自然语言理解,得到目标命令的意图和槽位。对话管理单元930用于根据目标命令的意图和槽位,以及历史命令的历史决策编码向量,生成目标决策。语言生成单元940和语音合成单元950用于实现反馈单元830的功能。如:语言生成单元940用于将目标决策转换为自然语言。语音合成单元950用于向用户反馈决策的语言。Optionally, as shown in FIG. 9 , the human-computer interaction device 900 may include a speech recognition unit 910 , a language understanding unit 920 , a dialogue management unit 930 , a language generation unit 940 and a speech synthesis unit 950 . The speech recognition unit 910 is used to realize the function of the acquisition unit 810 . For example, the voice recognition unit 910 is used to recognize the voice issued by the user to obtain the target command. The language understanding unit 920 and the dialogue management unit 930 are used to implement the functions of the processing unit 820 to obtain target decisions. For example, the language understanding unit 920 is configured to use the intent understanding model to perform natural language understanding on the word vector of the target command and historical command coding information, and obtain the intent and slot of the target command. The dialogue management unit 930 is configured to generate the target decision according to the intent and slot of the target command and the historical decision encoding vector of the historical command. The language generation unit 940 and the speech synthesis unit 950 are used to realize the function of the feedback unit 830 . For example, the language generation unit 940 is used to convert the target decision into natural language. The speech synthesis unit 950 is used to feed back the decision language to the user.
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。It can be understood that the processor in the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (Field Programmable Gate Array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor may be a microprocessor or any conventional processor.
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或电子设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或电子设 备中。The method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM) , PROM), Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disks, removable hard disks, CD-ROMs or known in the art in any other form of storage medium. An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage medium may reside in an ASIC. Alternatively, the ASIC may reside in a network device or an electronic device. Of course, the processor and storage medium may also exist as discrete components in a network device or an electronic device.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus. The computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits by wire or wireless to another website site, computer, server or data center. The computer-readable storage medium may be any available media that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium can be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it can also be an optical medium, such as a digital video disc (DVD); it can also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。In the various embodiments of the present application, if there is no special description or logical conflict, the terms and/or descriptions between different embodiments are consistent and can be referred to each other, and the technical features in different embodiments are based on their inherent Logical relationships can be combined to form new embodiments.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。In this application, "at least one" means one or more, and "plurality" means two or more. "And/or", which describes the relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, it can indicate that A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. In the text description of this application, the character "/" generally indicates that the related objects are a kind of "or" relationship; in the formula of this application, the character "/" indicates that the related objects are a kind of "division" Relationship.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。It can be understood that, the various numbers and numbers involved in the embodiments of the present application are only for the convenience of description, and are not used to limit the scope of the embodiments of the present application. The size of the sequence numbers of the above processes does not imply the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic.

Claims (13)

  1. 一种人机交互方法,其特征在于,包括:A human-computer interaction method, comprising:
    接收用户发出的目标命令;Receive the target command issued by the user;
    利用历史命令、所述历史命令的历史决策和所述目标命令生成所述目标命令的目标决策,所述历史命令是历史人机交互任务的命令,所述目标命令是当前人机交互任务的命令;The target decision of the target command is generated using the historical command, the historical decision of the historical command and the target command, the historical command is the command of the historical human-computer interaction task, and the target command is the command of the current human-computer interaction task ;
    输出所述目标决策。The target decision is output.
  2. 根据权利要求1所述的方法,其特征在于,利用历史命令、所述历史命令的历史决策和所述目标命令生成所述目标命令的目标决策,包括:The method according to claim 1, wherein generating the target decision of the target command using historical commands, historical decisions of the historical commands and the target command, comprising:
    基于命令语义相关权重对所述历史命令进行加权编码,得到历史命令编码信息,所述命令语义相关权重表示所述目标命令与所述历史命令的语义相关程度;The historical command is weighted and encoded based on the command semantic correlation weight to obtain historical command encoding information, and the command semantic correlation weight represents the semantic correlation degree between the target command and the historical command;
    根据所述目标命令、所述历史命令编码信息和所述历史命令的历史决策生成所述目标决策。The target decision is generated according to the target command, the historical command encoding information and the historical decision of the historical command.
  3. 根据权利要求2所述的方法,其特征在于,在基于命令语义相关权重对所述历史命令进行加权编码之前,所述方法还包括:The method according to claim 2, characterized in that before weighted encoding the historical command based on the command semantic correlation weight, the method further comprises:
    对所述目标命令进行语义编码,得到所述目标命令的语义向量;Semantic encoding is performed on the target command to obtain a semantic vector of the target command;
    根据所述目标命令的语义向量和所述历史命令的语义向量进行相似度计算得到所述命令语义相关权重。The command semantic correlation weight is obtained by calculating the similarity according to the semantic vector of the target command and the semantic vector of the historical command.
  4. 根据权利要求2或3所述的方法,其特征在于,基于命令语义相关权重对所述历史命令进行加权编码,得到历史命令编码信息,包括:The method according to claim 2 or 3, wherein weighted coding is performed on the historical command based on the relevant weight of command semantics to obtain historical command coding information, comprising:
    基于所述命令语义相关权重和用户权重对所述历史命令进行加权编码,得到历史命令编码信息,所述用户权重表示所述用户与发出所述历史命令的历史用户之间的关联程度。The historical command is weighted and encoded based on the command semantic correlation weight and the user weight to obtain historical command encoding information, where the user weight represents the degree of association between the user and the historical user who issued the historical command.
  5. 根据权利要求4所述的方法,其特征在于,在所述基于命令语义相关权重和用户权重对所述历史命令进行加权编码之前,所述方法还包括:The method according to claim 4, wherein before the weighted coding of the historical command based on the command semantic correlation weight and the user weight, the method further comprises:
    依据所述用户的声纹获取所述用户与所述历史用户的关联程度,得到所述用户权重。The user weight is obtained by acquiring the degree of association between the user and the historical user according to the user's voiceprint.
  6. 根据权利要求4或5所述的方法,其特征在于,基于所述命令语义相关权重和用户权重对所述历史命令进行加权编码,得到历史命令编码信息,包括:The method according to claim 4 or 5, wherein the historical command is weighted and encoded based on the command semantic correlation weight and the user weight to obtain historical command encoding information, comprising:
    基于所述命令语义相关权重、所述用户权重和用户关系相关度权重对所述历史命令进行加权编码,得到历史命令编码信息,所述用户关系相关度权重为预设的多个用户的关系强度值。The historical command is weighted and encoded based on the command semantic correlation weight, the user weight and the user relationship correlation degree weight to obtain historical command encoding information, and the user relationship correlation degree weight is a preset relationship strength of multiple users value.
  7. 根据权利要求2-6中任一项所述的方法,其特征在于,根据所述目标命令、所述历史命令编码信息和所述历史命令的历史决策生成所述目标决策,包括:The method according to any one of claims 2-6, wherein generating the target decision according to the target command, the historical command coding information and the historical decision of the historical command, comprising:
    利用意图理解模型对所述目标命令的词向量和所述历史命令编码信息进行自然语言理解,得到所述目标命令的意图和槽位;Utilize the intent understanding model to perform natural language understanding on the word vector of the target command and the historical command coding information to obtain the intent and slot of the target command;
    根据所述目标命令的意图和槽位,以及所述历史命令的历史决策编码向量,生成所述目标决策。The target decision is generated according to the intent and slot of the target command and the historical decision encoding vector of the historical command.
  8. 根据权利要求7所述的方法,其特征在于,根据所述目标命令的意图和槽位, 以及所述历史命令的历史决策编码向量,生成所述目标决策,包括:The method according to claim 7, wherein generating the target decision according to the intention and slot of the target command and the historical decision coding vector of the historical command, comprising:
    对所述目标命令的意图和槽位进行编码,得到决策编码向量;Encoding the intention and slot of the target command to obtain a decision encoding vector;
    基于历史决策编码向量权重对所述历史决策编码向量进行加权编码,得到历史决策编码信息,所述历史决策编码向量权重表示所述决策编码向量与所述历史决策编码向量的相关程度;Perform weighted encoding on the historical decision encoding vector based on the historical decision encoding vector weight to obtain historical decision encoding information, where the historical decision encoding vector weight represents the degree of correlation between the decision encoding vector and the historical decision encoding vector;
    利用决策模型对所述决策编码向量和所述历史决策编码信息进行分析,生成所述目标决策。The target decision is generated by analyzing the decision coding vector and the historical decision coding information by using a decision model.
  9. 根据权利要求8所述的方法,其特征在于,在基于历史决策编码向量权重对所述历史决策编码向量进行加权编码,得到历史决策编码信息之前,所述方法还包括:The method according to claim 8, characterized in that, before performing weighted encoding on the historical decision encoding vector based on the historical decision encoding vector weight to obtain historical decision encoding information, the method further comprises:
    对所述决策编码向量和所述历史决策编码向量进行相似度计算,得到所述历史决策编码向量权重。The similarity calculation is performed on the decision coding vector and the historical decision coding vector to obtain the weight of the historical decision coding vector.
  10. 根据权利要求8或9所述的方法,其特征在于,基于历史决策编码向量权重对所述历史决策编码向量进行加权编码,得到历史决策编码信息,包括:The method according to claim 8 or 9, wherein weighted encoding is performed on the historical decision encoding vector based on the historical decision encoding vector weight to obtain historical decision encoding information, including:
    基于所述历史决策编码向量权重和用户权重对所述历史决策编码向量进行加权编码,得到历史决策编码信息;Weighted encoding is performed on the historical decision encoding vector based on the historical decision encoding vector weight and the user weight to obtain historical decision encoding information;
    或者,基于所述历史决策编码向量权重、所述用户权重和用户关系相关度权重对所述历史决策编码向量进行加权编码,得到历史决策编码信息。Alternatively, weighted encoding is performed on the historical decision encoding vector based on the historical decision encoding vector weight, the user weight and the user relationship relevance weight to obtain historical decision encoding information.
  11. 一种人机交互装置,其特征在于,包括:A human-computer interaction device, comprising:
    获取单元,用于接收用户发出的目标命令;an acquisition unit for receiving the target command issued by the user;
    处理单元,用于利用历史命令、所述历史命令的历史决策和所述目标命令生成所述目标命令的目标决策,所述历史命令是历史人机交互任务的命令,所述目标命令是当前人机交互任务的命令;The processing unit is used for generating the target decision of the target command by using the historical command, the historical decision of the historical command and the target command, the historical command is the command of the historical human-computer interaction task, and the target command is the current human-computer interaction task. Commands for computer interaction tasks;
    反馈单元,用于输出所述目标决策。A feedback unit for outputting the target decision.
  12. 一种电子设备,其特征在于,包括:至少一个处理器、存储器和语音收发器,其中,所述语音收发器用于接收目标命令的语音或反馈目标决策的语音,所述存储器用于存储计算机程序和指令,所述处理器用于调用所述计算机程序和指令,与所述语音收发器协助执行如权利要求1-10中任一项所述的人机交互方法。An electronic device, characterized in that it comprises: at least one processor, a memory and a voice transceiver, wherein the voice transceiver is used to receive a voice of a target command or a voice of a feedback target decision, and the memory is used to store a computer program and instructions, the processor is configured to invoke the computer program and instructions to assist with the voice transceiver to execute the human-computer interaction method according to any one of claims 1-10.
  13. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被人机交互装置执行时,实现如权利要求1-10中任一项所述的方法。A computer-readable storage medium, characterized in that a computer program or instruction is stored in the storage medium, and when the computer program or instruction is executed by a human-computer interaction device, any one of claims 1-10 is implemented. the method described.
PCT/CN2021/114853 2020-08-28 2021-08-26 Human-computer interaction method and device WO2022042664A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010886462.3 2020-08-28
CN202010886462.3A CN112183105A (en) 2020-08-28 2020-08-28 Man-machine interaction method and device

Publications (1)

Publication Number Publication Date
WO2022042664A1 true WO2022042664A1 (en) 2022-03-03

Family

ID=73924596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114853 WO2022042664A1 (en) 2020-08-28 2021-08-26 Human-computer interaction method and device

Country Status (2)

Country Link
CN (1) CN112183105A (en)
WO (1) WO2022042664A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975654A (en) * 2023-08-22 2023-10-31 腾讯科技(深圳)有限公司 Object interaction method, device, electronic equipment, storage medium and program product
CN117649107A (en) * 2024-01-29 2024-03-05 上海朋熙半导体有限公司 Automatic decision node creation method, device, system and readable medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183105A (en) * 2020-08-28 2021-01-05 华为技术有限公司 Man-machine interaction method and device
CN113345174B (en) * 2021-05-31 2023-04-18 中国工商银行股份有限公司 Interactive simulation method and device for teller cash recycling machine and terminal platform
CN117557674A (en) * 2024-01-11 2024-02-13 宁波特斯联信息科技有限公司 Picture processing method, device, equipment and storage medium based on man-machine interaction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300310A1 (en) * 2017-04-06 2018-10-18 AIBrain Corporation Adaptive, interactive, and cognitive reasoner of an autonomous robotic system
CN110413752A (en) * 2019-07-22 2019-11-05 中国科学院自动化研究所 More wheel speech understanding methods, system, device based on dialog logic
CN110704588A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network
CN110781998A (en) * 2019-09-12 2020-02-11 腾讯科技(深圳)有限公司 Recommendation processing method and device based on artificial intelligence
CN110825857A (en) * 2019-09-24 2020-02-21 平安科技(深圳)有限公司 Multi-turn question and answer identification method and device, computer equipment and storage medium
CN112183105A (en) * 2020-08-28 2021-01-05 华为技术有限公司 Man-machine interaction method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300310A1 (en) * 2017-04-06 2018-10-18 AIBrain Corporation Adaptive, interactive, and cognitive reasoner of an autonomous robotic system
CN110413752A (en) * 2019-07-22 2019-11-05 中国科学院自动化研究所 More wheel speech understanding methods, system, device based on dialog logic
CN110704588A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network
CN110781998A (en) * 2019-09-12 2020-02-11 腾讯科技(深圳)有限公司 Recommendation processing method and device based on artificial intelligence
CN110825857A (en) * 2019-09-24 2020-02-21 平安科技(深圳)有限公司 Multi-turn question and answer identification method and device, computer equipment and storage medium
CN112183105A (en) * 2020-08-28 2021-01-05 华为技术有限公司 Man-machine interaction method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975654A (en) * 2023-08-22 2023-10-31 腾讯科技(深圳)有限公司 Object interaction method, device, electronic equipment, storage medium and program product
CN116975654B (en) * 2023-08-22 2024-01-05 腾讯科技(深圳)有限公司 Object interaction method and device, electronic equipment and storage medium
CN117649107A (en) * 2024-01-29 2024-03-05 上海朋熙半导体有限公司 Automatic decision node creation method, device, system and readable medium

Also Published As

Publication number Publication date
CN112183105A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
WO2022042664A1 (en) Human-computer interaction method and device
Zhang et al. Hello edge: Keyword spotting on microcontrollers
US20240038218A1 (en) Speech model personalization via ambient context harvesting
US10956771B2 (en) Image recognition method, terminal, and storage medium
WO2020232860A1 (en) Speech synthesis method and apparatus, and computer readable storage medium
US20180144749A1 (en) Speech recognition apparatus and method
WO2019052293A1 (en) Machine translation method and apparatus, computer device and storage medium
US20180052831A1 (en) Language translation device and language translation method
US20220172737A1 (en) Speech signal processing method and speech separation method
US20240105159A1 (en) Speech processing method and related device
US20200312306A1 (en) System and Method for End-to-End Speech Recognition with Triggered Attention
US20200234713A1 (en) Method and device for speech recognition
JP7324838B2 (en) Encoding method and its device, apparatus and computer program
JP7224447B2 (en) Encoding method, apparatus, equipment and program
US11314951B2 (en) Electronic device for performing translation by sharing context of utterance and operation method therefor
CN112885328A (en) Text data processing method and device
US11532310B2 (en) System and method for recognizing user's speech
WO2023207541A1 (en) Speech processing method and related device
WO2023231676A9 (en) Instruction recognition method and device, training method, and computer readable storage medium
KR20210028041A (en) Electronic device and Method for controlling the electronic device thereof
WO2022147692A1 (en) Voice command recognition method, electronic device and non-transitory computer-readable storage medium
CN110874402A (en) Reply generation method, device and computer readable medium based on personalized information
WO2024046473A1 (en) Data processing method and apparatus
US20230386448A1 (en) Method of training speech recognition model, electronic device and storage medium
US20230154172A1 (en) Emotion recognition in multimedia videos using multi-modal fusion-based deep neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860498

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21860498

Country of ref document: EP

Kind code of ref document: A1