CN112182196A

CN112182196A - Service equipment applied to multi-turn conversation and multi-turn conversation method

Info

Publication number: CN112182196A
Application number: CN202011212620.3A
Authority: CN
Inventors: 蒋似尧; 杨善松
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-01-05

Abstract

The application provides a service device applied to multiple rounds of conversations and a multiple-round conversation method, which are used for solving the problem that the conversation quality still needs to be improved in the multiple-round conversation process in the related art. According to the method and the device, the keywords are expanded based on the knowledge graph, the retrieval conditions can be added, namely, the knowledge behind the conversation content can be combined, the available features during retrieval are expanded, the relevance degree of the retrieval is enhanced, and the overall effect of the model is improved. Therefore, the effect of improving the recall rate of the candidate corpus set can be achieved. According to the method and the device, the relevance of information in multiple rounds of user conversations is fully utilized, multiple rounds of conversation histories are combined, the candidate corpora which accord with the conversation histories can be screened from the candidate corpus set to be replied, and the finally replied corpora can be screened more reasonably.

Description

Service equipment applied to multi-turn conversation and multi-turn conversation method

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a service device and a multi-turn dialogue method applied to multi-turn dialogue.

Background

With the development of natural language processing technology, research results thereof have been adopted in many scenes. Such as smart customer service, chat-enabled robots, etc.

A typical dialog is a task-based dialog, which is an interaction with a user to perform a specific task in a certain domain (e.g., booking airline tickets, querying maps, etc.). In the task type dialogue, the user is guided to complete the inquiry and ticket booking operation of ticket booking through the understanding of the user language.

In task-based conversations, if the user presents a topic unrelated to the task, the system may not react accordingly or the prompt may not understand the user's intent.

With task-based conversations is an open-domain chat, i.e., a chat that is not topic-limited. Open-domain chat is often used to allow the system to respond naturally without explicit intent of the query (query) of the user. In the existing man-machine conversation system, open domain chat mainly plays roles of shortening distance, establishing trust relationship, emotional companions, improving user stickiness and the like.

Open domain chat is a popular research topic in natural language processing. Generally, open domain chat has good relevance from the viewpoint of single sentence reply, but from the viewpoint of conversation history, problems of inconsistent context, disordered logic and the like often exist, and the chat is difficult to continue. Previous dialog histories as a whole are typically responded to based on contextual information, i.e., multiple rounds of dialog.

In a multi-turn conversation, there are roughly two ways to determine the reply content for a user query in the related art. One way is to use a search method, i.e. to select a corpus with the highest matching score from an existing corpus as a reply. Another way is a generative method that uses a sequence-to-sequence model, where the encoder reads the chat history and the decoder directly generates the next sentence reply.

The retrieval method extracts keywords from the query, and then selects the reply corpus from the known corpus by taking the keywords as retrieval conditions. However, the results show that the search method still has the problem of poor recovery effect.

The generation method uses an end-to-end model for generation, which is difficult to control the generation content and to ensure the effect of generating the reply.

In view of the above, how to accurately understand the user's intent and give reasonable replies still needs to be improved in the multi-turn dialog.

Disclosure of Invention

The application aims to provide intelligent equipment applied to multi-turn conversations and a multi-turn conversation method, and the intelligent equipment and the multi-turn conversation method are used for solving the problem that conversation contents automatically replied in the related technology are not accurate enough.

In a first aspect, the present application provides a service device for multiple rounds of dialog, the service device comprising an input interface and a processor: the processor is configured for;

extracting keywords from the dialog content of the current dialog;

acquiring knowledge information with a specified association relation with the keyword from a pre-constructed knowledge graph as an expansion word;

searching a candidate corpus set from a corpus by using the keywords and the expansion words as detection keywords;

and adopting multiple rounds of conversations of the current conversation to screen reply information aiming at the current conversation from the corpus candidate set.

In some embodiments, when performing the extracting of the keywords from the dialog content of the current dialog, the processor is configured to:

carrying out entity identification on the conversation content to obtain an identified entity as a first keyword;

performing syntactic analysis on the dialog content according to the first keyword to obtain a second keyword associated with the first keyword;

wherein the keywords comprise the first keyword and the second keyword.

In some embodiments, when performing the acquiring the knowledge information having the specified association relationship with the keyword from the pre-constructed knowledge graph as the augmented word, the processor is configured to:

acquiring a target entity corresponding to the keyword from the knowledge graph;

obtaining attributes of the target entity from the knowledge graph;

associating the target entity and its attributes into a dialog knowledge graph of a historical dialog;

and extracting keywords as the expansion words in a random walk mode in the dialogue knowledge graph.

acquiring knowledge information having a primary association relation with the keywords from the knowledge graph to serve as the expansion words; alternatively, the first and second electrodes may be,

and acquiring the specified attribute associated with the keyword from the indication image as the expansion word.

In some embodiments, when performing the retrieving the corpus candidate set from the corpus using the keyword and the augmented word as the retrieval condition, the processor is configured to:

and taking the keyword as a retrieval necessary condition and the expansion word as a retrieval supplement condition, and retrieving a candidate corpus set from the corpus.

In some embodiments, when performing the multiple rounds of conversations using the current conversation and filtering the reply information for the current conversation from the corpus candidate set, the processor is positioned to:

analyzing and processing the multiple rounds of conversations of the current conversation by adopting a deep learning matching model to obtain the matching degree of each corpus in the candidate corpus set and the current conversation;

and selecting the corpus with the highest matching degree as the reply information of the current conversation.

In a second aspect, the present application also provides a multi-turn dialog method, the method comprising;

extracting keywords from the dialog content of the current dialog;

In some embodiments, the extracting keywords from the dialog content of the current dialog includes:

wherein the keywords include the first keyword and the second keyword as the keywords, respectively.

In some embodiments, the obtaining the knowledge information having the specified association relationship with the keyword from a pre-constructed knowledge graph as an augmented word includes:

obtaining attributes of the target entity from the knowledge graph;

In some embodiments, acquiring the knowledge information having the specified association relationship with the keyword from a pre-constructed knowledge graph as an augmented word includes:

In some embodiments, the retrieving the corpus candidate set from the corpus using the keyword and the expansion word as retrieval conditions includes:

In some embodiments, the filtering out, from the corpus candidate, reply information for the current dialog using multiple turns of the current dialog includes:

In the embodiment of the application, the keywords are expanded based on the knowledge graph, the retrieval conditions can be added, namely, the knowledge behind the conversation content can be combined, the available features during retrieval are expanded, the relevance degree of the retrieval is enhanced, and the overall effect of the model is improved. Therefore, the effect of improving the recall rate of the candidate corpus set can be achieved. According to the method and the device, the relevance between the multi-round user conversation and the current conversation is fully utilized, the multi-round conversation history is further combined, the candidate corpora which accord with the conversation history can be screened from the candidate corpus set for response, and the screening of the final reply corpora is more reasonable.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an operational scenario between a display device and a control device according to some embodiments;

a block diagram of a hardware configuration of a display device 200 according to some embodiments is illustrated in fig. 2;

a block diagram of the hardware configuration of the control device 1001 according to some embodiments is illustrated in fig. 3;

a schematic diagram of a software configuration in a display device 200 according to some embodiments is illustrated in fig. 4;

FIG. 5 illustrates an icon control interface display diagram of an application in the display device 200, according to some embodiments;

fig. 6 is a schematic diagram illustrating a network architecture provided by an embodiment of the present application;

fig. 7a is a schematic diagram of another network architecture according to an embodiment of the present application;

fig. 7b is a flowchart illustrating a multi-turn dialog method provided by an embodiment of the present application;

fig. 8 and fig. 10 are schematic diagrams illustrating an association relationship between knowledge information of a knowledge-graph;

FIG. 9 is a flow chart illustrating one method of extracting augmented words;

fig. 11 is a schematic diagram illustrating an exemplary structure of a deep learning matching model.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first", "second", "third", and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily meant to define a particular order or sequence Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The term "remote control" as used in this application refers to a component of an electronic device, such as the display device disclosed in this application, that is typically wirelessly controllable over a short range of distances. Typically using infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in the common remote control device with the user interface in the touch screen.

The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.

In the related art, two ways are adopted for processing multiple rounds of conversations. Wherein:

1. the retrieval method extracts the contained key words from the query, and then selects the reply corpus from the known corpus by taking the key words as retrieval conditions. The method still has the problem of poor recovery effect.

2. The generation method adopts an end-to-end model for generation, so that the generation content is difficult to control, and the effect of generating the reply is difficult to guarantee.

Thus, how to accurately understand the user's intent and give reasonable replies still needs to be improved in the multi-turn dialog. The inventor finds that the retrieval method needs to pay attention to recall rate, namely reasonable corpora are screened from a huge corpus to make better response. In view of the above, the present application provides a communication device and a multi-turn dialog method applied to multi-turn dialogues. The invention conception of the application is as follows: after extracting the keywords from the query of the user, the keywords can be expanded according to the knowledge graph so as to increase some keywords and screen reply linguistic data which better accord with the intention of the query of the user. In the embodiment of the application, the corpus is roughly screened to obtain a candidate corpus set based on the expansion result of the keyword, then the dialogue history of the multi-turn dialogue is understood and analyzed by adopting a deep learning matching model, and the appropriate corpus is further screened from the candidate corpus set for replying.

Therefore, based on the knowledge graph to expand the keywords, the retrieval conditions can be added, namely, the knowledge behind the conversation content can be combined, and the available features during retrieval are expanded, so that the relevance degree of the retrieval is enhanced, and the overall effect of the model is improved. Therefore, the effect of improving the recall rate of the candidate corpus set can be achieved. In the embodiment of the application, the relevance between the conversation history in the multi-round user conversation and the current conversation content is fully utilized, and the multi-round conversation history is further combined, so that the candidate corpora which accord with the conversation history can be screened from the candidate corpus set for response, and the screening of the final responded corpora is more reasonable.

The communication device and method applied to multi-turn conversations provided by the present application are further described below with reference to the accompanying drawings.

In the application, the user query can be voice information or text information. The voice information or the text information can be input to the display device through the control device to be displayed, and the display device can display the reply content of the query of the user.

In this embodiment of the present application, the service device that executes the multi-turn conversation method in this embodiment of the present application may be a server or an intelligent device. The intelligent device is, for example, a smart television, a smart phone, etc. Fig. 1 is a schematic diagram illustrating an operation scenario between an intelligent device and a control device according to an embodiment. As shown in fig. 1, a user may operate the smart device 200 through a mobile terminal 1002 and a control device 1001.

In some embodiments, the control device 1001 may be a remote controller, and the communication between the remote controller and the smart device includes an infrared protocol communication or a bluetooth protocol communication, and other short-range communication methods, etc. to control the smart device 200 wirelessly or in other wired manners. The user may input user instructions via keys on a remote control, voice input, control panel input, etc. to control the smart device 200. Such as: the user can input a corresponding control instruction through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to realize the function of controlling the smart device 200. In this embodiment, the user may also select a corresponding text input to the smart device 200 by pressing a key.

In some embodiments, mobile terminals, tablets, computers, laptops, and other smart terminals may also be used to control the smart device 200. For example, the smart device 200 is controlled using an application running on the smart terminal. The application program can be associated with the intelligent terminal through configuration, and various controls are provided for a user in an intuitive User Interface (UI).

In some embodiments, the mobile terminal 1002 may install a software application with the smart device 200, and implement connection communication through a network communication protocol, so as to achieve the purpose of one-to-one control operation and data communication. Such as: the remote control keyboard can be synchronized to the mobile terminal 1002 by establishing a control instruction protocol between the mobile terminal 1002 and the intelligent device 200, and the function of controlling the intelligent device 200 can be realized by controlling the user interface on the mobile terminal 1002. The audio and video content displayed on the mobile terminal 1002 can also be transmitted to the intelligent device 200, so that the synchronous display function is realized.

As also shown in fig. 1, the smart device 200 is also in data communication with the server 400 through a variety of communication means. The smart device 200 may be allowed to communicatively connect through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various content and interactions to the smart device 200. Illustratively, the smart device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers. Other web service contents such as video on demand and advertisement services are provided through the server 400.

The smart device 200 may be a liquid crystal display, an OLED display, or a projection smart device. The specific smart device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the smart device 200 may be modified in performance and configuration as desired.

The smart device 200 may additionally provide a smart web tv function of a computer support function in addition to the broadcast receiving tv function, including but not limited to a web tv, a smart tv, an Internet Protocol Tv (IPTV), and the like.

A block diagram of a hardware configuration of a smart device 200 according to an exemplary embodiment is illustrated in fig. 2.

In some embodiments, at least one of the controller 250, the tuner demodulator 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the smart appliance 200.

In some embodiments, a display 275 receives image signals originating from the first processor output and displays video content and images and components of the menu manipulation interface.

In some embodiments, the display 275, includes a display screen assembly for presenting a picture, and a driving assembly that drives the display of an image.

In some embodiments, the video content is displayed from broadcast television content, or alternatively, from various broadcast signals that may be received via wired or wireless communication protocols. Alternatively, various image contents received from the network communication protocol and sent from the network server side can be displayed.

In some embodiments, the display 275 is used to present a user-manipulated UI interface generated in the smart device 200 and used to control the smart device 200.

In some embodiments, a driver assembly for driving the display is also included, depending on the type of display 275.

In some embodiments, display 275 is a projection display and may also include a projection device and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi chip, a bluetooth communication protocol chip, a wired ethernet communication protocol chip, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.

In some embodiments, the smart device 200 may establish control signal and data signal transmission and reception with the external control device 1001 or the content providing device through the communicator 220.

In some embodiments, user interface 265 may be configured to receive infrared control signals from a control device 1001 (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is a signal used by the smart device 200 to collect an external environment or interact with the outside.

In some embodiments, the detector 230 includes a light receiver, a sensor for collecting the intensity of ambient light, and parameters changes can be adaptively displayed by collecting the ambient light, and the like.

In some embodiments, an image collector 232 in the detector 230, such as a camera, a video camera, etc., may be used to collect external environment scenes, collect attributes of a user or gestures interacted with the user, adaptively change display parameters, and also recognize user gestures, so as to implement a function of interaction with the user.

In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.

In some embodiments, the smart device 200 may adaptively adjust the display color temperature of the image. For example, when the temperature is higher, the smart device 200 may be adjusted to display the image with a color temperature that is cooler, or when the temperature is lower, the smart device 200 may be adjusted to display the image with a warmer.

In some embodiments, the detector 230 may further include a sound collector 231, such as a microphone, for collecting voice data, wherein when the user speaks an instruction by voice, the microphone can collect voice data including the instruction spoken by the user. For example, the sound collector 231 may collect a voice signal including a control instruction of the user to control the smart device 200, or collect an environmental sound for identifying an environmental scene type, so that the smart device 200 may adaptively adapt to the environmental noise.

In some embodiments, as shown in fig. 2, the input/output interface 255 is configured to allow data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, or command instruction data, etc.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: the interface can be any one or more of a high-definition multimedia interface (HDMI), an analog or data high-definition component input interface, a composite video input interface, a USB input interface, an RGB port and the like. The plurality of interfaces may form a composite input/output interface.

In some embodiments, as shown in fig. 2, the tuning demodulator 210 is configured to receive a broadcast television signal through a wired or wireless receiving manner, perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and demodulate an audio and video signal from a plurality of wireless or wired broadcast television signals, where the audio and video signal may include a television audio and video signal carried in a television channel frequency selected by a user and an EPG data signal.

In some embodiments, the frequency points demodulated by the tuner demodulator 210 are controlled by the controller 250, and the controller 250 can send out control signals according to user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.

In some embodiments, the broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to the broadcasting system of the television signal. Or may be classified into a digital modulation signal, an analog modulation signal, and the like according to a modulation type. Or the signals are classified into digital signals, analog signals and the like according to the types of the signals.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box. Therefore, the set top box outputs the television audio and video signals modulated and demodulated by the received broadcast television signals to the main body equipment, and the main body equipment receives the audio and video signals through the first input/output interface.

In some embodiments, controller 250 controls the operation of the smart device and responds to user actions through various software control programs stored in memory. The controller 250 may control the overall operation of the smart device 200. For example: in response to receiving a user command for selecting a UI object displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon. The user command for selecting the UI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the smart device 200 or a voice command corresponding to a voice spoken by the user.

As shown in fig. 2, the controller 250 includes at least one of a Random Access Memory 251 (RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a Graphics Processing Unit (GPU), a Central Processing Unit 254 (CPU), a Communication Interface (Communication Interface), and a Communication Bus 256(Bus), which connects the respective components.

In some embodiments, RAM 251 is used to store temporary data for the operating system or other programs that are running.

In some embodiments, ROM252 is used to store instructions for various system boots.

In some embodiments, the ROM252 is used to store a Basic Input Output System (BIOS). The system is used for completing power-on self-test of the system, initialization of each functional module in the system, a driver of basic input/output of the system and booting an operating system.

In some embodiments, when the power-on signal is received, the smart device 200 starts to power up, the CPU executes the system boot instruction in the ROM252, and copies the temporary data of the operating system stored in the memory into the RAM 251 so as to start or run the operating system. After the start of the operating system is completed, the CPU copies the temporary data of the various application programs in the memory to the RAM 251, and then, the various application programs are started or run.

In some embodiments, CPU processor 254 is used to execute operating system and application program instructions stored in memory. And executing various application programs, data and contents according to various interactive instructions received from the outside so as to finally display and play various audio and video contents.

In some example embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include a main processor and one or more sub-processors. And a main processor for performing some operations of the smart device 200 in a pre-power-up mode and/or operations of displaying a screen in a normal mode. One or more sub-processors for one operation in a standby mode or the like.

In some embodiments, the graphics processor 253 is used to generate various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The display device comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes. And the system comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor 270 is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and the like according to a standard codec protocol of the input signal, so as to obtain a signal that can be directly displayed or played on the smart device 200.

In some embodiments, video processor 270 includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is used for demultiplexing the input audio and video data stream, and if the input MPEG-2 is input, the demultiplexing module demultiplexes the input audio and video data stream into a video signal and an audio signal.

And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.

And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.

The frame rate conversion module is configured to convert an input video frame rate, such as a 60Hz frame rate into a 120Hz frame rate or a 240Hz frame rate, and the normal format is implemented in, for example, an interpolation frame mode.

The display format module is used for converting the received video output signal after the frame rate conversion, and changing the signal to conform to the signal of the display format, such as outputting an RGB data signal.

In some embodiments, the graphics processor 253 and the video processor may be integrated or separately configured, and when the graphics processor and the video processor are integrated, the graphics processor and the video processor may perform processing of graphics signals output to the display, and when the graphics processor and the video processor are separately configured, the graphics processor and the video processor may perform different functions, respectively, for example, a GPU + frc (frame Rate conversion) architecture.

In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processes to obtain an audio signal that can be played in a speaker.

In some embodiments, video processor 270 may comprise one or more chips. The audio processor may also comprise one or more chips.

In some embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.

In some embodiments, the audio output, under the control of controller 250, receives sound signals output by audio processor 280, such as: the speaker 286, and the external sound output terminal of the generating device that can output to the external device, in addition to the speaker carried by the smart device 200 itself, such as: external sound interface or earphone interface, etc., and may also include a near field communication module in the communication interface, for example: and the Bluetooth module is used for outputting sound of the Bluetooth loudspeaker.

The power supply 290 provides power supply support for the smart device 200 with power input from an external power source under the control of the controller 250. The power supply 290 may include a built-in power circuit installed inside the smart device 200, or may be a power interface installed outside the smart device 200 to provide an external power source in the smart device 200.

A user interface 265 for receiving an input signal of a user and then transmitting the received user input signal to the controller 250. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.

In some embodiments, the user inputs a user command through the control device 1001 or the mobile terminal 1002, the user input interface responds to the user input through the controller 250 according to the user input, and the smart device 200 responds to the user input through the controller 1001.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

The memory 260 includes various software modules for driving the smart device 200. Such as: various software modules stored in the first memory, including: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules.

The basic module is a bottom layer software module for signal communication between each hardware in the smart device 200 and sending processing and control signals to the upper layer module. The detection module is used for collecting various information from various sensors or user input interfaces, and the management module is used for performing digital-to-analog conversion and analysis management.

For example, the voice recognition module comprises a voice analysis module and a voice instruction database module. The display control module is used for controlling the display to display the image content, and can be used for playing the multimedia image content, UI interface and other information. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing a module for data communication between browsing servers. And the service module is used for providing various services and modules including various application programs. Meanwhile, the memory 260 may store a visual effect map for receiving external data and user data, images of various items in various user interfaces, and a focus object, etc.

Fig. 3 exemplarily shows a block diagram of a configuration of the control device 1001 according to an exemplary embodiment. As shown in fig. 3, the control device 1001 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply source.

The control device 1001 is configured to control the smart device 200, and may receive an input operation instruction from a user, and convert the operation instruction into an instruction recognizable and responsive by the smart device 200, serving as an interaction intermediary between the user and the smart device 200. Such as: the user responds to the channel add/subtract operation by operating the channel add/subtract key on the control device 1001.

In some embodiments, the control device 1001 may be a smart terminal. Such as: the control device 1001 may install various applications that control the smart device 200 according to user needs.

In some embodiments, as shown in fig. 1, a mobile terminal 1002 or other intelligent electronic device may function similar to control device 1001 after installation of an application that operates smart device 200. Such as: a user may implement the functionality of physical keys of control device 1001 by installing applications, various function keys or virtual buttons of a graphical user interface that may be provided on mobile terminal 1002 or other intelligent electronic devices.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used to control the operation of the control device 1001, as well as the communications between the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the smart device 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the smart device 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize the user instruction input function through actions such as voice, touch, gestures, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the smart device 200.

The output interface includes an interface that transmits the received user instruction to the smart device 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, a user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the intelligent device 200 through the infrared sending module. The following steps are repeated: when the radio frequency signal interface is used, a user input instruction needs to be converted into a digital signal, and then the digital signal is modulated according to a radio frequency control signal modulation protocol and then is sent to the intelligent device 200 through the radio frequency sending terminal.

In some embodiments, the control device 1001 includes at least one of the communication interface 130 and the input-output interface 140. The control device 1001 configures the communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may send the user input command to the smart device 200 through the WiFi protocol, or bluetooth protocol, or NFC protocol encoding.

A memory 190 for storing various operation programs, data and applications for driving and controlling the control device 1001 under the control of the controller. The memory 190 may store various control signal commands input by a user.

And a power supply 180 for providing operational power support to the components of the control device 1001 under the control of the controller. A battery and associated control circuitry.

In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Fig. 4 is a schematic diagram of a software system of an intelligent device provided in the present Application, and referring to fig. 4, in some embodiments, the system is divided into four layers, which are, from top to bottom, an Application (Applications) layer (referred to as an "Application layer"), an Application Framework (Application Framework) layer (referred to as a "Framework layer"), an Android runtime (Android runtime) layer and a system library layer (referred to as a "system runtime library layer"), and a kernel layer.

In some embodiments, at least one application program runs in the application program layer, and the application programs can be Window (Window) programs carried by an operating system, system setting programs, clock programs, camera applications and the like; or may be an application developed by a third party developer such as a hi program, a karaoke program, a magic mirror program, or the like. In specific implementation, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is to: managing the life cycle of each application program and the general navigation backspacing function, such as controlling the exit of the application program (including switching the user interface currently displayed in the display window to the system desktop), opening, backing (including switching the user interface currently displayed in the display window to the previous user interface of the user interface currently displayed), and the like.

In some embodiments, the window manager is configured to manage all window processes, such as obtaining a display size, determining whether a status bar is available, locking a screen, intercepting a screen, controlling a display change (e.g., zooming out, dithering, distorting, etc.) and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (such as fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and so on.

In some embodiments, the kernel layer further comprises a power driver module for power management.

In some embodiments, software programs and/or modules corresponding to the software architecture of fig. 4 are stored in the first memory or the second memory shown in fig. 2 or 3.

In some embodiments, taking the magic mirror application (photographing application) as an example, when the remote control receiving device receives a remote control input operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the input operation into an original input event (including information such as a value of the input operation, a timestamp of the input operation, etc.). The raw input events are stored at the kernel layer. The application program framework layer obtains an original input event from the kernel layer, identifies a control corresponding to the input event according to the current position of the focus and uses the input operation as a confirmation operation, the control corresponding to the confirmation operation is a control of a magic mirror application icon, the magic mirror application calls an interface of the application framework layer to start the magic mirror application, and then the kernel layer is called to start a camera driver, so that a static image or a video is captured through the camera.

In some embodiments, for an intelligent device with a touch function, taking a split screen operation as an example, the intelligent device receives an input operation (such as a split screen operation) that a user acts on a display screen, and the kernel layer may generate a corresponding input event according to the input operation and report the event to the application framework layer. The window mode (such as multi-window mode) corresponding to the input operation, the position and size of the window and the like are set by an activity manager of the application framework layer. And the window management of the application program framework layer draws a window according to the setting of the activity manager, then sends the drawn window data to the display driver of the kernel layer, and the display driver displays the corresponding application interface in different display areas of the display screen.

In some embodiments, fig. 5 is a schematic diagram of applications that can be provided by the smart device provided in the present application, and as shown in fig. 5, the application layer includes at least one application program that can display a corresponding icon control in a display, such as: the system comprises a live television application icon control, a video on demand application icon control, a media center application icon control, an application center icon control, a game application icon control, a voice control function and the like.

In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may provide television signals using input from cable television, radio broadcasts, satellite services, or other types of live television services. And, the live television application may display a video of the live television signal on the smart device 200.

In some embodiments, a video-on-demand application may provide video from different storage sources. Unlike live television applications, video on demand provides a video display from some storage source. For example, the video on demand may come from a server side of the cloud storage, from a local hard disk storage containing stored video programs.

In some embodiments, the media center application may provide various applications for multimedia content playback. For example, a media center, which may be other than live television or video on demand, may provide services that a user may access to various images or audio through a media center application.

In some embodiments, an application center may provide storage for various applications. The application may be a game, an application, or some other application associated with a computer system or other device that may be run on the smart television. The application center may obtain these applications from different sources, store them in local storage, and then be run on the smart device 200.

More specifically, in some embodiments, any of the smart devices 200 described in the present application may have a voice interaction function, so as to improve the intelligence degree of the smart device 200 and improve the user experience of the smart device 200.

In some embodiments, fig. 6 is a schematic diagram of an application of a smart device in a voice interaction scenario, where a user 1 may speak an instruction that the smart device 200 desires to execute by voice, and then the smart device 200 may collect voice data in real time, recognize the instruction of the user 1 included in the voice data, and directly execute the instruction after recognizing the instruction of the user 1, and in the whole process, the user 1 does not actually operate the smart device 200 or other devices, but simply speaks the instruction.

In some embodiments, when the smart device 200 shown in fig. 2 is applied in the scenario shown in fig. 6, the smart device 200 may collect voice data in real time through its sound collector 231, and then the sound collector 231 transmits the collected voice data to the controller 250, and finally the controller 250 recognizes instructions included in the voice data.

In order to accurately understand a query of a user in multiple rounds of conversations and reasonably respond to the query to achieve the effect of natural conversation, the embodiment of the application provides a multiple round of conversation method.

For the purpose of clearly illustrating the embodiments of the present application, some explanations of related terms are given below.

Entity: it is intended to refer to things that exist objectively and that can be distinguished from each other, including specific persons, things, mechanisms, abstract concepts, and the like. In the field of natural language processing, an entity generally refers to an entity having a special meaning or strong reference in text, and generally includes a person name, a place name, an organization name, time, proper nouns and the like. According to the corpus notation of ACE2003, entities are divided into five major categories, namely, names of people, organizations, places, administration areas and facilities, but there are several subclasses in each category.

Entity Recognition, also known as NER (Named Entity Recognition), is a technology in the field of natural language processing. The NER extracts the entities from the unstructured text and can identify more categories of entities such as names of people, names of places, names of products, models, prices, etc. according to business needs. Therefore, the concept of entity can be very wide, and any special text segment required by the service can be called an entity.

Knowledge graph: is essentially a semantic network that can represent semantic relationships between entities. Entities are used as vertexes or nodes in the knowledge graph, and relationships are used as edges. The knowledge graph can be constructed in various ways, and the embodiment of the application does not point to how to construct the knowledge graph, so the detailed description is not provided.

For clarity of explanation of the embodiments of the present application, a network architecture provided by the embodiments of the present application is described below with reference to fig. 7 a.

Referring to fig. 7a, fig. 7a is a schematic diagram of a network architecture according to an embodiment of the present application. In fig. 7a, the intelligent device and the service device are included, and the service device may be a single server or a server cluster. In fig. 7a, the service devices include a voice recognition server device, a semantic service device, and a business service device. Wherein:

the intelligent equipment is used for receiving input information and outputting a processing result of the information; the voice recognition service equipment is electronic equipment with voice recognition service deployed, the semantic service equipment is electronic equipment with semantic service deployed, and the business service equipment is electronic equipment with business service deployed. The electronic device may include a server, a computer, and the like, and the speech recognition service, the semantic service (also referred to as a semantic engine), and the business service are web services that can be deployed on the electronic device, wherein the speech recognition service is used for recognizing audio as text, the semantic service is used for performing semantic parsing on the text, and the business service is used for providing specific services such as a query service, a ticket booking service, a chat service, and the like. In one embodiment, there may be multiple business service devices deployed with different business services in the architecture shown in FIG. 7 a.

The following describes an example of a process for processing information input into an intelligent device based on the architecture shown in fig. 7a, and fig. 7b is a schematic flow chart of a service device applied in a multi-turn dialog scenario, where the method includes the following steps:

in step 701: extracting keywords from the dialog content of the current dialog;

wherein, the user can output the dialog content of the current dialog in a text form. The user may also output the dialog content of the current dialog in the form of speech. The sound collector can collect voice data, and a controller of the intelligent device can obtain conversation contents through voice recognition.

And extracting the keywords by adopting a keyword extraction technology after obtaining the conversation content of the current conversation. In order to accurately extract the keywords and expand the keywords by searching for appropriate knowledge information, the method can be implemented in a keyword extraction stage as follows:

carrying out entity identification on the conversation content to obtain an identified entity as a first keyword; as mentioned above, the entity identification refers to identifying entities with specific meanings in the text, such as proper nouns like names of people, names of organizations, names of places, and meaningful time. If the user dialog content is "Xiaoming vacation in Hawaii". "wherein the named entities may include: "Mingming-name of person", "Hawaii-name of place".

The entity identified in the application refers to an entity having a specific meaning in the text. Mainly comprises a name of a person, a place name, a mechanism name, a proper noun and the like. Entity identification includes at least the following types of methods: rule-based methods, statistical machine learning-based methods, and deep learning-based methods.

In some embodiments, the rule-based method mostly adopts linguistic experts to construct rule templates, selects methods with characteristics including statistical information, punctuation marks, keywords, indicator words, direction words, position words, central words and the like, and takes pattern and character string matching as main means, and most of such systems rely on the establishment of a knowledge base and a dictionary. Rule and dictionary based methods are the earliest used methods in named entity recognition and, in general, they perform better than statistical based methods when the extracted rules reflect linguistic phenomena more accurately.

In some embodiments, a statistical machine learning-based method includes at least: hidden markov models, maximum entropy, support vector machines, conditional random fields, and the like. Among the learning methods, the maximum entropy model has a compact structure and better universality. Generally speaking, the maximum entropy and support vector machine is higher than the hidden Markov model in accuracy, but the hidden Markov model is faster in training and recognition, mainly because the efficiency of solving the named entity class sequence by using the Viterbi algorithm is higher. Hidden markov models are more suitable for applications where real-time requirements are required and where large amounts of text need to be processed, such as information retrieval, e.g. short text named entity recognition.

The statistical-based method has a high requirement on feature selection, and various features which have an influence on the task need to be selected from the text and added into the feature vector. Depending on the major difficulties faced in identifying a particular named entity and the characteristics exhibited, it is contemplated to select a feature set that effectively reflects the characteristics of that type of entity. The method mainly comprises the step of excavating features from a training corpus by counting and analyzing language information contained in the training corpus. The related characteristics can be divided into specific word characteristics, context characteristics, dictionary and part-of-speech characteristics, stop word characteristics, core word characteristics, semantic characteristics and the like. The statistical-based method has a large dependence on the corpus, and a large-scale general corpus which can be used for building and evaluating a named entity recognition system is less.

In some embodiments, a deep learning based approach is the one that has emerged in recent years, and a typical deep learning model such as LSTM-CRF may assist in extracting entity keywords in the form of sequence labels.

In practice, entity recognition can be accomplished primarily using rule-based methods, assisted by deep learning methods. For example, a rule-based method is adopted to extract a key entity A from a text, a deep learning method is adopted to extract a key entity B, and keywords contained in a union set of A and B are used as all first extracted keywords.

In order to further integrate more effective features for retrieval to improve the recall rate of corpus retrieval, after the first keywords are extracted based on the entity identification technology, the following operations can be respectively executed for each first keyword, for example, according to the first keywords, the syntactic analysis is performed on the conversation content, and the second keywords related to the first keywords are obtained.

In practice, keywords having a specified association with the first keyword may be extracted as the second keyword. The association is, for example, a word that can be described in the knowledge-graph as an attribute of the first keyword. For example, a person's name is extracted from the text, and then specified attributes of the person's name, such as occupation, appearance, etc., are extracted from the text.

In implementation, the dialog content can be analyzed by adopting a syntax tree, and the syntax tree is a tree which is organized by adopting a syntax analysis method according to a syntax subgrade for the text content. The syntactic analysis may include at least syntactic structure analysis and dependency analysis. Syntactic structure analysis is used to extract sentence noun phrases, verb phrases, etc. Dependency analysis is different from syntactic analysis, which is mainly used to analyze interdependencies between words in a sentence. Which can be generally represented as (word 1, relationship, word 2) triplets, with word 2 being dependent on word 1, e.g., a predicate structure, etc. Whether syntactic structure analysis or dependency structure analysis can result in a corresponding syntactic tree, and then a corresponding entity can be proposed based on the syntactic tree.

In summary, after the first keyword and the second keyword are extracted, the first keyword and the second keyword can be respectively used as the extracted keywords to perform subsequent processing. As in step 702: acquiring knowledge information with a specified association relation with a keyword from a pre-constructed knowledge graph as an expansion word;

in the embodiment of the present application, several following implementation manners may be provided to extract knowledge information as an expansion word:

in some embodiments, a simple implementation is to obtain the specified attribute information of the keyword from the knowledge graph as an extended word.

As shown in fig. 8, is an association relationship between different entities. Xiaoming corresponds to a name entity, and its attributes may include occupation, movie and television works, wife, etc. The specified attribute can be determined according to the dialog content of the current round of dialog, for example, in each text of 'small-minded skill is good', the attribute that 'skill' is 'small-minded' can be obtained through syntactic analysis, and the 'skill' is associated with profession in the knowledge graph. Therefore, the keyword "skill" can be obtained by syntactic analysis, and the attribute "actor" can be further obtained based on the knowledge map.

In practice, other attributes, such as movie names, can also be extended, so that the chat content can be actively extended to guide the chat to continue. For example, the user says that the "clever demotion" is good based on the expanded movie name, and can recover the "curvy presentation in a certain movie A and the modeled Huo image is deep in mind". The 'movie A' in the reply content is the expanded content, and the dialogue of the user can be naturally replied, so that the chat is natural.

In other embodiments, to enable fusing of chat history, the retrieved corpora are made to conform to the current chat scenario. In the application, the keywords can be further expanded according to the context information of the current round of conversation. As shown in fig. 9, for each keyword extracted from the dialog content of the current round of dialog, the following steps may be included, it needs to be described that adding a sequence number to each step in this embodiment is only used for exemplarily describing the embodiment of the present application, and is not used for limiting the execution sequence of each step, and when the method is specifically implemented, the execution sequence of the steps may be changed according to actual needs, for example, in one embodiment: in step 901: acquiring a target entity corresponding to the keyword from the knowledge graph;

for example "Xiaoming" corresponds to a person name entity.

Step 902: acquiring attributes of a target entity from a knowledge graph;

step 903: associating the target entity and its attributes into a dialog knowledge graph of the historical dialog;

for example, for each of multiple rounds of conversations, a target entity and an entity attribute corresponding to a keyword are respectively extracted. Therefore, the background knowledge of the current conversation in the knowledge graph can be extracted according to the knowledge graph with rich contents. In implementation, the dialog knowledge graph of the historical dialog can be established based on the correlation among the background knowledge of all the dialog rounds, so as to describe the background knowledge of the dialog rounds.

Assuming that the last dialog refers to "drama B", the corresponding attribute information of the drama B, such as xiaolan and xiaojian, can be extracted from the knowledge graph. The 'Xiaoming' skill of the dialog is good, and the relevant knowledge information of the 'Xiaoming' is further extracted from the knowledge graph. And "xiaoming" has just participated in "drama B", a dialog knowledge map extractable based on these two dialogs is shown in fig. 10.

Step 904: and extracting keywords as expansion words in a random walk mode in the dialogue knowledge graph.

Continuing with the example shown in fig. 10, since the tv set B is involved in the history dialog, the "tv series B" may be used as an extension word, and further, an entity may be supplemented in the attribute of the "tv series B" as an extension word. For example, to supplement the cooperative actor "Xiaolan" as an augmented word.

In this embodiment, the history information and semantics can be combined to find the augmented words. If the intention of the current round of dialogue is to evaluate a small and clear skill, the dialogue can be compared and analyzed with the cooperative actors, so that the cymbidium can be positioned as an expansion word.

By the expansion based on the historical knowledge, the expansion words which accord with the current chat content can be conveniently searched subsequently.

After obtaining the keywords and the augmented words, in step 703: searching a candidate corpus set from a corpus by using the keywords and the expansion words as detection keywords;

in order to accurately search the corpus, the corpus candidate set may be searched from the corpus with the keyword as a search requirement and the expanded word as a search supplement. For example, the "Xiaoming" extracted in the current round of dialogue is taken as a necessary condition, and the expansion word "Xiaolan" is taken as a supplementary condition, and the search is carried out so as to roughly screen out the corpus candidate set.

In implementation, the search term set may be formed by all the keywords and the expansion terms. And constructing the weight information of each search term in the search term set. The weight TF-IDF of each term is calculated as according to the following formula (1):

word frequency (TF) — the number of occurrences of a word in an article/the total number of words in the article;

inverse Document Frequency (IDF) log (total number of documents in corpus/number of documents containing the same number + 1); (1)

TF-IDF＝TF＊IDF；

the article in formula (1) may be the text content of the current dialog (user query) in the embodiment of the present application.

And constructing a retrieval vector by all the retrieved weights, and calculating the similarity between the retrieval vector and each corpus in the corpus. In practice, the similarity index may be cosine similarity, assuming that a and B are two n-dimensional vectors, a is [ a1, a 2., An ], B is [ B1, B2., Bn ], and the cosine similarity formula of a and B is:

then, some corpora can be roughly screened out to form a corpus candidate based on the similarity.

In step 704: and screening the reply information aiming at the current dialogue from the candidate corpus set by adopting multiple rounds of dialogues of the current dialogue.

In order to improve the accuracy of searching the reply corpus, in the application, a deep learning matching model can be adopted to analyze and process multiple rounds of conversations of a current conversation to obtain the matching degree of each corpus in a candidate corpus set and the current conversation; and then, selecting the corpus with the highest matching degree as the reply information of the current conversation.

As shown in fig. 11, a schematic structural diagram of a deep learning matching model is shown, where the model includes: a word representation layer (word presentation layer), a sentence encoding layer (sensor encoding layer), a matching layer (matching layer), a fusion layer (Aggregation layer), and a prediction layer (prediction layer); wherein:

word representation layer: for representing each word of the input sentence as a pre-trained word vector. The goal of this layer is to convert the input sequences P (dialog) and Q (reply) into a d-dimensional vector representation consisting of two parts, a normal word vector and a vector consisting of characters whose values are the result of inputting all the characters into the LSTM. This layer is actually the vectorized representation embedding of the text.

Sentence coding layer: and inputting the vector representation of the statement into a multi-layer RNN encoder, and performing weighted summation on the output of each layer as the encoding characteristic of the statement. A two-tower model may be employed, with the encoders of the two-tower model sharing weights for the query and the candidate reply, respectively. If the context information of P and Q is extracted, the embedding taken from the previous layer is input into a Bi-directional Long Short-Term Memory (Bi-directional Long Short-Term Memory, forward LSTM and backward LSTM combined model), and content encoding (context embedding) at two different directions and different moments is obtained respectively.

Matching layer: this layer mainly performs information interaction between the two through an attention mechanism. The aim of the layer is to compare the context of different moments of each sequence with the context of all moments of another sequence, and considering two directions, a multi-productive matching method can be adopted to obtain fine-grained contact information of two sentences. And calculating and splicing attention vectors from the query to the candidate reply and from the candidate reply to the query, wherein matching information between the query and the reply is fully contained.

A fusion layer: the purpose of this layer is to gather the results of the previous layer together and merge them into a vector of fixed length, which can still be implemented using the BilSTM model. During implementation, RNN and posing operations are used for respectively fusing the query and the replied coding vector and the attention vector, and then the query and the replied coding vector are spliced to obtain a final matching feature vector m.

Prediction layer: the layer can comprise two full connection layers and one softmax layer or sigmoid layer, and can be set by a business scene. The function of the layer is to input the obtained matching feature vector m into a fully connected neural network classifier to predict whether a query and a reply sentence are matched, and the output result is score to represent the matching degree. During training, whether the output result is consistent with the label can be evaluated by adopting a supervised learning mode.

After the model is trained, the matching degree of all candidate replies and the query of the user can be calculated, and one of the candidate replies with the highest matching score is selected as a final reply.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A service device for multiple rounds of dialogue, wherein the intelligent device comprises an input interface and a processor: the processor is configured for;

extracting keywords from the dialog content of the current dialog;

2. The service device of claim 1, wherein in performing the extracting of the keywords from the dialog content of the current dialog, the processor is configured to:

wherein the keywords comprise the first keyword and the second keyword.

3. The service apparatus according to claim 1, wherein, when performing the acquiring of the knowledge information having the specified association with the keyword from the pre-constructed knowledge graph as the augmented word, the processor is configured to:

obtaining attributes of the target entity from the knowledge graph;

4. The service apparatus according to claim 1, wherein, when performing the acquiring of the knowledge information having the specified association with the keyword from the pre-constructed knowledge graph as the augmented word, the processor is configured to:

5. The service apparatus according to claim 1, wherein when performing the search of the corpus candidate corpus using the keyword and the augmented word as search conditions, the processor is configured to:

6. The smart device of any of claims 1-5 wherein the processor is positioned to perform the multiple rounds of conversations using the current conversation and to filter out reply messages to the current conversation from the corpus candidate set:

7. A method of multi-turn dialog, the method comprising;

extracting keywords from the dialog content of the current dialog;

acquiring the knowledge information with the appointed incidence relation with the key words from a pre-constructed knowledge graph as an expansion word;

8. The method of claim 7, wherein extracting keywords from the dialog content of the current dialog comprises:

wherein the keywords comprise the first keyword and the second keyword.

9. The method according to claim 7, wherein the acquiring the knowledge information having the specified association relation with the keyword from the pre-constructed knowledge graph as an augmented word comprises:

obtaining attributes of the target entity from the knowledge graph;

10. The method according to any one of claims 7-9, wherein said retrieving a corpus candidate corpus set from a corpus using said keyword and said augmented word as retrieval conditions comprises: