CN109189897B

CN109189897B - Chatting method and chatting device based on data content matching

Info

Publication number: CN109189897B
Application number: CN201810846861.XA
Authority: CN
Inventors: 曲源
Original assignee: Sibyl Shanghai Intelligent Technology Co ltd
Current assignee: Sibyl Shanghai Intelligent Technology Co ltd
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2020-07-31
Anticipated expiration: 2038-07-27
Also published as: CN109189897A

Abstract

The application discloses a chatting method and a chatting device based on data content matching, which are applied to a robot, and the method comprises the following steps: a receiving step: receiving user input content; the processing steps are as follows: processing the user input content to obtain data content; matching: matching answer information corresponding to the data content; an output step: outputting the answer information corresponding to the data content; and a storage step: and storing the answer information. According to the chatting method, when chatting with the robot, the user inputs content, the robot carries out data content processing on the user input content, answer information corresponding to the data content is matched, the answer information is output and stored, and the chatting method can be used for quickly constructing chatting aiming at a specific field without the process of customizing Intent in detail. The method is convenient and quick, and can improve the chat efficiency and the user experience.

Description

Chatting method and chatting device based on data content matching

Technical Field

The application relates to the technical field of robot intelligent interaction, in particular to a chatting method and a chatting device based on data content matching.

Background

With the development of computer and network technologies, the robot chat system has been applied to various fields in daily work, life and learning, and in recent years, the appearance and popularization of various intelligent hardware and Instant Messaging (IM) software greatly expand the application field of the robot chat system.

The related technical scheme is as follows: a robot chat system in a closed domain is constructed aiming at a specific field and comprises a mapping relation (Intent) of language and behavior, a set (Entity) of parameters in input content and a Context (Context). The system uses voice or character input, and maps input content into behavior through Intent according to the Entity and the Context in the system to obtain the reply of the input content.

When a robot chat system in a specific field is constructed, it is necessary to define Intent item by item and set Context and Entity thereof, which requires a high user demand and a long construction time. In addition, significant delays may be experienced during use.

Disclosure of Invention

The technical problem to be solved is as follows:

the invention aims to provide a chatting method based on data content matching, which solves the problems of high requirement on a user and long construction time.

The technical scheme is as follows:

in order to solve the technical problem, the present application provides a chat method based on data content matching, which is applied to a robot, and the method includes:

a receiving step: receiving user input content;

the processing steps are as follows: processing the user input content to obtain data content;

matching: matching answer information corresponding to the data content;

an output step: outputting the answer information corresponding to the data content; and

a storage step: and storing the answer information.

According to the chatting method, when chatting with the robot, the user inputs content, the robot carries out data content processing on the user input content, answer information corresponding to the data content is matched, the answer information is output and stored, and the chatting method can be used for quickly constructing chatting aiming at a specific field without the process of customizing Intent in detail. The method is convenient and quick, and can improve the chat efficiency and the user experience.

In one embodiment, optionally, the user input content is text or speech or emoticons.

In one embodiment, optionally, the processing step includes:

a conversion step: converting the user input content into a character string;

unifying: unifying the character strings into the same character string data format;

an acquisition step: acquiring a word and part-of-speech collection of the character string data;

and (3) filtering: performing word filtration, part-of-speech filtration and blacklist filtration on the collection to obtain an effective i item collection; and

a generation step: and generating a matching object of the valid i item set.

In one embodiment, optionally, the matching step comprises:

traversing: initiating a request to a storage module according to the matching object, traversing the returned content, and acquiring a learning record set corresponding to the specified field;

grading: scoring each item in the set of learning records, the scoring comprising a base score, a score, and a strike score;

the effective content words and part of speech items of a certain learning record are j sets, and the content and part of speech matching items of the certain learning record are k records;

the basic score is synthesized by dividing the sum of the part-of-speech benchmark scores respectively corresponding to the k items of records by the part-of-speech benchmark scores respectively corresponding to the i item set, and the value range is 0-1;

the score is synthesized by dividing the sum of the part-of-speech benchmark scores respectively corresponding to the k items of records by the part-of-speech benchmark scores respectively corresponding to the j item set, and the value range is 0-1;

the continuous click is related to the offset of the k item records in the j item set, if the offset difference value of the next item record and the previous item record in the continuous n item records is 1, the continuous click is divided into n-1, and the value range is 0-5;

each item in the learning record set obtains three values of basic score, proportion score and continuous hit score; aiming at different specific fields, the total score obtained by each item in the learning record set is composed of a value corresponding to the basic score multiplied by the basic score of the field, an occupation ratio score multiplied by the occupation ratio score of the field and a continuous hit score, and the sum of the three items is 0-1;

a sorting step: queue sorting is carried out on the scored multiple learning record sets, and the queue order is according to the descending order of the total score of the queue;

a calculation step: calculating the significance of the queue sequence, wherein the average value of the total scores of n items before the learning record is divided is firstly calculated, wherein n is determined by the characteristics of a specific field, the significance is the value of the normalized total score of the learning record divided by the average value of the total scores, and the value range > is 0;

a judging step: and according to the queue sorting result, taking the queue head item to judge a threshold value, wherein the threshold value is determined by the characteristics of a specific field, the values needing to be judged mainly comprise a total score and a significance, the total score and the significance are both greater than the threshold value, the learning record is output, and otherwise, a matching failure mark is output and a default reply is returned.

In one embodiment, optionally, before the converting step, the method further comprises a loading step of: and loading a word bank, wherein the word bank comprises a conventional word bank, a customized word bank aiming at a specific field and a part of speech table.

In one embodiment, optionally, before the receiving step, the method further comprises a aggregating step of: and aggregating the user input content and the answer information.

In one embodiment, optionally, after the outputting step, the method further comprises:

a screening step: screening the answer information;

identification: and labeling the user input content and the answer information corresponding to the user input content, and storing the labels in a history record.

According to another aspect of the present application, there is also provided a chat device for a robot, the device including:

a receiving module: configured to receive user input content;

a processing module: configured to process the user input content resulting in data content;

a matching module: configured to match answer information corresponding to the data content;

an output module: configured to output the answer information corresponding to the data content; and

a storage module: configured to store the answer information;

wherein, the receiving module is connected with a collection module: configured to aggregate the user input content and the answer information;

the output module is connected with a screening module: configured to filter the answer information;

the screening module is connected with an identification module: a tag configured to identify the user input content and the answer information corresponding to the user input content, and to store the tag in a history;

the receiving module is connected with a gathering module: is configured to aggregate the user input content and the answer information.

The chat device enables the user to chat with the robot without customizing Intent in detail, can quickly build chat in a specific field, is convenient and quick, can improve chat efficiency, and improves user experience.

In one embodiment, optionally, the processing module includes:

a conversion module: configured to translate the user input content into a string of characters;

a unification module: configured to unify the character strings into a same character string data format;

an acquisition module: configured to obtain a collection of words and parts of speech of the character string data;

a filtering module: is configured to perform word filtering, part-of-speech filtering, and blacklist filtering on the collection to obtain a valid i-item collection;

a generation module: a matching object configured to generate the set of valid i-items;

the conversion module is connected with a loading module: is configured to load a thesaurus comprising a conventional thesaurus, a customized thesaurus for a specific domain, and a part of speech table.

In one embodiment, optionally, the matching module comprises:

a traversing module: the device is configured to initiate a request to the storage module according to the matching object, traverse the returned content and acquire a learning record set corresponding to the specified field;

a scoring module: is configured to score each item in the set of learning records, the score comprising a base score, an occupancy score, and a strike score;

a sorting module: configured to queue the scored plurality of sets of learning records, the queue order being in descending order of their total score;

a calculation module: is configured to calculate the significance of the queue ordering, wherein, firstly, a total score average value of n items before dividing the learning record is calculated, wherein, the n is determined by the characteristics of a specific field, the significance is the value of the normalized total score of the learning record divided by the total score average value, and the value range > is 0;

a judging module: and the device is configured to take the queue head item to judge the threshold value according to the queue sorting result, the threshold value is determined by the characteristics of a specific field, the values needing to be judged mainly comprise the total score and the significance, the learning record is output if the total score and the significance are both greater than the threshold value, and otherwise, a matching failure mark is output and a default reply is returned.

According to another aspect of the present application, there is also provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements one of the above-mentioned methods when executing the computer program.

According to another aspect of the application, there is also provided a computer-readable storage medium, preferably a non-volatile readable storage medium, having stored therein a computer program which, when executed by a processor, implements one of the above-described methods.

According to another aspect of the present application, there is also provided a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform one of the methods described above.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow diagram of a chat method based on data content matching according to one embodiment of the application;

FIG. 2 is a flow chart of the processing steps of a chat method based on data content matching according to one embodiment of the present application;

FIG. 3 is a flow chart of the matching steps of a chat method based on data content matching according to an embodiment of the present application;

FIG. 4 is a block diagram of a chat facility based on data content matching according to an embodiment of the present application;

FIG. 5 is a block diagram of the processing modules of a chat facility based on data content matching according to an embodiment of the application;

FIG. 6 is a block diagram of a matching module of a chat facility based on data content matching according to an embodiment of the present application;

figure 7 is an overall workflow diagram of a chat method based on data content matching according to one embodiment of the present application,

FIG. 8 is a first data processing module workflow diagram of a chat method based on data content matching according to an embodiment of the present application

Fig. 9 is a flowchart of the operation of a data matching module of a chat method based on data content matching according to an embodiment of the present application.

Detailed Description

The present application will now be described in further detail by way of specific examples with reference to the accompanying drawings. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Throughout the description of the present application, it is to be noted that, unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, as if they were fixed or removable or integrally connected, for example; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. There may be communication between the interiors of the two elements. "first", "second", "third" and "fourth" do not denote any sequence relationship, but are merely used for convenience of description. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

The application provides a chatting method based on data content matching, which has the problems of high requirements on users and long construction time in the conventional chatting method.

Hereinafter, a product and the like will be described in detail through a basic design, an extended design, and an alternative design.

Fig. 1 is a flow chart of a chat method based on data content matching according to an embodiment of the present application. The chat method may generally include the steps of:

a receiving step: receiving user input content;

matching: matching answer information corresponding to the data content;

a storage step: and storing the answer information.

The chat method may be implemented by an application (app), which is in a robot. When the method is used, when a user chats with the robot, the user inputs content, the robot processes the data content of the user input content, matches answer information corresponding to the data content, outputs the answer information and stores the answer information, and the chat aiming at a specific field can be quickly constructed without the process of customizing Intent in detail. The method is convenient and quick, and can improve the chat efficiency and the user experience.

In an alternative embodiment, the user input content may be text or speech or emoticons or other common content. Wherein the other common content may be an action or the like.

In an alternative embodiment, before said receiving step, the method further comprises the step of aggregating: and aggregating the user input content and the answer information. The content frequently input by the user can be prestored, and the answer aiming at the content is also prestored;

for example: the user says "what name you call" and the robot looks up the corresponding answer in the pre-stored set to answer "you are, i call the defender".

In an alternative embodiment, after the outputting step, the method further comprises: a screening step: screening the answer information;

For example: the user enters content as "hello", label as "1", the user enters content as "what name you call", label as "2", and so on.

Fig. 2 is a flow chart of processing steps of a chat method based on data content matching according to an embodiment of the present application. In an alternative embodiment, the treating step comprises:

a conversion step: converting the user input content into a character string; the character string usually takes the whole string as an operation object, such as: searching a certain substring in the string, obtaining a substring, inserting a substring at a certain position of the string, deleting a substring and the like. The following are the following requirements for the two strings being equal: the lengths are equal and the characters at the respective corresponding positions are equal. Let p and q be two strings, and the operation of finding the position where q appears first in p is called pattern matching. The two most basic storage modes for strings are sequential storage and linked storage.

a generation step: and generating a matching object of the valid i item set.

Fig. 3 is a flow chart of the matching steps of a chat method based on data content matching according to an embodiment of the present application. In an alternative embodiment, the matching step comprises:

Fig. 4 is a block diagram of a chat apparatus based on data content matching according to an embodiment of the present application. The chat apparatus may generally include:

a receiving module: configured to receive user input content;

a storage module: configured to store the answer information;

The chat device may be implemented by an application (app), which is implemented in the robot. When the method is used, when a user chats with the robot, the user inputs content, the robot processes the data content of the user input content, matches answer information corresponding to the data content, outputs the answer information and stores the answer information, and the chat aiming at a specific field can be quickly constructed without the process of customizing Intent in detail. The method is convenient and quick, and can improve the chat efficiency and the user experience.

Fig. 5 is a block diagram of a processing module of a chat apparatus based on data content matching according to an embodiment of the present application. In one embodiment, optionally, the processing module includes:

a conversion module: configured to translate the user input content into a string of characters; the character string usually takes the whole string as an operation object, such as: searching a certain substring in the string, obtaining a substring, inserting a substring at a certain position of the string, deleting a substring and the like. The following are the following requirements for the two strings being equal: the lengths are equal and the characters at the respective corresponding positions are equal. Let p and q be two strings, and the operation of finding the position where q appears first in p is called pattern matching. The two most basic storage modes for strings are sequential storage and linked storage.

Fig. 6 is a block diagram illustrating a matching module of a chat device based on data content matching according to an embodiment of the present application. In one embodiment, optionally, the matching module comprises:

Fig. 7 is an overall workflow diagram of a chat method based on data content matching according to an embodiment of the present application, in which:

1) a system training process:

the training set of the specific field, which is generally a set of questions and answers, is acquired through the first data processing module, and is stored in the data storage module.

2) The overall work flow of the system is as follows:

when the user inputs, the system sends the user input into the first data processing module to obtain the processable data content. The user input content may be text, speech, emoticons, or other common content; and carrying out answer matching on the processed data content, carrying out cooperative work by the data matching module and the second data storage module in the process, and outputting a reply. After the above processes are completed, the records are stored in the history records, and the records are stored in the data storage module after being processed by the second data processing module, so that the continuous learning of the system is realized.

Fig. 8 is a flowchart of a first data processing module of a chat method based on data content matching according to an embodiment of the present application, in which:

3) the first data processing module workflow:

when the input content enters the first data processing module, the input content is firstly subjected to character serialization, and different types of input content, such as texts, voices, expressions and the like, are unified into the same character string data format. And sending the processed character strings into a word segmentation module, wherein the word segmentation module needs to load a word bank in advance, and the word bank comprises a conventional word bank (namely common words), a customized word bank aiming at a specific field and a part of speech table thereof. The word segmentation module outputs a set of words and parts of speech of the user input content. The output content of the word segmentation module is subjected to part-of-speech filtering, word filtering and blacklist filtering respectively, the filtered content is used as the output content of the first data processing module, and the output content is a set of i effective content words and parts-of-speech.

Fig. 9 is a flowchart of the work of a data matching module of a chat method based on data content matching according to an embodiment of the present application, in which:

4) answer matching process workflow:

the answer matching process requires the second data matching module to work simultaneously with the data storage module. The output content of the second data processing module firstly enters a traversal module in the first data matching module, the traversal module initiates a request to the data storage module, and a learning record set corresponding to the designated field is acquired one by one.

And the learning record set enters a grading module to perform three types of score matching, namely a basic score, an occupation score and a continuous hit score. And matching and scoring the user input content processed by the first data processing module and each item in the learning record set. The effective content word and part of speech item of a certain learning record are j, and the processed input content of the user, a certain learning record content and part of speech matching item are k.

The basic score is synthesized by dividing the sum of the part-of-speech benchmark scores respectively corresponding to the k records by the part-of-speech benchmark scores respectively corresponding to the i sets, and the value range is 0-1.

The score is integrated by dividing the sum of the part-of-speech benchmark scores respectively corresponding to the k items of records by the part-of-speech benchmark scores respectively corresponding to the j item sets, and the value range is 0-1.

The continuous hit is related to the offset of the k records in the j item set, if the offset difference between the next record and the previous record in the continuous n records is 1, the continuous hit is divided into n-1, and the value range is 0-5.

After the scoring module acts on each item in the learning record set, each item of record obtains three values of basic score, proportion score and continuous hit score. Aiming at different specific fields, the total score obtained by each record is composed of the corresponding values of the basic score multiplied by the basic score base number of the field, the proportion score multiplied by the proportion score base number of the field and the continuous hit score, and the value range is 0-1.

And (4) carrying out queue sorting on the scored learning record sets, wherein the queue order is in a descending order according to the total score. And calculating the significance of each item in the scored learning record set. First, the total score average of the learning records of n items before division is calculated, wherein n is determined by the characteristics of a specific field. The significance is the value obtained by dividing the normalized total score of the learning record by the average value of the total score, and the value range > is 0.

And according to the queue sorting result, taking the queue head item to carry out threshold judgment. The threshold value is determined by the characteristics of a specific field, the values needing to be judged are mainly total points and significance, the learning record is output if the total points and the significance are greater than the threshold value, and otherwise, a matching failure mark and a default reply are output.

First, the key point of this application:

1) the method and the system avoid the process that the duration of the general robot chat system is built in the specific field and Intent needs to be customized in detail, and can be used for quickly building the chat robot system aiming at the specific field.

2) The application uses the solution of normalization in the process of calculating the total score through basic score, percentage score and continuous attack score, can clearly quantify the performance condition of the system aiming at a specific field, and is convenient for optimizing the field.

3) According to the method and the device, the significance degree of the part of output content is calculated in the answer matching process, and the output content is ensured to have high matching performance and characteristic with the content input by the user in the learning content.

Secondly, the technical effects of the application are as follows: the technical scheme of the application can be applied to the robot chat system in the specific field, and corresponding output content can be matched quickly through user input content. The scheme is applied, and has the following characteristics:

1) by using the system, a set of robot chat system can be quickly constructed, a user does not need to define Intent item by item, and the system can be trained only by providing a training content set corresponding to the question and the answer, so that the time for preparation before development and use is saved;

2) the system can clearly and numerically express the effect on a specific field, and the generated data report can be simply optimized aiming at the field;

3) the system calculates the significance of the content output by the module, and can effectively match the output content with high matching degree with the input content of the user under the conditions of more training sets and approximate training problems.

4) The system can run at the local and remote ends of the intelligent equipment, can work normally under the condition of no network, and has high response speed;

5) the system can be used as a pre-filter of other robot chat systems, and as a specific solution aiming at a specific field, the system has good compatibility with other similar systems.

According to another aspect of the present application, there is also provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements any of the methods described above when executing the computer program.

According to another aspect of the application, there is also provided a computer-readable storage medium, preferably a non-volatile readable storage medium, having stored therein a computer program which, when executed by a processor, implements any of the methods described above.

According to another aspect of the application, there is also provided a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform any of the methods described above.

The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., from one website site, computer, server, or data center, via wired (e.g., coaxial cable, optical fiber, digital subscriber line (DS L)) or wireless (e.g., infrared, wireless, microwave, etc.) means to another website site, computer, server, or data center.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A chatting method based on data content matching is applied to a robot and comprises the following steps:

a receiving step: receiving user input content;

matching: matching answer information corresponding to the data content;

the matching step comprises:

traversing: initiating a request to a storage module according to the data content, traversing the returned content, and acquiring a learning record set corresponding to the specified field;

the basic score is synthesized by dividing the sum of the part-of-speech benchmark scores respectively corresponding to the k items of records by the part-of-speech benchmark scores respectively corresponding to the j item sets, and the value range is 0-1;

a sorting step: performing queue sequencing on the scored multiple learning record sets, and descending according to the total score of the learning record sets;

a judging step: according to the result of queue sorting, taking a queue head item to carry out threshold value judgment, wherein the threshold value is determined by the characteristics of a specific field, the values needing to be judged mainly comprise a total score and a significance degree, the total score and the significance degree are both greater than the threshold value, the learning record is output, and otherwise, a matching failure mark is output and a default reply is returned;

a storage step: and storing the answer information.

2. The method of claim 1, wherein the user input content is text or speech or an expression.

3. The method of claim 1, wherein the processing step comprises:

a conversion step: converting the user input content into a character string;

a generation step: and generating a matching object of the valid i item set.

4. The method of claim 3, wherein prior to the converting step, the method further comprises a loading step of: and loading a word bank, wherein the word bank comprises a conventional word bank, a customized word bank aiming at a specific field and a part of speech table.

5. The method of claim 1, wherein prior to the receiving step, the method further comprises the step of aggregating: and aggregating the user input content and the answer information.

6. The method of claim 1, wherein after the outputting step, the method further comprises:

a screening step: screening the answer information;

7. A chat apparatus for use with a robot, the apparatus comprising:

a receiving module: configured to receive user input content;

the matching module includes:

a traversing module: the system is configured to initiate a request to a storage module according to the data content, traverse the return content and acquire a learning record set corresponding to the specified field;

a sorting module: is configured to queue the scored plurality of sets of learning records in descending order according to their overall score;

a judging module: the device is configured to take the queue head item to carry out threshold value judgment according to the queue sorting result, the threshold value is determined by the characteristics of a specific field, the values needing to be judged mainly comprise a total score and a significance degree, the total score and the significance degree are both greater than the threshold value, the learning record is output, and otherwise, a matching failure mark is output and a default reply is returned;

a storage module: configured to store the answer information;

8. The apparatus of claim 7, wherein the processing module comprises: