WO2019098409A1

WO2019098409A1 - Machine learning based data adding device for chatbot

Info

Publication number: WO2019098409A1
Application number: PCT/KR2017/012984
Authority: WO
Inventors: 유승재; 심병학
Original assignee: (주)페르소나시스템
Priority date: 2017-11-15
Filing date: 2017-11-16
Publication date: 2019-05-23
Also published as: KR20190055425A; KR102033175B1

Abstract

The present invention relates to a device for adding data required for driving a chatbot by machine learning. More particularly, the present invention relates to a device for autonomously generating and increasing the amount of question data as part of a method for increasing the recognition rate of the chatbot with respect to an input user question.

Description

Machine learning based chat bot data adding device

The present invention relates to an apparatus for adding data required for driving a chatbot by machine learning. More particularly, the present invention relates to a device for self-generating and increasing the amount of query data as part of a method for increasing the recognition rate of a chatbot with respect to an incoming user query.

Recently, the use of artificial intelligence (AI) has been increasing in various industries. Chatbot is an example of the use of artificial intelligence, and chatbots are being used in a variety of industries including social media, financial companies, and media companies. Chatbot is an artificial intelligence (AI) -based program that provides a variety of information and solutions using text-based dialogue format and is implemented in messaging apps. As a result, companies have provided chatting consulting services using chatbots instead of the existing ARS-based consulting services. In the case of consulting services using chatbots, there is an advantage in terms of labor cost reduction as well as a high satisfaction from customers who are burdened with ARS-type counseling services. These chatbots are worth not only the consulting services but also the effective e-commerce and marketing channels.

On the other hand, the most important thing in operating such a chatbot service is how to accurately recognize the question and to find the corresponding answer. At this time, the amount of query data stored in the server operating the chatbot is absolutely influential in making accurate judgment of the query inputted from the user by the chatbot.

On the other hand, inquiries received on the conventional chatbot input method or the homepage are based on the natural language of the sentence form rather than a specific word combination. Thus, if the number of cases in which research and usage of verbs are varied, it is possible to express a wide variety of questions even if the questions have the same meaning. As a result, the amount of data required to clearly recognize a specific sentence becomes very large. However, the main way to increase the amount of question data was manually entering FAQ information manually. Accordingly, the conventional method is costly and time-consuming, and there is a problem that the amount of the question data is limited and the amount of the question data still required is not satisfied.

On the other hand, patents related to this chatbot are disclosed in Korean Patent Publication No. 10-2017-0103586 (artificial intelligence learning method and system using messenger service, and answer relay method and system using artificial intelligence).

The present invention can generate the question data required for raising the question recognition rate of the chatbot based on the machine learning. In addition, the present invention generates a question based on various data collected from an external server, so that not only a standard word but also a non-standard word type query can be generated.

The machine learning-based chatbot data adding apparatus according to an embodiment of the present invention includes a storage unit for storing an additional question generated from an analysis question and storing dictionary data, rule data and crawl data required for generating a further question, And a control unit for generating an additional question based on the machine learning from the analysis question, wherein the control unit is configured to execute an analysis query, And generates an additional query having the same meaning or expression as the analysis query based on at least one of data of dictionary data, grammar data and crawl data, and determines the similarity between the generated additional query and the analysis query If the similarity is within a predetermined range, A data management unit for storing in a storage unit and a chatbot for analyzing a user question input to the chatbot and performing analysis and response matching based on the question data including the generated additional question with respect to a user query to be analyzed, And a providing unit.

The present invention enables a server operator or a chatbot producing client to quickly generate and store additional questions having a meaning similar to a query as a reference even if the client does not input question data directly. Further, the present invention can generate additional questions using data crawled from web sites such as SNS as well as dictionary data, so that it can support not only standard words but also non-standard word queries.

FIG. 1 is a configuration diagram illustrating a network configuration performed by a chatbot device according to an embodiment of the present invention.

2 is a block diagram illustrating a configuration of a chatbot server according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating an operation performed by a configuration of a chatbot server according to an embodiment of the present invention.

4 is a flowchart showing a procedure of an additional question generation operation based on dictionary data according to an embodiment of the present invention.

FIG. 5 is a flowchart illustrating a procedure of an additional query generation operation based on crawl data according to an embodiment of the present invention.

6 is a diagram illustrating an operation of collecting question data from an external server according to an embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail.

It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Also, in this specification, a device may be a general device (or object) connected to a gateway and applied to the Internet of Things (IoT). For example, the device may be a wireless pager, a smart phone, a tablet PC, a computer, a temperature sensor, a humidity sensor, a sound sensor, a motion sensor, a proximity sensor, a gas sensor, a heat sensor, a refrigerator, , A lamp, a fire alarm, and the like. However, it is not limited thereto.

Also, devices herein may be interchanged with "devices" or "devices", and "devices", "devices" and "devices" may be described in the same language.

Also, in this specification, a service may include various services that can be performed in a device. A service may include a service based on communication with a server or other device, and a service operable in the device. It is desirable that the service applied to the present disclosure be understood in broad terms to include various services that can be performed in the device in addition to the services described by way of example in this disclosure.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Hereinafter, the same reference numerals will be used for the same constituent elements in the drawings, and redundant explanations for the same constituent elements will be omitted.

Referring to FIG. 1, a chatbox device (chatbot server) 100 according to an embodiment of the present invention can communicate with an external server 200 and an external user device 300, and the chatbox device 100 can communicate with the external server 200 or an external user device 300 The necessary information can be collected.

The chattel device 100 may mean a server that manages chatboxes and an electronic device that supports chatbox operations. Hereinafter, the chatbox device 100 may be named together with the chatbox server.

The external server 200 may be, for example, a business server using a chat service. For example, an online business company, a call center, and a public institution may use a chatbot service. These companies use a chatbot to request a chatbot service to perform the consulting business and sales business can do. Hereinafter, in order to distinguish between these companies and the users requesting consultation with respective companies, it is referred to as a 'chatbox service client' for a company, and a 'chatbox user' for a consultation requestor.

The chatbot service client can associate with the chatbot server 100 to provide counseling information that they have collected themselves. And the client of the chatbot service can transmit to the chatbot server 100 the business information required for the production of the chatbot and the options necessary for the production of the other chatbot. The chat server 100 may periodically update the data required for operation of the chat bot based on the information received from the client.

The function of the server 100 will be described in more detail with reference to FIG.

First, the server 100 refers to an object capable of transmitting and receiving data with at least one other device through a wired or wireless communication environment. According to various embodiments, the server 100 may further include a relay server and / or a client server .

An example of the server 100 is a cloud server, an IP Multimedia Subsystem (IMS) server, a telephony application server, an IM (Instant Messaging) server, a Media Gateway Control Function (MGCF) server, ) Server, and a CSCF (Call Session Control Function) server. The server 100 is an apparatus that refers to an object capable of transmitting and receiving data, such as a PC (Personal Computer), a notebook computer, and a tablet PC .

2, the server 100 may include a communication unit 210, a storage unit 220, and a control unit 230. The storage unit 220 may include question data 221, dictionary data 222, crawl data 223, reply data 224, Data can be stored. The controller 230 can be designed to include a chatbot providing unit 231 and a data managing unit 232 according to an embodiment of the present invention.

The communication unit 210 can use a network for transmitting and receiving data between the client device 120 and the server 110, and the type of the network is not particularly limited. The network may be, for example, an IP (Internet Protocol) network that provides a large capacity data transmission / reception service through an Internet Protocol (IP) or an All IP network that integrates different IP networks. The network may be a mobile communication network including a wired network, a Wibro (Wireless Broadband) network, a mobile communication network including a WCDMA, a High Speed Downlink Packet Access (HSDPA) network and an LTE (Long Term Evolution) network, ), A 5G (Five Generation) mobile communication network, a satellite communication network, and a Wi-Fi network, or a combination of at least one of them.

The communication unit 210 may collect crawl data from the external server 200 according to an embodiment of the present invention.

The various information related to the production of the chatbot includes, for example, business information or personal information of the manufacturing client, usage of the chatbot (for example, consultation, entertainment and game, provision of professional information, provision of contents, shopping and advertisement, (For example, clothing items, household appliances, foods, travel goods, etc.) and items corresponding to selected items (for example, in the case of a counseling chatbot of an apparel shopping company, May be applicable). Also, the communication unit 210 can receive from the client device 120 information on messenger interworking settings (e.g., information on messenger type selection to be interlocked among various messenger candidates).

Also, after the chatbot is completed according to the method requested by the client, the communication unit 210 can receive the text message input in the IM input window when the corresponding chatbot is implemented on the messenger. The communication unit 210 may transmit the response contents output as the result of the input text message to the device driven by the messenger. In addition, the communication unit 210 may communicate with another device or another server to collect information for updating the chatbot after generating the chatbot.

In addition, the communication unit 210 may receive the content and chat information stored in the client device 120 in order to utilize it as a later update data, in accordance with various embodiments of the present invention.

The storage unit 220 may store commands and data required for operation of the chatbox, and may store additional questions generated by the data management unit 232 in the control unit and some data received from the external server and external user equipment. Specifically, the storage unit 220 may store the question data 221, the dictionary data 222, the crawl data 223, the answer data 224, the rule data 225, and the like.

The question data 221 may be updated based on frequent question (FAQ) data received from external servers. In addition, the question data 221 may store additional questions generated by the operation performed by the data management unit 232 according to the embodiment of the present invention. At this time, the question data 221 can be classified and stored according to the type and similarity of each question item. For example, each question stored in the question data 221 can be classified by the same group (cluster), and the coordinate values for the similarity of each question item in each cluster can be stored together. As a method for setting the coordinate value, an artificial intelligence based software engine such as a tensor flow can be used.

The dictionary data 222 may include information of words synonymous with a specific word. The synonym is a concept including a thesaurus, and may include information about similar expressions, antonyms, and the like even if it is not a word.

The crawl data 223 may be scrap information or a list of URLs collected from the web or the like to generate another additional question within the certain similarity range for each question in the server 100. Basically, the crawling data may be a target of a myriad of data. However, according to an embodiment, it is possible to selectively collect data by prioritizing a specific web page or SNS. The crawling data may be, for example, text information scraped from a specific page on the web, and may include data in various formats such as moving images and images. According to various embodiments, the web page from which the crawl data is collected may be determined based on the frequency with which the query previously stored in the storage unit 220 is searched. In addition, the crawling data may be collected from a general search engine, a moving picture and image providing server having a large data providing amount, and the like. In addition, if the key word in the query is a term used mainly in some industries, the crawling data can be collected in a way that preferentially searches the home page of the industry and related web pages.

The answer data 224 is data to be retrieved when an answer to a user question input to the chatbot is output. Since the answer data 224 needs to be matched with a specific question, each answer data item may include information on a question that can be matched. For example, each item of the question and answer data may be data that can be matched to each other, wherein the questions and answers corresponding to the same number (e.g., the relationship between the first and first answers match each other). Also, the answer data 224 may include pre-stored default answer data. The default answer data may include, for example, "question is not clear", "do you want to connect an agent?" And the like.

The answer data can also be updated. The chat agent can directly answer questions that chatbots can not answer. In this case, certain phrases that are frequently answered by the agent by the agent can be stored as frequently used answers. have. To this end, the server 100 may suggest registering as answer data for a specific phrase detected as a frequently used answer. If the client is approved by the chatbot service, the content may be added to the response data. In this way, the answer data can be updated with frequently used answer contents.

The rule data 225 may refer to grammar and morphological analysis related data required to perform natural language processing. The grammatical data may be used to determine grammatical errors of additional questions generated by word substitution by dictionary data and additional questions generated by other methods.

In addition, the unanswered data (not shown) may be data that collects queries for which no corresponding answer is found among the queries of the chatbot user.

In addition, chat history data (not shown) may mean a chat history of a chatbot user and a chatbot, a chatbot user, and a conversation history of an agent. If the chatbot fails to output an appropriate answer to the question of the chatbot user, the agent can directly communicate with the chatbot user (client) in the instant messenger chat window. The storage unit 220 stores the conversation history of the agent and the chatbot user at this time Can be stored. The chat history data of chatbot users and chatbots can be used to detect the most frequently used questions of chatbot users.

In addition to the above-mentioned data, the storage unit 220 may store data in various formats required for chatbot operation and update.

The controller 230 can control signals transmitted from the communication unit 210 of the server 100 and process data required for the overall operation of the server 100. The control unit 230 may include a chatbot providing unit 231 and a data managing unit 232 according to an embodiment of the present invention.

The chatbox providing unit 231 may include a filtering unit 2311, a matching unit 2312, and an answer output unit 2313. The filtering unit 2311 may perform an operation of filtering a user query inputted through a chat app (APP).

The data management unit 232 can perform learning, analysis and update of the chatbot. For example, the data management unit 232 can detect frequently used question lists and answer lists based on the conversation contents performed by the chatbots, and store them in the storage unit 220.

The detailed description of the chatbot providing unit 231 and the data managing unit 232 will be described with reference to FIG.

The chatbot unit 231 and the data management unit 232 shown in FIG. 3 belong to the control unit 230, and are all part of the server 110 including the storage unit 220 shown in the lower part of FIG. The vertical lines and the speech bubbles shown in the left one area means a chat performed by the chatbot user and the chatbot and means that the time progresses from the top to the bottom. Referring to the drawing of the left speech bubble line, the chatbot providing unit 231 performs a filtering and matching process on the inputted question when the chatbot user inputs a specific question, calculates a corresponding answer, and displays the calculated answer on the chat screen of the messenger To the output terminal.

The function of the chatbox providing unit 231 shown in FIG. 3 will be described in more detail as follows. The chatbot providing unit 231 may include a filtering unit 2311, a matching unit 2312, and an answer output unit 2313.

For example, the filtering unit 2311 can classify the user's question into a question that can be immediately answered according to a predetermined criterion, a question requiring analysis, and the like. For example, the filtering unit 2311 can determine whether the input query is a question about security, a question involving a profanity, an inappropriate word for a query, or an inappropriate query for an answer. The criterion for the query term may be established based on the stored data in the storage unit 220 of the server.

For example, the filtering unit 2311 may classify the query into a questionable answer (profanity) if the word corresponding to the previously stored profanity data is included in the user query. The filtering unit 2311 can immediately transmit a signal through the answer output unit 2313 to output an answer corresponding to the profanity question (e.g., not a correct question).

In addition, the filtering unit 2311 can immediately output an answer through the answer output unit 2313 without a separate question analysis, when a question existing in the pre-stored question data 221 is inputted.

On the other hand, when the filtering unit 2311 determines that the input query value analysis is a required query, the filtering unit 2311 can control the matching unit 2312 to process the query. The matching unit 2312 may perform processing for a user query that is not stored in the question data 221. [ In this case, a process of analyzing the inputted user question may be required. As the analysis method, there are a method of performing natural language processing by morphological analysis (system rule) and a method of using AI (AI rule) You can use one method to analyze the question. If the filtering unit 2311 corresponds to the function of filtering the input of the emotional region, the matching unit may correspond to the matching of the input of the expert region. Since the matching unit 2312 can use the web during question analysis, the matching unit 2312 can support multilingual translation using a web-based translation system. Since the question analysis method is mostly performed in the chatbot using AI, a detailed description will be omitted.

After the question is analyzed, the answer output unit 2313 performs an operation of matching the answer to the question, and the calculated answer content can be displayed on the messenger chat screen. In this way, the chatbot user can keep various conversations with the chatbot, and the chatbot can record the conversation history with the chatbot user in the storage unit 220. [ At this time, the conversation contents recorded in the storage unit 220 by the server 100 may be extracted and recorded as the question data and the answer data, and the server 100 may separately store the questions not answered by the chatbot.

The data management unit 232 may collect the contents of the conversation performed when the chatbot and the chatbot generated by the chatbot user are in the watching mode, and may separately store the contents of the conversation in the storage unit 220. [ Also, the data management unit 232 may control to separately store the question in the case where the chatbot fails to answer, as the unanswered data. The data management unit 232 can control the machine learning based on the data stored in the storage unit 220 and the data downloaded through communication with the web servers. The machine learning refers to a method of causing a computer to create a new algorithm on its own based on data. Deep learning may be performed by the data management unit 232 as an area of the machine learning. The data management unit 232 may generate new data based on the machine learning and the deep learning.

The data management unit 232 can control to collect various kinds of information. For example, the data management unit 232 may perform a function of automatically analyzing the YouTube video to find a specific image requested by the user. In addition, the data management unit 232 may process information received from a server of a third party and an organization to extract frequent question (FAQ) information. In addition, the data management unit 232 may perform image determination and sound determination using content stored in the storage unit 220 and content published on the web, and may store the analysis data calculated through the process in the storage unit 220 again.

The data management unit 232 may extract information related to a rule for generating question data. The data management unit 232 can generate an additional question having the same meaning as the specific question (analysis question) inputted. In this case, the analysis question as a source for generating the additional question by the data management unit 232 may be data input from the operator of the server 100, or when the question data stored in the storage unit 220 in the server 100 itself satisfies a specific value The question data may be automatically set as a question for analysis. For example, when the data management unit 232 determines that new unanswered data is stored in the storage unit 220, the data management unit 232 may set the unanswered data as an analysis question, and then generate a similar additional question. In addition, when the unanswered data is newly stored in the storage unit 220, the data management unit 232 selects the question data having the smallest cluster data out of the unanswered data and the question data including at least one common word (including synonyms) Assuming that questions A and B contain common words, assuming that the amount of similar question data in A is 100 and B is 200, A can be selected as the analytical question by giving priority to A). In addition, in response to various conditions, the data management unit 232 can select an analysis question and generate additional questions on its own.

The data management unit 232 may generate an additional question having similarity within a certain range with the analysis question. The data management unit 232 can extract a word first in the analysis question. The data management unit 232 may replace the extracted words with synonyms or synonyms. The synonyms and synonyms may be determined based on dictionary data 222 in storage 220. The dictionary data may also store synonyms and synonyms as well as antonyms and frequently incorrect expressions. Further, the dictionary data may be set by setting information about the degree of similarity of each word as coordinate data (for example, setting a word having the same meaning as a closest coordinate, and setting a coordinate so as to have a longer distance). The operation of measuring and storing the degree of similarity by setting the coordinate values may be performed by the learning module 2321 in the data management unit 232. [

The data management unit 232 may generate a new question by replacing the extracted word with a synonym. Functions related to generation of the additional query can be performed in the analysis module 2322 in the data management unit 232. The analysis module 2322 may perform question analysis and rule analysis. The data management unit 232 can determine a grammatical error of the additional question if an additional question is generated by word substitution. The data management unit 232 may change other sentence components except for the substituted word according to the grammatical error, and may add the sentence to the question data 221 with respect to the sentence whose grammatical error is not found.

The data management unit 232 may update the rule by the update module 2323. The rule at this time is a concept including grammatical data. That is, the update module 2323 can update the rule data including the grammar based on the external web server, the SNS, and the data stored in the storage unit 220. The update module 2323 can extract grammatical common denominators by analyzing sentences written in papers, books, SNS, etc. through analysis of big data. In particular, based on the articles posted on SNS and community websites, the grammar rules of non- Can be extracted.

The data management unit 232 can determine a grammatical error based on grammar data in the rule data updated by the update module 2323 and stored in the rule data 225.

The data manager 232 may generate an additional query based on the crawled data by the analysis module 2322. The data management unit 232 extracts some words from the analytical question and then determines the industry information in which the corresponding word is searched or the word is mainly used and determines whether or not the industry information (for example, clothing shopping, a public agency, . Then, the data management unit 232 can collect crawl data (scrap information, URL list, etc.) from the preferentially selected web sites. At this time, crawling means extracting valid data from a vast amount of data. Therefore, the crawling data may be not only the entire web page (including the SNS), the web site address, but also sentence data having a sentence format similar to the analysis question among a plurality of sentences existing in the web page have. The crawling data may include sentences including words in synonymy with words included in the analysis question. Also, the crawling data may include sound, image, and moving picture information related to the meaning of the core word included in the analysis question.

The collected crawl data may be used by the data manager 232 to generate additional queries. If the dictionary data 222 were used to generate additional questions in the standard language domain, the crawl data 223 could be used to generate additional questions in the non-standard domain similar to the analytical query. That is, the data management unit 232 may generate the additional query of the non-standard word region for the analysis question using the crawl data 223. Accordingly, the data management unit 232 can generate additional questions in both the standard language region and the non-standard language region from the analytical question. In addition, the data management unit 232 can determine whether there is a grammatical error even for the additional question generated based on the crawl data 223. If there is a grammatical error, the data management unit 232 can correct the error, have.

In addition, the data management unit 232 may perform similarity determination on the additional questions generated according to various embodiments. At this time, the data management unit 232 can perform artificial intelligence data processing by a machine learning engine (e.g., tensor flow). The data management unit 232 can analyze the similarity of each sentence based on crawling data and various previously stored data, and can quantify (e.g., set coordinate values) the degree of similarity between the sentences. For example, in the case of two or more question data, the data management unit 232 may set the separation distance to one level in the case of similarity in the level where the words are the same and the utilization of the verb is different only, and there is no problem in deriving the same answer, The use of verbs is different, but if the degree of similarity is not problematic in deriving the same answer, the separation distance can be set to two levels. In this case, the distance between the first stage and the second stage is shorter than the second stage. The coordinate values for each sentence given by the data management unit 232 may include not only coordinate values on a two-dimensional plane but also three-dimensional coordinate values having X, Y, and Z values.

The coordinate values given to each sentence by the data management unit 232 are not fixed, and they can be changed according to the relationship between the update and the sentence to be added. As the additional question is continuously generated, the amount of the question data in the storage unit 220 can be increased, so that the coordinate value of each question sentence can be changed more finely.

The data management unit 232 can set the cluster based on the coordinate value information given to each question sentence and the user's setting (the user manually inputs the CS data to group specific sentences into one group and designate the same meaning) .

The data management unit 232 can determine the degree of similarity between the questions based on the coordinate value assigned to each of the question data 221.

Referring to FIG. 4, the data management unit 232 of the server 100 according to the embodiment of the present invention may perform an operation 401 for determining an input of a query for analysis. At this time, the analysis question may be data directly inputted by the operator of the server 100. For example, the analytical query may be input by directly typing or inputting a previously created data file by the server 100 operator. Also, the analysis question may be set to the question items satisfying predetermined conditions among the question data previously stored in the storage unit 220. [ The data management unit 232 may perform an operation for selecting an analysis question among the stored query data in the storage unit 220. In this case, If the question item satisfying the predetermined condition is detected, the data management unit 232 may select the question item as an analysis question, and then perform the operation after the operation of 401.

The data management unit 232 may perform an operation 405 of generating an additional question by extracting a word from the analysis question and replacing the extracted word with a synonym. In this case, the synonyms include synonyms and the like, and the synonyms may be information existing in the dictionary data 222 stored in the storage unit 220. If it is determined that the synonym information corresponding to the extracted word does not exist in the storage unit 220 according to various embodiments, the data management unit 232 may access the web and search for and use dictionary information such as synonyms.

However, when generating additional questions using the dictionary data, the data management unit 232 assigns weights to the core words among the extracted words, and changes the word replacement probability according to the weighting level of each word, The additional word can be generated by setting the word substitution probability to a lower value. At this time, the key word may be set based on the industry classification of the chatbot company (which may be set to a word which is more than a preset probability on the homepage of the industry). For example, in the case of a chatbot used by a shopping mall company, related words such as 'order', 'delivery', 'exchange', and the like may be set as key words in the question data of the corresponding chatbot.

The data manager 232 may then perform a 411 operation to determine a grammar for the additional query generated by operation 405. The grammar-related information may be included in the rule data 225 stored in the storage unit 220. The operation 411 of determining the grammar may correspond to an operation of determining the type of grammar to be applied to the generated additional question.

Thereafter, the data manager 232 may perform an operation 415 of determining whether there is a grammatical error in the generated additional query. If there is no grammatical error, the data manager 232 may lead to a 425 operation phase. However, if it is determined that there is a grammatical error, the data management unit 232 may perform the operation 421 of correcting the grammar error of the additional question based on the grammar information.

After the operation 421, the data management unit 232 may perform an operation 425 for determining whether the similarity between the generated additional query and the analytical query input is within a predetermined range. If the similarity is within the predetermined range, the data management unit 232 may perform an operation 431 of storing the additional query as new query data in the storage unit 220. On the other hand, if it is determined that the similarity is out of the predetermined range as a result of the operation 425, the data manager 232 may perform operation 435 for deleting the generated additional query.

The data management unit 232 can perform an operation 501 to determine that an analysis question is input. The analytical questions in the 501 action may be entered or selected in the same manner as in the 401 action. However, in accordance with various embodiments, the analytical query may correspond to an additional question stored in operation 431 of FIG. That is, the operation performed in FIG. 5 according to various embodiments may be performed following the operation shown in FIG.

After the operation 501, the data management unit 232 may extract a word from the input analysis question and perform an operation 505 for connecting to the word related web (including the SNS). The data management unit 232 may then perform an operation 511 of collecting crawl data from the related web. Then, the data management unit 232 can perform an operation 515 for analyzing words and grammar of the non-standard word region by the crawled data. According to various embodiments, the crawl data collected by the data management unit 232 may be temporarily or permanently stored in the storage unit 220, and the synonym and grammar information may be updated using the crawled data.

The data manager 232 may then perform an operation 521 to generate an additional query based on the analyzed word and grammar from the crawled data. After the operation 521, the data management unit 232 may perform an operation 525 for determining whether the similarity between the generated additional query and the analysis query is within a predetermined range. If it is determined that the similarity is within the predetermined range as a result of the operation 525, the data manager 232 may perform an operation 531 for storing an additional query. On the other hand, if the similarity is not within the predetermined range, the data management unit 232 may perform the operation 535 for deleting the generated additional query.

Meanwhile, the query and analysis questions added to the storage unit 220 according to various embodiments of the present invention may be input in various ways. The operation of interlocking with an external server and receiving an additional question will be described with reference to FIG.

610, 620 and 630 illustrate the receipt of a question or complaint from a consumer or a complainant. First, 610 shows that a question is received from customers on an online business company homepage. The customer questions recorded on the company homepage shown at 610 can be transmitted to the server 100 of the present invention in addition to the case where they are answered by the chatbot as well as when they are directly answered by the counselor. To this end, the server 100 of the present invention may be affiliated with the servers of the vendors and the organizations shown in 610, 620, 630, or the companies and organizations shown in the above 610, 620, 630 may be provided by the server 100 of the present invention It may be the customer who uses the chatbot service. Therefore, according to various embodiments, a company using a chat service and an institution-side server can automatically transmit a received question item to the server 100 of the present invention.

The question information collected from the servers of the

external vendors

610, 620, and 630 may be transmitted to the chat server 100 of the present invention. The server 100 may perform operations to collect customer information 645, performance management data 647, and FAQ (Frequently Asked Questions) management data 649 through step 641 of collecting real-time information and step 643 of data analysis. Then, the server 100 may store the customer information 645, the performance management data 647, and the FAQ management data 649 in the storage unit 220 and utilize it when generating a new additional question.

Then, the server 100 selects frequent questions according to the age, sex, and question registration method of the customer based on the common question type information and the customer information according to the business fields, and frequently asked questions according to the business type of the chatbot company, based on the collected information, . Thereafter, the server 100 may select the selected frequent question as an analysis question to generate an additional question by type.

In addition, since the server 100 serves as a chatbot providing server, it can store the chat history performed by the chatbot. Accordingly, the server 100 can store not only the chat history between the chatbot and the chatbot user but also the chat history that the agent answers instead of the chatbot only when the chatbot fails to respond. In addition, the server 100 may store a conversation history between the agent and the chatbot user, which is performed in a state where the chatbot is in a watching mode (a state in which the chatbot is set to not answer). At this time, the server 100 may store the agent response by user question on the storage unit as conversation data based on the conversation history. The server 100 may update question data and answer data based on stored chat history data, and may store association weights for at least two or more consecutive questions in the chat data.

According to various embodiments, the server 100 may assign relevance weights to questions having hysteresis in which the same answer is selected by the agent based on the chat history input to the chatbot. In addition, the server 100 assigns coordinate data to the question data stored in the storage unit 220 for each question, sets coordinate data closely to at least two or more questions to which the association weight is assigned, So that the degree of similarity between the questions can be determined. The degree of similarity can be used for grasping the meaning of the user question inputted to the chatbot in the future.

Although the present invention has been described in detail with reference to the above examples, those skilled in the art will be able to make adaptations, changes and modifications to these examples without departing from the scope of the present invention. In other words, in order to achieve the intended effect of the present invention, all the functional blocks shown in the drawings are separately included or all the steps shown in the drawings are not necessarily followed in the order shown, It can be in the range.

Claims

An apparatus for performing a chatbot function,

A storage unit for storing the additional question generated from the analysis question, and storing the dictionary data, the rule data, and the crawl data required for generating the additional question;

A communication unit for receiving information required for machine learning for generating an additional question from the analysis query from a web server; And

And a controller for generating an additional question based on the machine learning from the analysis question,

The control unit

Wherein the analyzing unit generates an additional question having the same meaning or expression as the analysis question based on at least one of data of dictionary data, grammar data, and crawl data, A data management unit for storing the generated additional question in the storage unit if the degree of similarity is within a predetermined range; And

And a chatbot providing unit for determining whether to analyze the user query input to the chatbot and performing analysis and response matching based on the query data including the generated additional query for the user query to be analyzed Based data chatting device for chatbots.
The method according to claim 1,

The data management unit

Extracting a word from the input analysis query, determining whether a synonym of the extracted word exists based on the dictionary data, and replacing the extracted word with the synonym when the synonym exists, Based data chatting device for chatbots.
3. The method of claim 2,

The data management unit

Wherein a weight is assigned to a core word among the extracted words and a word substitution probability is varied according to a weighting level of each word and a word substitution probability is set lower for a weighted core word to generate an additional query. A data-adding device for a machine-based chat bot.
The method according to claim 1,

The data management unit

Collecting crawl data in cooperation with a web page retrieved with at least one word extracted from the input analysis question, extracting non-standard synonyms and non-standard grammars by performing big data analysis on the collected crawl data, Based on the non-standard synonyms and the non-standard grammar, an additional question from the analytical query.
The method according to claim 1,

The data management unit

Judges whether there is a grammatical error in the generated additional question, corrects the grammatical error if there is a grammatical error, performs similarity judgment on the additional question which corrects the grammatical error and the additional question without grammatical error, And stores additional questions whose similarity is within a predetermined range.
The method according to claim 1,

The chatbot providing unit

If no answer corresponding to the user question entered in the chatbot is detected, the agent invites the agent to the dialog box for conducting chat with the chatbot, and the chatbot maintains a watch mode in which the agent does not output a response until the agent leaves the dialog Based on the data received from the mobile communication terminal.
The method according to claim 6,

The chatbot providing unit

Storing the agent response by user question, which is performed while the chatbot maintains the watching mode, as conversation data in the storage unit,

The data management unit

Wherein the question data and answer data are updated based on the stored chat data, and associativity weights are assigned to at least two or more consecutive questions in the chat data and stored.
The method of claim 1, wherein

The data management unit

The server is connected to a plurality of online business companies, a public institution, and an external server of a call center. The server generates question data based on requests received from the external server, analyzes the requests, And classifying and generating performance management information according to the customer information and the countermeasures to update the question data and the answer data.
9. The method of claim 8,

The data management unit

Frequently asked questions by the age, sex, and question registration method of the customer and frequent questions by the business type of the chat bot user are selected and stored in the question data based on the common question type information by task field and the customer information, and the selected frequently asked questions And selecting an analysis question to generate an additional question for each type.
The method according to claim 1,

The data management unit

Assigning association weights to questions having a history in which the same answer is selected by an agent based on a conversation history input to the chatbot,

The question data stored in the storage unit is used to assign coordinate data to each question, to set coordinate data for at least two or more questions to which the association weight is assigned,

And the similarity is determined based on the coordinate data.