CN109918486B - Corpus construction method and device for intelligent customer service, computer equipment and storage medium - Google Patents

Corpus construction method and device for intelligent customer service, computer equipment and storage medium Download PDF

Info

Publication number
CN109918486B
CN109918486B CN201910065779.8A CN201910065779A CN109918486B CN 109918486 B CN109918486 B CN 109918486B CN 201910065779 A CN201910065779 A CN 201910065779A CN 109918486 B CN109918486 B CN 109918486B
Authority
CN
China
Prior art keywords
corpus
data
response
model
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910065779.8A
Other languages
Chinese (zh)
Other versions
CN109918486A (en
Inventor
吴壮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910065779.8A priority Critical patent/CN109918486B/en
Publication of CN109918486A publication Critical patent/CN109918486A/en
Priority to PCT/CN2019/117698 priority patent/WO2020151318A1/en
Application granted granted Critical
Publication of CN109918486B publication Critical patent/CN109918486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a corpus construction method, a device, computer equipment and a storage medium for intelligent customer service, wherein the method comprises the following steps: acquiring a subject term of corpus data to be constructed; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The response data obtain real customer service response through the web crawler. The invention improves the efficiency and quality of corpus construction, and also improves the problem hit rate of intelligent customer service, so that the customer service is artificially intelligent.

Description

Corpus construction method and device for intelligent customer service, computer equipment and storage medium
Technical Field
The invention relates to the field of intelligent customer service, in particular to a corpus construction method, a corpus construction device, computer equipment and a storage medium for intelligent customer service.
Background
With the development of artificial intelligence technology, intelligent customer service systems are also gradually rising. The intelligent customer service establishes a convenient natural language-based communication platform between the enterprise and a large number of users, effectively improves the efficiency of customer service work, and can provide direct-source customer information for the enterprise to carry out fine management.
The intelligent customer service can provide a customer service function based on the existing question-answer database, knowledge points are required to be manually arranged when the question-answer database is built, the problem points of users are manually expanded, and finally, data of one question and one answer in the question-answer database is generated.
However, the method for manually arranging knowledge points and manually expanding user problem points is time-consuming and labor-consuming, and often cannot embody the real hot spot problem of the user, so that the hit rate of the problem points in the user question and answer database is low in the use process, and intelligent customer service cannot effectively answer the user questions, and the user experience is affected.
Disclosure of Invention
The invention provides a corpus construction method, device, computer equipment and storage medium for intelligent customer service, which are used for solving the problem that the construction of a question-answer corpus by intelligent customer service is time-consuming and labor-consuming.
In order to solve the technical problems, the invention provides a corpus construction method of intelligent customer service, which comprises the following steps:
acquiring a subject term of corpus data to be constructed;
inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms;
inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list;
and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term.
Optionally, in the step of inputting the subject term into a preset problem generation model, the step of obtaining a problem list output by the problem generation model in response to the subject term specifically includes the following steps:
inputting the subject term into a second web crawler model, and acquiring query candidate data output by the second web crawler model in response to the subject term;
matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule;
and taking the question matching data as a question list of the subject term.
Optionally, in the step of matching the query candidate data according to a preset matching rule, the step of matching specifically adopts a regular matching algorithm to obtain query matching data.
Optionally, in the step of inputting the subject term into a preset problem generation model, the step of obtaining a problem list output by the problem generation model in response to the subject term specifically includes the following steps:
inputting the subject term into a pre-trained Seq2Seq model;
and acquiring a question list output by the Seq2Seq model in response to the subject term.
Optionally, after the step of inputting the problem list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the problem list, the method further includes the following steps:
filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;
and taking the filtered data as response data of the problem list.
Optionally, after the step of inputting the problem list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the problem list, the method further includes the following steps:
inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data;
and taking the non-doubtful corpus data as response data of the question list.
Optionally, the pre-trained deep neural network model is trained by:
obtaining training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;
inputting the training sample into a deep convolutional neural network model to obtain a reference corpus category of the training sample;
comparing whether the reference corpus categories of different samples in the training sample are consistent with the corpus categories or not;
and when the reference corpus class is inconsistent with the corpus class, repeatedly and iteratively updating the weight in the deep neural network model until the reference corpus class is consistent with the corpus class.
In order to solve the above problems, the present invention further provides a device for constructing question-answer corpus data of intelligent customer service, including:
the acquisition module is used for acquiring the subject terms of the corpus data to be constructed;
the generation module is used for inputting the subject terms into a preset problem generation model and acquiring a problem list which is output by the problem generation model in response to the subject terms;
the processing module is used for inputting the problem list into a preset first web crawler model, and obtaining response data output by the first web crawler model in response to the problem list, wherein the first web crawler model takes the problem list as a constraint condition to capture target data;
and the execution module is used for taking the response data as response data of the question list, and the response data is associated with the question list to form question and answer corpus data of the subject term.
Optionally, the generating module further includes:
the first processing submodule is used for inputting the subject term into a second web crawler model and obtaining query candidate data which is output by the second web crawler model in response to the subject term;
the first matching sub-module is used for matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule;
and the first execution sub-module is used for taking the question matching data as a question list of the subject terms.
Optionally, in the first matching sub-module, query matching data is obtained by adopting a regular matching algorithm.
Optionally, the generating module further includes:
a second processing sub-module for inputting the subject term into a pre-trained Seq2Seq model;
and the first acquisition submodule is used for acquiring a question list output by the Seq2Seq model in response to the subject term.
Optionally, the device for constructing the question and answer corpus data of the intelligent customer service further comprises:
the first filtering sub-module is used for filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;
and the second execution sub-module is used for taking the filtering data as response data of the problem list.
Optionally, the device for constructing the question and answer corpus data of the intelligent customer service further comprises:
the first classification sub-module is used for inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least divides the response data into query corpus data and non-query corpus data;
and the third execution sub-module is used for taking the non-doubtful corpus data as response data of the question list.
Optionally, the device for constructing the question and answer corpus data of the intelligent customer service further comprises:
the second acquisition sub-module is used for acquiring training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;
the third processing sub-module is used for inputting the training sample into a deep convolutional neural network model to obtain a reference corpus class of the training sample;
the first comparison sub-module is used for comparing whether the reference corpus categories of different samples in the training samples are consistent with the corpus categories or not;
and the first updating sub-module is used for repeatedly and iteratively updating the weight in the deep neural network model when the reference corpus class is inconsistent with the corpus class until the reference corpus class is consistent with the corpus class.
In order to solve the above technical problems, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor is caused to execute the steps of the corpus construction method of intelligent customer service.
In order to solve the above technical problems, an embodiment of the present invention further provides a computer readable storage medium, where computer readable instructions are stored on the computer readable storage medium, and when the computer readable instructions are executed by a processor, the processor is caused to execute the steps of the corpus construction method of intelligent customer service.
The embodiment of the invention has the beneficial effects that: acquiring a subject word of the corpus data to be constructed; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The problem generation of the subject term automatically acquires a real problem of a user through a web crawler, or generates a problem list through artificial intelligence learning of the real intention of the user, and corresponding response data also acquires a real customer service response through the web crawler. The invention improves the efficiency and quality of the construction of the question-answer data and also improves the hit rate of the intelligent customer service.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art
FIG. 1 is a basic flow diagram of a corpus construction method of intelligent customer service according to an embodiment of the invention;
FIG. 2 is a flow chart of generating a problem list based on a second web crawler model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of generating a problem list based on a Seq2Seq model according to an embodiment of the present invention;
FIG. 4 is a flow chart of obtaining response data based on filtering rules according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of obtaining response data based on a deep neural network model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a training process of a deep neural network model according to an embodiment of the present invention;
FIG. 7 is a basic structure block diagram of a question-answer corpus data construction device of an intelligent customer service according to an embodiment of the invention;
FIG. 8 is a block diagram showing the basic structure of a computer device according to the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
Examples
As used herein, a "terminal" includes both a device of a wireless signal receiver having no transmitting capability and a device of receiving and transmitting hardware having receiving and transmitting hardware capable of performing bi-directional communications over a bi-directional communication link, as will be appreciated by those skilled in the art. Such a device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; a PCS (Personal Communications Service, personal communication system) that may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant ) that can include a radio frequency receiver, pager, internet/intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System ) receiver; a conventional laptop and/or palmtop computer or other appliance that has and/or includes a radio frequency receiver. As used herein, "terminal," "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or adapted and/or configured to operate locally and/or in a distributed fashion, to operate at any other location(s) on earth and/or in space. The "terminal" and "terminal device" used herein may also be a communication terminal, a network access terminal, and a music/video playing terminal, for example, may be a PDA, a MID (Mobile Internet Device ), and/or a mobile phone with a music/video playing function, and may also be a smart tv, a set top box, and other devices.
The terminal in this embodiment is the above-described terminal.
Specifically, referring to fig. 1, fig. 1 is a basic flow chart of a corpus construction method of intelligent customer service according to the present embodiment.
As shown in fig. 1, a corpus construction method of intelligent customer service includes the following steps:
s101, acquiring a subject term of corpus data to be constructed;
the subject words define the subjects of the corpus data to be constructed, and the subject words input by the user are acquired through the interactive pages on the terminal. In order to construct question-answer corpus data more focused, the scope of the suggested input subject word descriptions is suitably small. For example, "mobile phones" are wide in coverage, constructed question and answer corpora may be divergent, and for the purpose of focusing the question and answer corpus data, the subject terms may be defined as "xx model mobile phones".
S102, inputting the subject term into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject term;
inputting the acquired subject terms into a preset problem generation model, and generating a problem list based on the subject terms. The problem generating model may be a fixed series of problems set in advance, with the subject terms as parameter changes. For example, a series of problems set in advance are:
subject matter word When the release time is
Subject matter word What the sales price is
Subject matter word Which sales channels have
Subject matter word Support fingerprint identification
Subject matter word Support for multiple user login
Different types of subject words can be preset to correspond to different problem lists.
In the embodiment of the invention, a web crawler model is adopted to obtain the real questioning of the online user; or a problem list is generated by a pre-trained Seq2Seq model. In particular, please refer to fig. 2 and 3 below for description.
S103, inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list;
a list of questions about the subject term is obtained and input into a pre-set web crawler model, referred to herein as a first web crawler model. The web crawler is a program for automatically extracting a web page, specifically, by using the python program to simulate a browser, a request is sent to a target site, and the target site server responds to the request and returns html, pictures, videos and other resources. The first web crawler model takes the problem list as a retrieval condition, and retrieves data related to the problem list in the target site, namely response data output by the first web crawler model when affecting the problem list.
And S104, taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term.
The response data is used as response data of the question list, and the question list and the response data are associated in a one-to-one response mode. The method is characterized in that a piece of data is represented in a database and comprises two parts, wherein one part is a problem, and the other part is a response to the problem.
When the intelligent customer service receives the problem of the user, the problem of which the question and answer database is consistent with the keyword of the problem of the user can be searched by a keyword searching mode, and a response with a mapping relation with the problem is returned. In some embodiments, the response corresponding to the question is obtained by calculating the similarity between the user question and the question in the question and answer database. The similarity calculation may calculate the similarity using an algorithm for calculating the edit distance, for example: the question stored in the question-answer database is "how much and how little the mobile phone sells", the received user question is "how much and how much the mobile phone sells", the edit distance of the question-answer database and the question-answer database is 1, namely "how much and how much the mobile phone sells" only needs to be inserted into the question-answer database. And searching the problem with the maximum similarity with the problem proposed by the user in the database, and returning a response corresponding to the problem.
As shown in fig. 2, step S102 specifically further includes the following steps:
s111, inputting the subject term into a second web crawler model, and acquiring query candidate data output by the second web crawler model in response to the subject term;
in the embodiment of the invention, a web crawler model is adopted to acquire the problem related to the input subject term, and in order to distinguish the problem from the web crawler model, the problem is called a second web crawler model, the model takes the subject term as a search condition to acquire the content related to the subject term on a target site, and the content is called query candidate data.
S112, matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule;
the obtained query candidate data comprises non-query corpus data and query corpus data, a matching rule is preset for obtaining the query corpus, and the query candidate data is processed through the preset matching rule to obtain the query matching data. Matching rules are contain "? "what", "how much", "where", "what", etc. represent the word of the question. In the embodiment of the invention, a regular matching algorithm is adopted, a regular expression is a logic formula for operating the character string, and a rule character string is formed by a plurality of specific characters defined in advance and the combination of the specific characters and is used for expressing a filtering logic for the character string. A regular expression is a pattern of text describing one or more strings to be matched when searching text. For example, a regular expression of "/what is the subject term" may be used to find any string containing "subject term" and "what".
Because the real questions of the user recorded by the target site are acquired through the second web crawler model, the question list acquired through the method is closer to reality, and the hit rate of the constructed question-answer corpus data hitting the actual questions of the user is higher.
S113, taking the question matching data as a question list of the subject term.
After the query candidate data is subjected to a preset matching rule, the obtained query matching data is a question list related to the subject word.
As shown in fig. 3, in some embodiments, step S102 specifically further includes the following steps:
s121, inputting the subject term into a pre-trained Seq2Seq model;
in some embodiments, the problem list is obtained by inputting the subject term into a pre-trained Seq2Seq model. The Seq2Seq model is a network of an Encoder-Decoder structure whose input is a sequence and whose output is a sequence, where the Encoder converts a variable length signal sequence into a fixed length vector representation and the Decoder converts the fixed length vector into a variable length target signal sequence. Specifically, firstly, performing one-hot vocabulary coding on a received subject term, then, converting the received subject term into a corresponding term vector through a word2vec term vector model, and inputting the term vector into an Encoder layer, wherein the Encoder layer is a multi-layer neuron layer taking a bidirectional LSTM layer or an RNN (convolutional neural network) as a basic neuron unit, and generating a final_state state layer and a final_output state vector;
then, the final output state vector of the encoder output in the steps is input to a global information layer to generate a global state layer context;
and finally, inputting the final_state and final_output state vectors obtained in the steps and the context vector generated by the global information layer into a Decoder layer, and outputting the final_state vector and the output vector of the Decoder layer, wherein the Decoder layer is also a multi-layer neuron layer taking a bidirectional LSTM layer or RNN as a basic neuron unit. The output result is a basic question list with the input subject word as the subject.
S122, acquiring a question list output by the Seq2Seq model in response to the subject term.
The response about the subject word output by the Seq2Seq model is taken as a question list. The Seq2Seq model needs to be trained to have the function of outputting a problem list. The specific training process is to prepare a training corpus, namely, an input sequence and a corresponding output sequence are prepared, the input sequence is input into a Seq2Seq model, the probability of the output sequence is calculated, the parameters of the Seq2Seq model are adjusted, and the probability of the whole sample, namely, all the input sequences output the corresponding output sequence through the Seq2Seq, is highest.
As shown in fig. 4, after step S103, the following steps are further included:
s131, filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;
in some embodiments, further processing is performed on the acquired response data, and since the acquired response data is needed here, it is first necessary to filter out the corpus representing questions. All corpora containing "what", "how much" etc. representing the query semantics can be filtered out by the regular matching algorithm as well. In addition, the filtering rule can also comprise filtering of the sensitive words, and corpus containing the sensitive words is filtered according to the set sensitive word list.
And S132, taking the filtered data as response data of the problem list.
And filtering the response data to obtain filtered data, namely response data of the problem list.
As shown in fig. 5, after step S103, in some embodiments, the following steps are further included:
s141, inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data;
in some embodiments, the response data is classified by a pre-trained deep neural network model, wherein the pre-trained deep neural network model can identify at least a query corpus and a non-query corpus. The specific training process of the deep neural network is shown in fig. 6.
S142, the non-doubtful corpus data is used as response data of the question list.
The non-corpus data identified by the deep neural network is response data corresponding to the problem list.
As shown in fig. 6, the deep neural network model used in step S141 is trained as follows:
s151, obtaining a training sample marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;
in the embodiment of the invention, the training target of the deep neural network model is that the query corpus and the non-query corpus can be identified. The training samples contain two types of corpus, each marked with a corpus class.
S152, inputting the training sample into a deep convolutional neural network model to obtain a reference corpus class of the training sample;
the samples are input into a deep convolutional neural network model, and the reference corpus category of the training samples is output, namely whether the output sample corpus is the doubtful corpus or the non-doubtful corpus.
S153, comparing whether the reference corpus categories of different samples in the training samples are consistent with the corpus categories;
and judging whether the reference corpus class output by the deep convolutional neural network is consistent with the corpus class marked by the sample through the loss function. In the embodiment of the invention, the loss function adopts a Softmax cross entropy loss function. In the training process, the weight in the deep convolutional neural network model is adjusted, so that the Softmax cross entropy loss function is converged as much as possible, that is, the weight is continuously adjusted, the value of the obtained loss function is not reduced, and when the value is increased, the deep convolutional neural network training is considered to be finished.
Assuming that there are N training samples, the input feature of the i-th sample of the last hierarchy of the network is Xi, and the corresponding label is Yi, which is the final classification result (i.e. whether the sample i is an question or a non-question), h= (h 1, h2, & gt, hc) is the final output of the network, i.e. the prediction result of the sample i. Where C is the number of last all classifications.
The Softmax cross entropy loss function is
And S154, when the reference corpus category is inconsistent with the corpus category, repeatedly and iteratively updating the weight in the deep neural network model until the reference corpus category is consistent with the corpus category.
When the loss function does not converge, the weights of all nodes in the deep convolutional neural network model are updated as described above, and in the embodiment of the invention, a gradient descent method is adopted, wherein the gradient descent method is an optimization algorithm used for recursively approximating the minimum deviation model in machine learning and artificial intelligence.
In order to solve the technical problems, the embodiment of the invention also provides a question-answer corpus data construction device of the intelligent customer service. Referring specifically to fig. 7, fig. 7 is a basic structural block diagram of a device for constructing question-answer corpus data of intelligent customer service according to this embodiment.
As shown in fig. 7, a device for constructing question-answer corpus data of intelligent customer service includes: the system comprises an acquisition module 210, a generation module 220, a processing module 230 and an execution module 240. The acquiring module 210 is configured to acquire a subject term of the question-answer corpus data to be constructed; a generating module 220, configured to input the subject term into a preset problem generating model, and obtain a problem list output by the problem generating model in response to the subject term; the processing module 230 is configured to input the problem list into a preset first web crawler model, and obtain response data output by the first web crawler model in response to the problem list, where the first web crawler model captures target data with the problem list as a constraint condition; and the execution module 240 is configured to use the response data as response data of the question list, where the response data is associated with the question list to form question and answer corpus data of the subject term.
According to the embodiment of the invention, the subject words of the question-answer corpus data to be constructed are obtained; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The problem generation of the subject term automatically acquires a real problem of a user through a web crawler, or generates a problem list through artificial intelligence learning of the real intention of the user, and corresponding response data also acquires a real customer service response through the web crawler. The invention improves the efficiency and quality of the construction of the question-answer data and also improves the hit rate of the intelligent customer service.
In some embodiments, the generating module 220 further comprises: the device comprises a first processing sub-module, a first matching sub-module and a first executing sub-module. The first processing sub-module is used for inputting the subject term into a second web crawler model and obtaining query candidate data output by the second web crawler model in response to the subject term; the first matching sub-module is used for matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule; and the first execution sub-module is used for taking the question matching data as a question list of the subject terms.
In some embodiments, in the first matching sub-module, query matching data is obtained by using a regular matching algorithm.
In some embodiments, the generating module 220 further comprises: the system comprises a second processing sub-module and a first acquisition sub-module. The second processing submodule is used for inputting the subject term into a pre-trained Seq2Seq model; and the first acquisition submodule is used for acquiring a question list output by the Seq2Seq model in response to the subject term.
In some embodiments, the device for constructing question and answer corpus data of intelligent customer service further includes: the system comprises a first filtering sub-module and a second executing sub-module. The first filtering sub-module is used for filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule; and the second execution sub-module is used for taking the filtering data as response data of the problem list.
In some embodiments, the device for constructing question and answer corpus data of intelligent customer service further includes: a first classification sub-module and a third execution sub-module. The first classification sub-module is used for inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data; and the third execution sub-module is used for taking the non-doubtful corpus data as response data of the question list.
In some embodiments, the device for constructing question and answer corpus data of intelligent customer service further includes: the system comprises a second acquisition sub-module, a third processing sub-module, a first comparison sub-module and a first updating sub-module. The second obtaining submodule is used for obtaining training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus; the third processing sub-module is used for inputting the training sample into a deep convolutional neural network model to obtain a reference corpus class of the training sample; the first comparison sub-module is used for comparing whether the reference corpus categories of different samples in the training samples are consistent with the corpus categories or not; and the first updating sub-module is used for repeatedly and iteratively updating the weight in the deep neural network model when the reference corpus class is inconsistent with the corpus class until the reference corpus class is consistent with the corpus class.
In order to solve the technical problems, the embodiment of the invention also provides computer equipment. Referring specifically to fig. 8, fig. 8 is a basic structural block diagram of a computer device according to the present embodiment.
As shown in fig. 8, the internal structure of the computer device is schematically shown. As shown in fig. 8, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and when the computer readable instructions are executed by the processor, the processor can realize a method for constructing question-answer corpus data of intelligent customer service. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may store computer readable instructions that, when executed by the processor, cause the processor to perform a method for constructing question-answer corpus data of intelligent customer service. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
The processor in this embodiment is configured to execute the specific contents of the acquisition module 210, the generation module 220, the processing module 230, and the execution module 240 in fig. 7, and the memory stores program codes and various types of data required for executing the above modules. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all sub-modules in the corpus construction method of intelligent customer service, and the server can call the program codes and data of the server to execute the functions of all sub-modules.
The computer equipment acquires the subject terms of the corpus data to be constructed; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The problem generation of the subject term automatically acquires a real problem of a user through a web crawler, or generates a problem list through artificial intelligence learning of the real intention of the user, and corresponding response data also acquires a real customer service response through the web crawler. The invention improves the efficiency and quality of the construction of the question-answer data and also improves the hit rate of the intelligent customer service.
The present invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the corpus construction method for intelligent customer service according to any of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (8)

1. The corpus construction method of the intelligent customer service is characterized by comprising the following steps of:
acquiring a subject term of corpus data to be constructed;
inputting the subject word into a preset problem generation model, and acquiring a preset problem list output by the problem generation model in response to the subject word, wherein the problem generation model is a model which is preset with a fixed series of problems and takes the subject word as a parameter change;
inputting the subject word into a second web crawler model, obtaining query candidate data output by the second web crawler model in response to the subject word, and matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule, and the query matching data is used as a problem list of the subject word;
inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list;
the response data is used as response data of the question list, and the response data and the question list are associated to form question and answer corpus data of the subject term;
wherein after the step of inputting the problem list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the problem list, the method further comprises the following steps:
inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data;
and taking the non-doubtful corpus data as response data of the question list.
2. The method for constructing the corpus of intelligent customer service according to claim 1, wherein in the step of matching the query candidate data according to a preset matching rule, the step of matching specifically adopts a regular matching algorithm to obtain query matching data.
3. The method for constructing a corpus of intelligent customer service according to claim 1, wherein in the step of inputting the subject word into a preset question generation model and obtaining a preset question list output by the question generation model in response to the subject word, the method specifically comprises the following steps:
inputting the subject term into a pre-trained Seq2Seq model;
and acquiring a question list output by the Seq2Seq model in response to the subject term.
4. The corpus construction method of intelligent customer service according to claim 1, further comprising the following steps after the step of inputting the question list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the question list:
filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;
and taking the filtered data as response data of the problem list.
5. The intelligent customer service corpus construction method according to claim 1, wherein the pre-trained deep neural network model is trained by the following steps:
obtaining training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;
inputting the training sample into a deep convolutional neural network model to obtain a reference corpus category of the training sample;
comparing whether the reference corpus categories of different samples in the training sample are consistent with the corpus categories or not;
and when the reference corpus class is inconsistent with the corpus class, repeatedly and iteratively updating the weight in the deep neural network model until the reference corpus class is consistent with the corpus class.
6. A device for constructing question-answer corpus data of an intelligent customer service, the device being configured to implement the corpus construction method of an intelligent customer service according to any one of claims 1 to 5, the device comprising:
the acquisition module is used for acquiring the subject terms of the corpus data to be constructed;
the generation module is used for inputting the subject terms into a preset problem generation model and acquiring a preset problem list which is output by the problem generation model in response to the subject terms; inputting the subject term into a second web crawler model, obtaining query candidate data output by the second web crawler model in response to the subject term, and matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule, and the query matching data is used as a problem list of the subject term;
the processing module is used for inputting the problem list into a preset first web crawler model, and obtaining response data output by the first web crawler model in response to the problem list, wherein the first web crawler model takes the problem list as a constraint condition to capture target data;
and the execution module is used for taking the response data as response data of the question list, and the response data is associated with the question list to form question and answer corpus data of the subject term.
7. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the steps of the corpus construction method of intelligent customer service according to any of claims 1 to 5.
8. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, implement the steps of the corpus construction method of intelligent customer service according to any of claims 1 to 5.
CN201910065779.8A 2019-01-24 2019-01-24 Corpus construction method and device for intelligent customer service, computer equipment and storage medium Active CN109918486B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910065779.8A CN109918486B (en) 2019-01-24 2019-01-24 Corpus construction method and device for intelligent customer service, computer equipment and storage medium
PCT/CN2019/117698 WO2020151318A1 (en) 2019-01-24 2019-11-12 Corpus construction method and apparatus based on crawler model, and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910065779.8A CN109918486B (en) 2019-01-24 2019-01-24 Corpus construction method and device for intelligent customer service, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109918486A CN109918486A (en) 2019-06-21
CN109918486B true CN109918486B (en) 2024-03-19

Family

ID=66960656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910065779.8A Active CN109918486B (en) 2019-01-24 2019-01-24 Corpus construction method and device for intelligent customer service, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN109918486B (en)
WO (1) WO2020151318A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918486B (en) * 2019-01-24 2024-03-19 平安科技(深圳)有限公司 Corpus construction method and device for intelligent customer service, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699590A (en) * 2013-12-09 2014-04-02 北京奇虎科技有限公司 Method and server for providing graphic tutorial problem solution
CN108549710A (en) * 2018-04-20 2018-09-18 腾讯科技(深圳)有限公司 Intelligent answer method, apparatus, storage medium and equipment
CN108717433A (en) * 2018-05-14 2018-10-30 南京邮电大学 A kind of construction of knowledge base method and device of programming-oriented field question answering system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573028B (en) * 2015-01-14 2019-01-25 百度在线网络技术(北京)有限公司 Realize the method and system of intelligent answer
JP6520513B2 (en) * 2015-07-17 2019-05-29 富士ゼロックス株式会社 Question and Answer Information Providing System, Information Processing Device, and Program
US10275515B2 (en) * 2017-02-21 2019-04-30 International Business Machines Corporation Question-answer pair generation
CN108345640B (en) * 2018-01-12 2021-10-12 上海大学 Question and answer corpus construction method based on neural network semantic analysis
CN108959559B (en) * 2018-06-29 2021-02-26 北京百度网讯科技有限公司 Question and answer pair generation method and device
CN109190062B (en) * 2018-08-03 2023-04-07 平安科技(深圳)有限公司 Crawling method and device for target corpus data and storage medium
CN109918486B (en) * 2019-01-24 2024-03-19 平安科技(深圳)有限公司 Corpus construction method and device for intelligent customer service, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699590A (en) * 2013-12-09 2014-04-02 北京奇虎科技有限公司 Method and server for providing graphic tutorial problem solution
CN108549710A (en) * 2018-04-20 2018-09-18 腾讯科技(深圳)有限公司 Intelligent answer method, apparatus, storage medium and equipment
CN108717433A (en) * 2018-05-14 2018-10-30 南京邮电大学 A kind of construction of knowledge base method and device of programming-oriented field question answering system

Also Published As

Publication number Publication date
CN109918486A (en) 2019-06-21
WO2020151318A1 (en) 2020-07-30

Similar Documents

Publication Publication Date Title
CN110046221B (en) Machine dialogue method, device, computer equipment and storage medium
CN109885672B (en) Question-answering type intelligent retrieval system and method for online education
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN109977201B (en) Machine chat method and device with emotion, computer equipment and storage medium
CN109086303B (en) Intelligent conversation method, device and terminal based on machine reading understanding
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN117033608A (en) Knowledge graph generation type question-answering method and system based on large language model
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN106462626A (en) Modeling interestingness with deep neural networks
CN113239169B (en) Answer generation method, device, equipment and storage medium based on artificial intelligence
CN111159367B (en) Information processing method and related equipment
KR102090237B1 (en) Method, system and computer program for knowledge extension based on triple-semantic
KR102326744B1 (en) Control method, device and program of user participation keyword selection system
CN111666376B (en) Answer generation method and device based on paragraph boundary scan prediction and word shift distance cluster matching
CN111078847A (en) Power consumer intention identification method and device, computer equipment and storage medium
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN111078835A (en) Resume evaluation method and device, computer equipment and storage medium
US11599666B2 (en) Smart document migration and entity detection
CN114218488A (en) Information recommendation method and device based on multi-modal feature fusion and processor
CN113515589A (en) Data recommendation method, device, equipment and medium
CN114239730A (en) Cross-modal retrieval method based on neighbor sorting relation
CN114491023A (en) Text processing method and device, electronic equipment and storage medium
CN109918486B (en) Corpus construction method and device for intelligent customer service, computer equipment and storage medium
CN112508177A (en) Network structure searching method and device, electronic equipment and storage medium
CN117076946A (en) Short text similarity determination method, device and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant