CN109918486B

CN109918486B - Corpus construction method and device for intelligent customer service, computer equipment and storage medium

Info

Publication number: CN109918486B
Application number: CN201910065779.8A
Authority: CN
Inventors: 吴壮伟
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-01-24
Filing date: 2019-01-24
Publication date: 2024-03-19
Anticipated expiration: 2039-01-24
Also published as: CN109918486A; WO2020151318A1

Abstract

The embodiment of the invention discloses a corpus construction method, a device, computer equipment and a storage medium for intelligent customer service, wherein the method comprises the following steps: acquiring a subject term of corpus data to be constructed; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The response data obtain real customer service response through the web crawler. The invention improves the efficiency and quality of corpus construction, and also improves the problem hit rate of intelligent customer service, so that the customer service is artificially intelligent.

Description

Corpus construction method and device for intelligent customer service, computer equipment and storage medium

Technical Field

The invention relates to the field of intelligent customer service, in particular to a corpus construction method, a corpus construction device, computer equipment and a storage medium for intelligent customer service.

Background

With the development of artificial intelligence technology, intelligent customer service systems are also gradually rising. The intelligent customer service establishes a convenient natural language-based communication platform between the enterprise and a large number of users, effectively improves the efficiency of customer service work, and can provide direct-source customer information for the enterprise to carry out fine management.

The intelligent customer service can provide a customer service function based on the existing question-answer database, knowledge points are required to be manually arranged when the question-answer database is built, the problem points of users are manually expanded, and finally, data of one question and one answer in the question-answer database is generated.

However, the method for manually arranging knowledge points and manually expanding user problem points is time-consuming and labor-consuming, and often cannot embody the real hot spot problem of the user, so that the hit rate of the problem points in the user question and answer database is low in the use process, and intelligent customer service cannot effectively answer the user questions, and the user experience is affected.

Disclosure of Invention

The invention provides a corpus construction method, device, computer equipment and storage medium for intelligent customer service, which are used for solving the problem that the construction of a question-answer corpus by intelligent customer service is time-consuming and labor-consuming.

In order to solve the technical problems, the invention provides a corpus construction method of intelligent customer service, which comprises the following steps:

acquiring a subject term of corpus data to be constructed;

inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms;

inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list;

and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term.

Optionally, in the step of inputting the subject term into a preset problem generation model, the step of obtaining a problem list output by the problem generation model in response to the subject term specifically includes the following steps:

inputting the subject term into a second web crawler model, and acquiring query candidate data output by the second web crawler model in response to the subject term;

matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule;

and taking the question matching data as a question list of the subject term.

Optionally, in the step of matching the query candidate data according to a preset matching rule, the step of matching specifically adopts a regular matching algorithm to obtain query matching data.

inputting the subject term into a pre-trained Seq2Seq model;

and acquiring a question list output by the Seq2Seq model in response to the subject term.

Optionally, after the step of inputting the problem list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the problem list, the method further includes the following steps:

filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;

and taking the filtered data as response data of the problem list.

inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data;

and taking the non-doubtful corpus data as response data of the question list.

Optionally, the pre-trained deep neural network model is trained by:

obtaining training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;

inputting the training sample into a deep convolutional neural network model to obtain a reference corpus category of the training sample;

comparing whether the reference corpus categories of different samples in the training sample are consistent with the corpus categories or not;

and when the reference corpus class is inconsistent with the corpus class, repeatedly and iteratively updating the weight in the deep neural network model until the reference corpus class is consistent with the corpus class.

In order to solve the above problems, the present invention further provides a device for constructing question-answer corpus data of intelligent customer service, including:

the acquisition module is used for acquiring the subject terms of the corpus data to be constructed;

the generation module is used for inputting the subject terms into a preset problem generation model and acquiring a problem list which is output by the problem generation model in response to the subject terms;

the processing module is used for inputting the problem list into a preset first web crawler model, and obtaining response data output by the first web crawler model in response to the problem list, wherein the first web crawler model takes the problem list as a constraint condition to capture target data;

and the execution module is used for taking the response data as response data of the question list, and the response data is associated with the question list to form question and answer corpus data of the subject term.

Optionally, the generating module further includes:

the first processing submodule is used for inputting the subject term into a second web crawler model and obtaining query candidate data which is output by the second web crawler model in response to the subject term;

the first matching sub-module is used for matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule;

and the first execution sub-module is used for taking the question matching data as a question list of the subject terms.

Optionally, in the first matching sub-module, query matching data is obtained by adopting a regular matching algorithm.

Optionally, the generating module further includes:

a second processing sub-module for inputting the subject term into a pre-trained Seq2Seq model;

and the first acquisition submodule is used for acquiring a question list output by the Seq2Seq model in response to the subject term.

Optionally, the device for constructing the question and answer corpus data of the intelligent customer service further comprises:

the first filtering sub-module is used for filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;

and the second execution sub-module is used for taking the filtering data as response data of the problem list.

the first classification sub-module is used for inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least divides the response data into query corpus data and non-query corpus data;

and the third execution sub-module is used for taking the non-doubtful corpus data as response data of the question list.

the second acquisition sub-module is used for acquiring training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;

the third processing sub-module is used for inputting the training sample into a deep convolutional neural network model to obtain a reference corpus class of the training sample;

the first comparison sub-module is used for comparing whether the reference corpus categories of different samples in the training samples are consistent with the corpus categories or not;

and the first updating sub-module is used for repeatedly and iteratively updating the weight in the deep neural network model when the reference corpus class is inconsistent with the corpus class until the reference corpus class is consistent with the corpus class.

In order to solve the above technical problems, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor is caused to execute the steps of the corpus construction method of intelligent customer service.

In order to solve the above technical problems, an embodiment of the present invention further provides a computer readable storage medium, where computer readable instructions are stored on the computer readable storage medium, and when the computer readable instructions are executed by a processor, the processor is caused to execute the steps of the corpus construction method of intelligent customer service.

The embodiment of the invention has the beneficial effects that: acquiring a subject word of the corpus data to be constructed; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The problem generation of the subject term automatically acquires a real problem of a user through a web crawler, or generates a problem list through artificial intelligence learning of the real intention of the user, and corresponding response data also acquires a real customer service response through the web crawler. The invention improves the efficiency and quality of the construction of the question-answer data and also improves the hit rate of the intelligent customer service.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art

FIG. 1 is a basic flow diagram of a corpus construction method of intelligent customer service according to an embodiment of the invention;

FIG. 2 is a flow chart of generating a problem list based on a second web crawler model according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of generating a problem list based on a Seq2Seq model according to an embodiment of the present invention;

FIG. 4 is a flow chart of obtaining response data based on filtering rules according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of obtaining response data based on a deep neural network model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a training process of a deep neural network model according to an embodiment of the present invention;

FIG. 7 is a basic structure block diagram of a question-answer corpus data construction device of an intelligent customer service according to an embodiment of the invention;

FIG. 8 is a block diagram showing the basic structure of a computer device according to the present invention.

Detailed Description

In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.

In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Examples

As used herein, a "terminal" includes both a device of a wireless signal receiver having no transmitting capability and a device of receiving and transmitting hardware having receiving and transmitting hardware capable of performing bi-directional communications over a bi-directional communication link, as will be appreciated by those skilled in the art. Such a device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; a PCS (Personal Communications Service, personal communication system) that may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant ) that can include a radio frequency receiver, pager, internet/intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System ) receiver; a conventional laptop and/or palmtop computer or other appliance that has and/or includes a radio frequency receiver. As used herein, "terminal," "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or adapted and/or configured to operate locally and/or in a distributed fashion, to operate at any other location(s) on earth and/or in space. The "terminal" and "terminal device" used herein may also be a communication terminal, a network access terminal, and a music/video playing terminal, for example, may be a PDA, a MID (Mobile Internet Device ), and/or a mobile phone with a music/video playing function, and may also be a smart tv, a set top box, and other devices.

The terminal in this embodiment is the above-described terminal.

Specifically, referring to fig. 1, fig. 1 is a basic flow chart of a corpus construction method of intelligent customer service according to the present embodiment.

As shown in fig. 1, a corpus construction method of intelligent customer service includes the following steps:

s101, acquiring a subject term of corpus data to be constructed;

the subject words define the subjects of the corpus data to be constructed, and the subject words input by the user are acquired through the interactive pages on the terminal. In order to construct question-answer corpus data more focused, the scope of the suggested input subject word descriptions is suitably small. For example, "mobile phones" are wide in coverage, constructed question and answer corpora may be divergent, and for the purpose of focusing the question and answer corpus data, the subject terms may be defined as "xx model mobile phones".

S102, inputting the subject term into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject term;

inputting the acquired subject terms into a preset problem generation model, and generating a problem list based on the subject terms. The problem generating model may be a fixed series of problems set in advance, with the subject terms as parameter changes. For example, a series of problems set in advance are:

subject matter word	When the release time is
		Subject matter word	What the sales price is
Subject matter word	Which sales channels have
		Subject matter word	Support fingerprint identification
Subject matter word	Support for multiple user login

Different types of subject words can be preset to correspond to different problem lists.

In the embodiment of the invention, a web crawler model is adopted to obtain the real questioning of the online user; or a problem list is generated by a pre-trained Seq2Seq model. In particular, please refer to fig. 2 and 3 below for description.

S103, inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list;

a list of questions about the subject term is obtained and input into a pre-set web crawler model, referred to herein as a first web crawler model. The web crawler is a program for automatically extracting a web page, specifically, by using the python program to simulate a browser, a request is sent to a target site, and the target site server responds to the request and returns html, pictures, videos and other resources. The first web crawler model takes the problem list as a retrieval condition, and retrieves data related to the problem list in the target site, namely response data output by the first web crawler model when affecting the problem list.

And S104, taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term.

The response data is used as response data of the question list, and the question list and the response data are associated in a one-to-one response mode. The method is characterized in that a piece of data is represented in a database and comprises two parts, wherein one part is a problem, and the other part is a response to the problem.

When the intelligent customer service receives the problem of the user, the problem of which the question and answer database is consistent with the keyword of the problem of the user can be searched by a keyword searching mode, and a response with a mapping relation with the problem is returned. In some embodiments, the response corresponding to the question is obtained by calculating the similarity between the user question and the question in the question and answer database. The similarity calculation may calculate the similarity using an algorithm for calculating the edit distance, for example: the question stored in the question-answer database is "how much and how little the mobile phone sells", the received user question is "how much and how much the mobile phone sells", the edit distance of the question-answer database and the question-answer database is 1, namely "how much and how much the mobile phone sells" only needs to be inserted into the question-answer database. And searching the problem with the maximum similarity with the problem proposed by the user in the database, and returning a response corresponding to the problem.

As shown in fig. 2, step S102 specifically further includes the following steps:

s111, inputting the subject term into a second web crawler model, and acquiring query candidate data output by the second web crawler model in response to the subject term;

in the embodiment of the invention, a web crawler model is adopted to acquire the problem related to the input subject term, and in order to distinguish the problem from the web crawler model, the problem is called a second web crawler model, the model takes the subject term as a search condition to acquire the content related to the subject term on a target site, and the content is called query candidate data.

S112, matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule;

the obtained query candidate data comprises non-query corpus data and query corpus data, a matching rule is preset for obtaining the query corpus, and the query candidate data is processed through the preset matching rule to obtain the query matching data. Matching rules are contain "? "what", "how much", "where", "what", etc. represent the word of the question. In the embodiment of the invention, a regular matching algorithm is adopted, a regular expression is a logic formula for operating the character string, and a rule character string is formed by a plurality of specific characters defined in advance and the combination of the specific characters and is used for expressing a filtering logic for the character string. A regular expression is a pattern of text describing one or more strings to be matched when searching text. For example, a regular expression of "/what is the subject term" may be used to find any string containing "subject term" and "what".

Because the real questions of the user recorded by the target site are acquired through the second web crawler model, the question list acquired through the method is closer to reality, and the hit rate of the constructed question-answer corpus data hitting the actual questions of the user is higher.

S113, taking the question matching data as a question list of the subject term.

After the query candidate data is subjected to a preset matching rule, the obtained query matching data is a question list related to the subject word.

As shown in fig. 3, in some embodiments, step S102 specifically further includes the following steps:

s121, inputting the subject term into a pre-trained Seq2Seq model;

in some embodiments, the problem list is obtained by inputting the subject term into a pre-trained Seq2Seq model. The Seq2Seq model is a network of an Encoder-Decoder structure whose input is a sequence and whose output is a sequence, where the Encoder converts a variable length signal sequence into a fixed length vector representation and the Decoder converts the fixed length vector into a variable length target signal sequence. Specifically, firstly, performing one-hot vocabulary coding on a received subject term, then, converting the received subject term into a corresponding term vector through a word2vec term vector model, and inputting the term vector into an Encoder layer, wherein the Encoder layer is a multi-layer neuron layer taking a bidirectional LSTM layer or an RNN (convolutional neural network) as a basic neuron unit, and generating a final_state state layer and a final_output state vector;

then, the final output state vector of the encoder output in the steps is input to a global information layer to generate a global state layer context;

and finally, inputting the final_state and final_output state vectors obtained in the steps and the context vector generated by the global information layer into a Decoder layer, and outputting the final_state vector and the output vector of the Decoder layer, wherein the Decoder layer is also a multi-layer neuron layer taking a bidirectional LSTM layer or RNN as a basic neuron unit. The output result is a basic question list with the input subject word as the subject.

S122, acquiring a question list output by the Seq2Seq model in response to the subject term.

The response about the subject word output by the Seq2Seq model is taken as a question list. The Seq2Seq model needs to be trained to have the function of outputting a problem list. The specific training process is to prepare a training corpus, namely, an input sequence and a corresponding output sequence are prepared, the input sequence is input into a Seq2Seq model, the probability of the output sequence is calculated, the parameters of the Seq2Seq model are adjusted, and the probability of the whole sample, namely, all the input sequences output the corresponding output sequence through the Seq2Seq, is highest.

As shown in fig. 4, after step S103, the following steps are further included:

s131, filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule;

in some embodiments, further processing is performed on the acquired response data, and since the acquired response data is needed here, it is first necessary to filter out the corpus representing questions. All corpora containing "what", "how much" etc. representing the query semantics can be filtered out by the regular matching algorithm as well. In addition, the filtering rule can also comprise filtering of the sensitive words, and corpus containing the sensitive words is filtered according to the set sensitive word list.

And S132, taking the filtered data as response data of the problem list.

And filtering the response data to obtain filtered data, namely response data of the problem list.

As shown in fig. 5, after step S103, in some embodiments, the following steps are further included:

s141, inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data;

in some embodiments, the response data is classified by a pre-trained deep neural network model, wherein the pre-trained deep neural network model can identify at least a query corpus and a non-query corpus. The specific training process of the deep neural network is shown in fig. 6.

S142, the non-doubtful corpus data is used as response data of the question list.

The non-corpus data identified by the deep neural network is response data corresponding to the problem list.

As shown in fig. 6, the deep neural network model used in step S141 is trained as follows:

s151, obtaining a training sample marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus;

in the embodiment of the invention, the training target of the deep neural network model is that the query corpus and the non-query corpus can be identified. The training samples contain two types of corpus, each marked with a corpus class.

S152, inputting the training sample into a deep convolutional neural network model to obtain a reference corpus class of the training sample;

the samples are input into a deep convolutional neural network model, and the reference corpus category of the training samples is output, namely whether the output sample corpus is the doubtful corpus or the non-doubtful corpus.

S153, comparing whether the reference corpus categories of different samples in the training samples are consistent with the corpus categories;

and judging whether the reference corpus class output by the deep convolutional neural network is consistent with the corpus class marked by the sample through the loss function. In the embodiment of the invention, the loss function adopts a Softmax cross entropy loss function. In the training process, the weight in the deep convolutional neural network model is adjusted, so that the Softmax cross entropy loss function is converged as much as possible, that is, the weight is continuously adjusted, the value of the obtained loss function is not reduced, and when the value is increased, the deep convolutional neural network training is considered to be finished.

Assuming that there are N training samples, the input feature of the i-th sample of the last hierarchy of the network is Xi, and the corresponding label is Yi, which is the final classification result (i.e. whether the sample i is an question or a non-question), h= (h 1, h2, & gt, hc) is the final output of the network, i.e. the prediction result of the sample i. Where C is the number of last all classifications.

The Softmax cross entropy loss function is

And S154, when the reference corpus category is inconsistent with the corpus category, repeatedly and iteratively updating the weight in the deep neural network model until the reference corpus category is consistent with the corpus category.

When the loss function does not converge, the weights of all nodes in the deep convolutional neural network model are updated as described above, and in the embodiment of the invention, a gradient descent method is adopted, wherein the gradient descent method is an optimization algorithm used for recursively approximating the minimum deviation model in machine learning and artificial intelligence.

In order to solve the technical problems, the embodiment of the invention also provides a question-answer corpus data construction device of the intelligent customer service. Referring specifically to fig. 7, fig. 7 is a basic structural block diagram of a device for constructing question-answer corpus data of intelligent customer service according to this embodiment.

As shown in fig. 7, a device for constructing question-answer corpus data of intelligent customer service includes: the system comprises an acquisition module 210, a generation module 220, a processing module 230 and an execution module 240. The acquiring module 210 is configured to acquire a subject term of the question-answer corpus data to be constructed; a generating module 220, configured to input the subject term into a preset problem generating model, and obtain a problem list output by the problem generating model in response to the subject term; the processing module 230 is configured to input the problem list into a preset first web crawler model, and obtain response data output by the first web crawler model in response to the problem list, where the first web crawler model captures target data with the problem list as a constraint condition; and the execution module 240 is configured to use the response data as response data of the question list, where the response data is associated with the question list to form question and answer corpus data of the subject term.

According to the embodiment of the invention, the subject words of the question-answer corpus data to be constructed are obtained; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The problem generation of the subject term automatically acquires a real problem of a user through a web crawler, or generates a problem list through artificial intelligence learning of the real intention of the user, and corresponding response data also acquires a real customer service response through the web crawler. The invention improves the efficiency and quality of the construction of the question-answer data and also improves the hit rate of the intelligent customer service.

In some embodiments, the generating module 220 further comprises: the device comprises a first processing sub-module, a first matching sub-module and a first executing sub-module. The first processing sub-module is used for inputting the subject term into a second web crawler model and obtaining query candidate data output by the second web crawler model in response to the subject term; the first matching sub-module is used for matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule; and the first execution sub-module is used for taking the question matching data as a question list of the subject terms.

In some embodiments, in the first matching sub-module, query matching data is obtained by using a regular matching algorithm.

In some embodiments, the generating module 220 further comprises: the system comprises a second processing sub-module and a first acquisition sub-module. The second processing submodule is used for inputting the subject term into a pre-trained Seq2Seq model; and the first acquisition submodule is used for acquiring a question list output by the Seq2Seq model in response to the subject term.

In some embodiments, the device for constructing question and answer corpus data of intelligent customer service further includes: the system comprises a first filtering sub-module and a second executing sub-module. The first filtering sub-module is used for filtering the response data according to a preset filtering rule to obtain filtering data, wherein the filtering rule at least comprises a query corpus data filtering rule; and the second execution sub-module is used for taking the filtering data as response data of the problem list.

In some embodiments, the device for constructing question and answer corpus data of intelligent customer service further includes: a first classification sub-module and a third execution sub-module. The first classification sub-module is used for inputting the response data into a pre-trained deep neural network model, and obtaining classification information of the response data output by the deep neural network model, wherein the classification information at least distinguishes the response data into query corpus data and non-query corpus data; and the third execution sub-module is used for taking the non-doubtful corpus data as response data of the question list.

In some embodiments, the device for constructing question and answer corpus data of intelligent customer service further includes: the system comprises a second acquisition sub-module, a third processing sub-module, a first comparison sub-module and a first updating sub-module. The second obtaining submodule is used for obtaining training samples marked with corpus categories, wherein the corpus categories at least comprise doubtful corpus and non-doubtful corpus; the third processing sub-module is used for inputting the training sample into a deep convolutional neural network model to obtain a reference corpus class of the training sample; the first comparison sub-module is used for comparing whether the reference corpus categories of different samples in the training samples are consistent with the corpus categories or not; and the first updating sub-module is used for repeatedly and iteratively updating the weight in the deep neural network model when the reference corpus class is inconsistent with the corpus class until the reference corpus class is consistent with the corpus class.

In order to solve the technical problems, the embodiment of the invention also provides computer equipment. Referring specifically to fig. 8, fig. 8 is a basic structural block diagram of a computer device according to the present embodiment.

As shown in fig. 8, the internal structure of the computer device is schematically shown. As shown in fig. 8, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and when the computer readable instructions are executed by the processor, the processor can realize a method for constructing question-answer corpus data of intelligent customer service. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may store computer readable instructions that, when executed by the processor, cause the processor to perform a method for constructing question-answer corpus data of intelligent customer service. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor in this embodiment is configured to execute the specific contents of the acquisition module 210, the generation module 220, the processing module 230, and the execution module 240 in fig. 7, and the memory stores program codes and various types of data required for executing the above modules. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all sub-modules in the corpus construction method of intelligent customer service, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The computer equipment acquires the subject terms of the corpus data to be constructed; inputting the subject terms into a preset problem generation model, and acquiring a problem list output by the problem generation model in response to the subject terms; inputting the problem list into a preset first web crawler model, and acquiring response data output by the first web crawler model in response to the problem list; and taking the response data as response data of the question list, wherein the response data is associated with the question list to form question and answer corpus data of the subject term. The problem generation of the subject term automatically acquires a real problem of a user through a web crawler, or generates a problem list through artificial intelligence learning of the real intention of the user, and corresponding response data also acquires a real customer service response through the web crawler. The invention improves the efficiency and quality of the construction of the question-answer data and also improves the hit rate of the intelligent customer service.

The present invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the corpus construction method for intelligent customer service according to any of the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. The corpus construction method of the intelligent customer service is characterized by comprising the following steps of:

acquiring a subject term of corpus data to be constructed;

inputting the subject word into a preset problem generation model, and acquiring a preset problem list output by the problem generation model in response to the subject word, wherein the problem generation model is a model which is preset with a fixed series of problems and takes the subject word as a parameter change;

inputting the subject word into a second web crawler model, obtaining query candidate data output by the second web crawler model in response to the subject word, and matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule, and the query matching data is used as a problem list of the subject word;

the response data is used as response data of the question list, and the response data and the question list are associated to form question and answer corpus data of the subject term;

wherein after the step of inputting the problem list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the problem list, the method further comprises the following steps:

and taking the non-doubtful corpus data as response data of the question list.

2. The method for constructing the corpus of intelligent customer service according to claim 1, wherein in the step of matching the query candidate data according to a preset matching rule, the step of matching specifically adopts a regular matching algorithm to obtain query matching data.

3. The method for constructing a corpus of intelligent customer service according to claim 1, wherein in the step of inputting the subject word into a preset question generation model and obtaining a preset question list output by the question generation model in response to the subject word, the method specifically comprises the following steps:

inputting the subject term into a pre-trained Seq2Seq model;

4. The corpus construction method of intelligent customer service according to claim 1, further comprising the following steps after the step of inputting the question list into a preset first web crawler model and obtaining response data output by the first web crawler model in response to the question list:

and taking the filtered data as response data of the problem list.

5. The intelligent customer service corpus construction method according to claim 1, wherein the pre-trained deep neural network model is trained by the following steps:

6. A device for constructing question-answer corpus data of an intelligent customer service, the device being configured to implement the corpus construction method of an intelligent customer service according to any one of claims 1 to 5, the device comprising:

the generation module is used for inputting the subject terms into a preset problem generation model and acquiring a preset problem list which is output by the problem generation model in response to the subject terms; inputting the subject term into a second web crawler model, obtaining query candidate data output by the second web crawler model in response to the subject term, and matching the query candidate data according to a preset matching rule to obtain query matching data, wherein the matching rule at least comprises a query corpus matching rule, and the query matching data is used as a problem list of the subject term;

7. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the steps of the corpus construction method of intelligent customer service according to any of claims 1 to 5.

8. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, implement the steps of the corpus construction method of intelligent customer service according to any of claims 1 to 5.