CN112036906B

CN112036906B - Data processing method, device and equipment

Info

Publication number: CN112036906B
Application number: CN202010761316.8A
Authority: CN
Inventors: 王岗
Original assignee: Suning Financial Technology Nanjing Co Ltd
Current assignee: Suning Financial Technology Nanjing Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-08-19
Anticipated expiration: 2040-07-31
Also published as: CN112036906A

Abstract

The embodiment of the application discloses a data processing method, a data processing device and data processing equipment, wherein the method comprises the steps that a user server sends a received question to a cloud, receives a participle text which is sent by the cloud and obtained after the question is preprocessed, and a first question-answer pair set which is obtained after the participle text is recognized based on a preset general knowledge base; matching the word segmentation text with a preset local knowledge base to obtain a second question-answer pair set, merging the first question-answer pair set and the second question-answer pair set, and calculating the similarity of the word segmentation text and the questions in the merged question-answer pair set; and comparing the calculated similarity with a preset similarity threshold to determine all the reserved question-answer pairs, sending the answers in all the question-answer pairs to an application server, and sending the answers to the corresponding client side by the application server for displaying. The method and the system can meet the requirements of enterprises on customized design and privatized deployment of the intelligent customer service robot and realize migration multiplexing of the intelligent customer service robot in different financial service fields.

Description

Data processing method, device and equipment

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a data processing method, device and equipment.

Background

The intelligent customer service system is a large-scale knowledge processing automatic question-answering system oriented to all fields and industries in order to relieve the contradiction between supply and demand between artificial customer service response and user consultation requirements, relates to multi-field interdisciplinary application technologies such as natural language processing, knowledge maps, big data storage and operation and the like, and can provide an effective solution for communication between modern enterprises and a large number of users to a certain extent. The intelligent customer service system can provide an automatic question-answer response function for traditional enterprises, particularly traditional financial service industries at three ends of APP, WAP and PC, and accordingly the working pressure of artificial customer service is relieved, the labor cost of the enterprises is reduced, the user experience is improved, and the timeliness, stability, accuracy and normalization of enterprise service are improved.

Under a modern business mode, the intelligent customer service adopts two modes of cloud deployment and privatization deployment, and supports a whole channel including calling from a computer webpage, a mobile phone terminal and a telephone center. But the service output and popularization of intelligent customer service in the field have poor effect all the time. Taking the financial service field as an example, the intelligent customer service not only realizes the basic general session function, but also needs to meet the customized requirements of the first-party enterprise, establishes a professional knowledge base aiming at the existing business mode and product market of the enterprise, and establishes an exclusive intelligent customer service system of the enterprise on the basis.

However, the operation of the basic service of the intelligent customer service robot depends on a large-scale cluster, a high-performance data storage and operation unit and mature research and development background support, so that SaaS cloud deployment is undoubtedly an ideal way for ensuring that an enterprise can obtain stable intelligent customer service at low cost; but the information sensitivity in the financial field limits the financial enterprises to accept the privatized deployment in the local environment. Therefore, on one hand, information security isolation in the financial industry is achieved, on the other hand, huge technical cost caused by cross-environment privatization deployment of the intelligent customer service robot is achieved, and difficulty in later operation and maintenance is caused, so that the existing deployment mode cannot meet the requirements of the financial industry.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a data processing method, a data processing device and data processing equipment, which can meet the requirements of enterprises on customized design and privatized deployment of intelligent customer service robots, realize migration multiplexing of the intelligent customer service robots in different financial service fields, and realize low-cost shared knowledge transfer, thereby truly meeting the requirements of cross-service and cross-unit migration multiplexing of the intelligent customer service robots in the financial fields.

The embodiment of the invention provides the following specific technical scheme:

a first aspect discloses a data processing method, the method comprising:

the method comprises the steps that a user server sends a received question proposed by a user to a cloud end, receives a word segmentation text which is sent by the cloud end and obtained after preprocessing the question proposed by the user, and a first question and answer pair set which is obtained after recognizing the word segmentation text based on a preset general knowledge base;

matching the word segmentation text with a preset local knowledge base to obtain a second question-answer pair set, combining the first question-answer pair set and the second question-answer pair set, and calculating the similarity of the words in the word segmentation text and the combined question-answer pair set;

and comparing the calculated similarity with a preset similarity threshold, determining question-answer pairs matched with the comparison result, sending the answers in the determined question-answer pairs to an application server, and sending the answers to the corresponding client side by the application server for displaying.

Preferably, the cloud identifies the segmented text based on a preset general knowledge base to obtain a first question-answer pair set specifically including:

the cloud end judges the text length of the word segmentation text;

when the text length of the word segmentation text is smaller than a first preset value, converting the word segmentation text into word vectors, and inputting the word vectors obtained through conversion into a first recognition model trained in advance to obtain a classification result of a problem proposed by the user;

when the classification result is the same as a preset category, inputting the converted word vector into a preset second recognition model to obtain the first question-answer pair set matched with the question proposed by the user under the classification result; the first recognition model and the second recognition model are obtained by training according to a preset corpus and the universal knowledge base;

when the text length of the word segmentation text is larger than or equal to a first preset value, searching question-answer pairs matched with the word segmentation text in the general knowledge base in a full scale mode, and determining the matched question-answer pairs as the first question-answer pair set.

Preferably, the method for acquiring the first recognition model and the second recognition model includes:

classifying all question-answer pairs in the general knowledge base to obtain question-answer pairs corresponding to each category;

establishing a sample library based on a preset corpus; wherein the sample library is a collection of pairs of daily customer service questions and answers;

labeling all questions in the sample library based on all categories obtained after classification, and determining question-answer pairs matched with all the questions in the sample library based on the classified question-answer pairs;

training a first basic model according to all the marked problems in the sample library to obtain the first recognition model;

and training a second basic model according to all the questions in the sample library and the classified question-answer pairs matched with each question to obtain the second recognition model.

Preferably, the step of preprocessing the question posed by the user by the cloud to obtain the word segmentation text specifically includes:

the cloud carries out word segmentation processing on the problem proposed by the user based on a preset dictionary tree to obtain a word segmentation result;

and judging the word segmentation result, and when the word segmentation result meets a preset judgment condition, correcting the word segmentation result and extracting keywords to obtain the word segmentation text.

Preferably, the cloud further includes, before performing word segmentation processing on the problem posed by the user based on a preset dictionary tree:

and the cloud end cleans the problems proposed by the user based on a preset noise reduction model.

Preferably, the method further comprises:

updating the dictionary tree according to a preset updating period, which specifically comprises the following steps:

performing word segmentation on all the linguistic data before the updating period to obtain a candidate new word set;

filtering all candidate new words in the candidate new word set by utilizing mutual information and left-right entropy;

comparing the candidate new words obtained after filtering with the dictionary tree to determine target new words;

and updating the dictionary tree based on the target new words.

Preferably, before the cloud sends the first question-answer pair set to the user server, the method further includes:

the cloud calculates the similarity between the word segmentation text and the questions in the first question-answer pair set;

comparing the similarity of the word segmentation text and the questions in the first question-answer pair set with the similarity threshold value, and determining question-answer pairs matched with the comparison result;

the sending, by the cloud, the first question-answer pair set to the user server specifically includes:

the cloud sends the question-answer pairs which are obtained by determination and matched with the comparison result to the user server;

the user server side merges the first question-answer pair set and the second question-answer pair set, and the calculating of the similarity of the participle text and the questions in the merged question-answer pair set specifically includes:

and the user server side combines the determined question-answer pairs matched with the comparison result and the second question-answer pair set, and calculates the similarity of the word segmentation text and the questions in the combined question-answer pair set.

Preferably, the comparing the calculated similarity with a preset similarity threshold, determining a question-answer pair matched with the comparison result, and sending the answer in the determined question-answer pair to the application server and sending the answer to the corresponding client by the application server specifically includes:

comparing each similarity obtained by calculation with a preset credible threshold of the similarity and an available threshold of the similarity;

if the similarity higher than the similarity credibility threshold exists, acquiring a corresponding question-answer pair with the highest similarity in the combined question-answer pair set, sending an answer of the corresponding question-answer pair with the highest similarity to an application server side, and sending the answer to a corresponding client side by the application server side;

if all the similarities are lower than the similarity credibility threshold and the similarities higher than the similarity available threshold exist, performing descending order arrangement on the question-answer pairs corresponding to the similarities higher than the similarity available threshold in the combined question-answer pair set, acquiring a preset number of question-answer pairs in the descending order arranged question-answer pairs according to a preset screening rule, sending answers in the screened preset number of question-answer pairs to an application server and sending the answers to corresponding clients by the application server;

and if all the similarities are lower than the available similarity threshold, matching question-answer pairs corresponding to the word segmentation texts in a preset target rule base, sending answers in the matched question-answer pairs to an application server, and sending the answers to the corresponding client by the application server.

In a second aspect, a data processing apparatus is disclosed, the apparatus comprising: a user server and a cloud end;

the user server comprises:

the system comprises a first transmission module, a second transmission module and a third transmission module, wherein the first transmission module is used for sending a received question proposed by a user to the cloud end, receiving a word segmentation text which is sent by the cloud end and obtained by preprocessing the question proposed by the user, and identifying the word segmentation text based on a preset general knowledge base to obtain a first question-answer pair set;

the first matching module is used for matching the word segmentation text with a preset local knowledge base to obtain a second question-answer pair set, merging the first question-answer pair set and the second question-answer pair set, and calculating the similarity of the words segmentation text and the questions in the merged question-answer pair;

the first returning module is used for comparing the calculated similarity with a preset similarity threshold, determining a question-answer pair matched with the comparison result, sending the answer in the determined question-answer pair to an application server side, and sending the answer to the corresponding client side by the application server side for displaying;

the cloud comprises:

the second transmission module is used for receiving the problems proposed by the user and sent by the user server;

a processing module: the system is used for preprocessing the questions proposed by the user to obtain word segmentation texts;

a second matching module: the system is used for identifying the word segmentation text based on a preset general knowledge base to obtain a first question-answer pair set;

and the second transmission module is used for sending the word segmentation text obtained by preprocessing the question proposed by the user and the first question-answer pair set obtained by identifying the word segmentation text based on a preset general knowledge base to the user server.

In a third aspect, a computer device is disclosed, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the data processing method according to the first aspect when executing the computer program.

The embodiment of the invention has the following beneficial effects:

in the application, in order to ensure the information security in the financial field, the content of a knowledge base (including business information, product information, order information and the like) of a user can not be directly maintained in an open basic universal customer service robot, and the sensitive information of enterprises can not be allowed to flow among networks, so that the intelligent customer service robot is unpacked in an online function and an offline function, and on one hand, modules (text preprocessing, intention identification and the like) which do not relate to client privacy and have high calculation and storage costs are deployed at the cloud end to output services in a SaaS mode; on the other hand, the construction and the use of the local knowledge base search application are deployed locally, distributed and installed in a PaaS mode, and the network identification result and the local identification result are fused and screened, so that the knowledge expansion and the service migration are finally realized, and the differentiation requirements of different customers are met.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a network architecture of a cloud terminal provided in embodiment 1 of the present application;

fig. 2 is a flowchart of implementing a PssS extension service provided by a user service end in embodiment 1 of the present application;

fig. 3 is a flowchart illustrating a functional interaction implementation between a cloud and a user server provided in embodiment 1 of the present application;

fig. 4 is a flowchart of a data processing method provided in embodiment 1 of the present application;

fig. 5 is a schematic structural diagram of a computer device provided in embodiment 3 of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As described in the background art, in the prior art, the operation of an intelligent customer service robot in the financial field depends on a large-scale cluster, a high-performance data storage and operation unit and mature research and development background support, so SaaS cloud deployment is undoubtedly an ideal way for ensuring that an enterprise can obtain stable intelligent customer service at low cost; but the information sensitivity in the financial field limits the financial enterprises to accept the privatized deployment in the local area environment. Based on the application framework combining the Saas mode and the Pass mode, the applicant of the application creatively thinks that some basic services (modules without client privacy, large calculation and storage overhead, such as text preprocessing, intention identification and the like) are deployed at the cloud end and output services in the SaaS mode; on the other hand, the construction and the use of the local knowledge base search application are deployed locally and distributed and installed in a PaaS mode, so that the deployment of the intelligent customer service robot can be realized.

In order to achieve the purpose, a lightweight Web service framework based on flash is constructed at the cloud end. The flash can completely meet the service scene of a robot question-answering access mechanism and meet the high concurrency performance requirement of general customer service access; secondly, the flash is simplified and expandable, has a very simple and excellent cementing layer, is compiled by Python, can be naturally embedded into a customer service robot program script developed by Python language, and is convenient for the expansion of network service.

In the scheme, the network architecture of the cloud end refers to fig. 1, the functions of text recognition, text extraction, retrieval matching and the like can be completed, when the services are output, the SaaS mode is adopted externally, and the tenant only needs to subscribe related services to obtain access authority and can complete interaction with the cloud end through the network service interface to obtain feedback of the processing result of the cloud end on the user input content.

On the local side, i.e. the user service side, as shown in fig. 2, some personalized service programs related to privacy are subjected to image encapsulation through a Dockerfile. The user who subscribes the personalized service downloads and runs the mirror image of the personalized service at the user client through a Paas distribution mechanism, and starts a special container. In the Docker container starting process, a user needs to store a local knowledge base about personalized services according to a promissory Schema structure, the local knowledge base is placed under a designated folder, then the local service base establishes a set of lightweight search applications locally based on an internal Whoosh module, establishes a knowledge base index based on the local knowledge base, and undertakes the work of information retrieval and matching of the local knowledge base. Docker enables some local personalized services to operate in a standardized container environment according to configuration dependence set in the mirror image packaging process without depending on a local development environment, and the trouble of coordinating and synchronizing the development environment is eliminated.

Therefore, deployment of the intelligent customer service robot is completely completed, when a customer visits, referring to fig. 3, a user server sends visit content to a cloud end through a network interface, the cloud end completes work such as text preprocessing, word vector conversion, intention recognition and the like, and returns an intermediate result and a matching result completed at the cloud end to a local server, the local server matches the intermediate result with a local knowledge base, the result matched at the local end and the matching result returned at the cloud end are fused, the application server is returned according to a preset return rule, and the application server is returned to a corresponding front end for display.

Based on the network architecture constructed above, the specific implementation manner of the present application is as follows:

example 1

Referring to fig. 4, a data processing method includes the steps of:

110. the user server sends the received questions proposed by the user to the cloud, receives word segmentation texts which are sent by the cloud and obtained after the questions proposed by the user are preprocessed, and first question and answer pairings which are obtained after the word segmentation texts are recognized on the basis of a preset general knowledge base.

The user server sends the problems proposed by the user to the cloud end through the network interface, and the cloud end preprocesses and identifies the received problems.

The process of preprocessing the problems brought forward by the user by the cloud is as follows:

1. the cloud carries out parameter analysis, standardized coding and cleaning treatment on the problems proposed by the user;

the cleaning process can be completed by using a pre-trained noise reduction model, and the noise reduction model reserves CJK unified ideographic symbols, English, Arabic, Greek, Arabic numerals and other language forms, so that the segmentation result can be cleaned, and data noise reduction is completed.

2. Performing word segmentation on the cleaned problem based on a preset dictionary tree to obtain a word segmentation result;

in the above steps, the word segmentation is performed on the question posed by the user, so that the granularity conversion from sentence to word of the recognition object can be realized.

The dictionary tree is updated according to a preset updating period, and the specific updating process is as follows:

a. performing word segmentation on all the linguistic data before the updating period to obtain a candidate new word set;

b. filtering all candidate new words in the candidate new word set by utilizing mutual information and left-right entropy;

c. comparing the candidate new words obtained after filtering with the dictionary tree to determine target new words;

d. and updating the dictionary tree based on the target new words and the word frequency of the target new words.

3. And judging the segmentation result, and when the segmentation result meets a preset judgment condition, correcting the segmentation result and extracting keywords to obtain a segmentation text.

The word segmentation result meeting the preset judgment condition refers to that: the segmentation result does not belong to the artificial scene and the greeting scene, the segmentation result is only required to be identified when the segmentation result does not belong to the two scenes, otherwise, the answer is carried out according to a corresponding rule, and specifically, the implementation process corresponding to the scenes is as follows:

when the word segmentation result belongs to an artificial scene, extracting keywords in the word segmentation result and performing manual processing;

and when the word segmentation result belongs to the greeting scene, matching a target answer corresponding to the word segmentation result in a preset target rule base.

In addition, the step of correcting the segmentation result specifically comprises the following steps:

a. recognizing the word segmentation result by using an n-gram model to obtain a candidate wrong word set;

specifically, the rationality of a sentence is simply abstracted to the size of the conditional probability of the combination of words constituting the sentence, for example, one sentence S is composed of n words, i.e., S ═ w ₁ ,w ₂ ,…,w _n And then, the language model representing the legal probability of the sentence S can be represented as:

P(S)＝P(w ₁ ,w ₂ ,…,w _n )＝P(w ₁ )*P(w ₂ |w ₁ )*…*P(w _n |w ₁ ,w ₂ ,…,w _n-1 )

considering the information sparsity of the above language model, based on the markov assumption, the probability of occurrence of a word is considered to depend only on the first 1 or several words, and in the present scheme, the probability is considered to be related to the first two words, so two models, i.e. Bigram (2-gram) and Trigram (3-gram), are adopted:

wherein the P value is approximately characterized based on a Maximum Likelihood estimation (Maximum Likelihood Estimate), for example, for Bigram (2-gram), the calculation formula of the P value is as follows:

P(w _i |w _i-1 )＝count(w _i ,w _i-1 )/count(w _i-1 )

here, count refers to the total number of occurrences of a word or word combination in the corpus.

And determining a candidate wrong word set according to the calculated P value.

b. Determining a previous word of each candidate wrong word in the candidate wrong word set according to the word segmentation result, and inquiring a preset collocation table to obtain a candidate wrong word set;

c. calculating the editing distance between each candidate wrong word in the candidate wrong word set and the corresponding alternative wrong word in the alternative wrong word set, and acquiring all the corresponding alternative wrong words when the editing distance is greater than a threshold value;

d. replacing candidate wrong words with all the corresponding alternative wrong words when the editing distance is larger than a threshold value, and then respectively inputting the candidate wrong words into the n-gram model, and calculating to obtain the probability of each alternative wrong word;

e. and comparing the probability of each alternative wrong word, and replacing the corresponding alternative wrong word with the corresponding alternative wrong word when the probability is highest so as to realize the error correction of the word segmentation result.

And after the error correction is finished, extracting key words in the word segmentation result, wherein in the scheme, the key words are extracted from the word segmentation result based on a TF-IDF algorithm, so that a word segmentation text is obtained.

The specific process of the first question-answer pair set obtained after the cloud identifies the word text based on the preset general knowledge base is as follows:

1. the cloud end judges the text length of the word segmentation text;

2. when the text length of the word segmentation text is smaller than a first preset value, converting the word segmentation text into word vectors, and inputting the word vectors obtained through conversion into a first recognition model trained in advance to obtain a classification result of a problem proposed by a user;

in this scheme, the CBOW model is utilized to convert participle text into word vectors.

3. When the classification result is the same as the preset classification, inputting the word vector obtained by conversion into a preset second recognition model to obtain a first question-answer pair set matched with the question proposed by the user under the classification result;

in the financial industry, only the personal question-answer scene is related to determine the question-answer pair in the scene through the knowledge base, and other scenes can obtain the answer in the related scene through the related interface.

The first recognition model and the second recognition model are established according to a preset corpus and a universal knowledge base; the construction method comprises the following steps:

a. classifying all question-answer pairs in the general knowledge base to obtain question-answer pairs corresponding to each category;

b. establishing a sample library based on a preset corpus; the sample library is a set of daily customer service question and answer pairs;

c. labeling all questions in the sample library based on all categories obtained after classification, and determining question-answer pairs matched with all the questions in the sample library based on the classified question-answer pairs;

d. training a first basic model according to all the problems in the labeled sample library to obtain a first recognition model;

e. and training a second basic model according to all the questions in the sample library and the classified question-answer pairs matched with each question to obtain a second recognition model.

In this scheme, first recognition model is used for carrying out categorised discernment to the problem, and its output mainly includes three types: question-answer class, task class, chat class.

The question-answer type is a question type input for acquiring relevant information of the financial field or financial business, such as what stock type fund is and how much loan interest is; the task class has a definite financial business purpose and is used for obtaining personal business information or command type input of services under specific limiting conditions, such as payment order inquiry, financial income inquiry and the like; the chat class is other user input that does not belong to the "task" or "question and answer" classes described above, such as greeting, asking for time, asking for weather conditions, etc.

The second recognition model is used for recognizing the questions in question-answer classes, so that question-answer pairs matched with the questions are obtained.

Through the two recognition models, the retrieval range of the general knowledge base can be reduced, the calculation amount of recall sequencing is reduced, the query response speed is improved, and the accuracy of the recognition result is improved.

In addition, when the recognition result of the first recognition model belongs to the task class, the cloud classifies the problems of the user according to the preset class, performs slot filling conversion, transmits the problems after the slot filling conversion to the corresponding database query system to which the external system belongs through the external application expansion interface, receives a query result message of the external system, and outputs the query result message to the front-end interactive page, so that the personalized request of user order query and transaction history review is met.

And when the recognition result of the first recognition model belongs to the chat task class, matching question-answer pairs corresponding to the questions in a preset target rule base and sending answers in the matched question-answer pairs to the front-end interactive page.

4. And when the text length of the word segmentation text is greater than or equal to a first preset value, searching question-answer pairs matched with the converted word vectors in a general knowledge base in a full quantity mode, and determining the question-answer pairs obtained through matching as a first question-answer pair set.

When the question-answer pairs are matched, the text length of the word segmentation text and the similarity of the questions in the general knowledge base are calculated based on a WMD algorithm, and the similarity is higher when the calculated distance is smaller.

In the scheme, when the high in the clouds is based on the general knowledge base of presetting to discern the word text, two kinds of circumstances have been considered, and the first condition is: the method comprises the steps that the length of a word segmentation text is smaller than a preset value, in this case, the text length is short, and effective information is possibly less, so that a classification result of the text is identified and obtained through a first identification model trained in advance, when the classification result of the text is a question-answer type, an answer of the text is identified and obtained according to a second identification model trained in advance, and because the first identification model and the second identification model are trained through a large amount of linguistic data, the identification result is accurate and can be used for intention identification when the length of the word segmentation text is short; the second case is: the length of the word segmentation text is larger than a preset value, in this case, more effective information is represented, and therefore matched question-answer pairs are searched in the general knowledge base in a full-scale mode by using the word segmentation text.

120. Matching the word segmentation text with a preset local knowledge base to obtain a second question-answer pair set, merging the first question-answer pair set and the second question-answer pair set, and calculating the similarity of the word segmentation text and the questions in the merged question-answer pair set.

In the scheme, the similarity of the participle text and the questions in the combined question-answer pair set is calculated based on a WMD algorithm, and the similarity is higher when the calculated distance is smaller.

130. And comparing the calculated similarity with a preset similarity threshold, determining question-answer pairs matched with the comparison result, sending the answers in the determined question-answer pairs to an application server, and sending the answers to the corresponding client side by the application server for displaying.

The step 130 specifically includes:

1. comparing each similarity obtained by calculation with a preset credible threshold of the similarity and an available threshold of the similarity;

2. if the similarity higher than the similarity credibility threshold exists, acquiring the corresponding question-answer pair with the highest similarity in the combined question-answer pair set, sending the answer in the corresponding question-answer pair with the highest similarity to the application server side, and sending the answer to the corresponding client side by the application server side;

3. if all the similarities are lower than the similarity credibility threshold and the similarities higher than the similarity available threshold exist, performing descending arrangement on the question-answer pairs corresponding to the similarities higher than the similarity available threshold in the combined question-answer pair set, acquiring a preset number of question-answer pairs in the descending arrangement of the question-answer pairs according to a preset screening rule, sending answers in the screened preset number of question-answer pairs to the application server and sending the answers to the corresponding client by the application server;

4. and if all the similarities are lower than the available similarity threshold, matching question-answer pairs corresponding to the word segmentation texts in a preset target rule base, sending answers in the matched question-answer pairs to an application server, and sending the answers to the corresponding client by the application server.

In order to make the question-answer pair returned by the cloud more accurate and reduce the calculation amount of the user server, the method further comprises the following steps:

210. before the cloud end sends the word segmentation text and the first question-answer pair set to the user server, the cloud end calculates the similarity between the word segmentation text and the questions in the first question-answer pair set;

220. comparing the similarity and the similarity threshold of the word segmentation text and the questions in the first question-answer pair set, determining question-answer pairs matched with the comparison result, and sending the determined question-answer pairs matched with the comparison result to the user server;

the step 220 specifically includes:

1. comparing the similarity of the word segmentation text and the questions in the first question-answer pair set with a preset credible similarity threshold and an available similarity threshold;

2. if the similarity higher than the similarity credible threshold exists, acquiring a corresponding question-answer pair with the highest similarity in the first question-answer pair set, and sending the corresponding question-answer pair with the highest similarity to the user server;

3. if all the similarities are lower than the similarity credibility threshold and the similarities higher than the similarity available threshold exist, performing descending arrangement on the question-answer pairs corresponding to the similarities higher than the similarity available threshold in the first question-answer pair set, acquiring a preset number of question-answer pairs in the descending arrangement of the question-answer pairs according to a preset screening rule, and sending the screened preset number of question-answer pairs to the user service end;

4. and if all the similarity degrees are lower than the available similarity degree threshold value, matching question-answer pairs corresponding to the word segmentation texts in a preset target rule base and sending the matched question-answer pairs to the user server.

230. The user server side combines the determined question-answer pairs matched with the comparison result and the second question-answer pair set, and calculates the similarity of the word segmentation text and the questions in the combined question-answer pairs;

240. and comparing the calculated similarity with a preset similarity threshold, determining question-answer pairs matched with the comparison result, sending the answers in the determined question-answer pairs to an application server, and sending the answers to the corresponding client side by the application server for displaying.

According to the scheme, the intelligent customer service robot is unpacked and unpacked in an online function and an offline function, on one hand, a module with high operation and storage cost or contents obtained by being suitable for large-scale corpus learning are deployed at the cloud end, and services are output in a SaaS mode; on the other hand, the construction and the use of the self-owned knowledge base search application are deployed in the local and distributed and installed in a PaaS mode.

Therefore, the scheme provides a set of solution with low cost, reliable operation and information safety for the privatized deployment of the intelligent customer service robot, a high-cost technical module is deployed at the cloud, a user front-end application can obtain 7 x 24 stable background services by relying on hardware resources and technical support of a remote cluster, high concurrency and low delay are supported, and a high throughput situation (based on TPS, concurrency and response time evaluation) during enterprise's ' promotion ' activity is withstood, in addition, the cost is shared by the cloud, the user only needs to bear certain service lease cost without paying attention to software and hardware resource cost necessary for the development and operation of an intelligent customer service robot system, and the user can conveniently obtain low-cost basic general intelligent customer service network services through a basic SaaS service interface; on one hand, the personalized service realizes standardized environment deployment by means of Docker technology, a user can install the personalized service in a one-touch mode without concerning a local development environment, and on the other hand, when the container is started, the personalized service can automatically establish local lightweight search application based on the local knowledge base file, so that the requirement of private knowledge base information retrieval is met, and the safety requirement of local solidification of private information is also guaranteed. The mode greatly reduces the realization threshold of the intelligent customer service robot, and even small and medium-sized enterprises in the financial industry with weak technical foundation and deficient hardware resources can smoothly implement deployment.

According to the scheme, the accuracy and the generalization of the customized customer service robot in daily application are ensured by the method of carrying out local mixed arrangement on the retrieval result of the general knowledge base and the retrieval result of the local knowledge base. The method not only meets the consultation requirement of the client on the general knowledge content, but also meets the requirement of the user on enterprise information or service under the specific business background of the enterprise. And the programmed response mode depending on the algorithm model and the knowledge base also avoids the difference of personnel service quality and professional level existing in manual customer service, and ensures standardized and consistent service output.

In addition, the application framework of 'network service + local service' adopted by the scheme ensures low coupling between the intelligent customer service robot and customer service front-end application (APP, WAP, PC) of the user, the original customer service front-end application of the user does not need to be changed, the robot can make a quick response as long as sending a request message to the background intelligent customer service robot service according to an interface protocol, and pushes the result back to the front end, so that convenience in popularization and expansion of the intelligent customer service robot is further improved.

Example 2

In one embodiment, there is provided a data processing apparatus comprising: a user server and a cloud end;

the user server side comprises:

the system comprises a first transmission module, a second transmission module and a third transmission module, wherein the first transmission module is used for sending received questions proposed by a user to a cloud end, receiving word segmentation texts which are sent by the cloud end and obtained after preprocessing the questions proposed by the user, and first question and answer pairs which are obtained after recognizing the word segmentation texts based on a preset general knowledge base;

the first matching module is used for matching the word segmentation text with a preset local knowledge base to obtain a second question-answer pair set, merging the first question-answer pair set and the second question-answer pair set, and calculating the similarity of the word segmentation text and the questions in the merged question-answer pair;

the first returning module is used for comparing the calculated similarity with a preset similarity threshold, determining a question-answer pair matched with the comparison result, sending the answer in the determined question-answer pair to the application server side, and sending the answer to the corresponding client side by the application server side for displaying;

the high in the clouds includes:

the second transmission module is used for receiving the problems which are sent by the user server and are proposed by the user;

a processing module: the system is used for preprocessing the questions put forward by the user to obtain word segmentation texts;

a second matching module: the system comprises a word segmentation unit, a word segmentation unit and a word segmentation unit, wherein the word segmentation unit is used for identifying a word segmentation text based on a preset general knowledge base to obtain a first question-answer pair set;

the second transmission module is used for sending a participle text obtained by preprocessing a question provided by a user and a first question-answer pair set obtained by identifying the participle text based on a preset general knowledge base to the user server;

preferably, the second matching module is specifically configured to:

judging the text length of the word segmentation text;

when the text length of the word segmentation text is smaller than a first preset value, converting the word segmentation text into a word vector, and inputting the word vector obtained through conversion into a first recognition model trained in advance to obtain a classification result about a problem proposed by a user;

when the classification result is the same as the preset category, inputting the word vector obtained by conversion into a preset second recognition model to obtain a first question-answer pair set matched with the question provided by the user under the classification result; the first recognition model and the second recognition model are obtained by training according to a preset corpus and a general knowledge base;

when the text length of the word segmentation text is larger than or equal to a first preset value, searching question-answer pairs matched with the word segmentation text in a general knowledge base in a full scale, and determining the matched question-answer pairs as a first question-answer pair set.

Preferably, the cloud further includes a modeling module, configured to construct the first recognition model and the second recognition model, and specifically includes:

classifying all question-answer pairs in the general knowledge base to obtain question-answer pairs corresponding to each class;

establishing a sample library based on a preset corpus; the sample library is a set of daily customer service question and answer pairs;

training a first basic model according to all the problems in the labeled sample library to obtain a first recognition model;

and training a second basic model according to all the questions in the sample library and the classified question-answer pairs matched with each question to obtain a second recognition model.

Preferably, the processing module is specifically configured to:

performing word segmentation processing on a problem proposed by a user based on a preset dictionary tree to obtain a word segmentation result;

and judging the segmentation result, and when the segmentation result meets a preset judgment condition, correcting the segmentation result and extracting keywords to obtain a segmentation text.

Preferably, the processing module is further configured to:

before the word segmentation processing is carried out on the problems proposed by the user based on the preset dictionary tree, the problems proposed by the user are cleaned based on the preset noise reduction model.

Preferably, the processing module is further configured to:

and updating the dictionary tree based on the target new words.

Preferably, the cloud further comprises a second return module, configured to:

before the first question-answer pair set is sent to the user server, calculating the similarity between the word segmentation text and the questions in the first question-answer pair set;

comparing the similarity of the word segmentation text and the questions in the first question and answer set with a similarity threshold value, and determining question and answer pairs matched with the comparison result;

the second transmission module is further configured to: sending the question-answer pair matched with the comparison result to the user server;

the first matching module is further configured to: and merging the determined question-answer pairs matched with the comparison result and the second question-answer pair set, and calculating the similarity of the word segmentation text and the questions in the merged question-answer pair set.

Preferably, the first returning module is specifically configured to:

if the similarity higher than the similarity credibility threshold exists, acquiring the corresponding question-answer pair with the highest similarity in the combined question-answer pair set, sending the answer in the corresponding question-answer pair with the highest similarity to the application server side, and sending the answer to the corresponding client side by the application server side;

if all the similarities are lower than the similarity credibility threshold and the similarities higher than the similarity available threshold exist, performing descending arrangement on the question-answer pairs corresponding to the similarities of the combined question-answer pairs, of which the sets are higher than the similarity available threshold, obtaining a preset number of question-answer pairs in the descending arrangement of the question-answer pairs according to a preset screening rule, sending the answers of the screened preset number of question-answer pairs to the application server and sending the answers to the corresponding client by the application server;

and if all the similarity degrees are lower than the available similarity degree threshold value, matching question-answer pairs corresponding to the word segmentation texts in a preset target rule base, sending answers in the matched question-answer pairs to an application server side, and sending the answers to the corresponding client side by the application server side.

Example 3

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing all the methods described in embodiment 1 when executing the computer program.

Fig. 5 is an internal structural diagram of a computer device according to an embodiment of the present invention. The computer device may be a server, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data processing method.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of data processing, the method comprising:

the method comprises the steps that a user server sends a received question proposed by a user to a cloud end, receives a word segmentation text which is sent by the cloud end and obtained after the question proposed by the user is preprocessed, and a first question and answer pair set which is obtained after the word segmentation text is recognized on the basis of a preset general knowledge base;

2. The method of claim 1, wherein the cloud recognizing the segmented text based on a preset general knowledge base to obtain a first question-answer pair set specifically comprises:

the cloud end judges the text length of the word segmentation text;

3. The method according to claim 2, wherein the method for obtaining the first recognition model and the second recognition model comprises:

4. The method according to claim 1, wherein the preprocessing of the question posed by the user by the cloud end to obtain a segmented text specifically comprises:

5. The method of claim 4, wherein before performing the word segmentation processing on the question posed by the user based on a preset dictionary tree, the cloud further comprises:

and the cloud end cleans the problems brought forward by the user based on a preset noise reduction model.

6. The method of claim 4, further comprising:

and updating the dictionary tree based on the target new words.

7. The method according to any one of claims 1 to 6, wherein before the cloud sends the first question-answer pair set to the user server, the method further comprises:

8. The method according to any one of claims 1 to 6, wherein the comparing the calculated similarity with a preset similarity threshold, determining a question-answer pair matching with the comparison result, and sending an answer in the determined question-answer pair to an application server and sending the answer to a corresponding client by the application server specifically includes:

if the similarity higher than the similarity credibility threshold exists, acquiring a question-answer pair corresponding to the highest similarity in the combined question-answer pair set, sending an answer in the question-answer pair corresponding to the highest similarity to an application server side, and sending the answer to a corresponding client side by the application server side;

and if all the similarity degrees are lower than the available similarity degree threshold value, matching question-answer pairs corresponding to the participle texts in a preset target rule base, sending answers in the matched question-answer pairs to an application server side, and sending the answers to the corresponding client side by the application server side.

9. A data processing apparatus, characterized in that the apparatus comprises: a user server and a cloud end;

the user server comprises:

the first returning module is used for comparing the calculated similarity with a preset similarity threshold, determining question-answer pairs matched with the comparison result, sending answers in the determined question-answer pairs to an application server side, and sending the answers to the corresponding client side by the application server side for displaying;

the cloud comprises:

the processing module is used for preprocessing the problem proposed by the user to obtain a word segmentation text;

the second matching module is used for identifying the word segmentation text based on a preset general knowledge base to obtain a first question-answer pair set;

and the second transmission module is used for sending the word segmentation text obtained by preprocessing the problem proposed by the user and the first question-answer pair set obtained by identifying the word segmentation text based on the preset general knowledge base to the user server.

10. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that:

the processor, when executing the computer program, implements the data processing method of any of claims 1 to 8.