CN108595695A

CN108595695A - Data processing method, device, computer equipment and storage medium

Info

Publication number: CN108595695A
Application number: CN201810434066.XA
Authority: CN
Inventors: 万周斌; 王艳飞; 於跃
Original assignee: United States (shenzhen) Information Technology Ltd By Share Ltd
Current assignee: United States (shenzhen) Information Technology Ltd By Share Ltd
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2018-09-28
Anticipated expiration: 2038-05-08
Also published as: CN108595695B

Abstract

This application involves a kind of data processing method, device, computer equipment and storage mediums.Method includes：Obtain question information,When the term vector similarity score of candidate's problem to be recommended in question information and knowledge base is more than the first predetermined threshold value,Obtain the corresponding target of question information problem to be recommended and the corresponding answer of target problem to be recommended,Obtain the same problems set adjacent with target recommendation problem,Calculate total similarity of each same problems and target problem to be recommended,The problem of total similarity is problem to be recommended with target according to each same problems similarity,First term vector similarity,At least two similarities determine in second term vector similarity and statement similarity,Second term vector similarity is the term vector similarity of different type language,The same problems that total similarity is met to default similarity threshold are added to knowledge base,The corresponding answer of same problems is that target recommends the corresponding answer of problem,The expansion of knowledge base is more accurate.

Description

Data processing method, device, computer equipment and storage medium

Technical field

This application involves field of computer technology, more particularly to a kind of data processing method, device, computer equipment and Storage medium.

Background technology

With the development of computer technology, data processing technique also develops therewith.Intelligent Answer System is with question-response Mode, user obtains the enquirement knowledge needed for user by natural language and system interaction.It will be appreciated that people mouthful The machine that languageization is putd question to is more clever, and knowledge base is exactly the key that intelligent Answer System can realize intelligent answer, not huge Knowledge base, effective answer can not be just provided for intelligent answer.All it is by library and the promotion nature language of expanding knowledge at this stage Treatment technology is sayed come the performance for machine of deducting a percentage, to preferably promote the processing capacity of data, obtains more accurate enquirement knot Fruit.

When expanding knowledge library, traditional method, which will directly put question to data and get corresponding answer and be added directly to, to be known Know library, causes the content for including in knowledge base more complicated.

Invention content

Based on this, it is necessary to which in view of the above technical problems, providing one kind can ask according to question information and target are to be recommended Topic targetedly expands knowledge library, to improve the data processing method of accuracy of knowledge base, device, computer equipment and Storage medium.

A kind of data processing method, including：

Obtain question information；

When the term vector similarity score of candidate's problem to be recommended in question information and knowledge base is more than the first default threshold When value, the corresponding target of question information problem to be recommended and the corresponding answer of target problem to be recommended are obtained；

Obtain and recommend the adjacent same problems set of problem with target, calculate in same problems set each same problems and The problem of similarity of target problem to be recommended, similarity is problem to be recommended to target according to each same problems, is always similar At least two similarities determine in degree, the first term vector similarity, the second term vector similarity and statement similarity, the second word Vector similarity is the term vector similarity of different type language；

The same problems that total similarity is met to default similarity threshold are added to knowledge base, the corresponding answer of same problems Recommend the corresponding answer of problem for target.

A kind of data processing equipment, including：

Question information acquisition module, for obtaining question information；

Recommendation problem acquisition module, it is similar to candidate's term vector of problem to be recommended in knowledge base for working as question information When degree scoring is more than the first predetermined threshold value, obtains the corresponding target of question information problem to be recommended and target problem to be recommended corresponds to Answer；

Similarity calculation module calculates same problems for obtaining the same problems set adjacent with target recommendation problem The similarity of each same problems and target problem to be recommended in set, similarity are to wait pushing away with target according to each same problems At least two in the total similarity of the problem of recommending problem, the first term vector similarity, the second term vector similarity and statement similarity What similarity determined, the second term vector similarity is the term vector similarity of different type language；

Knowledge base update module, the same problems for total similarity to be met to default similarity threshold are added to knowledge Library, the corresponding answer of same problems are that target recommends the corresponding answer of problem.

A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor Computer program, the processor realize following steps when executing the computer program：

Obtain question information；

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Following steps are realized when row：

Obtain question information；

Above-mentioned data processing method, device, computer equipment and storage medium are believed by obtaining question information when puing question to When breath and the term vector similarity score of candidate's problem to be recommended in knowledge base are more than the first predetermined threshold value, question information is obtained Corresponding target problem to be recommended and the corresponding answer of target problem to be recommended obtain adjacent with target recommendation problem similar ask Topic set, calculates the similarity of each same problems and target problem to be recommended in same problems set, and similarity is according to each The problem of a same problems and target problem to be recommended total similarity, the first term vector similarity, the second term vector similarity and At least two similarities determine in statement similarity, and the second term vector similarity is that the term vector of different type language is similar Degree, the same problems that total similarity is met to default similarity threshold are added to knowledge base, and the corresponding answer of same problems is mesh Mark the corresponding answer of recommendation problem.

Description of the drawings

Fig. 1 is the applied environment figure of data processing method in one embodiment；

Fig. 2 is the flow diagram of data processing method in one embodiment；

Fig. 3 is the flow diagram of total similarity calculation step in one embodiment；

Fig. 4 is the flow diagram of data processing method in another embodiment；

Fig. 5 is the flow diagram that feedback information is obtained in one embodiment；

Fig. 6 is the flow diagram for handling question information step in one embodiment according to feedback message；

Fig. 7 is the flow diagram that effective question information step is obtained in one embodiment；

Fig. 8 is the structure diagram of data processing equipment in one embodiment；

Fig. 9 is the structure diagram of similarity calculation module in one embodiment；

Figure 10 is the structure diagram of data processing equipment in another embodiment；

Figure 11 is the structure diagram of data processing equipment in further embodiment；

Figure 12 is the structure diagram of data processing equipment in another embodiment；

Figure 13 is the structure diagram of data processing equipment in another embodiment；

Figure 14 is the internal structure chart of one embodiment Computer equipment.

Specific implementation mode

It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Data processing method provided by the present application can be applied in application environment as shown in Figure 1.Wherein, terminal 102 It is communicated by network with server 104 by network.Terminal 102 obtains question information, when in question information and knowledge base The term vector similarity score of candidate's problem to be recommended when being more than the first predetermined threshold value, therefrom obtain the corresponding mesh of question information The corresponding answer of problem and target problem to be recommended to be recommended is marked, the same problems set adjacent with target recommendation problem is obtained, The similarity of each same problems and target problem to be recommended in same problems set is calculated, similarity similar is asked according to each The total similarity of the problem of inscribing problem to be recommended to target, the first term vector similarity, the second term vector similarity and sentence are similar At least two similarities determine in degree, and the second term vector similarity is the term vector similarity of different type language, by total phase The same problems for meeting default similarity threshold like degree are added to knowledge base, and the corresponding answer of same problems is that target recommends problem Corresponding answer.Content in knowledge base is sent to server 104 by network.

Above-mentioned acquisition question information calculates the steps such as target problem to be recommended and the similarity of same problems, more new knowledge base Suddenly it can equally be carried out in server 104.

Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, smart mobile phone, tablet computer With portable wearable device, server 104 can use the server set of the either multiple server compositions of independent server Group realizes.

In one embodiment, as shown in Fig. 2, providing a kind of data processing method, it is applied in Fig. 1 in this way It illustrates, includes the following steps for terminal (or server)：

Step S202 obtains question information, when the term vector phase of question information and candidate's problem to be recommended in knowledge base When being more than the first predetermined threshold value like degree scoring, the corresponding target of question information problem to be recommended and target problem pair to be recommended are obtained The answer answered.

Wherein, question information refers to puing question to data, including voice data and text data, if enquirement data are voice number According to, first convert voice data to text data, i.e., to voice data carry out speech recognition obtain the text data to having.Knowledge Library refers to for storing the database for puing question to data and corresponding answer, and the enquirement data in the data all can serve as candidate and wait for Recommendation problem.Term vector similarity score refer to it is different enquirement data between evaluated according to term vector similitude Scoring, term vector refer to that the words in natural language is switched to dense vectorial (the Dense Vector) that computer is appreciated that. Target problem to be recommended refers to extracting to be more than first in advance with the term vector similarity of question information from candidate problem to be recommended If the problem to be recommended of threshold value.First predetermined threshold value is a pre-set critical value.

Specifically, when terminal or server receive question information, calculating puts question to data to wait pushing away with the candidate in knowledge base Recommend problem term vector similarity score, using term vector similarity score be more than candidate's problem to be recommended of the first predetermined threshold value as Target problem to be recommended, and obtain the corresponding answer of target problem to be recommended.Such as when question information is that " I wants to handle credit Card ", target problem to be recommended are " how handling credit card ", and corresponding answer handles the information such as flow for credit card.

Step S204 obtains the same problems set adjacent with target recommendation problem, calculates each in same problems set Total similarity of same problems and target problem to be recommended, total similarity are to be recommended according to each same problems and target The problem of problem similarity, the first term vector similarity, at least two similar in the second term vector similarity and statement similarity Degree determination, the second term vector similarity is the term vector similarity of different type language.

Specifically, same problems refer to problem under different scenes the problem of belonging to same type with target problem to be recommended Classification it is inconsistent, the problem of same problem belongs to different types under different scenes, therefore classify to problem When, it can be classified according to special scenes information.Corresponding scene information can be self-defined.When scene information is credit card information When, it includes but not limited to that credit card handles flow, inquiry amount, integral inquiry and refunds time etc. to put question to the type of data. Total similarity refers to the similarity of each same problems and target problem to be recommended, which is according to problem similarity, the At least two similarities determine in one term vector similarity, the second term vector similarity and statement similarity.Problem similarity It is to be determined according to the similarity of the keyword extracted in two problems, the first term vector similarity refers to that target is to be recommended Problem and same problems all become the dense vector that computer is appreciated that, according to the similarity that dense vector is calculated, Two vector similarities are that target problem to be recommended and same problems are all first converted into other types language and then wait for target Recommendation problem and same problems all become the dense vector that computer is appreciated that, are calculated according to dense vector similar Degree.When current language is Chinese, the other kinds of language such as English, Japanese, Korean can be changed into.Statement similarity is Refer to the similarity matched to target problem to be recommended and same problems information according to customized rule.It is wherein total Similarity is according at least two in problem similarity, the first term vector similarity, the second term vector similarity and statement similarity Kind is similar to go to be weighted to obtain.

In one embodiment, aforementioned four similarity can be all weighted to obtain total similarity, each similarity Corresponding weight can be self-defined.

Step S206, the same problems that total similarity is met to default similarity threshold are added to knowledge base, same problems Corresponding answer is that target recommends the corresponding answer of problem.

Specifically, it will meet with total similarity in target problem to be recommended in same problems set and preset similarity threshold Same problems and target recommend corresponding answer to be added in knowledge base.

In a specific embodiment, it is answered such as in knowledge base comprising problem " how credit card, which will be applied, is handled " is corresponding Case is " hello, and credit card needs to official website to register, and then clicks application ", when problem input by user is that " I will do credit Card ", since " how credit card, which will be applied, is handled " and " I will do credit card " is the same meaning of expression, system accounting calculates letter How to apply handling with card " and " I will do credit card " the two sentences similarity, if the similarity that is calculated meets First similarity threshold will recommend problem.Recommendation form may include：Whether you will seek advice from problems with：Q1. credit card is wanted How application is handled, Q2. credit card amounts, when user selects Q1, it can be understood as user's desired the problem of inquiring and Q1 are Same problem calculates total similarity of above-mentioned two problems, when total similarity meet preset similarity threshold when, addition it is interior It is " I will do credit card " to hold, and the content in updated knowledge base includes：Q11：How credit card, which will be applied, is handled, Q12：I Credit card is done, is answered：Hello, and credit card needs to official website to register, and then clicks application.When user inputs again, " I will handle Credit card ", system will directly return to answer.

In above-mentioned data processing method.By calculate user input the problem of with knowledge base in storage the problem of word Vector similarity scores, and when term vector similarity score is more than the first predetermined threshold value, question information pair is obtained from knowledge base The target problem to be recommended answered and the corresponding answer of target problem to be recommended, and calculate problem input by user and waited for target The problem of recommendation problem is same problems is always similar, will be input by user when total similarity, which meets, presets similarity threshold Problem is added to knowledge base, and the wherein corresponding answer of problem is that target recommends the corresponding answer of problem, from multiple dimension similarities Determine that target problem to be recommended inputs the total similar of problem to user, the knowledge base expanded according to total similarity is more accurate, leads to A large amount of manpowers can be saved by crossing total similarity library that expands knowledge automatically.

In one embodiment, as shown in figure 3, step S204 includes：

Step S302, the problem of calculating each same problems and target problem to be recommended similarity, problem similarity is logical The keyword for extracting each same problems and target problem to be recommended is crossed, it is to be recommended according to each same problems keyword and target The similarity of the keyword of problem determines problem similarity.

Specifically, same problems are to belong to same type of user with target problem to be recommended to input problem, and keyword is Important vocabulary in finger speech sentence, such as " I wants to handle credit card ", keyword includes verb " handling ", noun " credit card " etc..Root The similarity between the keyword and the keyword of target problem to be recommended of same problems is calculated according to custom algorithm, solves sentence In similarity between each keyword, determine the similarity between two problems, obtain problem similarity.

In one embodiment, can be by being segmented respectively to target problem to be recommended and each same problems, Interdependent syntax extracts keyword.

Step S304, extracts the term vector of each same problems and target problem to be recommended respectively, calculates each similar ask The similarity of the term vector of term vector and the target problem to be recommended of topic, obtains the first term vector similarity.

Specifically, according to identical term vector extraction algorithm, each same problems and target problem to be recommended are extracted respectively Term vector, according to customized term vector matching algorithm, calculate the corresponding term vector of target problem to be recommended with it is each similar The similarity of the term vector of problem obtains the first term vector similarity.Term vector similarity can be calculated according to custom algorithm, Term vector similarity is such as calculated according to the cosine law.

Each same problems are converted to second language same problems, will convert into corresponding second language by step S306 Target problem to be recommended extracts each second language same problems and second language target problem term vector to be recommended, root respectively According to the similarity of each second language same problems keyword and the keyword of second language target problem to be recommended, second is determined Term vector similarity.

Specifically, second language refers to the language different language currently employed with enquirement data, respectively will be each same Class problem and target problem to be recommended are converted into corresponding second language, can will be each when the language that user uses is Chinese A same problems and target problem to be recommended are converted into including but not limited to English, Japanese, German, Korean or French.And it extracts The corresponding term vector by each same problems and target problem to be recommended of second language calculates second language, each similar to ask The term vector similarity of topic and target problem to be recommended, obtains the second term vector similarity.Calculate the similarity of different language, energy Enough degrees of association preferably determined between problem, to improve the accuracy of knowledge base expansion.

Step S308 obtains custom rule template, according to each same problems of custom rule formwork calculation and target The statement similarity of problem to be recommended.

Specifically, custom rule template is customized for carrying out matched mould to same problems and problem to be recommended Plate obtains the statement similarity of each same problems and target problem to be recommended by customized formwork calculation.

Step S310, according to problem similarity, the first term vector similarity, the second term vector similarity and statement similarity It is weighted to obtain total similarity.

Specifically, problem similarity, the first term vector similarity, the second term vector similarity and statement similarity are carried out Weighting obtains total similarity, has merged multiple similarities, more can accurately judge that each same problems and target are to be recommended The degree of association between problem, the degree of association is big, and both illustrate can be as the same problem, and the degree of association is small, and both illustrate cannot be as The same problem.The similarity of problem is determined by calculating multiple similarities, the similarity of decision problem from different perspectives can Improve the understandability of question answering system.

In one embodiment, as shown in figure 4, before obtaining the same problems set adjacent with target recommendation problem, also Including：

Step S402, before obtaining the corresponding current goal of current sessions problem to be recommended and current goal problem to be recommended The whole issue after answer is returned to the current sessions last time, forms problem set.

Specifically, session self-defined can divide, such as can be by session close as a session, to close session Dialogue in window is all as the same as a session.Current sessions refer to the session handled.It obtains in current sessions window In target problem to be recommended and current sessions answer is returned to the current sessions last time before current goal problem to be recommended Whole issue between problem, by the problematic set of these problem sets.If current goal problem to be recommended is that " Q1. credits card are wanted How application is handled ", last the problem of returning to answer is " credit card amount ", and during which problem input by user includes：" how Register credits card ", " I will handle credit card " and " the handling flow of credit card " etc., then will " how register credits card ", " I wants Handle credit card ", " the handling flow of credit card " and problem set is formed the problems such as " how Q1. credits card, which will be applied, is handled ".

Step S404 when number meets the first preset number when the problem set the problem of, directly returns to that target is to be recommended to ask Inscribe corresponding answer.

Specifically, the first preset number is a critical numerical value, such as when problem set only includes a problem, illustrates to use The problem of having directly obtained corresponding answer, i.e. problem set after family input problem number meets the first preset number, directly Return to the problem corresponding result in knowledge base.

Step S406 when number meets the second preset number when the problem set the problem of, recommends to ask into obtaining with target The step of inscribing adjacent same problems set.

Specifically, it when the problems in problem set number meets the second preset number, is asked into obtaining with target recommendation The step of inscribing adjacent same problems set, as when including two problems in problem set, then one of problem is that user is defeated The problem of entering, another problem be system recommendation target problem to be recommended, when such as when in problem set comprising five problems when, Then wherein four problems are problem input by user, another problem is the target problem to be recommended of system recommendation.

In one embodiment, it waits pushing away with target as shown in figure 5, calculating each same problems in the same problems set After the total similarity for recommending problem, further include：

Step S502, before obtaining the corresponding current goal of current sessions problem to be recommended and current goal problem to be recommended The whole issue after answer is returned to the current sessions last time, forms problem set.

Specifically, the problems in problem set and step S402 collection are combined into the same problem set, problem set in the step The acquisition modes of conjunction are consistent, do not repeating again.

Step S504 is obtained from problem set when total similarity does not meet default similarity threshold and is effectively putd question to letter Breath, clusters effective question information, obtains corresponding clustering problem set, carries out duplicate removal to clustering problem set, obtains Effective clustering problem set and corresponding significant problem number.

Specifically, it when total similarity does not meet default similarity threshold, is extracted from problem set and effectively puts question to letter Breath, wherein effective question information refers to inputting problem according to the user obtained after being filtered by custom rule template.To effective The problem of it refers to being clustered to significant problem according to clustering algorithm that problem, which carries out cluster, same type is gathered into the same classification The problem of.Wherein clustering algorithm can be self-defined, is such as clustered to significant problem using what Duan Wenwen clusters were calculated.Such as use K- Mean clustering algorithms or DBSCAN algorithms cluster significant problem, after cluster, obtain corresponding clustering problem set, cluster Problem set carries out duplicate removal, obtains effective clustering problem set and corresponding significant problem number.

In one embodiment, can also be according to being recorded in inefficiency logging modle in knowledge base the problem of, to passing through The user obtained after the filtering of custom rule template inputs problem and is filtered, and obtains effective question information.

Step S506 is more than significant problem number by network effective clustering problem set of default clusters number threshold value In each clustering problem crawled, obtain corresponding network problem and network answers.

Specifically, it is to pre-set critical value to preset clusters number threshold value, and significant problem number after duplicate removal is more than pre- If clusters number threshold value indicates that the number that this asks that user asks is more, illustrate cluster after the type the problem of be user concern ask Topic etc., therefore significant problem number can be more than by network each poly- in the effective clustering problem set of default clusters number threshold value Class problem is crawled, and crawls to obtain the corresponding answer set of each problem by network, according to self-defined screening rule from answering One of them is chosen in case set as target answer as the corresponding network answers of each problem.Such as according to effective clustering problem Each clustering problem knows problem and the answer of middle correspondence by crawling Baidu in set, can choose and thumb up number at most and solved Certainly the problem of, answer was as network answers.

Step S508 calculates each clustering problem and the third term vector similarity of corresponding network problem.

Specifically, each clustering problem and the term vector similarity of corresponding network problem are calculated, term vector similarity Computational methods can be consistent with the term vector similarity score computational methods in step S202.

Step S510, by clustering problem, corresponding network problem and network answers and corresponding third term vector similarity, It is sent to first terminal.

Specifically, by clustering problem, corresponding network problem and network answers and corresponding third term vector similarity, hair Send to first terminal, wherein first terminal be auditor be used for crawl to the problem of and the terminal audited such as answer, Auditor receive clustering problem, corresponding network problem and network answers and corresponding third term vector similarity it Afterwards, it is audited.

Step S512 receives the feedback information that first terminal is sent, the processing shape of clustering problem is determined according to feedback information State, and according to processing state processing cluster.

Specifically, auditor operates first terminal, receives the feedback that first terminal is returned according to the operation of auditor Information determines the processing state of clustering problem according to feedback information, wherein processing state includes passing through and abandoning two states, root Clustering problem is handled according to different processing states.By crawling Web content automatically, manually into needs audit or simply The library that can expand knowledge is edited, a large amount of human resources are saved.

In one embodiment, as shown in fig. 6, after receiving the feedback information that first terminal is sent, including：

Step S602, when feedback message be by when, clustering problem and corresponding network answers are added to knowledge base.

Specifically, when auditor thinks that the answer crawled in network is correct, the operation that auditor executes is logical Cross, when receive by feedback message when, clustering problem and corresponding network answers are added to knowledge base.

Step S604, when feedback message is to abandon, by clustering problem typing dictionary, dictionary is asked in vain for recording Topic.

Specifically, when auditor thinks that the answer crawled in network is mistake, by clustering problem typing for recording In the dictionary of inefficiency, when the inefficiency of record can reduce the inefficiency of subsequent user input same type, directly go It removes, follow-up data processing procedure is reduced, to promote data-handling efficiency.

In one embodiment, as shown in fig. 7, above-mentioned data processing method, further includes：

Step S208, when term vector similarity score does not meet the first predetermined threshold value, by question information typing day module.

Specifically, journal module scores relatively low problem for recording and analyzing term vector, when term vector similarity score When being unsatisfactory for the first predetermined threshold value, that is, term vector similarity score be less than the first predetermined threshold value when, by question information typing Journal module.

In one embodiment, can to user terminal send prompt message, prompt the current question information threshold value of user compared with It is low, it can not handle etc..

Step S210, when the question information number for including in journal module is more than the first preset number, by journal module In whole question informations as problem set, into the step of obtaining effective question information from problem set.

Specifically, quantitative analysis is carried out to journal module, i.e., when in journal module for recording the corresponding of question information When number is more than pre-set first preset number, the question information recorded in journal module is formed into problem set, is executed Effective question information is obtained from problem set, and effective question information is clustered, duplicate removal and problem is crawled by network Corresponding answer and etc., until question information is added in knowledge base or typing dictionary.

Step S212 obtains the preconfigured case study time, when current time meet described in be pre-configured with the problem of When analysis time, using whole question informations in journal module as problem set, effectively carried into being obtained from problem set The step of asking information.

Specifically, the preconfigured case study time is obtained, the case study time self-defined can be arranged, can be with every It, each week or every month timing in journal module question information carry out case study, when each week timing to day When will module, such as setting is analyzed in 10 point data of each Saturday night, then at 10 in the evening of each Saturday, by daily record The question information composition problem set recorded in module, execution obtains effective question information from problem set, to effectively carrying It asks that information is clustered, duplicate removal crawls the corresponding answer of problem with by network, knows until question information is added to Know in library or typing dictionary.

In one embodiment, when the term vector similarity score of candidate's problem to be recommended in question information and knowledge base After when more than the first predetermined threshold value, further include：

Step S214 repeats step S202 when the corresponding answer of target problem to be recommended has not been obtained.

Specifically, it when user's answer corresponding for selection target problem to be recommended corresponding with question information, connects again Question information input by user is received, question information is matched with candidate's problem to be recommended in knowledge base.

It should be understood that although each step in Fig. 2-7 flow charts is shown successively according to the instruction of arrow, this A little steps are not that the inevitable sequence indicated according to arrow executes successively.Unless expressly state otherwise herein, these steps It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least one in Fig. 2-7 May include that either these sub-steps of multiple stages or stage are held in synchronization to multiple sub-steps step by step Row is completed, but can be executed at different times, the execution sequence in these sub-steps or stage be also not necessarily successively into Row, but can either the sub-step of other steps or at least part in stage are held in turn or alternately with other steps Row.

In one embodiment, as shown in figure 8, providing a kind of data processing equipment, including：Question information acquisition module 202, question recommending module 204, similarity calculation module 206 and knowledge base update module 208, wherein：

Question information acquisition module 202, for obtaining question information.

Question recommending module 204, it is similar to candidate's term vector of problem to be recommended in knowledge base for working as question information When degree scoring is more than the first predetermined threshold value, obtains the corresponding target of question information problem to be recommended and target problem to be recommended corresponds to Answer.

Similarity calculation module 206 calculates similar asks for obtaining the same problems set adjacent with target recommendation problem Total similarity of each same problems and target problem to be recommended in topic set, total similarity is according to each same problems The problem of problem to be recommended with target similarity, the first term vector similarity, in the second term vector similarity and statement similarity What at least two similarities determined, the second term vector similarity is the term vector similarity of different type language.

Knowledge base update module 208, the same problems for total similarity to be met to default similarity threshold, which are added to, to be known Know library, the corresponding answer of same problems is that target recommends the corresponding answer of problem.

In one embodiment, as shown in fig. 6, similarity calculation module 206 includes：

Problem similarity calculated 2062, for calculating each same problems problem to be recommended to target the problem of, are similar Degree, problem similarity is the keyword by extracting each same problems and target problem to be recommended, according to each same problems The similarity of keyword and the keyword of target problem to be recommended, determines problem similarity.

First computing unit 2064, the term vector for extracting each same problems and target problem to be recommended respectively, meter The similarity for calculating the term vector of each same problems and the term vector of target problem to be recommended, obtains the first term vector similarity.

Second computing unit 2066 will convert into pair for each same problems to be converted to second language same problems The second language target problem to be recommended answered extracts each second language same problems and second language target is to be recommended asks respectively Epigraph vector, it is similar to the keyword of second language target problem to be recommended according to each second language same problems keyword Degree, determines the second term vector similarity.

Statement similarity computing unit 2068, for obtaining custom rule template, according to custom rule formwork calculation The statement similarity of each same problems and target problem to be recommended.

Total similar computing unit 2070, for similar according to problem similarity, the first term vector similarity, the second term vector Degree and statement similarity are weighted to obtain total similarity.

In one embodiment, as shown in Figure 10, above-mentioned data processing equipment 200 further includes：

Problem set acquisition module 210, for obtaining the corresponding current goal of current sessions problem to be recommended and current mesh The whole issue after answer is returned to the current sessions last time before marking problem to be recommended, forms problem set.

Number of questions determination module 212, for when number meets the first preset number when the problem set the problem of, directly returning The corresponding answer of target problem to be recommended is returned, when number meets the second preset number when the problem set the problem of, into similarity Computing module 206.

In one embodiment, as shown in figure 11, above-mentioned data processing equipment 200 further includes：

Problem set acquisition module 210, obtains the corresponding current goal of current sessions problem to be recommended and current goal waits for The whole issue after answer is returned to the current sessions last time before recommendation problem, forms problem set.

Problem deduplication module 214, for when total similarity does not meet the default similarity threshold, being asked from described Effective question information is obtained in topic set, effective question information is clustered, corresponding clustering problem set is obtained, it is right The clustering problem set carries out duplicate removal, obtains effective clustering problem set and corresponding significant problem number.

Answer crawls module 216, for being more than the effective of default clusters number threshold value to significant problem number by network Each clustering problem is crawled in clustering problem set, obtains corresponding network problem and network answers.

Third similarity calculation module 218, for calculate the third word of each clustering problem and corresponding network problem to Measure similarity.

Data transmission blocks 220 are used for clustering problem, corresponding network problem and network answers and corresponding third word Vector similarity is sent to first terminal.

State determining module 222, the feedback information for receiving first terminal transmission determine that cluster is asked according to feedback information The processing state of topic.

In one embodiment, as shown in figure 12, connecing above-mentioned data processing equipment 200 further includes：

Data add module 224, for when feedback information be by when, clustering problem and corresponding network answers are added To knowledge base.

Inefficiency logging modle 226, for when feedback message is to abandon, by clustering problem typing dictionary, dictionary For recording inefficiency.

In one embodiment, as shown in figure 13, above-mentioned data processing equipment 200 further includes：

Daily record recording module 228, for when term vector similarity score does not meet first predetermined threshold value, will put question to Data input journal module.

Quantitative analysis module 230, for when the question information number for including in journal module be more than the first preset number when, Using whole question informations in journal module as problem set, into problem deduplication module 214.

Timing analysis module 232, for obtaining the preconfigured case study time, when current time meet it is described in advance When analysis time the problem of configuration, using whole question informations in journal module as problem set, into problem deduplication module 214。

In one embodiment, above-mentioned data processing equipment 200 further includes：

Question information acquisition module 202 is additionally operable to when the corresponding answer of target problem to be recommended has not been obtained, into enquirement Data obtaining module 202.

Specific about data processing equipment limits the restriction that may refer to above for data processing method, herein not It repeats again.Modules in above-mentioned data processing equipment can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in or independently of in the processor in computer equipment, can also store in a software form in the form of hardware In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in figure 14.The computer equipment include the processor connected by system bus, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment handles data for storing data.The network interface of the computer equipment is used to pass through with external terminal Network connection communicates.To realize a kind of data processing method when the computer program is executed by processor.

It will be understood by those skilled in the art that structure shown in Figure 14, only with the relevant part of application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize following steps when executing computer program：It obtains and puts question to letter Breath, when the term vector similarity score of candidate's problem to be recommended in question information and knowledge base is more than the first predetermined threshold value, The corresponding target of question information problem to be recommended and the corresponding answer of target problem to be recommended are obtained, obtains and recommends problem with target Adjacent same problems set calculates total similarity of each same problems and target problem to be recommended in same problems set, Total similarity be according to similarity the problem of each same problems and target problem to be recommended, the first term vector similarity, At least two similarities determine in second term vector similarity and statement similarity, and the second term vector similarity is different type The term vector similarity of language, the same problems that total similarity is met to default similarity threshold are added to knowledge base, similar to ask It is that target recommends the corresponding answer of problem to inscribe corresponding answer.

In one embodiment, it is total similar to target problem to be recommended that each same problems in same problems set are calculated Degree, including：The data processing problem similarity of each data processing same problems and data processing target problem to be recommended is calculated, Data processing problem similarity is the pass by extracting each data processing same problems and data processing target problem to be recommended Keyword, according to each data processing same problems data processing keyword and the keyword of data processing target problem to be recommended Similarity, determines data processing problem similarity, extracts each data processing same problems respectively and data processing target waits pushing away The term vector for recommending problem, calculate each data processing same problems term vector and data processing target problem to be recommended word to The similarity of amount, obtains data processing the first term vector similarity, and each data processing same problems are converted to second language Data processing is converted into corresponding second language target problem to be recommended by same problems, extracts each data processing respectively Two language same problems and data processing second language target problem term vector to be recommended, according to each data processing second language The similarity of same problems data processing keyword and the keyword of data processing second language target problem to be recommended, determines number According to the second term vector similarity of processing, custom rule template is obtained, it is each according to data processing custom rule formwork calculation The statement similarity of data processing same problems and target problem to be recommended, according to problem similarity, the first term vector similarity, Second term vector similarity and statement similarity are weighted to obtain total similarity.

In one embodiment, before obtaining the same problems set adjacent with target recommendation problem, processor executes meter Following steps are also realized when calculation machine program：It obtains the corresponding current goal of current sessions problem to be recommended and current goal is to be recommended The whole issue after answer is returned to the current sessions last time before problem, forms problem set, the number when the problem set the problem of When mesh meets the first preset number, the corresponding answer of target problem to be recommended is directly returned to, number expires when the problem set the problem of When the second preset number of foot, into the step of obtaining the same problems set adjacent with target recommendation problem.

In one embodiment, total similarity of each same problems and question information in the same problems set is calculated Later, following steps are also realized when processor executes computer program：The corresponding current goal of acquisition current sessions is to be recommended to ask The whole issue after answer is returned to the current sessions last time before topic and current goal problem to be recommended, forms problem set, When total similarity does not meet default similarity threshold, effective question information is obtained from problem set, to effective question information It is clustered, obtains corresponding clustering problem set；To clustering problem set carry out duplicate removal, obtain effective clustering problem set and Corresponding significant problem number is more than significant problem number by network effective clustering problem collection of default clusters number threshold value Each clustering problem is crawled in conjunction, obtains corresponding network problem and network answers, calculates each clustering problem and correspondence Network problem third term vector similarity, by clustering problem, corresponding network problem and network answers and corresponding third Term vector similarity is sent to first terminal, receives the feedback information that first terminal is sent, and determines that cluster is asked according to feedback information The processing state of topic.

In one embodiment, after receiving the feedback information that first terminal is sent, when processor executes computer program Also realize following steps：When feedback information be by when, clustering problem and corresponding network answers are added to knowledge base, when anti- Feedforward information is when abandoning, and by clustering problem typing dictionary, dictionary is for recording inefficiency.

In one embodiment, following steps are also realized when processor executes computer program：When term vector similarity is commented When point not meeting the first predetermined threshold value, by question information typing journal module, when the question information number for including in journal module When more than the first preset number, using whole question informations in journal module as problem set, obtained into from problem set The step of taking effective question information obtains the preconfigured case study time, when current time meet it is described preconfigured When the case study time, using whole question informations in journal module as problem set, have into being obtained from problem set The step of imitating question information.

In one embodiment, when the term vector similarity score of candidate's problem to be recommended in question information and knowledge base After when more than the first predetermined threshold value, processor also realizes following steps when executing computer program：It waits pushing away when target has not been obtained When recommending the corresponding answer of problem, acquisition question information is repeated.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes following steps when being executed by processor：Question information is obtained, when question information and the candidate in knowledge base wait pushing away When recommending the term vector similarity score of problem and being more than the first predetermined threshold value, obtain the corresponding target problem to be recommended of question information and The corresponding answer of target problem to be recommended obtains the same problems set adjacent with target recommendation problem, calculates same problems collection Total similarity of each same problems and target problem to be recommended in conjunction, total similarity are according to each same problems and mesh In the problem of marking problem to be recommended similarity, the first term vector similarity, the second term vector similarity and statement similarity at least What two kinds of similarities determined, the second term vector similarity is the term vector similarity of different type language, and total similarity is met The same problems of default similarity threshold are added to knowledge base, and the corresponding answer of same problems is that target recommends that problem is corresponding to be answered Case.

In one embodiment, the total of each same problems and target problem to be recommended in the same problems set is calculated After similarity, processor also realizes following steps when executing computer program：The corresponding current goal of current sessions is obtained to wait for The whole issue after answer is returned to the current sessions last time before recommendation problem and current goal problem to be recommended, group is problematic Set, when total similarity does not meet default similarity threshold, obtains effective question information, to effectively puing question to from problem set Information is clustered, and corresponding clustering problem set is obtained；Duplicate removal is carried out to clustering problem set, obtains effective clustering problem collection It closes and corresponding significant problem number, the effective cluster for being more than default clusters number threshold value to significant problem number by network is asked Topic set in each clustering problem crawled, obtain corresponding network problem and network answers, calculate each clustering problem and The third term vector similarity of corresponding network problem, by clustering problem, corresponding network problem and network answers and corresponding Third term vector similarity is sent to first terminal, receives the feedback information that first terminal is sent, and is determined according to feedback information poly- The processing state of class problem.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, Any reference to memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the protection domain of the application patent should be determined by the appended claims.

Claims

1. a kind of data processing method, the method includes：

Obtain question information；

When the term vector similarity score of candidate's problem to be recommended in the question information and knowledge base is more than the first default threshold When value, the corresponding target of question information problem to be recommended and the corresponding answer of target problem to be recommended are obtained；

Adjacent with target problem to be recommended same problems set is obtained, each in the same problems set similar ask is calculated Total similarity of topic and target problem to be recommended, total similarity is according to each same problems and the target The problem of problem to be recommended similarity, the first term vector similarity, at least two in the second term vector similarity and statement similarity Kind similarity determines that the second term vector similarity is the term vector similarity of different type language；

The total similarity is met and presets the same problems of similarity threshold and is added to the knowledge base, it is described similar to ask It is that the target recommends the corresponding answer of problem to inscribe corresponding answer.

2. according to the method described in claim 1, it is characterized in that, described calculate each in the same problems set similar ask Total similarity of topic and target problem to be recommended, including：

The described problem similarity of each same problems and target problem to be recommended is calculated, described problem similarity is By extracting the keyword of each same problems and target problem to be recommended, according to each same problems The similarity of keyword and the keyword of target problem to be recommended, determines described problem similarity；

The term vector of each same problems and target problem to be recommended is extracted respectively, calculates each same problems Term vector and target problem to be recommended term vector similarity, obtain the first term vector similarity；

Each same problems are converted into second language same problems, target problem to be recommended are converted into corresponding Second language target problem to be recommended, extracts each second language same problems respectively and the second language target waits pushing away Recommend problem term vector, according to keyword described in each second language same problems with the second language target is to be recommended asks The similarity of the keyword of topic determines the second term vector similarity；

Custom rule template is obtained, is waited for the target according to each same problems of the custom rule formwork calculation The statement similarity of recommendation problem；

According to described problem similarity, the first term vector similarity, the second term vector similarity and the sentence phase It is weighted to obtain total similarity like degree.

3. according to the method described in claim 1, it is characterized in that, the acquisition is adjacent with target problem to be recommended similar Before problem set, further include：

The corresponding current goal problem to be recommended of current sessions and the current question information are obtained before on current sessions one Whole issue after secondary return answer forms problem set；

When number meets the first preset number when the described problem set the problem of, directly returns to target problem to be recommended and correspond to Answer；

When number meets the second preset number when the described problem set the problem of, into the acquisition and target problem phase to be recommended The step of adjacent same problems set.

4. according to the method described in claim 1, it is characterized in that, described calculate each in the same problems set similar ask After the similarity of topic and target problem to be recommended, further include：

To current meeting before obtaining the corresponding current goal problem to be recommended of current sessions and current goal problem to be recommended The last whole issue returned after answer of words, forms problem set；

When total similarity does not meet the default similarity threshold, is obtained from described problem set and effectively put question to letter Breath；

Effective question information is clustered, corresponding clustering problem set is obtained；

Duplicate removal is carried out to the clustering problem set, obtains effective clustering problem set and corresponding significant problem number；

The significant problem number is more than by network each in effective clustering problem set of default clusters number threshold value A clustering problem is crawled, and corresponding network problem and network answers are obtained；

Calculate each clustering problem and the third term vector similarity of corresponding network problem；

The clustering problem, the corresponding network problem and the network answers are similar with the corresponding third term vector Degree, is sent to first terminal；

The feedback information that the first terminal is sent is received, the processing shape of the clustering problem is determined according to the feedback information State.

5. according to the method described in claim 1, it is characterized in that, it is described receive feedback information that the first terminal is sent it Afterwards, including：

When the feedback message be by when, the clustering problem and corresponding network answers are added to the knowledge base；

When the feedback message is to abandon, by the clustering problem typing dictionary, the dictionary is for recording inefficiency.

6. according to the method described in claim 4, it is characterized in that, the method further includes：

When the term vector similarity score does not meet first predetermined threshold value, by the question information typing daily record mould Block；

It, will be in the journal module when the question information number for including in the journal module is more than the first preset number Whole question informations as problem set, into the step of obtaining effective question information from described problem set；Or

The preconfigured case study time is obtained, it, will when meeting the preconfigured case study time at current time Whole question informations in the journal module are as problem set, into obtaining effective question information from described problem set The step of.

7. according to the method described in claim 1, it is characterized in that, described when the question information is waited for the candidate in knowledge base After when the term vector similarity score of recommendation problem is more than the first predetermined threshold value, further include：

When the corresponding answer of target problem to be recommended has not been obtained, the step of repeating the acquisition question information.

8. a kind of data processing equipment, which is characterized in that described device includes：

Question information acquisition module, for obtaining question information；

Question recommending module, for being commented when the term vector similarity of the question information and candidate's problem to be recommended in knowledge base When dividing more than the first predetermined threshold value, the corresponding target of question information problem to be recommended and target problem to be recommended are obtained Corresponding answer；

Similarity calculation module calculates institute for obtaining corresponding same problems set adjacent with target problem to be recommended The similarity of each same problems and target problem to be recommended in same problems set is stated, the similarity is according to each The problem of same problems problem to be recommended with the target total similarity, the first term vector similarity, the second term vector phase It is determined like at least two similarities in degree and statement similarity, the second term vector similarity is the word of different type language Vector similarity；

Knowledge base update module, for total similarity to be met the same problems for presetting similarity threshold and the mesh It marks the corresponding answer of problem to be recommended and is added to the knowledge base.

9. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor Calculation machine program, which is characterized in that the processor realizes any one of claim 1 to 7 institute when executing the computer program The step of stating method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claim 1 to 7 is realized when being executed by processor.