CN107784105A - Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem - Google Patents

Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem Download PDF

Info

Publication number
CN107784105A
CN107784105A CN201711032456.6A CN201711032456A CN107784105A CN 107784105 A CN107784105 A CN 107784105A CN 201711032456 A CN201711032456 A CN 201711032456A CN 107784105 A CN107784105 A CN 107784105A
Authority
CN
China
Prior art keywords
text
cluster
knowledge base
construction
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711032456.6A
Other languages
Chinese (zh)
Inventor
高祎璠
卢川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201711032456.6A priority Critical patent/CN107784105A/en
Priority to PCT/CN2018/076461 priority patent/WO2019080417A1/en
Publication of CN107784105A publication Critical patent/CN107784105A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention discloses a kind of construction of knowledge base method based on magnanimity problem, belong to construction of knowledge base field.A kind of construction of knowledge base method based on magnanimity problem, comprises the following steps:S1, problem pretreatment;S2, text dimensionality reduction;S3, cluster are realized;S4, construction of knowledge base.The present invention realizes the automatic cluster of magnanimity problem, avoid the problem of manual sorting magnanimity problem expends great amount of cost, to the same problems formed by automatic cluster, it is artificial need to only answer once formed typical problem answer to, then by same problems with typical problem answer to being saved in after associating in knowledge base, Similar Problems are subsequently encountered again, it is possible to are answered by intelligent customer service instead of artificial customer service, greatly reduce the workload that artificial customer service is answered a question.

Description

Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem
Technical field
The present invention relates to construction of knowledge base field, is related to a kind of construction of knowledge base method based on magnanimity problem, electronics dress Put and storage medium.
Background technology
With the rapid development of Internet, the raising of people's service awareness, network customer service has spread to all trades and professions, deep Enter the links to everyday commerce service.
At present, common network client has intelligent customer service robot, message platform etc., compared to traditional customer service mode, These customer services can shunt artificial customer service burden, so as to effectively reduce the operation cost in enterprise's customer service field with 24-hour service.
But the new problem that can not much answer also is generated thereupon, these problems are disorderly and unsystematic, and quantity is up to several Thousand up to ten thousand, if to be answered one by one to these problems, it can only check and answer by an artificial rule, take huge, efficiency Also it is not high.And the meaning of wherein many problems is all similar, and answer is also just as, but due to not returning well Class, centralized and unified answer can not be carried out.
Therefore, be badly in need of now to these it is disorderly and unsystematic the problem of carry out Fast Classification, to reduce the work manually answered a question Measure.
The content of the invention
The technical problem to be solved in the present invention is in the prior art can not be well to rambling magnanimity in order to overcome The problem of problem is focused on, it is proposed that a kind of construction of knowledge base method, electronic installation and storage based on magnanimity problem Medium, by the way that rambling magnanimity problem is sorted out, the association of problem and corresponding answer is facilitated to be safeguarded after arranging into knowing Know storehouse, these problems are answered instead of artificial customer service using intelligent customer service robot with realizing.
The present invention is that solve above-mentioned technical problem by following technical proposals:
A kind of construction of knowledge base method based on magnanimity problem, comprises the following steps:
S1, problem pretreatment:By primal problem be split as some crucial phrases into word sequence;
S2, text dimensionality reduction:Each pretreated problem as a text, topic model algorithm will be utilized to carry out text Dimensionality reduction, each text is represented with multiple theme distributions;
S3, cluster are realized:The text after dimensionality reduction is classified using K-means clustering algorithms, corresponding different classification Text is saved in different clusters and output;
S4, construction of knowledge base:All texts are reduced to primal problem, and the primal problem belonged in same cluster is associated Same typical problem-answer is saved in knowledge base to after.
Preferably, also include before step S1:S0, problem collection:Will be unanswered in historical record on various platforms Problem is saved in a file by unified form.
Preferably, step S1 specifically include it is following step by step:
S11, problem is split into the word sequence being made up of several words;
S12, the stop words in word sequence removed;
S13, preservation only include the keyword sequence of keyword.
Preferably, step 3 specifically include it is following step by step:
S30, the upper limit of preset loop number and some judgement threshold in progressive relationship corresponding with the cycle-index Value, the cycle-index initial value are 1;
S31, all texts are preset as to a current cluster;
A text in S32, the random multiple texts obtained in current cluster is as barycenter text, and by the barycenter text Deleted from prevariety;
S33, matching value between remaining text and the barycenter text in current cluster is calculated, take out all matching values and be more than and work as The text of judgment threshold corresponding to preceding cycle-index and barycenter text are saved in a new cluster;
S34, judge whether also have remaining text not to be saved in new cluster in current cluster, if then performing step S32, if not Then perform step S35;
S35, judge whether also with current cluster to belong in the cluster of same circulation and also have text, if then performing step S36, If otherwise perform step S37;
S36, perform step S32 after current cluster is reset into the cluster that this also has text;
S37, judge whether cycle-index reaches the upper limit, if then performing step S39, if otherwise performing step S38;
S38, cycle-index add 1, and current cluster is reset to a new cluster in some new clusters, perform step S32;
S39, all new clusters of output.
Preferably, S33 specifically includes following sub-step step by step:
S331, the first text obtained in current cluster in remaining text compare text as current;
S332, the current keyword compared in text matched with the keyword in the barycenter text, and according to Matched rule calculates matching value;
S333, judge whether matching value is more than judgment threshold corresponding to current cycle time, if then performing step S334, If otherwise perform step S331;
S334, the current comparison text is taken out and kept in, and the current comparison text is deleted from current cluster;
S335, judge it is current compare whether text is last text in current cluster in remaining text, if then holding Row step S337, if otherwise performing step S336;
S336, the current text compared after text reset into new current comparison text, perform step S332;
S337, temporary text and barycenter text be saved in a new cluster.
Preferably, also include before step S4:S4 ', parameter adjustment, specifically include it is following step by step:
S41 ', check whether the new cluster of output meets classification criteria, step S42 ' is performed if not meeting, is tied if meeting Beam.
S42 ', the upper limit of the adjustment cycle-index and some the sentencing in progressive relationship corresponding with the cycle-index Disconnected threshold value, and re-execute step S31.
Preferably, the matched rule is:All keywords are divided into professional term, common noun and the class of verb three Word, the respectively matching to three class words distribute different weight, the weight be followed successively by from big to small professional term, common noun and Verb.
Preferably, step S4 specifically include it is following step by step:
S41, first new cluster is obtained as current cluster;
S42, the text in current cluster is reduced to primal problem;
S43, each primal problem is saved in knowledge base with same typical problem-answer to associating;
S44, judge whether current cluster is last new cluster, if then terminating, if otherwise performing step S45;
S45, current cluster is reset to a new cluster after current cluster, perform step S42.
A kind of electronic installation, including memory and processor, being stored with the memory can be by the computing device The construction of knowledge base system based on magnanimity problem, the construction of knowledge base system based on magnanimity problem includes:
Pretreatment module, for by primal problem be split as some crucial phrases into word sequence;
Dimensionality reduction module, text dimensionality reduction is carried out using topic model algorithm;
Cluster module, the text after dimensionality reduction is classified using K-means clustering algorithms, and corresponding different classification will Text is saved in different clusters and output;
Construction of knowledge base module, original asked for all texts to be reduced into primal problem, and by belong in same cluster Topic associates same typical problem-answer and is saved in after in knowledge base.
A kind of computer-readable recording medium, the computer-readable recording medium internal memory contain knowing based on magnanimity problem Know storehouse constructing system, the construction of knowledge base system based on magnanimity problem can be performed by least one processor, so that institute The step of stating the construction of knowledge base method based on magnanimity problem of at least one computing device as described in foregoing any one.
The positive effect of the present invention is:The present invention realizes the automatic cluster of magnanimity problem, and it is carefully and neatly done to avoid people The problem of magnanimity problem expends great amount of cost is managed, it is artificial only to answer a shape to the same problems formed by automatic cluster Into typical problem-answer to, then by same problems with typical problem-answer to being saved in after associating in knowledge base, after It is continuous to encounter Similar Problems again, it is possible to be answered by intelligent customer service instead of artificial customer service, greatly reduce artificial customer service solution question and answer The workload of topic.
Brief description of the drawings
Fig. 1 shows the hardware structure schematic diagram of the embodiment of electronic installation one of the present invention;
Fig. 2 shows the program mould of the embodiment of construction of knowledge base system one based on magnanimity problem in electronic installation of the present invention Block schematic diagram;
Fig. 3 shows the flow chart of the construction of knowledge base embodiment of the method one of the invention based on magnanimity problem;
Fig. 4 shows the flow that problem pre-processes in the construction of knowledge base embodiment of the method two of the invention based on magnanimity problem Figure;
Fig. 5 shows the flow that realization is clustered in the construction of knowledge base embodiment of the method three of the invention based on magnanimity problem Figure;
Fig. 6 shows the flow that realization is clustered in the construction of knowledge base embodiment of the method four of the invention based on magnanimity problem Figure;
Fig. 7 shows the flow of construction of knowledge base embodiment of the method five Chinese version matching of the present invention based on magnanimity problem Figure;
Fig. 8 shows the flow of construction of knowledge base in the construction of knowledge base embodiment of the method six of the invention based on magnanimity problem Figure.
Embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to described reality Apply among a scope.
First, the present invention proposes a kind of electronic installation.
As shown in fig.1, it is the hardware structure schematic diagram of the embodiment of electronic installation one of the present invention.It is described in the present embodiment Electronic installation 2 be it is a kind of can be automatic to carry out numerical computations and/or information processing according to the instruction for being previously set or storing Equipment.For example, it may be smart mobile phone, tablet personal computer, notebook computer, desktop computer, rack-mount server, blade type take It is engaged in device, tower server or Cabinet-type server (including independent server, or the server set that multiple servers are formed Group) etc..As illustrated, the electronic installation 2 comprises at least, but it is not limited to, connection storage can be in communication with each other by system bus Device 21, processor 22, network interface 23 and the construction of knowledge base system 20 based on magnanimity problem.Wherein:
The memory 21 comprises at least a type of computer-readable recording medium, and the readable storage medium storing program for executing includes Flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memories etc.), random access storage device (RAM), it is static with Machine access memory (SRAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), it is programmable only Read memory (PROM), magnetic storage, disk, CD etc..In certain embodiments, the memory 21 can be the electricity The internal storage unit of sub-device 2, such as the hard disk or internal memory of the electronic installation 2.In further embodiments, the memory 21 can also be the plug-in type hard disk being equipped with the External memory equipment of the electronic installation 2, such as the electronic installation 2, intelligence Storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) Deng.Certainly, the memory 21 can also both include the internal storage unit of the electronic installation 2 or be set including its external storage It is standby.In the present embodiment, the memory 21 is generally used for the operating system and types of applications that storage is installed on the electronic installation 2 Software, such as described program code of construction of knowledge base system 20 based on magnanimity problem etc..In addition, the memory 21 may be used also For temporarily storing the Various types of data that has exported or will export.
The processor 22 can be in certain embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 22 is generally used for controlling the electricity The overall operation of sub-device 2, such as perform the control and processing related to the electronic installation 2 progress data interaction or communication Deng.In the present embodiment, the processor 22 is used to run the program code stored in the memory 21 or processing data, example The construction of knowledge base system 20 based on magnanimity problem as described in running.
The network interface 23 may include radio network interface or wired network interface, and the network interface 23 is generally used for Communication connection is established between the electronic installation 2 and other electronic installations.For example, the network interface 23 is used to incite somebody to action by network The electronic installation 2 is connected with exterior terminal, between the electronic installation 2 and exterior terminal establish data transmission channel and Communication connection etc..The network can be intranet (Intranet), internet (Internet), global system for mobile telecommunications system Unite (Global System of Mobile communication, GSM), WCDMA (Wideband Code Division Multiple Access, WCDMA), 4G networks, 5G networks, bluetooth (Bluetooth), Wi-Fi etc. is wireless or has Gauze network.
It is pointed out that Fig. 1 illustrate only the electronic installation 2 with component 21-23, it should be understood that not It is required that implement all components shown, the more or less component of the implementation that can be substituted.
In the present embodiment, being stored in the construction of knowledge base system 20 based on magnanimity problem in memory 21 can be with One or more program module is divided into, one or more of program modules are stored in memory 21, and can It is performed by one or more processors (the present embodiment is processor 22), to complete the present invention.
For example, Fig. 2 shows the program module of the first embodiment of construction of knowledge base system 20 based on magnanimity problem Schematic diagram, in the embodiment, the construction of knowledge base system 20 based on magnanimity problem can be divided into pretreatment module 201st, dimensionality reduction module 202, cluster module 203 and construction of knowledge base module 204.Wherein, the program module alleged by the present invention refers to The series of computation machine programmed instruction section of specific function can be completed, it is more described based on magnanimity problem more suitable for describing than program Implementation procedure of the construction of knowledge base system 20 in the electronic installation 2.Describe specifically to introduce the program module below 201-204 concrete function.
The pretreatment module 201, for by primal problem be split as some crucial phrases into word sequence;
The dimensionality reduction module 202, text dimensionality reduction is carried out using topic model algorithm;
The cluster module 203, the text after dimensionality reduction is classified using K-means clustering algorithms, and it is corresponding different Classification text is saved in different clusters and output;
The construction of knowledge base module 204, for all texts to be reduced into primal problem, and it will belong in same cluster Primal problem associates same typical problem-answer and is saved in after in knowledge base.
The even more data of thousands of datas can be carried out automatic clustering by the system 20 described in the present embodiment, it is assumed that The problem of text reduction in a cluster is obtained after categorized is as follows:
1st, credit card automatically refund Sorry, your ticket has not enough value what if
2nd, out of funds in debit card, how could that will withhold
3rd, refund can also buckle up automatically for credit card
4th, credit card is refunded buckle up automatically
5th, how the refund automatically no amount of money of credit card deducts fund
6th, debit card insufficient in amount is withholdd
7th, debit card does not have how the amount of money withholds
8th, debit card does not have how the amount of money deducts fund
Finally, can by more than 8 belong to the problem of similar and be associated with a typical problem-answer knowledge is saved in after , can be by intelligence when receiving any one problem in above-mentioned 8 problems of customer inquiries or typical problem again later in storehouse Energy customer service inquires identical answer to answer client from knowledge base.
Knowledge base is built by this system 20, can greatly save the artificial time for collecting problem, and problem closer to The inquiry custom of client, to facilitate intelligent customer service more accurately to understand, client is intended to.
Secondly, the present invention proposes a kind of construction of knowledge base method based on magnanimity problem.
In embodiment one, as shown in figure 3, the construction of knowledge base method based on magnanimity problem includes following step Suddenly:
S0, problem collection:Unanswered problem in historical record on various platforms is saved in one by unified form In file.
Platform described here mainly message platform, online customer service system, customer service mailbox etc., visitor is received by these platforms The various problems at family, these problems are usually free of rule and manually understood one by one, it is necessary to rely on, very time-consuming.
S1, problem pretreatment:By primal problem be split as some crucial phrases into word sequence.
The problem of for convenience of the subsequently processing to problem, it is necessary to being obtained directly from various platforms, pre-processes, main If it is intended to obtain apparent client from problem.
S2, text dimensionality reduction:Each pretreated problem as a text, topic model algorithm will be utilized to carry out text Dimensionality reduction, each text is represented with multiple theme distributions.
S3, cluster are realized:The text after dimensionality reduction is classified using K-means clustering algorithms, corresponding different classification Text is saved in different clusters and output.
S4, construction of knowledge base:All texts are reduced to primal problem, and the primal problem belonged in same cluster is associated Same typical problem-answer is saved in knowledge base to after.
In order to improve the stock of knowledge of intelligent customer service, finally all sorted problems are saved in knowledge base, so as to It when next time encounters same problem, can directly be answered with intelligent customer service, mitigate the pressure of artificial customer service.
In embodiment two, on the basis of embodiment one, as shown in figure 4, problem pretreatment specifically includes following point Step:
S11, problem is split into the word sequence being made up of several words.
S12, the stop words in word sequence removed.
Stop words described here refers to that some do not have the word of physical meaning.
S13, preservation only include the keyword sequence of keyword.
Below with the pretreatment to problem " in debit card out of funds, how could that will withhold " exemplified by, illustrate:
1st, that problem " in debit card out of funds, how could that will withhold " is split as into word sequence is " out of funds in debit card How could that will withhold ".
2nd, the stop words " inner " in foregoing word sequence, " ", " that ", " wanting ", " how " and " could " be removed.
3rd, the keyword sequence " debit card is out of funds to withhold " for only including keyword is preserved.
In embodiment three, on the basis of embodiment two, as shown in figure 5, the process that cluster is realized is specific as follows:
S30, the upper limit of preset loop number and some judgement threshold in progressive relationship corresponding with the cycle-index Value, the cycle-index initial value are 1;
S31, all texts are preset as to a current cluster;
A text in S32, the random multiple texts obtained in current cluster is as barycenter text, and by the barycenter text Deleted from prevariety;
S33, matching value between remaining text and the barycenter text in current cluster is calculated, take out all matching values and be more than and work as The text of judgment threshold corresponding to preceding cycle-index and barycenter text are saved in a new cluster;
S34, judge whether also have remaining text not to be saved in new cluster in current cluster, if then performing step S32, if not Then perform step S35;
S35, judge whether also with current cluster to belong in the cluster of same circulation and also have text, if then performing step S36, If otherwise perform step S37;
S36, perform step S32 after current cluster is reset into the cluster that this also has text;
S37, judge whether cycle-index reaches the upper limit, if then performing step S39, if otherwise performing step S38;
S38, cycle-index add 1, and current cluster is reset to a new cluster in some new clusters, perform step S32;
S39, all new clusters of output.
In the present embodiment, the cluster is corresponding with classification, and a cluster is a classification, what all new clusters of final output referred to Be by thousands of it is even more more the problem of be divided into minimum scale, i.e. pointed answer is identical the problem of in a cluster, The diversity of problem is only by caused by the diversity of language performance or the statement custom of individual.Seemed with following 8 Illustrated exemplified by the problem of different:
1st, credit card automatically refund Sorry, your ticket has not enough value what if
2nd, out of funds in debit card, how could that will withhold
3rd, refund can also buckle up automatically for credit card
4th, credit card is refunded buckle up automatically
5th, how the refund automatically no amount of money of credit card deducts fund
6th, debit card insufficient in amount is withholdd
7th, debit card does not have how the amount of money withholds
8th, debit card does not have how the amount of money deducts fund
8 problems of the above seem different, but client questions are intended that the same, exactly wonder " automatic in credit card Refund debit card, be what if when Sorry, your ticket has not enough value", then after being clustered using above-mentioned steps, this 8 problems will be grouped into In one new cluster, that is, it is classified as one kind.
In example IV, in order to obtain more subtly classifying, on the basis of embodiment three, as shown in fig. 6, in step Also include before rapid S4:Step S4 ', parameter adjustment, specifically include it is following step by step:
S41 ', check whether the new cluster of output meets classification criteria, step S42 ' is performed if not meeting, is tied if meeting Beam.
S42 ', the upper limit of the adjustment cycle-index and some the sentencing in progressive relationship corresponding with the cycle-index Disconnected threshold value, and re-execute step S31.
Parameter adjustment described here is mainly realized by artificial observation, after usual cluster all can be by parameter adjustment several times The accurate classification of magnanimity problem could be realized.
In embodiment five, on the basis of example IV, as shown in fig. 7, disclosing the specific sub-step of S33 step by step Suddenly, it is specific as follows:
S331, the first text obtained in current cluster in remaining text compare text as current;
S332, the current keyword compared in text matched with the keyword in the barycenter text, and according to Matched rule calculates matching value;
S333, judge whether matching value is more than judgment threshold corresponding to current cycle time, if then performing step S334, If otherwise perform step S331;
S334, the current comparison text is taken out and kept in, and the current comparison text is deleted from current cluster;
S335, judge it is current compare whether text is last text in current cluster in remaining text, if then holding Row step S337, if otherwise performing step S336;
S336, the current text compared after text reset into new current comparison text, perform step S332;
S337, temporary text and barycenter text be saved in a new cluster.
In the present embodiment, the matched rule is:All keywords are divided into professional term, common noun and verb three Class word, the respectively matching to three class words distribute different weights, and the weight is followed successively by professional term, common noun from big to small And verb.
Illustrated below exemplified by following 3 texts belonged in a current cluster:
Text 1:Credit card is refunded automatically, and Sorry, your ticket has not enough value
Text 2:It is self-service that credit card opens card
Text 3:Debit card does not have the amount of money to withhold
Barycenter text:Debit card is out of funds to withhold
Assuming that judgment threshold is 0.75.
1st, text is compared using text 1 as current.
2nd, text 1 is matched with barycenter text, matching value is calculated as 0.8 according to matched rule.
3rd, matching value 0.8 is more than judgment threshold 0.75.
4th, text 1 is taken out from current cluster temporary.
5th, text is compared using text 2 as current.
6th, text 2 is matched with barycenter text, matching value is calculated as 0.5 according to matched rule.
7th, matching value 0.5 is less than judgment threshold 0.75.
8th, text is compared using text 3 as current.
9th, text 3 is matched with barycenter text, matching value is calculated as 0.9 according to matched rule.
10th, matching value 0.9 is more than judgment threshold 0.75.
11st, text 3 is taken out from current cluster temporary.
12nd, temporary text 1 and text 3 are saved in a new cluster with barycenter text.
In embodiment six, on the basis of embodiment five, as shown in figure 8, construction of knowledge base process is specific as follows:
S41, first new cluster is obtained as current cluster;
S42, the text in current cluster is reduced to primal problem;
S43, each primal problem is saved in knowledge base with same typical problem-answer to associating;
S44, judge whether current cluster is last new cluster, if then terminating, if otherwise performing step S45;
S45, current cluster is reset to a new cluster after current cluster, perform step S42.
In the present embodiment, due to the text in each cluster only include the keyword sequence of keyword, it is therefore desirable to first general These texts are reduced to the primal problem of client questions, and then primal problem association typical problem-answer is saved in after and is known Know in storehouse.
Example is connected, the building process of knowledge base is illustrated:
1st, using the new cluster preserved in upper example as current cluster.
2nd, text 1 " credit card automatically refund Sorry, your ticket has not enough value " is reduced to primal problem " refund remaining sum is not automatically for credit card Foot is what if ", text 3 " debit card does not have the amount of money to withhold " is reduced to primal problem " debit card does not have how the amount of money withholds ", Barycenter text " debit card is out of funds to withhold " is reduced to primal problem " in debit card out of funds, how could that will withhold ".
3rd, 3 primal problems that reduction obtains are associated with same typical problem-answer to being saved in knowledge base.
Assuming that there is other new clusters, then performed successively according to above-mentioned 3 steps.
In addition, a kind of computer-readable recording medium of the present invention, the computer-readable recording medium internal memory are contained based on sea The construction of knowledge base system 20 of amount problem, the construction of knowledge base system 20 based on magnanimity problem of being somebody's turn to do can be by one or more processors During execution, realize above-mentioned based on the construction of knowledge base method of magnanimity problem or the operation of electronic installation.
Although the embodiment of the present invention is the foregoing described, it will be appreciated by those of skill in the art that this is only For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from On the premise of the principle and essence of the present invention, various changes or modifications can be made to these embodiments, but these changes and Modification each falls within protection scope of the present invention.

Claims (10)

  1. A kind of 1. construction of knowledge base method based on magnanimity problem, it is characterised in that comprise the following steps:
    S1, problem pretreatment:By primal problem be split as some crucial phrases into word sequence;
    S2, text dimensionality reduction:Each pretreated problem as a text, topic model algorithm will be utilized to carry out text drop Dimension, each text is represented with multiple theme distributions;
    S3, cluster are realized:The text after dimensionality reduction is classified using K-means clustering algorithms, corresponds to different classifications by text Originally different clusters and output are saved in;
    S4, construction of knowledge base:All texts are reduced to primal problem, and the primal problem belonged in same cluster association is same Individual typical problem-answer is saved in knowledge base to after.
  2. 2. the construction of knowledge base method according to claim 1 based on magnanimity problem, it is characterised in that before step S1 also Including:S0, problem collection:Unanswered problem in historical record on various platforms is saved in a file by unified form In.
  3. 3. the construction of knowledge base method according to claim 1 based on magnanimity problem, it is characterised in that step S1 is specifically wrapped Include it is following step by step:
    S11, problem is split into the word sequence being made up of several words;
    S12, the stop words in word sequence removed;
    S13, preservation only include the keyword sequence of keyword.
  4. 4. the construction of knowledge base method based on magnanimity problem according to any one of claim 1-3, it is characterised in that step Rapid 3 specifically include it is following step by step:
    S30, the upper limit of preset loop number and some judgment threshold in progressive relationship corresponding with the cycle-index, institute Cycle-index initial value is stated as 1;
    S31, all texts are preset as to a current cluster;
    A text in S32, the random multiple texts obtained in current cluster is as barycenter text, and by barycenter text the past Deleted in cluster;
    S33, matching value between remaining text and the barycenter text in current cluster is calculated, take out all matching values and be more than and currently follow The text of judgment threshold corresponding to ring number and barycenter text are saved in a new cluster;
    S34, judge whether also have remaining text not to be saved in new cluster in current cluster, if step S32 is then performed, if otherwise holding Row step S35;
    S35, judge whether also with current cluster to belong in the cluster of same circulation and also have text, if then performing step S36, if not Then perform step S37;
    S36, perform step S32 after current cluster is reset into the cluster that this also has text;
    S37, judge whether cycle-index reaches the upper limit, if then performing step S39, if otherwise performing step S38;
    S38, cycle-index add 1, and current cluster is reset to a new cluster in some new clusters, perform step S32;
    S39, all new clusters of output.
  5. 5. the construction of knowledge base method according to claim 4 based on magnanimity problem, it is characterised in that S33 has step by step Body includes following sub-step:
    S331, the first text obtained in current cluster in remaining text compare text as current;
    S332, the current keyword compared in text matched with the keyword in the barycenter text, and according to matching Rule calculates matching value;
    S333, judge whether matching value is more than judgment threshold corresponding to current cycle time, if then performing step S334, if not Then perform step S331;
    S334, the current comparison text is taken out and kept in, and the current comparison text is deleted from current cluster;
    S335, judge it is current compare whether text is last text in current cluster in remaining text, if then performing step Rapid S337, if otherwise performing step S336;
    S336, the current text compared after text reset into new current comparison text, perform step S332;
    S337, temporary text and barycenter text be saved in a new cluster.
  6. 6. the construction of knowledge base method according to claim 4 based on magnanimity problem, it is characterised in that before step S4 also Including:S4 ', parameter adjustment, specifically include it is following step by step:
    S41 ', check whether the new cluster of output meets classification criteria, step S42 ' is performed if not meeting, is terminated if meeting.
    S42 ', the upper limit of the adjustment cycle-index and some judgement threshold in progressive relationship corresponding with the cycle-index Value, and re-execute step S31.
  7. 7. the construction of knowledge base method according to claim 4 based on magnanimity problem, it is characterised in that the matched rule For:All keywords are divided into professional term, common noun and the class word of verb three, the matching to three class words respectively distributes different Weight, the weight is followed successively by professional term, common noun and verb from big to small.
  8. 8. the construction of knowledge base method according to claim 4 based on magnanimity problem, it is characterised in that step S4 is specifically wrapped Include it is following step by step:
    S41, first new cluster is obtained as current cluster;
    S42, the text in current cluster is reduced to primal problem;
    S43, each primal problem is saved in knowledge base with same typical problem-answer to associating;
    S44, judge whether current cluster is last new cluster, if then terminating, if otherwise performing step S45;
    S45, current cluster is reset to a new cluster after current cluster, perform step S42.
  9. 9. a kind of electronic installation, including memory and processor, it is characterised in that being stored with the memory can be by the place The construction of knowledge base system based on magnanimity problem that device performs is managed, the construction of knowledge base system based on magnanimity problem includes:
    Pretreatment module, for by primal problem be split as some crucial phrases into word sequence;
    Dimensionality reduction module, text dimensionality reduction is carried out using topic model algorithm;
    Cluster module, the text after dimensionality reduction is classified using K-means clustering algorithms, and correspond to different classifications by text It is saved in different clusters and output;
    Construction of knowledge base module, for all texts to be reduced into primal problem, and the primal problem belonged in same cluster is closed Join same typical problem-answer to be saved in knowledge base to after.
  10. 10. a kind of computer-readable recording medium, it is characterised in that the computer-readable recording medium internal memory is contained based on sea The construction of knowledge base system of amount problem, the construction of knowledge base system based on magnanimity problem can be held by least one processor OK, so that knowledge base based on magnanimity problem of at least one computing device as any one of claim 1-8 The step of construction method.
CN201711032456.6A 2017-10-26 2017-10-26 Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem Pending CN107784105A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711032456.6A CN107784105A (en) 2017-10-26 2017-10-26 Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem
PCT/CN2018/076461 WO2019080417A1 (en) 2017-10-26 2018-02-12 Knowledge base construction method based on huge number of questions, electronic apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711032456.6A CN107784105A (en) 2017-10-26 2017-10-26 Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem

Publications (1)

Publication Number Publication Date
CN107784105A true CN107784105A (en) 2018-03-09

Family

ID=61432159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711032456.6A Pending CN107784105A (en) 2017-10-26 2017-10-26 Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem

Country Status (2)

Country Link
CN (1) CN107784105A (en)
WO (1) WO2019080417A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804567A (en) * 2018-05-22 2018-11-13 平安科技(深圳)有限公司 Improve method, equipment, storage medium and the device of intelligent customer service response rate
CN109145084A (en) * 2018-07-10 2019-01-04 阿里巴巴集团控股有限公司 Data processing method, data processing equipment and server
CN109213867A (en) * 2018-10-26 2019-01-15 湖北大学 A kind of mass knowledge base construction method precisely predicted towards big data
CN109299241A (en) * 2018-09-30 2019-02-01 北京小谛机器人科技有限公司 The knowledge library generating method and device of chat robots
CN110941696A (en) * 2019-11-12 2020-03-31 北京华宇信息技术有限公司 Query method and device and electronic equipment
CN111143565A (en) * 2019-12-29 2020-05-12 杭州睿沃科技有限公司 K-means self-learning system
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111858891A (en) * 2020-07-23 2020-10-30 平安科技(深圳)有限公司 Question-answer library construction method and device, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015878B (en) * 2020-09-02 2023-07-18 中国平安财产保险股份有限公司 Method and device for processing unanswered questions of intelligent customer service and computer equipment
CN113407718A (en) * 2021-06-08 2021-09-17 北京捷通华声科技股份有限公司 Method and device for generating question bank, computer readable storage medium and processor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6804665B2 (en) * 2001-04-18 2004-10-12 International Business Machines Corporation Method and apparatus for discovering knowledge gaps between problems and solutions in text databases
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN105975460A (en) * 2016-05-30 2016-09-28 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN106777232A (en) * 2016-12-26 2017-05-31 上海智臻智能网络科技股份有限公司 Question and answer abstracting method, device and terminal
CN106951498A (en) * 2017-03-15 2017-07-14 国信优易数据有限公司 Text clustering method
CN107180075A (en) * 2017-04-17 2017-09-19 浙江工商大学 The label automatic generation method of text classification integrated level clustering

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823844B (en) * 2014-01-26 2017-02-15 北京邮电大学 Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
CN105678324B (en) * 2015-12-31 2019-03-26 上海智臻智能网络科技股份有限公司 Method for building up, the apparatus and system of question and answer knowledge base based on similarity calculation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6804665B2 (en) * 2001-04-18 2004-10-12 International Business Machines Corporation Method and apparatus for discovering knowledge gaps between problems and solutions in text databases
CN105975460A (en) * 2016-05-30 2016-09-28 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN106777232A (en) * 2016-12-26 2017-05-31 上海智臻智能网络科技股份有限公司 Question and answer abstracting method, device and terminal
CN106951498A (en) * 2017-03-15 2017-07-14 国信优易数据有限公司 Text clustering method
CN107180075A (en) * 2017-04-17 2017-09-19 浙江工商大学 The label automatic generation method of text classification integrated level clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
洪韵佳: "专题知识库中多层次文本聚类及其可视化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
王振佶: "面向销售服务的自动问答系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804567A (en) * 2018-05-22 2018-11-13 平安科技(深圳)有限公司 Improve method, equipment, storage medium and the device of intelligent customer service response rate
CN109145084A (en) * 2018-07-10 2019-01-04 阿里巴巴集团控股有限公司 Data processing method, data processing equipment and server
CN109299241A (en) * 2018-09-30 2019-02-01 北京小谛机器人科技有限公司 The knowledge library generating method and device of chat robots
CN109213867A (en) * 2018-10-26 2019-01-15 湖北大学 A kind of mass knowledge base construction method precisely predicted towards big data
CN110941696A (en) * 2019-11-12 2020-03-31 北京华宇信息技术有限公司 Query method and device and electronic equipment
CN111143565A (en) * 2019-12-29 2020-05-12 杭州睿沃科技有限公司 K-means self-learning system
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111667029B (en) * 2020-07-09 2023-11-10 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111858891A (en) * 2020-07-23 2020-10-30 平安科技(深圳)有限公司 Question-answer library construction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2019080417A1 (en) 2019-05-02

Similar Documents

Publication Publication Date Title
CN107784105A (en) Construction of knowledge base method, electronic installation and storage medium based on magnanimity problem
CN107688667A (en) Intelligent robot client service method, electronic installation and computer-readable recording medium
CN111444236B (en) Mobile terminal user portrait construction method and system based on big data
CN108509477B (en) Method for recognizing semantics, electronic device and computer readable storage medium
CN107818344A (en) The method and system that user behavior is classified and predicted
CN107767259A (en) Loan risk control method, electronic installation and readable storage medium storing program for executing
CN110717023B (en) Method and device for classifying interview answer text, electronic equipment and storage medium
CN107832291A (en) Client service method, electronic installation and the storage medium of man-machine collaboration
CN108509476A (en) Problem associates method for pushing, electronic device and computer readable storage medium
CN110308946A (en) Race batch processing method, equipment, storage medium and device based on artificial intelligence
CN107807935B (en) Using recommended method and device
CN106227792A (en) Method and apparatus for pushed information
CN110609908A (en) Case serial-parallel method and device
CN109242522A (en) The foundation of target user's identification model, target user's recognition methods and device
CN109492863A (en) The automatic generation method and device of financial document
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN113705188B (en) Intelligent evaluation method for customs import and export commodity specification declaration
CN115809796B (en) Project intelligent dispatching method and system based on user portrait
CN107871277A (en) The method and computer-readable recording medium that server, customer relationship are excavated
CN110516849A (en) A kind of load classification evaluation of result method based on typical day load curve
CN107832374A (en) Construction method, electronic installation and the storage medium in standard knowledge storehouse
CN117195046A (en) Abnormal text recognition method and related equipment
CN106294115B (en) A kind of test method and device of application system migration
CN110532448A (en) Document Classification Method, device, equipment and storage medium neural network based
CN110472246A (en) Work order classification method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180309

RJ01 Rejection of invention patent application after publication