CN109460502A - Answer clustering method and its device, electronic equipment, computer-readable medium - Google Patents

Answer clustering method and its device, electronic equipment, computer-readable medium Download PDF

Info

Publication number
CN109460502A
CN109460502A CN201811071710.8A CN201811071710A CN109460502A CN 109460502 A CN109460502 A CN 109460502A CN 201811071710 A CN201811071710 A CN 201811071710A CN 109460502 A CN109460502 A CN 109460502A
Authority
CN
China
Prior art keywords
answer
clustering
similarity
similitude
answers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811071710.8A
Other languages
Chinese (zh)
Inventor
高雪
陈喆
焦碧碧
李秋豪
莫智慧
毛书宇
王亚军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Guangzhou Shenma Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shenma Mobile Information Technology Co Ltd filed Critical Guangzhou Shenma Mobile Information Technology Co Ltd
Priority to CN201811071710.8A priority Critical patent/CN109460502A/en
Publication of CN109460502A publication Critical patent/CN109460502A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of answer clustering method and its device, electronic equipment, computer-readable medium, answer clustering method includes: to obtain multiple answers that same problem is directed in intelligent answer community;According to the clustering rule of setting, similitude clustering is carried out to multiple answers for the same problem;According to the multiple answer carry out similitude clustering as a result, to the multiple answer carry out level division.The present embodiment, which is realized, has carried out clustering for same or similar problem, avoids the repetition and redundancy of the answer of intelligent answer community.

Description

Answer clustering method and its device, electronic equipment, computer-readable medium
Technical field
This application involves internet area more particularly to a kind of answer clustering methods and its device, electronic equipment, computer Readable medium.
Background technique
It is a kind of effective means that people obtain real-world information by problem and to the answer of the problem.It is same with this When, with the development of internet technology and the fast development of Internet application, people rely increasingly upon Internet Obtain information.Relevant information is searched for by search platform at present to realize that this obtains information by interrogation reply system, is especially being asked Answer search problem in community, proposition problem, answer a question, browsing problem or additional problem etc., this have become carry out between user it is mutual The important way of dynamic information interchange.Wherein, common intelligent answer community there is Baidu to know, search ask, Sina love ask.
But in the prior art, for same or similar problem, there are a large amount of similar answers, thus lead to intelligence There is a large amount of repetitions or even redundancy in the answer provided in energy Ask-Answer Community.
Summary of the invention
The purpose of the application is to propose a kind of answer clustering method and its device, electronic equipment, computer-readable medium, For solving above-mentioned technical problem in the prior art.
In a first aspect, the embodiment of the present application provides a kind of answer clustering method comprising:
Obtain multiple answers that same problem is directed in intelligent answer community;
According to the clustering rule of setting, similitude clustering is carried out to multiple answers for the same problem;
Drawing as a result, carrying out level to the multiple answer for similitude clustering is carried out according to the multiple answer Point.
Optionally, in any embodiment of the application, the answer clustering method further include: each described answer into Entity key therein is extracted in row semantic analysis;Accordingly, according to the clustering rule of setting, the same problem is directed to described Multiple answers carry out similitude clustering, comprising: count the entity key similarity of the multiple answer, and according to setting Fixed entity key similarity clustering rule carries out similitude cluster point to multiple answers for the same problem Analysis.
Optionally, in any embodiment of the application, the answer clustering method further include: to the entity key into Row category attribute divides;Accordingly, according to the clustering rule of setting, phase is carried out to multiple answers for the same problem Like property clustering, comprising: count the category attribute similarity of the multiple answer, and according to the category attribute similarity of setting Clustering rule carries out similitude clustering to multiple answers for the same problem.
Optionally, in any embodiment of the application, the answer clustering method further include: obtain association with it is multiple described Multiple problems of answer;Accordingly, according to the clustering rule of setting, phase is carried out to multiple answers for the same problem Like property clustering, comprising: the similarity of statistical correlation and multiple problems of multiple answers, and according to phase the problem of setting Like degree clustering rule, similitude clustering is carried out to multiple answers for the same problem.
Optionally, in any embodiment of the application, the answer clustering method further include: multiple answers are distinguished It is parsed to generate corresponding feature vector;Accordingly, according to the clustering rule of setting, to described for the same problem Multiple answers carry out similitude clustering, comprising: count the similarity of the feature vector of the multiple answer, and according to setting Feature vector similarity clustering rule, similitude clustering is carried out to multiple answers for the same problem.
Optionally, in any embodiment of the application, according to the knot for carrying out similitude clustering to the multiple answer Fruit carries out level division to the multiple answer, comprising: according to the knot for carrying out similitude clustering to the multiple answer Fruit will be respectively arranged at answer outlier according to similarity height to the multiple answer or layer is packed up in answer.
Optionally, in any embodiment of the application, the answer clustering method further include: for the answer outlier with The answer packs up the answer in layer and configures different preferential display levels.
Optionally, in any embodiment of the application, the answer clustering method further include: in the answer outlier The preferential display level of answer is greater than the preferential display level that the answer in layer is packed up in the answer.
Second aspect, the embodiment of the present application also provide a kind of answer clustering apparatus comprising:
Acquiring unit, for obtaining the multiple answers for being directed to same problem in intelligent answer community;
Cluster cell carries out phase to multiple answers for the same problem for the clustering rule according to setting Like property clustering;
Level division unit, for basis to the multiple answer progress similitude clustering as a result, to described more A answer carries out level division.
Optionally, in any embodiment of the application, the answer clustering apparatus further include: extraction unit, for every One answer carries out semantic analysis and extracts entity key therein;Accordingly, the cluster cell is further used for uniting The entity key similarity of the multiple answer is counted, and according to the entity key similarity clustering rule of setting, to described Multiple answers for the same problem carry out similitude clustering.
Optionally, in any embodiment of the application, the answer clustering apparatus further include: division unit, for institute It states entity key and carries out category attribute division;Accordingly, according to the clustering rule of setting, to described for the same problem Multiple answers carry out similitude clustering, comprising: the cluster cell is further used for counting the classification of the multiple answer Attributes similarity, and according to the category attribute similarity clustering rule of setting, to multiple answers for the same problem Carry out similitude clustering.
Optionally, in any embodiment of the application, the answer clustering apparatus further include: associative cell, for obtaining Multiple problems of association and multiple answers;Accordingly, the cluster cell be further used for statistical correlation with it is multiple described The similarity of multiple problems of answer, and according to similarity clustering rule the problem of setting, to described for the same problem Multiple answers carry out similitude clustering.
Optionally, in any embodiment of the application, the answer clustering apparatus further include: resolution unit, for more A answer is parsed respectively to generate corresponding feature vector;Accordingly, the cluster cell is further used for counting The similarity of the feature vector of the multiple answer, and according to the feature vector similarity clustering rule of setting, it is directed to described Multiple answers of the same problem carry out similitude clustering.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising:
One or more processors;
Computer-readable medium is configured to store one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the method as described in any embodiment.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, the journey The method as described in any embodiment is realized when sequence is executed by processor.
In technical solution provided by the present application, multiple answers of same problem are directed in intelligent answer community by obtaining;According to The clustering rule of setting carries out similitude clustering to multiple answers for the same problem;According to described more A answer carry out similitude clustering as a result, carrying out level division to the multiple answer.The present embodiment, which realizes, to be directed to Same or similar problem has carried out clustering, avoids the repetition and redundancy of the answer of intelligent answer community.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is one answer clustering method flow diagram of the embodiment of the present application;
Fig. 2 is two answer clustering method flow diagram of the embodiment of the present application;
Fig. 3 is three answer clustering method flow diagram of the embodiment of the present application;
Fig. 4 is four answer clustering method flow diagram of the embodiment of the present application;
Fig. 5 is five answer clustering apparatus structural schematic diagram of the embodiment of the present application;
Fig. 6 is six answer clustering apparatus structural schematic diagram of the embodiment of the present application;
Fig. 7 is seven answer clustering apparatus structural schematic diagram of the embodiment of the present application;
Fig. 8 is eight answer clustering apparatus structural schematic diagram of the embodiment of the present application;
Fig. 9 is nine answer clustering apparatus structural schematic diagram of the embodiment of the present application;
Figure 10 is the structural schematic diagram of ten electronic equipment of the embodiment of the present application;
Figure 11 is the hardware configuration of 11 electronic equipment of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated only is only configured to explain related invention, rather than the restriction to the invention.It also should be noted that being Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In the technical solution that the following embodiments of the application provide, main thought is, obtains needle in intelligent answer community Multiple answers to same problem;According to the clustering rule of setting, phase is carried out to multiple answers for the same problem Like property clustering;According to the multiple answer carry out similitude clustering as a result, to the multiple answer carry out layer Grade divides.Below
Fig. 1 is one answer clustering method flow diagram of the embodiment of the present application;As shown in Figure 1, it may include lower step:
S101, multiple answers that same problem is directed in intelligent answer community are obtained;
In the present embodiment, answer collection monitoring component can be specifically configured by the background service of intelligent answer community, it is right Question and answer data are carried out in the intelligent answer community to being monitored in real time and collected, and are stored on background server.This Place, question and answer data are putd question to and a corresponding answer or one put question to or corresponding mistake to can specifically include one A answer.Problem and answer are assigned unique identification.
Certainly, in the specific implementation, it is contemplated that data volume is larger, then can configure special distributed back-end data service Device, for storing above-mentioned question and answer data pair.The configuration of distributed background data server can specifically be carried out according to territorial scope Configuration, when obtaining multiple answers in step s101, obtains multiple answers from nearest distributed background server.
S102, semantic analysis extraction entity key therein is carried out each described answer;
In the present embodiment, semantic analysis processing includes word segmentation processing, which can specifically include based on character string Matched segmenting method, it includes a large amount of in the dictionary that in the specific implementation, being established according to big data analysis and collection, which has dictionary, Word sample.The entity key of cutting with the matched all possible word of dictionary, determines most out further according to statistical language model Excellent cutting thinks that each word is only and before it in sentence as a result, define the semantic logic of language in language statistics model N -1 words are related.Specifically, for example " Lanzhou Huanghe River bridge ", first progress entry retrieval (generally using Trie storage) are looked for To matched all entries (Lanzhou, city, the Yellow River, bridge, Lanzhou, Yellow River Bridge, the mayor, Jiang great Qiao, Jiang great, bridge), with word Grid (word lattices) form indicates, then does route searching, is found most based on statistical language model (such as n-gram) Shortest path finally obtains entity key " Lanzhou Huanghe River bridge ".
Alternatively, in other embodiments, can also based on the segmenting method by word word-building, the i.e. classification problem of word, Sequence labeling problem namely in natural language processing utilizes HMMMAXENT, MEMM, CRF in usual way Deng the tag mark of the prediction each word of text string, for example B, E, I, S, this four tag are respectively indicated: beginning, inside, Ending, single, that is, the beginning of a word, it is intermediate, terminate, and the word of single word.Such as " Lanzhou Huanghe River is big The annotation results of bridge " may are as follows: " big (B) bridge (E) of the blue yellow river (B) (E) of the city the state (B) (I) (E) ".
S103, the entity key similarity for counting the multiple answer, and according to the entity key similarity of setting Clustering rule carries out similitude clustering to multiple answers for the same problem;
In the present embodiment, specifically can arbitrarily connect in the entity key by counting multiple answers two character strings it Between distance, so that counting a character string changes into minimum edit operation times needed for another character string.Editor behaviour Work includes that a character is substituted for another character, is inserted into a character, deletes a character.In general, editing distance Smaller, the similarity of two strings is bigger.
Alternatively, in other embodiments, it can also further be built using the entity key of each answer as label Vertical label vector.For example, answer 1: China, Hangzhou, man, work;Answer 2: the city, Hangzhou, work.If label vector Dimension is 10, and entity key is not designated as 0 then for corresponding position, is based on this, the label vector difference of foundation is as follows:
The label vector V1 of answer 1:
(0,0,684373,0,605594,0,0,0,42062,28717)
The label vector V2 of answer 2:
(0,0,0,0,605594,0,487695,0,420062,0)
Calculate the cosine (i.e. similarity) between two label vectors:Thus the two answers are obtained Similarity be 0.47524222827391666.
When there are multiple answers, the calculating of above-mentioned similarity is carried out one by one, and multiple similarities, the number of similarity is calculated Value is bigger, shows that the similarity of the two answers is higher.
It optionally, is the assigned short text set being made of multiple short texts by problem cutting in an other embodiment, analysis The potential thematic knowledge that the text is concentrated extracts the word distribution under the theme and theme in text, obtains text-theme matrix Pass through the similar of semanteme so that the word under counting same subject has the same or similar semanteme with theme-word matrix Spend the similarity between the problem of determination.
S104, basis are to the multiple answer progress similitude clustering as a result, carrying out layer to the multiple answer Grade divides.
It, specifically can be according to the knot for carrying out similitude clustering to the multiple answer in step S104 in the present embodiment Fruit will be respectively arranged at answer outlier according to similarity height to the multiple answer or layer is packed up in answer.Further, The answer in layer, which is packed up, from the answer for the answer outlier configures different preferential display levels.Still further, institute The preferential display level for stating the answer in answer outlier is greater than the preferential display level for the answer that the answer is packed up in layer.
It is directed to the same problem, answer outlier includes the higher answer of preferential display level, and when showing, configuration exists Show the first floor, it includes the lower answer of preferential display level that layer is packed up in answer.
Fig. 2 is two answer clustering method flow diagram of the embodiment of the present application;As shown in Fig. 2, it may include lower step:
S201, multiple answers that same problem is directed in intelligent answer community are obtained;
In the present embodiment, step S201 is similar to above-mentioned steps S101, and details are not described herein.
S202, category attribute division is carried out to the entity key;
In the present embodiment, by with preset category attribute library, include several classes of setting in category attribute library Other attribute sample, each category attribute sample correspond to one or more keyword sample.
Therefore, by the comparison of entity key and keyword sample, the category attribute of the entity key is determined.Class Other attribute such as can specifically be divided into multiple grades by the matching degree with problem, for example, navigation generic attribute, affairs generic attribute, Information generic attribute, navigation generic attribute serve mainly to facilitate user and find the page including the keyword, and affairs generic attribute is for helping Help its actual purpose keyword of user.Information generic attribute is for reacting user's used keyword when finding certain information. Furthermore it is also possible to distinguish category attribute according to the length of entity key, such as can be all according to the length of keyword Keyword is divided into short-tail keyword and long-tail keyword.Short-tail keyword, that is, fewer the keyword of number of words, such as mechanical, beauty Appearance, Beijing Hospital etc., general competition intensity can be very big;Long-tail keyword, that is, number of words is relatively more, more specific, volumes of searches is relatively low Keyword, the combination of usually several words exists than Beijing's Imperial Palace museum, talents market, Langfang in Hebei Province, Beijing Zoo Where etc..
S203, the category attribute similarity for counting the multiple answer, and clustered according to the category attribute similarity of setting Rule carries out similitude clustering to multiple answers for the same problem;
Specifically, above-mentioned multiple category attributes can form a classification attribute vector, for each answer, when there are a certain When a category attribute, in category attribute vector corresponding place value be 1, with reference to above-mentioned cosine similarity calculation (i.e. Similarity clustering rule), to calculate for the similarity between two answers of the same problem.
S204, basis are to the multiple answer progress similitude clustering as a result, carrying out layer to the multiple answer Grade divides.
In the present embodiment, step S204 is repeated no more in detail similar to above-mentioned steps S104.
Fig. 3 is three answer clustering method flow diagram of the embodiment of the present application;As shown in figure 3, it may include lower step:
S301, multiple answers that same problem is directed in intelligent answer community are obtained;
In the present embodiment, step S301 is similar to above-mentioned steps S101, and details are not described herein.
S302, the multiple problems for obtaining association with multiple answers;
In the present embodiment, since the problem in knowledge base and answer are with question and answer to being organized, in fact, right Answer any one answer, can find at least one correspondence problem, by following step problem similitude judgement come The similitude of answer is determined indirectly.
The similarity of S303, statistical correlation and multiple problems of multiple answers, and according to similarity the problem of setting Clustering rule carries out similitude clustering to multiple answers for the same problem;
In the present embodiment, the similarity of problem can by extracting the keyword of problem, by establishing crucial term vector, then By the cosine similarity of term vector crucial between two problems, to calculate the similarity of two answers.If two problems Similarity is higher, then the similarity of corresponding two answers is also higher.Alternatively, the category attribute phase of the above problem can also be passed through Judge like degree.
In the keyword for determining problem, specifically the structural analysis of sentence justice can be carried out to each problem by step, extraction is asked Topic in topic states topic, elementary item, general term.It specifically can be the form of structure tree by the semantic expressiveness of entire problem, specifically It is expressed as four sentence pattern layer, describing layer, object layer and levels of detail levels.Sentence pattern layer indicates the sentence justice type of problem, including simple Sentence justice, complex sentence are adopted, compound sentence is adopted, type in multiple sentence justice four;Comprising topic and state topic in describing layer, topic with to state topic be pair The Preliminary division of sentence justice is the essential sentence justice ingredient in sentence justice structure, and topic is defined as being described object in sentence justice, and it is fixed to state topic Justice is the description content of the topic in sentence justice;Comprising predicate, elementary item, general term, semantic lattice in object layer, semantic lattice are to word The semantic tagger of language, including 7 kinds of fundamental mesh and 12 kinds of general lattice, elementary item are defined as having in sentence justice with predicate and directly contact Ingredient constitutes the trunk of a problem semanteme, and corresponding semanteme lattice are fundamental mesh, and general term is defined as being modified into sentence justice Point, corresponding semanteme lattice are general lattice;It include the amplification meaning of sentence in levels of detail.
Feature expansion is carried out to problem according to topic, obtains vector the problem of based on topic.If two identical words It is respectively served as topic in sentence and states a part of topic, then it is assumed that the two words have different semantemes, define the two Word is different word, according to this definition, when carrying out feature expansion to problem, according to topic and should state topic part to asking respectively Topic carries out feature expansion.The feature of the topic part of problem expands method particularly includes: the elementary item and one first under extraction topic As the corresponding word of item, then compare probability of the word under different themes, choose the highest theme of probability, will be under the theme Other words add in problem, as a part of problem, finally use all words of problem as feature, construction feature Vector indicates sentence, and wherein the value in sentence in dimension corresponding to original word is the frequency of occurrence in sentence of word, and Value in dimension corresponding to the word of expansion is calculated by formula (1):
V=n*w (1)
V is to expand word correspond to value in dimension, and n is the number for expanding word and occurring in problem, and w is expansion word Probability value under corresponding theme.
The feature vector of each problem is obtained based on aforesaid way, calculates two by above-mentioned cosine similarity calculation Similarity between problem.
S304, basis are to the multiple answer progress similitude clustering as a result, carrying out layer to the multiple answer Grade divides.
In the present embodiment, step S304 refers to above-mentioned steps 204, repeats no more in detail.
Fig. 4 is four answer clustering method flow diagram of the embodiment of the present application;As shown in figure 4, it may include lower step:
S401, multiple answers that same problem is directed in intelligent answer community are obtained;
S402, multiple answers are parsed respectively to generate corresponding feature vector;
In the present embodiment, the correlation between two answers is measured using several feature vectors, these features include Different ranks, is word feature vector, phrase feature vector, sentence structure feature vector respectively.
S403, the multiple answer of statistics feature vector similarity, and it is poly- according to the feature vector similarity of setting Rule-like carries out similitude clustering to multiple answers for the same problem;
1. word feature vector
Word feature vector is to calculate two answer similarities in terms of word from word.For example, using jointly Word number feature: each word co-occurrence number.
2. phrase feature vector
Simply it can be described as, when the phrase in answer sentence occurs directly in problem sentence, the score of the phrase It is exactly 1, if certain phrases in the phrase and problem sentence appear in phrase table, it is meant that two phrases are synonymous short When language or relevant phrases, which is exactly the product of the mutual translation probability of phrase in phrase table, is between one 0,1 Value.If the phrase is unsatisfactory for both the above situation, the score of the phrase is exactly 0.One arrives N member language in calculating answer sentence The Relevance scores of genitive phrase and problem sentence that method includes, are finally averaging N to obtain phrase feature vector.
3. sentence semantics feature vector
This feature using it is newest obtained based on the model of two sentence similarities of calculating of deep learning it is semantic similar Spend score.Problem sentence and answer sentence are used Bi-LSTM (bidirectional long short first by the model respectively Term memory) calculate the vector expression of two each positions of sentence, the different location of two sentences interact to be formed it is new Matrix and tensor, then connect k-Max sample level and multi-layer perception (MLP) carries out dimensionality reduction.Finally export the similarity sentence of two sentences Semantic feature vector.
4. sentence structure feature
Word common in two answers is found first, referred to herein as a pair of of anchor point.It is possible that more in two sentences To anchor point.Then the dependence of two sentences is calculated separately out.Count two dependency trees from root to anchor point it is identical according to The number of relationship is deposited to get sentence structure feature vector is arrived.
The similarity score of all features in above-mentioned four kinds of ranks is weighted summation and obtains overall similarity score; Obtain the similarity between two answers.
S404, basis are to the multiple answer progress similitude clustering as a result, carrying out layer to the multiple answer Grade divides.
In the present embodiment, step S404 is repeated no more in detail similar to above-mentioned steps S104.
Fig. 5 is five answer clustering apparatus structural schematic diagram of the embodiment of the present application;As shown in figure 5, it may include:
Acquiring unit 501, for obtaining the multiple answers for being directed to same problem in intelligent answer community;
Cluster cell 502 carries out multiple answers for the same problem for the clustering rule according to setting Similitude clustering;
Level division unit 503, for basis to the multiple answer progress similitude clustering as a result, to described Multiple answers carry out level division.
Fig. 6 is six answer clustering apparatus structural schematic diagram of the embodiment of the present application;As shown in fig. 6, its in addition to may include on Acquiring unit 501 in Fig. 5, cluster cell 502 are stated, outside level division unit 503, can also include extraction unit 504, be used for Semantic analysis is carried out each described answer and extracts entity key therein;Accordingly, the cluster cell 502 is further For counting the entity key similarity of the multiple answer, and according to the entity key similarity clustering rule of setting, Similitude clustering is carried out to multiple answers for the same problem.
Fig. 7 is seven answer clustering apparatus structural schematic diagram of the embodiment of the present application;As shown in fig. 7, its in addition to may include on It states acquiring unit 501 in Fig. 5, cluster cell 502, can also include category division unit 505 outside level division unit 503, For carrying out category attribute division to the entity key;Accordingly, according to the clustering rule of setting, to described for same Multiple answers of a problem carry out similitude clustering, comprising: the cluster cell 502 is further used for counting the multiple The category attribute similarity of answer, and according to the category attribute similarity clustering rule of setting, the same problem is directed to described Multiple answers carry out similitude clustering.
Fig. 8 is eight answer clustering apparatus structural schematic diagram of the embodiment of the present application;As shown in figure 8, its in addition to may include on Acquiring unit 501 in Fig. 5, cluster cell 502 are stated, outside level division unit 503, can also include associative cell 506, be used for Obtain multiple problems of association with multiple answers;Accordingly, the cluster cell 502 is further used for statistical correlation and more The similarity of multiple problems of a answer, and according to similarity clustering rule the problem of setting, to described for same Multiple answers of problem carry out similitude clustering.
Fig. 9 is nine answer clustering apparatus structural schematic diagram of the embodiment of the present application;As shown in figure 9, its in addition to may include on Acquiring unit 501 in Fig. 5, cluster cell 502 are stated, outside level division unit 503, can also include resolution unit 507, be used for Multiple answers are parsed respectively to generate corresponding feature vector;Accordingly, the cluster cell 502 is further used In the similarity for the feature vector for counting the multiple answer, and according to the feature vector similarity clustering rule of setting, to institute The multiple answers stated for the same problem carry out similitude clustering.
Figure 10 is the structural schematic diagram of ten electronic equipment of the embodiment of the present application;The electronic equipment may include:
One or more processors 1001;
Computer-readable medium 1002 is configurable to store one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the method as described in above-mentioned any embodiment.
Figure 11 is the hardware configuration of 11 electronic equipment of the embodiment of the present application;As shown in figure 11, the hardware of the electronic equipment Structure may include: processor 1101, communication interface 1102, computer-readable medium 1103 and communication bus 1104;
Wherein processor 1101, communication interface 1102, computer-readable medium 1103 complete phase by communication bus 1104 Communication between mutually;
Optionally, communication interface 1102 can be the interface of communication module, such as the interface of gsm module;
Wherein, processor 1101 is specifically configurable to: being obtained in intelligent answer community and is answered for the multiple of same problem Case;According to the clustering rule of setting, similitude clustering is carried out to multiple answers for the same problem;According to right The multiple answer carry out similitude clustering as a result, carrying out level division to the multiple answer.
Processor 1101 can be general processor, including central processing unit (Central Processing Unit, letter Claim CPU), network processing unit (Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), specially With integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or crystal Pipe logical device, discrete hardware components.It may be implemented or execute the disclosed each method in the embodiment of the present application, step and patrol Collect block diagram.General processor can be microprocessor or the processor is also possible to any conventional processor etc..
Particularly, according to an embodiment of the present application, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiments herein includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes to be configured to the program code of method shown in execution flow chart.Such In embodiment, which can be downloaded and installed from network by communications portion, and/or from detachable media quilt Installation.When the computer program is executed by central processing unit (CPU), the above-mentioned function limited in the present processes is executed Energy.It should be noted that computer-readable medium described herein can be computer-readable signal media or computer Readable storage medium storing program for executing either the two any combination.Computer-readable medium for example can be, but not limited to be electricity, magnetic, Optical, electromagnetic, the system of infrared ray or semiconductor, device or device, or any above combination.Computer-readable storage medium The more specific example of matter can include but is not limited to: have the electrical connections of one or more conducting wires, portable computer diskette, Hard disk, random access storage medium (RAM), read-only storage medium (ROM), erasable type may be programmed read-only storage medium (EPROM or Flash memory), optical fiber, the read-only storage medium of portable compact disc (CD-ROM), optical storage media part, magnetic storage medium part or Above-mentioned any appropriate combination.In this application, computer readable storage medium can be it is any include or storage program Tangible medium, the program can be commanded execution system, device or device use or in connection.And in the application In, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, wherein Carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to electric Magnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, which can send, propagate or transmission configuration is served as reasons Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
It can be write by one or more programming languages or combinations thereof in terms of the operation for being configured to execute the application Calculation machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C ++, further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind: including local area network (LAN) or extensively Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code are matched comprising one or more It is set to the executable instruction of logic function as defined in realizing.There is specific precedence relationship in above-mentioned specific embodiment, but these are successively Relationship is only exemplary, when specific implementation, these steps may less, more or execution sequence have adjustment.I.e. In some implementations as replacements, function marked in the box can also be sent out in a different order than that indicated in the drawings It is raw.For example, two boxes succeedingly indicated can actually be basically executed in parallel, they sometimes can also be by opposite suitable Sequence executes, and this depends on the function involved.It is also noted that each box and block diagram in block diagram and or flow chart And/or the combination of the box in flow chart, can with execute as defined in functions or operations dedicated hardware based system come It realizes, or can realize using a combination of dedicated hardware and computer instructions.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, wherein the title of these units is in certain situation Under do not constitute restriction to the unit itself, for example, acquiring unit is also described as " for obtaining intelligent answer community In for same problem multiple answers unit ".
As on the other hand, present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, should The method as described in above-mentioned any embodiment is realized when program is executed by processor.
As on the other hand, present invention also provides a kind of computer-readable medium, which can be above-mentioned Included in device described in embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned computer can It reads medium and carries one or more program, when said one or multiple programs are executed by the device, so that the device: Obtain multiple answers that same problem is directed in intelligent answer community;According to the clustering rule of setting, to described for same Multiple answers of problem carry out similitude clustering;According to the multiple answer carry out similitude clustering as a result, Level division is carried out to the multiple answer.
In addition, in above-described embodiment, acquiring unit, cluster cell, level divide it is single may be respectively referred to as again the first program unit, Second program unit, third program unit.
The statement used in the various embodiments of the application " first ", " second ", " first " or " described Two " can modify various parts and unrelated with sequence and/or importance, but these statements do not limit corresponding component.The above statement It is only configured to the purpose for distinguishing element and other elements.For example, the first user equipment and second user equipment indicate different User equipment, although being both user equipment.For example, first element can under the premise of without departing substantially from scope of the present application Referred to as second element, similarly, second element can be referred to as first element.
When an element (for example, first element) referred to as " (operationally or can with another element (for example, second element) Communicatedly) connection " or " (operationally or communicably) being attached to " another element (for example, second element) or " being connected to " are another When one element (for example, second element), it is thus understood that an element is connected directly to another element or an element Another element is indirectly connected to via another element (for example, third element).On the contrary, it is appreciated that when element (for example, First element) it referred to as " is directly connected to " or when " directly connection " to another element (second element), then without element (for example, the Three elements) it is inserted between the two.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (15)

1. a kind of answer clustering method characterized by comprising
Obtain multiple answers that same problem is directed in intelligent answer community;
According to the clustering rule of setting, similitude clustering is carried out to multiple answers for the same problem;
According to the multiple answer carry out similitude clustering as a result, to the multiple answer carry out level division.
2. the method according to claim 1, wherein further include: semantic analysis is carried out each described answer Extract entity key therein;Accordingly, according to the clustering rule of setting, to multiple answers for the same problem Carry out similitude clustering, comprising: count the entity key similarity of the multiple answer, and close according to the entity of setting Keyword similarity clustering rule carries out similitude clustering to multiple answers for the same problem.
3. the method according to claim 1, wherein further include: category attribute is carried out to the entity key It divides;Accordingly, according to the clustering rule of setting, similitude cluster point is carried out to multiple answers for the same problem Analysis, comprising: count the category attribute similarity of the multiple answer, and according to the category attribute similarity clustering rule of setting, Similitude clustering is carried out to multiple answers for the same problem.
4. the method according to claim 1, wherein further include: acquisition association is multiple with multiple answers Problem;Accordingly, according to the clustering rule of setting, similitude cluster point is carried out to multiple answers for the same problem Analysis, comprising: the similarity of statistical correlation and multiple problems of multiple answers, and according to similarity cluster rule the problem of setting Then, similitude clustering is carried out to multiple answers for the same problem.
5. the method according to claim 1, wherein further include: to multiple answers parsed respectively with Generate corresponding feature vector;Accordingly, according to the clustering rule of setting, to multiple answers for the same problem into Row similitude clustering, comprising: count the similarity of the feature vector of the multiple answer, and according to the feature vector of setting Similarity clustering rule carries out similitude clustering to multiple answers for the same problem.
6. method according to claim 1-5, which is characterized in that carry out similitude according to the multiple answer Clustering as a result, carrying out level division to the multiple answer, comprising: it is poly- according to similitude is carried out to the multiple answer Alanysis as a result, answer outlier will be respectively arranged at or answer is packed up according to similarity height to the multiple answer Layer.
7. according to the method described in claim 6, it is characterized by further comprising: being received for the answer outlier and the answer Answer in aliquation configures different preferential display levels.
8. the method according to the description of claim 7 is characterized in that further include: answer in the answer outlier it is preferential Display level is greater than the preferential display level that the answer in layer is packed up in the answer.
9. a kind of answer clustering apparatus characterized by comprising
Acquiring unit, for obtaining the multiple answers for being directed to same problem in intelligent answer community;
Cluster cell carries out similitude to multiple answers for the same problem for the clustering rule according to setting Clustering;
Level division unit, for basis to the multiple answer progress similitude clustering as a result, answering the multiple Case carries out level division.
10. device according to claim 9, which is characterized in that further include: extraction unit, for being answered described in each Case carries out semantic analysis and extracts entity key therein;Accordingly, the cluster cell is further used for counting the multiple The entity key similarity of answer, and according to the entity key similarity clustering rule of setting, to described for same Multiple answers of problem carry out similitude clustering.
11. device according to claim 9, which is characterized in that further include: division unit, for crucial to the entity Word carries out category attribute division;Accordingly, according to the clustering rule of setting, to multiple answers for the same problem into Row similitude clustering, comprising: the cluster cell is further used for counting the category attribute similarity of the multiple answer, And according to the category attribute similarity clustering rule of setting, it is poly- that similitude is carried out to multiple answers for the same problem Alanysis.
12. device according to claim 9, which is characterized in that further include: associative cell, for obtain association with it is multiple Multiple problems of the answer;Accordingly, the cluster cell is further used for the multiple of statistical correlation and multiple answers The similarity of problem, and according to setting the problem of similarity clustering rule, to multiple answers for the same problem into Row similitude clustering.
13. device according to claim 9, which is characterized in that further include: resolution unit, for multiple answers It is parsed respectively to generate corresponding feature vector;Accordingly, the cluster cell, which is further used for counting the multiple, answers The similarity of the feature vector of case, and according to the feature vector similarity clustering rule of setting, the same problem is directed to described Multiple answers carry out similitude clustering.
14. a kind of electronic equipment, comprising:
One or more processors;
Computer-readable medium is configured to store one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method described in any one of claims 1-8.
15. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Shi Shixian method for example described in any one of claims 1-8.
CN201811071710.8A 2018-09-14 2018-09-14 Answer clustering method and its device, electronic equipment, computer-readable medium Pending CN109460502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811071710.8A CN109460502A (en) 2018-09-14 2018-09-14 Answer clustering method and its device, electronic equipment, computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811071710.8A CN109460502A (en) 2018-09-14 2018-09-14 Answer clustering method and its device, electronic equipment, computer-readable medium

Publications (1)

Publication Number Publication Date
CN109460502A true CN109460502A (en) 2019-03-12

Family

ID=65606670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811071710.8A Pending CN109460502A (en) 2018-09-14 2018-09-14 Answer clustering method and its device, electronic equipment, computer-readable medium

Country Status (1)

Country Link
CN (1) CN109460502A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611382A (en) * 2020-05-22 2020-09-01 贝壳技术有限公司 Dialect model training method, dialog information generation method, device and system
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN112365374A (en) * 2020-06-19 2021-02-12 支付宝(杭州)信息技术有限公司 Standard case routing determination method, device and equipment
CN113051390A (en) * 2019-12-26 2021-06-29 百度在线网络技术(北京)有限公司 Knowledge base construction method and device, electronic equipment and medium
CN113505586A (en) * 2021-06-07 2021-10-15 中电鸿信信息科技有限公司 Seat-assisted question-answering method and system integrating semantic classification and knowledge graph
CN113535900A (en) * 2021-07-08 2021-10-22 李刚 Target information extraction method, electronic device, and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN106446287A (en) * 2016-11-08 2017-02-22 北京邮电大学 Answer aggregation method and system facing crowdsourcing scene question-answering system
US20180068222A1 (en) * 2016-09-07 2018-03-08 International Business Machines Corporation System and Method of Advising Human Verification of Machine-Annotated Ground Truth - Low Entropy Focus
CN107992554A (en) * 2017-11-28 2018-05-04 北京百度网讯科技有限公司 The searching method and device of the polymerization result of question and answer information are provided
CN108121821A (en) * 2018-01-09 2018-06-05 惠龙易通国际物流股份有限公司 A kind of machine customer service method, equipment and computer storage media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
US20180068222A1 (en) * 2016-09-07 2018-03-08 International Business Machines Corporation System and Method of Advising Human Verification of Machine-Annotated Ground Truth - Low Entropy Focus
CN106446287A (en) * 2016-11-08 2017-02-22 北京邮电大学 Answer aggregation method and system facing crowdsourcing scene question-answering system
CN107992554A (en) * 2017-11-28 2018-05-04 北京百度网讯科技有限公司 The searching method and device of the polymerization result of question and answer information are provided
CN108121821A (en) * 2018-01-09 2018-06-05 惠龙易通国际物流股份有限公司 A kind of machine customer service method, equipment and computer storage media

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051390A (en) * 2019-12-26 2021-06-29 百度在线网络技术(北京)有限公司 Knowledge base construction method and device, electronic equipment and medium
CN113051390B (en) * 2019-12-26 2023-09-26 百度在线网络技术(北京)有限公司 Knowledge base construction method, knowledge base construction device, electronic equipment and medium
CN111611382A (en) * 2020-05-22 2020-09-01 贝壳技术有限公司 Dialect model training method, dialog information generation method, device and system
CN112365374A (en) * 2020-06-19 2021-02-12 支付宝(杭州)信息技术有限公司 Standard case routing determination method, device and equipment
CN111667029A (en) * 2020-07-09 2020-09-15 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN111667029B (en) * 2020-07-09 2023-11-10 腾讯科技(深圳)有限公司 Clustering method, device, equipment and storage medium
CN113505586A (en) * 2021-06-07 2021-10-15 中电鸿信信息科技有限公司 Seat-assisted question-answering method and system integrating semantic classification and knowledge graph
CN113535900A (en) * 2021-07-08 2021-10-22 李刚 Target information extraction method, electronic device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN109460502A (en) Answer clustering method and its device, electronic equipment, computer-readable medium
Lieberman et al. STEWARD: architecture of a spatio-textual search engine
CN103221915B (en) Using ontological information in open domain type coercion
KR101173561B1 (en) Question type and domain identifying apparatus and method
Zhang et al. Automatic synonym extraction using Word2Vec and spectral clustering
CN109739964A (en) Knowledge data providing method, device, electronic equipment and storage medium
CN104281702B (en) Data retrieval method and device based on electric power critical word participle
Shi et al. Corpus-based semantic class mining: distributional vs. pattern-based approaches
CN103064956A (en) Method, computing system and computer-readable storage media for searching electric contents
CN105528411B (en) Apparel interactive electronic technical manual full-text search device and method
CN111625622B (en) Domain ontology construction method and device, electronic equipment and storage medium
CN110888991B (en) Sectional type semantic annotation method under weak annotation environment
Zhou et al. Simplified dom trees for transferable attribute extraction from the web
CN109299221A (en) Entity extraction and sort method and device
CN109582761A (en) A kind of Chinese intelligent Answer System method of the Words similarity based on the network platform
CN107943940A (en) Data processing method, medium, system and electronic equipment
CN108304381B (en) Entity edge establishing method, device and equipment based on artificial intelligence and storage medium
Wang et al. Data-driven approach for bridging the cognitive gap in image retrieval
KR102046692B1 (en) Method and System for Entity summarization based on multilingual projected entity space
CN112883182A (en) Question-answer matching method and device based on machine reading
CN116383430A (en) Knowledge graph construction method, device, equipment and storage medium
Moscato et al. iwin: A summarizer system based on a semantic analysis of web documents
CN109657052A (en) A kind of abstract of a thesis contains the abstracting method and device of fine granularity Knowledge Element
Stella et al. Mental lexicon growth modelling reveals the multiplexity of the English language
CN114595696A (en) Entity disambiguation method, entity disambiguation apparatus, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200604

Address after: 310051 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 12 layer self unit 01

Applicant before: GUANGZHOU SHENMA MOBILE INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190312

RJ01 Rejection of invention patent application after publication