CN112463943A

CN112463943A - Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium

Info

Publication number: CN112463943A
Application number: CN202011446866.7A
Authority: CN
Inventors: 周柳阳; 侯克鑫; 蒋林林
Original assignee: Shenzhen Yihao Hulian Technology Co ltd
Current assignee: Shenzhen Yihao Hulian Technology Co ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-09

Abstract

The invention relates to the technical field of data processing, in particular to a processing method and a device for discovering new knowledge points based on a clustering algorithm, electronic equipment and a storage medium, comprising the following steps: step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system; step S2: converting the textual representation into a vector; step S3: the vectors correspond to the first knowledge points one by one; step S4: calculating vector space cosine similarity of all vectors; step S5: classifying the vectors into categories; step S6: correspondingly converting the vectors of different classes into second knowledge points; step S7: and performing theme analysis and theme word display on the second knowledge points.

Description

Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of data processing, in particular to a new knowledge point discovery processing method based on a clustering algorithm, a device thereof, electronic equipment and a storage medium.

[ background of the invention ]

The current question-answering system finds out the knowledge point with the highest similarity to the problem proposed by the user in a retrieval mode, and the effect achieved by the mode depends on the construction quality of the knowledge base, so that the knowledge base has higher cognitive requirements for maintainers of the knowledge base, and the maintainers needing the knowledge base have more sufficient understanding on the related domain knowledge.

Therefore, the prior art is not sufficient and needs to be improved.

[ summary of the invention ]

In order to overcome the technical problems, the invention provides a processing method and a device for discovering new knowledge points based on a clustering algorithm, electronic equipment and a storage medium.

The invention provides a processing method for discovering new knowledge points based on a clustering algorithm, which comprises the following steps:

step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system;

step S2: converting the textual representation into a vector;

step S3: the vectors correspond to the first knowledge points one by one;

step S4: calculating vector space cosine similarity of all vectors;

step S5: classifying the vectors into categories;

step S6: correspondingly converting the vectors of different classes into second knowledge points;

step S7: and performing theme analysis and theme word display on the second knowledge points.

Preferably, the method further comprises the following steps: step S8: and optimizing the knowledge base.

Preferably, in step S2, the text representation is converted according to a deep neural network pre-training model trained on text semantic similarity data.

Preferably, in step S3, the vectors are associated with knowledge numbers and indexes are built.

Preferably, vectors within the similarity threshold are grouped into the same class according to the similarity threshold, and vectors outside the similarity threshold are grouped into different classes.

Preferably, the vectors of the different classes are converted into second knowledge points according to the established indices.

The present invention also provides a processing apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring text representation and knowledge numbers of first knowledge points inquired by a user in a question-answering system;

the processing unit is used for processing the first knowledge points, converting the first knowledge points into second knowledge points and analyzing the second knowledge points;

a display unit: and displaying the subject term on the second knowledge point.

The invention also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to execute the above processing method for new knowledge point discovery based on a clustering algorithm by means of the computer program.

The invention also provides a storage medium having stored therein a computer program arranged to perform the above-described processing method for new knowledge point discovery based on a clustering algorithm when running.

Compared with the prior art, the classification is formed by automatically clustering the knowledge points in the question-answering system, the knowledge base is constructed in an auxiliary mode, the question-answering effect is favorably improved, new knowledge points can be automatically found, the cognitive requirements of knowledge base maintenance personnel of the question-answering system are favorably reduced, and the working intensity of the knowledge base maintenance personnel is favorably reduced.

[ description of the drawings ]

Fig. 1 is a specific flow diagram of the processing method for discovering new knowledge points based on a clustering algorithm according to the present invention.

Fig. 2 is a block diagram of a processing apparatus according to a second embodiment of the present invention.

Fig. 3 is a block diagram of an electronic device according to a third embodiment of the present invention.

Description of reference numerals:

10. a processing device; 11. an acquisition unit; 12. a processing unit; 13. a display unit; 20. an electronic device; 21. a memory; 22. a processor.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a processing method for discovering new knowledge points based on a clustering algorithm, which includes the following steps:

step S1: a text representation and a knowledge number of a first knowledge point queried by a user in a question-and-answer system are obtained.

Specifically, the first knowledge point of the question-and-answer system is used for the solution of common questions, which consists of a specific text representation and a corresponding knowledge number.

Step S2: the text representation is converted into a vector.

Specifically, in the invention, the text representation of the first knowledge point is a vector correspondingly converted by a deep neural network pre-training model trained by text semantic similarity data, and the text representation is converted into the vector, so that the first knowledge point can be conveniently clustered by the vector subsequently.

Step S3: and the vectors are in one-to-one correspondence with the first knowledge points.

Specifically, by associating the vector with the knowledge number of the first knowledge point, one-to-one correspondence with the first knowledge point is achieved, and an index of the vector and the knowledge number is established.

Step S4: and calculating the vector space cosine similarity of all vectors.

Specifically, the cosine value of the included angle of two vectors in the vector space is used as a measure of the difference degree between the two vectors, wherein the closer the cosine value is to 1, the closer the included angle of the two vectors is to 0 degrees is shown, namely the difference of the two vectors is smaller; conversely, it is shown that the greater the difference between the two vectors.

Step S5: the vectors are classified into categories.

Further, a similarity threshold is set, and vectors within the similarity threshold are gathered into the same category, wherein vectors outside the similarity threshold are in different categories. Wherein the similarity threshold may be determined by the recall rate of the model on the similarity test set.

Step S6: and correspondingly converting the vectors of different classes into second knowledge points.

The classified vector is converted into a second knowledge point according to the established index.

Step S7: performing theme analysis and theme word display on the second knowledge points;

step S8: and optimizing the knowledge base.

The optimization of the knowledge base can be specifically the optimization of a maintainer of the knowledge base according to the second knowledge point.

Referring to fig. 2, a second embodiment of the present invention provides a processing device 10, which employs the above method, where the processing device 10 includes an obtaining unit 11 for obtaining a text representation and a knowledge number of a first knowledge point queried by a user in a question-and-answer system;

the processing unit 12 is configured to process, convert, and analyze the first knowledge point, and specifically, process the topic analysis on the second knowledge point in steps S2-S7 of the method;

the display unit 13: and displaying the subject term on the second knowledge point.

Referring to fig. 3, a third embodiment of the invention provides an electronic device 20, and the electronic device 20 is configured to implement the processing method for discovering a new knowledge point based on a clustering algorithm. The electronic device 20 comprises a memory 21 and a processor 22.

In particular, the memory 21 has stored therein a computer program, and the processor 22 is arranged to execute the processing method of new knowledge point discovery based on a clustering algorithm as described above by means of the computer program.

The memory 21 may be used for storing software programs and modules, such as program instructions or modules corresponding to the processing method and apparatus for new knowledge point discovery based on clustering algorithm described above. The processor 22 executes various functional applications and data processing by running software programs and modules stored in the memory 21, namely, the processing method for finding new knowledge points based on the clustering algorithm is realized.

The fourth embodiment of the present invention also provides a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps of any of the method embodiments described above when executed.

It is understood that, in this embodiment, all or part of the steps of the method of the above embodiments may be implemented by a program instructing hardware related to the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include, for example, a floppy disk, an optical disk, a DVD, a hard disk, a flash Memory, a U-disk, a CF card, an SD card, an MMC card, an SM card, a Memory Stick (Memory Stick), an xD card, and the like.

In this embodiment, a computer software product is stored in a storage medium and includes instructions for causing one or more computer devices (which may be personal computer devices, servers, or other network devices) to perform all or part of the steps of the methods described in the embodiments of the present invention.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit of the present invention should be included in the scope of the present invention.

Claims

1. A processing method for discovering new knowledge points based on a clustering algorithm is characterized in that: the method comprises the following steps:

step S2: converting the textual representation into a vector;

step S3: the vectors correspond to the first knowledge points one by one;

step S4: calculating vector space cosine similarity of all vectors;

step S5: classifying the vectors into categories;

2. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: also comprises the following steps:

step S8: and optimizing the knowledge base.

3. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: in step S2, the deep neural network pre-training model trained from the text semantic similarity data converts the text representation.

4. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: in step S3, the vector is associated with a knowledge number and an index is built.

5. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: and aggregating vectors positioned in the similarity threshold into the same class according to the similarity threshold, wherein the vectors positioned outside the similarity threshold are in different classes.

6. The method for processing new knowledge point discovery based on clustering algorithm as claimed in claim 4, wherein: and converting the vectors of different classes into second knowledge points according to the established indexes.

7. A processing apparatus, characterized by: the method comprises the following steps:

a display unit: and displaying the subject term on the second knowledge point.

8. An electronic device comprising a memory and a processor, characterized in that: the memory has stored therein a computer program, and the processor is arranged to execute the processing method of new knowledge point discovery based on clustering algorithm according to any one of claims 1 to 6 by the computer program.

9. A storage medium, characterized by: the storage medium has stored therein a computer program arranged to execute, when running, the processing method of new knowledge point discovery based on clustering algorithm according to any one of claims 1 to 6.