CN112463943A - Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium - Google Patents
Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112463943A CN112463943A CN202011446866.7A CN202011446866A CN112463943A CN 112463943 A CN112463943 A CN 112463943A CN 202011446866 A CN202011446866 A CN 202011446866A CN 112463943 A CN112463943 A CN 112463943A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- knowledge points
- vectors
- clustering algorithm
- processing method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data processing, in particular to a processing method and a device for discovering new knowledge points based on a clustering algorithm, electronic equipment and a storage medium, comprising the following steps: step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system; step S2: converting the textual representation into a vector; step S3: the vectors correspond to the first knowledge points one by one; step S4: calculating vector space cosine similarity of all vectors; step S5: classifying the vectors into categories; step S6: correspondingly converting the vectors of different classes into second knowledge points; step S7: and performing theme analysis and theme word display on the second knowledge points.
Description
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of data processing, in particular to a new knowledge point discovery processing method based on a clustering algorithm, a device thereof, electronic equipment and a storage medium.
[ background of the invention ]
The current question-answering system finds out the knowledge point with the highest similarity to the problem proposed by the user in a retrieval mode, and the effect achieved by the mode depends on the construction quality of the knowledge base, so that the knowledge base has higher cognitive requirements for maintainers of the knowledge base, and the maintainers needing the knowledge base have more sufficient understanding on the related domain knowledge.
Therefore, the prior art is not sufficient and needs to be improved.
[ summary of the invention ]
In order to overcome the technical problems, the invention provides a processing method and a device for discovering new knowledge points based on a clustering algorithm, electronic equipment and a storage medium.
The invention provides a processing method for discovering new knowledge points based on a clustering algorithm, which comprises the following steps:
step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system;
step S2: converting the textual representation into a vector;
step S3: the vectors correspond to the first knowledge points one by one;
step S4: calculating vector space cosine similarity of all vectors;
step S5: classifying the vectors into categories;
step S6: correspondingly converting the vectors of different classes into second knowledge points;
step S7: and performing theme analysis and theme word display on the second knowledge points.
Preferably, the method further comprises the following steps: step S8: and optimizing the knowledge base.
Preferably, in step S2, the text representation is converted according to a deep neural network pre-training model trained on text semantic similarity data.
Preferably, in step S3, the vectors are associated with knowledge numbers and indexes are built.
Preferably, vectors within the similarity threshold are grouped into the same class according to the similarity threshold, and vectors outside the similarity threshold are grouped into different classes.
Preferably, the vectors of the different classes are converted into second knowledge points according to the established indices.
The present invention also provides a processing apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring text representation and knowledge numbers of first knowledge points inquired by a user in a question-answering system;
the processing unit is used for processing the first knowledge points, converting the first knowledge points into second knowledge points and analyzing the second knowledge points;
a display unit: and displaying the subject term on the second knowledge point.
The invention also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to execute the above processing method for new knowledge point discovery based on a clustering algorithm by means of the computer program.
The invention also provides a storage medium having stored therein a computer program arranged to perform the above-described processing method for new knowledge point discovery based on a clustering algorithm when running.
Compared with the prior art, the classification is formed by automatically clustering the knowledge points in the question-answering system, the knowledge base is constructed in an auxiliary mode, the question-answering effect is favorably improved, new knowledge points can be automatically found, the cognitive requirements of knowledge base maintenance personnel of the question-answering system are favorably reduced, and the working intensity of the knowledge base maintenance personnel is favorably reduced.
[ description of the drawings ]
Fig. 1 is a specific flow diagram of the processing method for discovering new knowledge points based on a clustering algorithm according to the present invention.
Fig. 2 is a block diagram of a processing apparatus according to a second embodiment of the present invention.
Fig. 3 is a block diagram of an electronic device according to a third embodiment of the present invention.
Description of reference numerals:
10. a processing device; 11. an acquisition unit; 12. a processing unit; 13. a display unit; 20. an electronic device; 21. a memory; 22. a processor.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a processing method for discovering new knowledge points based on a clustering algorithm, which includes the following steps:
step S1: a text representation and a knowledge number of a first knowledge point queried by a user in a question-and-answer system are obtained.
Specifically, the first knowledge point of the question-and-answer system is used for the solution of common questions, which consists of a specific text representation and a corresponding knowledge number.
Step S2: the text representation is converted into a vector.
Specifically, in the invention, the text representation of the first knowledge point is a vector correspondingly converted by a deep neural network pre-training model trained by text semantic similarity data, and the text representation is converted into the vector, so that the first knowledge point can be conveniently clustered by the vector subsequently.
Step S3: and the vectors are in one-to-one correspondence with the first knowledge points.
Specifically, by associating the vector with the knowledge number of the first knowledge point, one-to-one correspondence with the first knowledge point is achieved, and an index of the vector and the knowledge number is established.
Step S4: and calculating the vector space cosine similarity of all vectors.
Specifically, the cosine value of the included angle of two vectors in the vector space is used as a measure of the difference degree between the two vectors, wherein the closer the cosine value is to 1, the closer the included angle of the two vectors is to 0 degrees is shown, namely the difference of the two vectors is smaller; conversely, it is shown that the greater the difference between the two vectors.
Step S5: the vectors are classified into categories.
Further, a similarity threshold is set, and vectors within the similarity threshold are gathered into the same category, wherein vectors outside the similarity threshold are in different categories. Wherein the similarity threshold may be determined by the recall rate of the model on the similarity test set.
Step S6: and correspondingly converting the vectors of different classes into second knowledge points.
The classified vector is converted into a second knowledge point according to the established index.
Step S7: performing theme analysis and theme word display on the second knowledge points;
step S8: and optimizing the knowledge base.
The optimization of the knowledge base can be specifically the optimization of a maintainer of the knowledge base according to the second knowledge point.
Referring to fig. 2, a second embodiment of the present invention provides a processing device 10, which employs the above method, where the processing device 10 includes an obtaining unit 11 for obtaining a text representation and a knowledge number of a first knowledge point queried by a user in a question-and-answer system;
the processing unit 12 is configured to process, convert, and analyze the first knowledge point, and specifically, process the topic analysis on the second knowledge point in steps S2-S7 of the method;
the display unit 13: and displaying the subject term on the second knowledge point.
Referring to fig. 3, a third embodiment of the invention provides an electronic device 20, and the electronic device 20 is configured to implement the processing method for discovering a new knowledge point based on a clustering algorithm. The electronic device 20 comprises a memory 21 and a processor 22.
In particular, the memory 21 has stored therein a computer program, and the processor 22 is arranged to execute the processing method of new knowledge point discovery based on a clustering algorithm as described above by means of the computer program.
The memory 21 may be used for storing software programs and modules, such as program instructions or modules corresponding to the processing method and apparatus for new knowledge point discovery based on clustering algorithm described above. The processor 22 executes various functional applications and data processing by running software programs and modules stored in the memory 21, namely, the processing method for finding new knowledge points based on the clustering algorithm is realized.
The fourth embodiment of the present invention also provides a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps of any of the method embodiments described above when executed.
It is understood that, in this embodiment, all or part of the steps of the method of the above embodiments may be implemented by a program instructing hardware related to the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include, for example, a floppy disk, an optical disk, a DVD, a hard disk, a flash Memory, a U-disk, a CF card, an SD card, an MMC card, an SM card, a Memory Stick (Memory Stick), an xD card, and the like.
In this embodiment, a computer software product is stored in a storage medium and includes instructions for causing one or more computer devices (which may be personal computer devices, servers, or other network devices) to perform all or part of the steps of the methods described in the embodiments of the present invention.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Compared with the prior art, the classification is formed by automatically clustering the knowledge points in the question-answering system, the knowledge base is constructed in an auxiliary mode, the question-answering effect is favorably improved, new knowledge points can be automatically found, the cognitive requirements of knowledge base maintenance personnel of the question-answering system are favorably reduced, and the working intensity of the knowledge base maintenance personnel is favorably reduced.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit of the present invention should be included in the scope of the present invention.
Claims (9)
1. A processing method for discovering new knowledge points based on a clustering algorithm is characterized in that: the method comprises the following steps:
step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system;
step S2: converting the textual representation into a vector;
step S3: the vectors correspond to the first knowledge points one by one;
step S4: calculating vector space cosine similarity of all vectors;
step S5: classifying the vectors into categories;
step S6: correspondingly converting the vectors of different classes into second knowledge points;
step S7: and performing theme analysis and theme word display on the second knowledge points.
2. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: also comprises the following steps:
step S8: and optimizing the knowledge base.
3. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: in step S2, the deep neural network pre-training model trained from the text semantic similarity data converts the text representation.
4. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: in step S3, the vector is associated with a knowledge number and an index is built.
5. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: and aggregating vectors positioned in the similarity threshold into the same class according to the similarity threshold, wherein the vectors positioned outside the similarity threshold are in different classes.
6. The method for processing new knowledge point discovery based on clustering algorithm as claimed in claim 4, wherein: and converting the vectors of different classes into second knowledge points according to the established indexes.
7. A processing apparatus, characterized by: the method comprises the following steps:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring text representation and knowledge numbers of first knowledge points inquired by a user in a question-answering system;
the processing unit is used for processing the first knowledge points, converting the first knowledge points into second knowledge points and analyzing the second knowledge points;
a display unit: and displaying the subject term on the second knowledge point.
8. An electronic device comprising a memory and a processor, characterized in that: the memory has stored therein a computer program, and the processor is arranged to execute the processing method of new knowledge point discovery based on clustering algorithm according to any one of claims 1 to 6 by the computer program.
9. A storage medium, characterized by: the storage medium has stored therein a computer program arranged to execute, when running, the processing method of new knowledge point discovery based on clustering algorithm according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011446866.7A CN112463943A (en) | 2020-12-11 | 2020-12-11 | Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011446866.7A CN112463943A (en) | 2020-12-11 | 2020-12-11 | Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112463943A true CN112463943A (en) | 2021-03-09 |
Family
ID=74801449
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011446866.7A Pending CN112463943A (en) | 2020-12-11 | 2020-12-11 | Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112463943A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105955965A (en) * | 2016-06-21 | 2016-09-21 | 上海智臻智能网络科技股份有限公司 | Question information processing method and device |
US20170330082A1 (en) * | 2016-05-13 | 2017-11-16 | Cognitive Scale, Inc. | Lossless Parsing When Storing Knowledge Elements Within a Universal Cognitive Graph |
CN108256056A (en) * | 2018-01-12 | 2018-07-06 | 广州杰赛科技股份有限公司 | Intelligent answer method and system |
CN110309377A (en) * | 2018-03-22 | 2019-10-08 | 阿里巴巴集团控股有限公司 | Semanteme normalization puts question to generation, the response of mode to determine method and device |
CN110598002A (en) * | 2019-08-14 | 2019-12-20 | 广州视源电子科技股份有限公司 | Knowledge graph library construction method and device, computer storage medium and electronic equipment |
CN111858891A (en) * | 2020-07-23 | 2020-10-30 | 平安科技(深圳)有限公司 | Question-answer library construction method and device, electronic equipment and storage medium |
-
2020
- 2020-12-11 CN CN202011446866.7A patent/CN112463943A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170330082A1 (en) * | 2016-05-13 | 2017-11-16 | Cognitive Scale, Inc. | Lossless Parsing When Storing Knowledge Elements Within a Universal Cognitive Graph |
CN105955965A (en) * | 2016-06-21 | 2016-09-21 | 上海智臻智能网络科技股份有限公司 | Question information processing method and device |
CN108256056A (en) * | 2018-01-12 | 2018-07-06 | 广州杰赛科技股份有限公司 | Intelligent answer method and system |
CN110309377A (en) * | 2018-03-22 | 2019-10-08 | 阿里巴巴集团控股有限公司 | Semanteme normalization puts question to generation, the response of mode to determine method and device |
CN110598002A (en) * | 2019-08-14 | 2019-12-20 | 广州视源电子科技股份有限公司 | Knowledge graph library construction method and device, computer storage medium and electronic equipment |
CN111858891A (en) * | 2020-07-23 | 2020-10-30 | 平安科技(深圳)有限公司 | Question-answer library construction method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
赵国庆: "《计算机应用技术实训教程》", 31 August 2016, 北京理工大学出版社 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11062090B2 (en) | Method and apparatus for mining general text content, server, and storage medium | |
US20180075368A1 (en) | System and Method of Advising Human Verification of Often-Confused Class Predictions | |
CN109189767B (en) | Data processing method and device, electronic equipment and storage medium | |
CN111460250B (en) | Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus | |
US20160092448A1 (en) | Method For Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon | |
CN110059172B (en) | Method and device for recommending answers based on natural language understanding | |
CN110245232B (en) | Text classification method, device, medium and computing equipment | |
CN113051911B (en) | Method, apparatus, device, medium and program product for extracting sensitive words | |
CN116186594B (en) | Method for realizing intelligent detection of environment change trend based on decision network combined with big data | |
CN113704389A (en) | Data evaluation method and device, computer equipment and storage medium | |
CN113486664A (en) | Text data visualization analysis method, device, equipment and storage medium | |
CN112800919A (en) | Method, device and equipment for detecting target type video and storage medium | |
CN116089873A (en) | Model training method, data classification and classification method, device, equipment and medium | |
CN111063446B (en) | Method, apparatus, device and storage medium for standardizing medical text data | |
CN109408175B (en) | Real-time interaction method and system in general high-performance deep learning calculation engine | |
CN113746780A (en) | Abnormal host detection method, device, medium and equipment based on host image | |
US11625630B2 (en) | Identifying intent in dialog data through variant assessment | |
CN113392920B (en) | Method, apparatus, device, medium, and program product for generating cheating prediction model | |
CN113033707B (en) | Video classification method and device, readable medium and electronic equipment | |
CN114169398A (en) | Photovoltaic direct-current arc fault identification method and device based on random forest algorithm | |
CN114841471B (en) | Knowledge point prediction method and device, electronic equipment and storage medium | |
CN116127066A (en) | Text clustering method, text clustering device, electronic equipment and storage medium | |
CN114201607B (en) | Information processing method and device | |
CN112463943A (en) | Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium | |
CN114020882A (en) | Method and device for determining engineering machinery fault solution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210309 |
|
RJ01 | Rejection of invention patent application after publication |