CN112463943A - Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium - Google Patents

Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium Download PDF

Info

Publication number
CN112463943A
CN112463943A CN202011446866.7A CN202011446866A CN112463943A CN 112463943 A CN112463943 A CN 112463943A CN 202011446866 A CN202011446866 A CN 202011446866A CN 112463943 A CN112463943 A CN 112463943A
Authority
CN
China
Prior art keywords
knowledge
knowledge points
vectors
clustering algorithm
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011446866.7A
Other languages
Chinese (zh)
Inventor
周柳阳
侯克鑫
蒋林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yihao Hulian Technology Co ltd
Original Assignee
Shenzhen Yihao Hulian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yihao Hulian Technology Co ltd filed Critical Shenzhen Yihao Hulian Technology Co ltd
Priority to CN202011446866.7A priority Critical patent/CN112463943A/en
Publication of CN112463943A publication Critical patent/CN112463943A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a processing method and a device for discovering new knowledge points based on a clustering algorithm, electronic equipment and a storage medium, comprising the following steps: step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system; step S2: converting the textual representation into a vector; step S3: the vectors correspond to the first knowledge points one by one; step S4: calculating vector space cosine similarity of all vectors; step S5: classifying the vectors into categories; step S6: correspondingly converting the vectors of different classes into second knowledge points; step S7: and performing theme analysis and theme word display on the second knowledge points.

Description

Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of data processing, in particular to a new knowledge point discovery processing method based on a clustering algorithm, a device thereof, electronic equipment and a storage medium.
[ background of the invention ]
The current question-answering system finds out the knowledge point with the highest similarity to the problem proposed by the user in a retrieval mode, and the effect achieved by the mode depends on the construction quality of the knowledge base, so that the knowledge base has higher cognitive requirements for maintainers of the knowledge base, and the maintainers needing the knowledge base have more sufficient understanding on the related domain knowledge.
Therefore, the prior art is not sufficient and needs to be improved.
[ summary of the invention ]
In order to overcome the technical problems, the invention provides a processing method and a device for discovering new knowledge points based on a clustering algorithm, electronic equipment and a storage medium.
The invention provides a processing method for discovering new knowledge points based on a clustering algorithm, which comprises the following steps:
step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system;
step S2: converting the textual representation into a vector;
step S3: the vectors correspond to the first knowledge points one by one;
step S4: calculating vector space cosine similarity of all vectors;
step S5: classifying the vectors into categories;
step S6: correspondingly converting the vectors of different classes into second knowledge points;
step S7: and performing theme analysis and theme word display on the second knowledge points.
Preferably, the method further comprises the following steps: step S8: and optimizing the knowledge base.
Preferably, in step S2, the text representation is converted according to a deep neural network pre-training model trained on text semantic similarity data.
Preferably, in step S3, the vectors are associated with knowledge numbers and indexes are built.
Preferably, vectors within the similarity threshold are grouped into the same class according to the similarity threshold, and vectors outside the similarity threshold are grouped into different classes.
Preferably, the vectors of the different classes are converted into second knowledge points according to the established indices.
The present invention also provides a processing apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring text representation and knowledge numbers of first knowledge points inquired by a user in a question-answering system;
the processing unit is used for processing the first knowledge points, converting the first knowledge points into second knowledge points and analyzing the second knowledge points;
a display unit: and displaying the subject term on the second knowledge point.
The invention also provides an electronic device comprising a memory in which a computer program is stored and a processor arranged to execute the above processing method for new knowledge point discovery based on a clustering algorithm by means of the computer program.
The invention also provides a storage medium having stored therein a computer program arranged to perform the above-described processing method for new knowledge point discovery based on a clustering algorithm when running.
Compared with the prior art, the classification is formed by automatically clustering the knowledge points in the question-answering system, the knowledge base is constructed in an auxiliary mode, the question-answering effect is favorably improved, new knowledge points can be automatically found, the cognitive requirements of knowledge base maintenance personnel of the question-answering system are favorably reduced, and the working intensity of the knowledge base maintenance personnel is favorably reduced.
[ description of the drawings ]
Fig. 1 is a specific flow diagram of the processing method for discovering new knowledge points based on a clustering algorithm according to the present invention.
Fig. 2 is a block diagram of a processing apparatus according to a second embodiment of the present invention.
Fig. 3 is a block diagram of an electronic device according to a third embodiment of the present invention.
Description of reference numerals:
10. a processing device; 11. an acquisition unit; 12. a processing unit; 13. a display unit; 20. an electronic device; 21. a memory; 22. a processor.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a processing method for discovering new knowledge points based on a clustering algorithm, which includes the following steps:
step S1: a text representation and a knowledge number of a first knowledge point queried by a user in a question-and-answer system are obtained.
Specifically, the first knowledge point of the question-and-answer system is used for the solution of common questions, which consists of a specific text representation and a corresponding knowledge number.
Step S2: the text representation is converted into a vector.
Specifically, in the invention, the text representation of the first knowledge point is a vector correspondingly converted by a deep neural network pre-training model trained by text semantic similarity data, and the text representation is converted into the vector, so that the first knowledge point can be conveniently clustered by the vector subsequently.
Step S3: and the vectors are in one-to-one correspondence with the first knowledge points.
Specifically, by associating the vector with the knowledge number of the first knowledge point, one-to-one correspondence with the first knowledge point is achieved, and an index of the vector and the knowledge number is established.
Step S4: and calculating the vector space cosine similarity of all vectors.
Specifically, the cosine value of the included angle of two vectors in the vector space is used as a measure of the difference degree between the two vectors, wherein the closer the cosine value is to 1, the closer the included angle of the two vectors is to 0 degrees is shown, namely the difference of the two vectors is smaller; conversely, it is shown that the greater the difference between the two vectors.
Step S5: the vectors are classified into categories.
Further, a similarity threshold is set, and vectors within the similarity threshold are gathered into the same category, wherein vectors outside the similarity threshold are in different categories. Wherein the similarity threshold may be determined by the recall rate of the model on the similarity test set.
Step S6: and correspondingly converting the vectors of different classes into second knowledge points.
The classified vector is converted into a second knowledge point according to the established index.
Step S7: performing theme analysis and theme word display on the second knowledge points;
step S8: and optimizing the knowledge base.
The optimization of the knowledge base can be specifically the optimization of a maintainer of the knowledge base according to the second knowledge point.
Referring to fig. 2, a second embodiment of the present invention provides a processing device 10, which employs the above method, where the processing device 10 includes an obtaining unit 11 for obtaining a text representation and a knowledge number of a first knowledge point queried by a user in a question-and-answer system;
the processing unit 12 is configured to process, convert, and analyze the first knowledge point, and specifically, process the topic analysis on the second knowledge point in steps S2-S7 of the method;
the display unit 13: and displaying the subject term on the second knowledge point.
Referring to fig. 3, a third embodiment of the invention provides an electronic device 20, and the electronic device 20 is configured to implement the processing method for discovering a new knowledge point based on a clustering algorithm. The electronic device 20 comprises a memory 21 and a processor 22.
In particular, the memory 21 has stored therein a computer program, and the processor 22 is arranged to execute the processing method of new knowledge point discovery based on a clustering algorithm as described above by means of the computer program.
The memory 21 may be used for storing software programs and modules, such as program instructions or modules corresponding to the processing method and apparatus for new knowledge point discovery based on clustering algorithm described above. The processor 22 executes various functional applications and data processing by running software programs and modules stored in the memory 21, namely, the processing method for finding new knowledge points based on the clustering algorithm is realized.
The fourth embodiment of the present invention also provides a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps of any of the method embodiments described above when executed.
It is understood that, in this embodiment, all or part of the steps of the method of the above embodiments may be implemented by a program instructing hardware related to the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include, for example, a floppy disk, an optical disk, a DVD, a hard disk, a flash Memory, a U-disk, a CF card, an SD card, an MMC card, an SM card, a Memory Stick (Memory Stick), an xD card, and the like.
In this embodiment, a computer software product is stored in a storage medium and includes instructions for causing one or more computer devices (which may be personal computer devices, servers, or other network devices) to perform all or part of the steps of the methods described in the embodiments of the present invention.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Compared with the prior art, the classification is formed by automatically clustering the knowledge points in the question-answering system, the knowledge base is constructed in an auxiliary mode, the question-answering effect is favorably improved, new knowledge points can be automatically found, the cognitive requirements of knowledge base maintenance personnel of the question-answering system are favorably reduced, and the working intensity of the knowledge base maintenance personnel is favorably reduced.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit of the present invention should be included in the scope of the present invention.

Claims (9)

1. A processing method for discovering new knowledge points based on a clustering algorithm is characterized in that: the method comprises the following steps:
step S1: acquiring text representation and knowledge numbers of first knowledge points queried by a user in a question-answering system;
step S2: converting the textual representation into a vector;
step S3: the vectors correspond to the first knowledge points one by one;
step S4: calculating vector space cosine similarity of all vectors;
step S5: classifying the vectors into categories;
step S6: correspondingly converting the vectors of different classes into second knowledge points;
step S7: and performing theme analysis and theme word display on the second knowledge points.
2. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: also comprises the following steps:
step S8: and optimizing the knowledge base.
3. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: in step S2, the deep neural network pre-training model trained from the text semantic similarity data converts the text representation.
4. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: in step S3, the vector is associated with a knowledge number and an index is built.
5. The processing method of new knowledge point discovery based on clustering algorithm as claimed in claim 1, characterized in that: and aggregating vectors positioned in the similarity threshold into the same class according to the similarity threshold, wherein the vectors positioned outside the similarity threshold are in different classes.
6. The method for processing new knowledge point discovery based on clustering algorithm as claimed in claim 4, wherein: and converting the vectors of different classes into second knowledge points according to the established indexes.
7. A processing apparatus, characterized by: the method comprises the following steps:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring text representation and knowledge numbers of first knowledge points inquired by a user in a question-answering system;
the processing unit is used for processing the first knowledge points, converting the first knowledge points into second knowledge points and analyzing the second knowledge points;
a display unit: and displaying the subject term on the second knowledge point.
8. An electronic device comprising a memory and a processor, characterized in that: the memory has stored therein a computer program, and the processor is arranged to execute the processing method of new knowledge point discovery based on clustering algorithm according to any one of claims 1 to 6 by the computer program.
9. A storage medium, characterized by: the storage medium has stored therein a computer program arranged to execute, when running, the processing method of new knowledge point discovery based on clustering algorithm according to any one of claims 1 to 6.
CN202011446866.7A 2020-12-11 2020-12-11 Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium Pending CN112463943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011446866.7A CN112463943A (en) 2020-12-11 2020-12-11 Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011446866.7A CN112463943A (en) 2020-12-11 2020-12-11 Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112463943A true CN112463943A (en) 2021-03-09

Family

ID=74801449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011446866.7A Pending CN112463943A (en) 2020-12-11 2020-12-11 Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112463943A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
US20170330082A1 (en) * 2016-05-13 2017-11-16 Cognitive Scale, Inc. Lossless Parsing When Storing Knowledge Elements Within a Universal Cognitive Graph
CN108256056A (en) * 2018-01-12 2018-07-06 广州杰赛科技股份有限公司 Intelligent answer method and system
CN110309377A (en) * 2018-03-22 2019-10-08 阿里巴巴集团控股有限公司 Semanteme normalization puts question to generation, the response of mode to determine method and device
CN110598002A (en) * 2019-08-14 2019-12-20 广州视源电子科技股份有限公司 Knowledge graph library construction method and device, computer storage medium and electronic equipment
CN111858891A (en) * 2020-07-23 2020-10-30 平安科技(深圳)有限公司 Question-answer library construction method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330082A1 (en) * 2016-05-13 2017-11-16 Cognitive Scale, Inc. Lossless Parsing When Storing Knowledge Elements Within a Universal Cognitive Graph
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN108256056A (en) * 2018-01-12 2018-07-06 广州杰赛科技股份有限公司 Intelligent answer method and system
CN110309377A (en) * 2018-03-22 2019-10-08 阿里巴巴集团控股有限公司 Semanteme normalization puts question to generation, the response of mode to determine method and device
CN110598002A (en) * 2019-08-14 2019-12-20 广州视源电子科技股份有限公司 Knowledge graph library construction method and device, computer storage medium and electronic equipment
CN111858891A (en) * 2020-07-23 2020-10-30 平安科技(深圳)有限公司 Question-answer library construction method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵国庆: "《计算机应用技术实训教程》", 31 August 2016, 北京理工大学出版社 *

Similar Documents

Publication Publication Date Title
US11062090B2 (en) Method and apparatus for mining general text content, server, and storage medium
US20180075368A1 (en) System and Method of Advising Human Verification of Often-Confused Class Predictions
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
CN111460250B (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
US20160092448A1 (en) Method For Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN110245232B (en) Text classification method, device, medium and computing equipment
CN113051911B (en) Method, apparatus, device, medium and program product for extracting sensitive words
CN116186594B (en) Method for realizing intelligent detection of environment change trend based on decision network combined with big data
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN112800919A (en) Method, device and equipment for detecting target type video and storage medium
CN116089873A (en) Model training method, data classification and classification method, device, equipment and medium
CN111063446B (en) Method, apparatus, device and storage medium for standardizing medical text data
CN109408175B (en) Real-time interaction method and system in general high-performance deep learning calculation engine
CN113746780A (en) Abnormal host detection method, device, medium and equipment based on host image
US11625630B2 (en) Identifying intent in dialog data through variant assessment
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN114169398A (en) Photovoltaic direct-current arc fault identification method and device based on random forest algorithm
CN114841471B (en) Knowledge point prediction method and device, electronic equipment and storage medium
CN116127066A (en) Text clustering method, text clustering device, electronic equipment and storage medium
CN114201607B (en) Information processing method and device
CN112463943A (en) Processing method and device for discovering new knowledge points based on clustering algorithm, electronic equipment and storage medium
CN114020882A (en) Method and device for determining engineering machinery fault solution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210309

RJ01 Rejection of invention patent application after publication