CN112199938A - Scientific and technological project similarity analysis method, computer equipment and storage medium - Google Patents
Scientific and technological project similarity analysis method, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112199938A CN112199938A CN202011258083.6A CN202011258083A CN112199938A CN 112199938 A CN112199938 A CN 112199938A CN 202011258083 A CN202011258083 A CN 202011258083A CN 112199938 A CN112199938 A CN 112199938A
- Authority
- CN
- China
- Prior art keywords
- project
- historical
- evaluated
- information
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 60
- 238000011156 evaluation Methods 0.000 claims abstract description 70
- 238000012552 review Methods 0.000 claims abstract description 59
- 239000000463 material Substances 0.000 claims abstract description 50
- 238000000034 method Methods 0.000 claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 41
- 238000004364 calculation method Methods 0.000 claims description 26
- 238000011160 research Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 abstract description 21
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 9
- 238000012549 training Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 101150035983 str1 gene Proteins 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Entrepreneurship & Innovation (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a scientific and technological project similarity analysis method, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an electronic document of a declared material of a project to be evaluated, and extracting a text of the electronic document to obtain title information to be evaluated of the project to be evaluated; acquiring an electronic document of historical evaluation project declaration materials, and performing text extraction on the electronic document to obtain historical title information of a historical evaluation project; carrying out short text similarity analysis according to the to-be-evaluated title information and the historical title information, and preliminarily judging whether the to-be-evaluated title information and the historical title information are similar according to an analysis result; and if so, performing text extraction on the electronic documents of the project to be evaluated and the historical project to obtain long text information to be evaluated and historical long text information, performing long text similarity analysis and final similarity judgment, and if not, performing circulation or ending. The method is suitable for text similarity analysis of science and technology project declaration materials in the field of electric power professions, is beneficial to realizing intelligent auxiliary establishment review and avoids repeated establishment.
Description
Technical Field
The invention relates to the technical field of software information, in particular to a scientific and technological project similarity analysis method, computer equipment and a storage medium.
Background
With the continuous deep electric power reform and the continuous development of scientific technology, scientific and technical research projects in various professional fields of power grid companies are more and more subjected to item review, and in order to avoid repeated declaration of similar projects, similarity review needs to be performed on declaration materials of the scientific and technical research projects. Generally speaking, science and technology project declaration materials are large texts, at present, a science and technology project similarity judgment method needs to depend on professional manual reading and discrimination comparison, and for each science and technology project declaration material, the science and technology project declaration material needs to be manually compared with a large amount of prior science and technology project declaration materials in a database, so that a large amount of labor and time cost is consumed, and the high efficiency and accuracy of similarity judgment are difficult to guarantee. With the enhancement of environmental awareness, the power grid company carries out paperless office work at present, scientific and technological project declaration materials are submitted and reviewed in an electronic document mode, the electronic document provides a basis for the informatization of review work, whether repeated declaration conditions exist can be judged by analyzing the text similarity of the project to be reviewed and the historical review project, the current text similarity analysis mainly comprises word segmentation and distance calculation between words after word segmentation, and finally a similarity result is obtained comprehensively.
However, the current text similarity analysis method is not suitable for scientific and technical research project establishment review in each professional field of the power grid company, and the main reasons are as follows:
(1) because the major words in the title are more and all appear as long words combined, the major words are not purely segmentable, such as 'research and application of a device visualization monitoring model based on big data accelerated analysis and three-dimensional digitization', wherein the 'big data accelerated analysis', 'device visualization detection model' is simply segmented into 'big data', 'accelerated', 'analysis', 'device', 'visualization', 'detection', 'model', and the meaning has changed;
(2) semantic understanding is less effective for professional names. Such as: the similarity of the key technology and the development mode research of the source end base comprehensive energy system and the research of the comprehensive energy system multi-energy conversion simulation and comprehensive energy efficiency evaluation technology on semantic understanding can be relatively high, but actually, the two scientific and technological projects are greatly different;
(3) the title of the scientific and technical project is relatively short, about 30 words are long, and only 10 words are short. Since science and technology project titles contain a large number of professional names, and the professional names are often combined together to form longer words containing semantics, for two project titles, if there are more repeated such terms in the two names, the likelihood that the two projects are similar is very high. But if direct edit distances are used for calculation, the similarity may be very low.
(4) The scientific and technological project target is a short text, and the contents of project abstract, main research contents, technical routes, expected targets and other parts in the declaration material of the scientific and technological project are long texts and are composed of a plurality of sentences, and the upper sentence and the lower sentence are mostly in mutual relation, so that the text comparison of the declaration material of a scientific and technological project cannot be simply processed by using a text comparison method, and the existing text processing does not consider the point.
Disclosure of Invention
The invention aims to provide a scientific and technological project similarity analysis method, computer equipment and a computer readable storage medium, which are suitable for text similarity analysis of scientific and technological project declaration materials in the various professional fields of electric power, are beneficial to realizing intelligent auxiliary establishment review, avoid repeated establishment and guarantee the quality improvement and efficiency improvement of establishment management work.
To achieve the above objective, according to a first aspect, an embodiment of the present invention provides a method for analyzing similarity of scientific and technological projects, including:
s1, obtaining an electronic document of the declared material of the project to be evaluated, and extracting the text of the electronic document to obtain the title information of the project to be evaluated;
step S2, obtaining an ith historical review project declaration material electronic document, and performing text extraction on the ith historical review project declaration material electronic document to obtain historical title information of the ith historical review project;
step S3, carrying out short text similarity analysis according to the information of the subject to be evaluated and the historical title information of the ith historical evaluation project, and preliminarily judging whether the information of the subject to be evaluated and the historical title information of the ith historical evaluation project are similar according to the analysis result; if yes, sequentially executing steps S4-S5, otherwise executing step S6; wherein the initial value of i is 1;
step S4, performing text extraction on the electronic document of the declaration material of the project to be evaluated to obtain long text information to be evaluated of the project to be evaluated, and performing text extraction on the electronic document of the declaration material of the ith historical project to obtain the long text information of the historical project;
step S5, according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, carrying out long text similarity analysis, and finally judging whether the two are similar according to the analysis result;
step S6, judging whether i is less than N; if yes, making i equal to i +1, and returning to the step S2; if not, outputting the similar judgment results between the project to be evaluated and all the historical evaluation projects to a display unit for displaying, and ending the analysis process; wherein M is a preset number; where N is the total number of historical review items.
Optionally, the step S31 includes:
step S31, obtaining the longest continuous common substring between the to-be-evaluated subject information and the historical title information of the ith historical evaluation project, and removing the longest continuous common substring from the to-be-evaluated subject information and the historical title information of the ith historical evaluation project respectively to obtain a first character string and a second character string;
step S32, calculating the edit distance between the first character string and the second character string;
step S33, calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance;
and step S34, judging whether the information to be evaluated and the historical title information of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the information to be evaluated and the historical title information of the ith historical evaluation project and a first similarity threshold value.
Optionally, the step S31 includes:
step S311, setting the subject information to be evaluated as a character string S1The historical title information of the ith historical review project is a character string si;
Step S312, finding character string S1And siLongest continuous common substring sz;
Step S313, if the longest continuous common substring SzIs greater than 2, the character string s is respectively connected1And siS inzAfter removal, a new 2 character string s is obtained10And si0And order s1=s10,si=si0Then returning to step S312; if the longest consecutive common substring szIs less than or equal to 2, s is output10As a first string, si0As a second string.
Optionally, the calculating the similarity between the to-be-reviewed title information and the historical title information of the ith historical review project according to the edit distance includes:
wherein s is10Representing a first string, si0Representing a second string, sim(s)10,si0) Calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance, ED represents the editing distance between the first character string and the second character string, len(s)10) Indicates the length of the first string, len(s)i0) Indicating the length of the second string.
Optionally, the information of the to-be-evaluated subject includes a project main title of the to-be-evaluated item and a subtitle in research content; the historical title information of the ith historical review project comprises a project main title of the ith historical review project and a subtitle in research content;
the step S31 specifically includes: obtaining the longest continuous common substring between each title information in the to-be-evaluated title information and each title information in the historical title information of the ith historical evaluation project, and respectively removing the longest continuous common substrings to obtain a first character string sjk1And a second character string sjk2(ii) a Wherein s isjk1Showing a first character string, s, obtained by removing the jth title information in the to-be-evaluated title information and the kth title information in the historical title information after removing the maximum continuous common substringjk2Representing a second character string obtained after removing the largest continuous common substring of the kth title information in the historical title information and the jth title information in the to-be-evaluated title information;
the step S32 specifically includes: calculating all the first strings sjk1And a second character string s corresponding theretojk2The editing distance between the two groups is obtained to obtain an editing distance set; each title information in the to-be-evaluated title information has k corresponding editing distances;
the step S33 specifically includes: calculating all first character strings s according to the edit distance setjk1And a second character string s corresponding theretojk2Calculating the similarity between the information of the title to be evaluated and the information of the historical title of the ith historical evaluation project according to all the similarity calculation results; and each title information in the to-be-evaluated title information has corresponding k similarity calculation results.
Optionally, the outputting the similar judgment results between the to-be-evaluated item and all the historical evaluation items to a display unit for displaying includes:
if at least one historical review project is similar to the project to be reviewed, outputting the declaration material electronic document of the at least one historical review project to a display unit;
if at least one historical evaluation project is similar to the to-be-evaluated project, sorting the similarity of the to-be-evaluated project and all the historical evaluation projects, and then selecting the declaration material electronic documents of the M historical evaluation projects with the highest similarity to output to a display unit for displaying; m is a preset number.
Optionally, the step S5 includes:
step S51, inputting pre-trained Doc2vec models respectively according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, and outputting corresponding paragraph vectors to be evaluated and the historical paragraph vectors of the ith historical evaluation project;
step S52, calculating the similarity between the paragraph vector to be reviewed and the historical paragraph vector of the ith historical review project;
and step S53, judging whether the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project and a second similarity threshold.
Optionally, the step S1 further includes:
text extraction is carried out on the electronic document of the declaration material of the project to be evaluated to obtain project technical field information of the project to be evaluated;
the obtaining of the ith electronic document of history review project declaration material in step S2 specifically includes:
acquiring an ith historical review project declaration material electronic document in a database corresponding to the project technical field according to the project technical field information of the project to be reviewed;
wherein all the historical review items in the step S6 are all the historical review items in the database of the corresponding project technology field.
According to a third aspect, an embodiment of the present invention further provides a computer device, including: according to the scientific and technological project similarity analysis system; or a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps according to the science and technology project similarity analysis method.
According to a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the scientific and technical project similarity analysis method.
The embodiment of the invention provides a scientific and technological project similarity analysis method and system, computer equipment and a computer readable storage medium, wherein the title information of declaration material electronic documents of a project to be evaluated and a historical evaluation project is extracted, and the similarity of the extracted title information is judged; and further extracting the long text information of the project to be evaluated and the historical evaluation project according to the preliminary similarity judgment result, carrying out similarity analysis according to the long text information, and finally determining whether the projects are similar or not according to the analysis result. The method is based on the text characteristics of the scientific and technological project declaration material, and the short text similarity analysis and the long text similarity analysis are combined to judge whether two projects are similar, so that the method can assist a reviewer in quickly judging whether the projects are repeatedly declared, the efficiency and accuracy of similarity judgment are guaranteed, intelligent auxiliary project approval can be realized, repeated project approval is avoided, and the efficiency of project management work is guaranteed to be increased.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a scientific and technological project similarity analysis method according to an embodiment of the present invention.
FIG. 2 is a block diagram of a Doc2vec PV-DM according to an embodiment of the invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.
Referring to fig. 1, an embodiment of the present invention provides a method for analyzing similarity of scientific and technological projects, including:
s1, obtaining an electronic document of the declared material of the project to be evaluated, and extracting the text of the electronic document to obtain the title information of the project to be evaluated;
for example, the "research on key technologies and development patterns of the source-end-base integrated energy system" is described.
Step S2, obtaining an ith historical review project declaration material electronic document, and performing text extraction on the ith historical review project declaration material electronic document to obtain historical title information of the ith historical review project;
for example, the research on the comprehensive energy system multi-energy conversion simulation and comprehensive energy efficiency evaluation technology is carried out.
Step S3, carrying out short text similarity analysis according to the information of the subject to be evaluated and the historical title information of the ith historical evaluation project, and preliminarily judging whether the information of the subject to be evaluated and the historical title information of the ith historical evaluation project are similar according to the analysis result; if yes, sequentially executing steps S4-S5, otherwise executing step S6; wherein the initial value of i is 1;
step S4, performing text extraction on the electronic document of the declaration material of the project to be evaluated to obtain long text information to be evaluated of the project to be evaluated, and performing text extraction on the electronic document of the declaration material of the ith historical project to obtain the long text information of the historical project;
step S5, according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, carrying out long text similarity analysis, and finally judging whether the two are similar according to the analysis result;
step S6, judging whether i is less than N; if yes, making i equal to i +1, and returning to the step S2; if not, outputting the similar judgment results between the project to be evaluated and all the historical evaluation projects to a display unit for displaying, and ending the analysis process; wherein M is a preset number; where N is the total number of historical review items. M and N are integers.
According to the method, the header information of the declaration material electronic document of the project to be evaluated and the current historical evaluation project is extracted, and the similarity of the extracted header information is judged, and because the header information is a short text, the calculation amount is small, the required calculation resources are less, and the consumed calculation time is very small, the method is beneficial to traversing all the historical evaluation projects, preliminarily and quickly judging the similarity between the project to be evaluated and all the historical evaluation projects, and realizing the preliminary screening of the similarity projects; and further extracting the long text information of the project to be evaluated and the historical evaluation project according to the preliminary similarity judgment result, performing similarity analysis according to the long text information, and finally determining whether the project to be evaluated and the current historical evaluation project are similar or not according to the analysis result. In this embodiment, based on the text characteristics of the science and technology project declaration material, a method combining short text similarity analysis and long text similarity analysis is provided to determine whether two projects are similar.
Optionally, the step S31 includes:
step S31, obtaining the longest continuous common substring between the to-be-evaluated subject information and the historical title information of the ith historical evaluation project, and removing the longest continuous common substring from the to-be-evaluated subject information and the historical title information of the ith historical evaluation project respectively to obtain a first character string and a second character string;
illustratively, the longest continuous common substring of the key technology and development mode research of the source-end base comprehensive energy system and the research of the comprehensive energy system multi-energy conversion simulation and comprehensive energy efficiency evaluation technology is the comprehensive energy system.
Specifically, the reason for selecting the continuous common substring instead of the Longest Common Subsequence (LCS) in this embodiment is that the longest common subsequence may split an originally semantic noun into single words, whereas a continuous substring occurring in both character strings may be a complete noun; where the longest continuous common substring problem is finding the substring for which two or more known strings are longest, the longest continuous common substring problem differs from the longest common subsequence problem in that the subsequences need not be continuous, but the substrings must be.
Step S32, calculating the edit distance between the first character string and the second character string;
specifically, the editing distance refers to the minimum editing times required for converting one substring into another substring between the two substrings; wherein the editing operation comprises deletion, insertion, replacement and the like.
The edit distance may be expressed as:
where D (str1, str2, i, j) represents the edit distance between the first i characters of the string str1 and the first j characters of the string str2, str1iRepresenting the ith sub-string of the string str 1. The initial value D (str1, str2,0,0) is 0.
The above equation is a recursive definition, and if there are strings s1 and s2, which have lengths of m and n, respectively, a matrix of matching relationships of (m +1) × (n +1) orders is typically used to calculate the edit distance. The values of the elements in the matrix are:
wherein d isi,jThe values of the ith row and j column in the matrix are shown, and are given belowAn example of a matching relationship matrix is obtained, and the edit distances of the character strings "similarity calculation" and "calculation similarity" are obtained, and the obtained edit distance is 4, as shown in table 1:
TABLE 1 edit distance computation matrix
0 | Phase (C) | Like | Degree of rotation | Meter | Calculating out |
Meter | 1 | 2 | 3 | 3 | 4 |
Calculating out | 2 | 2 | 3 | 4 | 3 |
Phase (C) | 2 | 3 | 3 | 4 | 4 |
Like | 3 | 2 | 3 | 4 | 5 |
Degree of rotation | 4 | 3 | 2 | 3 | 4 |
Step S33, calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance;
specifically, in this embodiment, some scientific and technological project sets are randomly selected, and the project title similarity calculation of the existing method and the project title similarity calculation of this embodiment are performed respectively, and the comparison results are shown in table 2 below: it can be seen that the calculated editing distance is relatively small, and the similarity result is more consistent with the similarity value close to the reality. In addition, the results obtained by the existing method and the method of the embodiment are the same when no common substring exists.
TABLE 2 name similarity comparison results under different algorithms
It should be noted that the method of the present embodiment is used for calculating and comparing the titles of the projects, and can achieve a more desirable effect. For example, the item A is similar to the item title of the item B in the main content subtitle, so that the item A and the item B may have more or less similar relations, and the similar relations are used as a preliminary judgment basis for repeated declaration of the items; moreover, the calculation comparison method needs a small amount of calculation, the electronic documents of the science and technology project declaration materials are usually large texts, if each historical project is compared with the full text in a conventional manner, a large amount of time and calculation resources are inevitably consumed, and the second similarity judgment is further performed according to the long text only when the similarity exists in the initial judgment, so that the technical problem can be effectively solved by the method.
Step S34, judging whether the information of the subject to be evaluated and the historical title information of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the information of the subject to be evaluated and the historical title information of the ith historical evaluation project and a first similarity threshold;
specifically, when the similarity is greater than the first similarity threshold, it is determined that the subject information to be reviewed is similar to the ith historical review item, and at this time, the steps S4 to S5 are continuously performed.
Optionally, the step S31 includes:
step S311, setting the subject information to be evaluated as a character string S1The historical title information of the ith historical review project is a character string si;
Step S312, finding character string S1And siLongest continuous common substring sz;
Step S313, if the longest continuous common substring SzIs greater than 2, the character string s is respectively connected1And siS inzAfter removal, a new 2 character string s is obtained10And si0And order s1=s10,si=si0Then returning to step S312; if the longest consecutive common substring szIs less than or equal to 2, s is output10As a first string, si0As a second string.
Optionally, the calculating the similarity between the to-be-reviewed title information and the historical title information of the ith historical review project according to the edit distance includes:
wherein s is10Representing a first string, si0Representing a second string, sim(s)10,si0) Calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance, ED represents the editing distance between the first character string and the second character string, len(s)10) Indicates the length of the first string, len(s)i0) Indicating the length of the second string.
Optionally, the information of the to-be-evaluated subject includes a project main title of the to-be-evaluated item and a subtitle in research content; the historical title information of the ith historical review project comprises a project main title of the ith historical review project and a subtitle in research content;
specifically, in general, a project main title, that is, a project name, needs to be filled in a declaration material (project declaration form) of a scientific project; and describes the main study, which is generally described in several aspects, each of which has a subheading.
The step S31 specifically includes: obtaining the longest continuous common substring between each title information in the to-be-evaluated title information and each title information in the historical title information of the ith historical evaluation project, and respectively removing the longest continuous common substrings to obtain a first character string sjk1And a second character string sjk2(ii) a Wherein s isjk1Showing a first character string, s, obtained by removing the jth title information in the to-be-evaluated title information and the kth title information in the historical title information after removing the maximum continuous common substringjk2Indicating that the kth title information in the historical title information is removed from the historical title informationExamining a maximum continuous public substring of jth title information in the title information to obtain a second character string;
note that, both the main title of the project and the subtitle in the content under study are regarded as one piece of title information.
The step S32 specifically includes: calculating all the first strings sjk1And a second character string s corresponding theretojk2The editing distance between the two groups is obtained to obtain an editing distance set; each title information in the to-be-evaluated title information has k corresponding editing distances;
specifically, if there are j pieces of title information in the to-be-evaluated title information, j × k pieces of editing distance data are correspondingly associated with the to-be-evaluated title information.
The step S33 specifically includes: calculating all first character strings s according to the edit distance setjk1And a second character string s corresponding theretojk2Calculating the similarity between the information of the title to be evaluated and the information of the historical title of the ith historical evaluation project according to all the similarity calculation results; and each title information in the to-be-evaluated title information has corresponding k similarity calculation results.
Specifically, correspondingly, the title information to be reviewed has j × k similarity data; and for the j multiplied by k similarity data, taking the average similarity of the j multiplied by k similarity data and outputting the average similarity as the similarity of the to-be-evaluated subject information and the historical subject information of the ith historical evaluation project.
Optionally, the outputting the similar judgment results between the to-be-evaluated item and all the historical evaluation items to a display unit for displaying includes:
if at least one historical review project is similar to the project to be reviewed, outputting the declaration material electronic document of the at least one historical review project to a display unit;
if at least one historical evaluation project is similar to the to-be-evaluated project, sorting the similarity of the to-be-evaluated project and all the historical evaluation projects, and then selecting the declaration material electronic documents of the M historical evaluation projects with the highest similarity to output to a display unit for displaying; m is a preset number.
Specifically, after the similarity determination of the method of the present embodiment, the M most similar historical review items are output for the reviewers to further confirm.
Optionally, the step S5 includes:
step S51, inputting pre-trained Doc2vec models respectively according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, and outputting corresponding paragraph vectors to be evaluated and the historical paragraph vectors of the ith historical evaluation project;
step S52, calculating the similarity between the paragraph vector to be reviewed and the historical paragraph vector of the ith historical review project;
illustratively, the similarity between two paragraph vectors may be determined according to the distance between them, wherein the closer the distance the greater the similarity.
It is understood that, in the present embodiment, the long text information may include multiple aspects, such as a project summary, main research content, and the like, each aspect includes multiple paragraphs, and the multiple aspects may be separated and individually subjected to similarity calculation; finally, carrying out comprehensive analysis calculation according to the similarity of multiple aspects, for example, taking the average value of the similarity of the multiple aspects as the analysis result of the similarity of the long text; for example, the similarity of multiple aspects is multiplied by corresponding preset weights respectively and then accumulated to be used as a long text similarity analysis result; for the similarity calculation of a certain aspect, for example, there are n paragraphs on the E aspect of the item to be evaluated, there are m paragraphs on the E aspect of the current history evaluation item, after the similarity calculation is performed on the multiple paragraphs on the certain aspect of the item to be evaluated and the multiple paragraphs on the certain aspect corresponding to the current history evaluation item, each paragraph on the E aspect of the item to be evaluated has m similarity calculation data, then there are n × m similarity calculation data on the n paragraphs on the E aspect of the item to be evaluated, and the similarity average value of the n × m similarity calculation data is used as the similarity of the item to be evaluated and the current history evaluation item on the E aspect.
And step S53, judging whether the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project and a second similarity threshold.
Specifically, in the embodiment, the Doc2vec Model is trained by specifically using a PV-DM (distribution Memory Model of Paragraph vectors) training method, as shown in fig. 2, a frame diagram of the Doc2vec PV-DM in the embodiment is shown, and it can be seen from fig. 2 that a vector representation of each Paragraph/sentence is added in addition to a vector at a word level. For example, for a sentence 'the cat sat on', if the word on in the sentence is to be predicted, the prediction can be performed not only according to the corresponding features generated by other words, but also according to the generated features of other words and sentences. Each paragraph/sentence is mapped into a vector space, which may be represented by a column of a matrix. Each word is also mapped to vector space, which can be represented by a column of the matrix. And then, cascading or averaging the paragraph vector and the word vector to obtain features, and predicting a next word in the sentence. A paragraph vector/sentence vector can also be considered as a word, which acts as a memory unit for the context or as a subject for the paragraph. Wherein, during training, the context length is fixed, and the training set is generated by using a sliding window method. And paragraph/sentence vectors are shared in that context. The training process of the Doc2vec model in this embodiment is specifically as follows, and mainly includes the following ((i) and (ii)):
training a model, and obtaining a word vector, a softmax parameter and a paragraph vector/sentence vector in known training data.
Inference stage, for new paragraphs, gets its vector expression. Specifically, more columns are added in the matrix, and in the case of a fixed length, the training is performed by using the method described above, and a gradient descent method is used to obtain a new D (paragraph vector matrix), thereby obtaining a vector expression of a new paragraph.
Optionally, the step S1 further includes:
text extraction is carried out on the electronic document of the declaration material of the project to be evaluated to obtain project technical field information of the project to be evaluated;
the obtaining of the ith electronic document of history review project declaration material in step S2 specifically includes:
acquiring an ith historical review project declaration material electronic document in a database corresponding to the project technical field according to the project technical field information of the project to be reviewed;
wherein all the historical review items in the step S6 are all the historical review items in the database of the corresponding project technology field.
Specifically, since there are many reviewed historical scientific and technological projects, a preliminary classification concept is further proposed in this embodiment, the electronic documents of the declaration materials of different types of historical scientific and technological projects are respectively stored in different databases, and when similarity analysis is performed, the similarity comparison is performed between the project to be reviewed and the historical scientific and technological projects in the corresponding technical fields according to the technical fields of the project to be reviewed, thereby effectively reducing the calculation workload.
To sum up, the problem of large data volume for science and technology projects is addressed in this embodiment, and 3 aspects of targeted setting are proposed altogether, and first the database classification is screened, second the preliminary similar screening of short text, and the third is the secondary similar screening of long text, screens layer by layer, and the whole process not only can accurately carry out similarity analysis, and the work load is less moreover, and the processing speed is very fast.
Another embodiment of the present invention further provides a computer device, including: a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the scientific and technological project similarity analysis method according to the above-mentioned embodiment.
Of course, the computer device may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the computer device may also include other components for implementing the functions of the device, which are not described herein again.
Illustratively, the computer program may be divided into one or more units, which are stored in the memory and executed by the processor to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the computer device.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center for the computer device and connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used for storing the computer program and/or unit, and the processor may implement various functions of the computer device by executing or executing the computer program and/or unit stored in the memory and calling data stored in the memory. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Another embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the scientific and technical project similarity analysis method according to the above-mentioned embodiment.
Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
To sum up, the embodiment of the invention provides a scientific and technological project similarity analysis method and system, computer equipment and a computer readable storage medium, the title information of the declaration material electronic documents of the project to be evaluated and the historical evaluation project is extracted, and the similarity of the extracted title information is judged.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (10)
1. A scientific and technological project similarity analysis method is characterized by comprising the following steps:
s1, obtaining an electronic document of the declared material of the project to be evaluated, and extracting the text of the electronic document to obtain the title information of the project to be evaluated;
step S2, obtaining an ith historical review project declaration material electronic document, and performing text extraction on the ith historical review project declaration material electronic document to obtain historical title information of the ith historical review project;
step S3, carrying out short text similarity analysis according to the information of the subject to be evaluated and the historical title information of the ith historical evaluation project, and preliminarily judging whether the information of the subject to be evaluated and the historical title information of the ith historical evaluation project are similar according to the analysis result; if yes, sequentially executing steps S4-S5, otherwise executing step S6; wherein the initial value of i is 1;
step S4, performing text extraction on the electronic document of the declaration material of the project to be evaluated to obtain long text information to be evaluated of the project to be evaluated, and performing text extraction on the electronic document of the declaration material of the ith historical project to obtain the long text information of the historical project;
step S5, according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, carrying out long text similarity analysis, and finally judging whether the two are similar according to the analysis result;
step S6, judging whether i is less than N; if yes, making i equal to i +1, and returning to the step S2; if not, outputting the similar judgment results between the project to be evaluated and all the historical evaluation projects to a display unit for displaying, and ending the analysis process; wherein M is a preset number; where N is the total number of historical review items.
2. The scientific and technological project similarity analysis method according to claim 1, wherein the step S31 includes:
step S31, obtaining the longest continuous common substring between the to-be-evaluated subject information and the historical title information of the ith historical evaluation project, and removing the longest continuous common substring from the to-be-evaluated subject information and the historical title information of the ith historical evaluation project respectively to obtain a first character string and a second character string;
step S32, calculating the edit distance between the first character string and the second character string;
step S33, calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance;
and step S34, judging whether the information to be evaluated and the historical title information of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the information to be evaluated and the historical title information of the ith historical evaluation project and a first similarity threshold value.
3. The scientific and technological project similarity analysis method according to claim 2, wherein the step S31 includes:
step S311, setting the subject information to be evaluated as a character string S1The historical title information of the ith historical review project is a character string si;
Step S312, finding character string S1And siLongest continuous common substring sz;
Step S313, if the longest continuous common substring SzIs greater than 2, the character string s is respectively connected1And siS inzAfter removal, a new 2 character string s is obtained10And si0And order s1=s10,si=si0Then returning to step S312; if the longest consecutive common substring szIs less than or equal to 2, s is output10As a first string, si0As a second string.
4. The method for analyzing similarity of technical projects according to claim 2, wherein the calculating the similarity between the information about the title to be reviewed and the information about the title of the ith historical review project according to the edit distance comprises:
wherein s is10Representing a first string, si0Representing a second string, sim(s)10,si0) Calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance, ED representing the editing distance between the first character string and the second character stringFrom, len(s)10) Indicates the length of the first string, len(s)i0) Indicating the length of the second string.
5. A scientific and technological project similarity analysis method according to claim 2, wherein the information of the titles to be evaluated comprises project main titles and sub-titles in research contents of the projects to be evaluated; the historical title information of the ith historical review project comprises a project main title of the ith historical review project and a subtitle in research content;
the step S31 specifically includes: obtaining the longest continuous common substring between each title information in the to-be-evaluated title information and each title information in the historical title information of the ith historical evaluation project, and respectively removing the longest continuous common substrings to obtain a first character string sjk1And a second character string sjk2(ii) a Wherein s isjk1Showing a first character string, s, obtained by removing the jth title information in the to-be-evaluated title information and the kth title information in the historical title information after removing the maximum continuous common substringjk2Representing a second character string obtained after removing the largest continuous common substring of the kth title information in the historical title information and the jth title information in the to-be-evaluated title information;
the step S32 specifically includes: calculating all the first strings sjk1And a second character string s corresponding theretojk2The editing distance between the two groups is obtained to obtain an editing distance set; each title information in the to-be-evaluated title information has k corresponding editing distances;
the step S33 specifically includes: calculating all first character strings s according to the edit distance setjk1And a second character string s corresponding theretojk2Calculating the similarity between the information of the title to be evaluated and the information of the historical title of the ith historical evaluation project according to all the similarity calculation results; and each title information in the to-be-evaluated title information has corresponding k similarity calculation results.
6. The scientific and technological project similarity analysis method according to claim 1, wherein the outputting of the similarity determination results between the project to be reviewed and all the historical review projects to a display unit for display comprises:
if at least one historical review project is similar to the project to be reviewed, outputting the declaration material electronic document of the at least one historical review project to a display unit;
if at least one historical evaluation project is similar to the to-be-evaluated project, sorting the similarity of the to-be-evaluated project and all the historical evaluation projects, and then selecting the declaration material electronic documents of the M historical evaluation projects with the highest similarity to output to a display unit for displaying; m is a preset number.
7. The scientific and technological project similarity analysis method according to claim 1, wherein the step S5 includes:
step S51, inputting pre-trained Doc2vec models respectively according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, and outputting corresponding paragraph vectors to be evaluated and the historical paragraph vectors of the ith historical evaluation project;
step S52, calculating the similarity between the paragraph vector to be reviewed and the historical paragraph vector of the ith historical review project;
and step S53, judging whether the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project and a second similarity threshold.
8. A scientific and technological project similarity analysis method according to claim 1,
the step S1 further includes:
text extraction is carried out on the electronic document of the declaration material of the project to be evaluated to obtain project technical field information of the project to be evaluated;
the obtaining of the ith electronic document of history review project declaration material in step S2 specifically includes:
and acquiring an ith historical review project declaration material electronic document in a database corresponding to the project technical field according to the project technical field information of the project to be reviewed.
9. A computer device, comprising: a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the scientific and technological project similarity analysis method according to any one of claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the scientific project similarity analysis method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011258083.6A CN112199938B (en) | 2020-11-12 | 2020-11-12 | Science and technology project similarity analysis method, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011258083.6A CN112199938B (en) | 2020-11-12 | 2020-11-12 | Science and technology project similarity analysis method, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112199938A true CN112199938A (en) | 2021-01-08 |
CN112199938B CN112199938B (en) | 2023-11-14 |
Family
ID=74033475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011258083.6A Active CN112199938B (en) | 2020-11-12 | 2020-11-12 | Science and technology project similarity analysis method, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199938B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112784569A (en) * | 2021-02-04 | 2021-05-11 | 北京秒针人工智能科技有限公司 | Method, system, equipment and storage medium for realizing similar text aggregation |
CN112926299A (en) * | 2021-03-29 | 2021-06-08 | 杭州天谷信息科技有限公司 | Text comparison method, contract review method and audit system |
CN113064979A (en) * | 2021-03-10 | 2021-07-02 | 国网河北省电力有限公司 | Keyword retrieval-based method for judging construction period and price reasonability |
CN113139374A (en) * | 2021-04-12 | 2021-07-20 | 北京明略昭辉科技有限公司 | Method, system, equipment and storage medium for querying marks of document similar paragraphs |
CN113704427A (en) * | 2021-08-30 | 2021-11-26 | 平安科技(深圳)有限公司 | Text provenance determination method, device, equipment and storage medium |
CN113762719A (en) * | 2021-08-03 | 2021-12-07 | 远光软件股份有限公司 | Text similarity calculation method, computer equipment and storage device |
CN113761869A (en) * | 2021-08-17 | 2021-12-07 | 中移(杭州)信息技术有限公司 | Method and device for detecting resource coverage rate and computer readable storage medium |
CN115801483A (en) * | 2023-02-10 | 2023-03-14 | 北京京能高安屯燃气热电有限责任公司 | Information sharing processing method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105446954A (en) * | 2015-11-18 | 2016-03-30 | 广东省科技基础条件平台中心 | Project duplicate checking method for science and technology big data |
CN106095865A (en) * | 2016-06-03 | 2016-11-09 | 中细软移动互联科技有限公司 | A kind of trade mark text similarity reviewing method |
CN107122340A (en) * | 2017-03-30 | 2017-09-01 | 浙江省科技信息研究院 | A kind of similarity detection method for the science and technology item return analyzed based on synonym |
CN110163476A (en) * | 2019-04-15 | 2019-08-23 | 重庆金融资产交易所有限责任公司 | Project intelligent recommendation method, electronic device and storage medium |
CN111782797A (en) * | 2020-07-13 | 2020-10-16 | 贵州省科技信息中心 | Automatic matching method for scientific and technological project review experts and storage medium |
-
2020
- 2020-11-12 CN CN202011258083.6A patent/CN112199938B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105446954A (en) * | 2015-11-18 | 2016-03-30 | 广东省科技基础条件平台中心 | Project duplicate checking method for science and technology big data |
CN106095865A (en) * | 2016-06-03 | 2016-11-09 | 中细软移动互联科技有限公司 | A kind of trade mark text similarity reviewing method |
CN107122340A (en) * | 2017-03-30 | 2017-09-01 | 浙江省科技信息研究院 | A kind of similarity detection method for the science and technology item return analyzed based on synonym |
CN110163476A (en) * | 2019-04-15 | 2019-08-23 | 重庆金融资产交易所有限责任公司 | Project intelligent recommendation method, electronic device and storage medium |
CN111782797A (en) * | 2020-07-13 | 2020-10-16 | 贵州省科技信息中心 | Automatic matching method for scientific and technological project review experts and storage medium |
Non-Patent Citations (1)
Title |
---|
张自锋;周育忠;陶秀杰;: "文本相似度指标分析及文本相似性分析方法研究", 信息系统工程, no. 04 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112784569A (en) * | 2021-02-04 | 2021-05-11 | 北京秒针人工智能科技有限公司 | Method, system, equipment and storage medium for realizing similar text aggregation |
CN112784569B (en) * | 2021-02-04 | 2024-04-19 | 北京秒针人工智能科技有限公司 | Method, system, equipment and storage medium for realizing similar text aggregation |
CN113064979A (en) * | 2021-03-10 | 2021-07-02 | 国网河北省电力有限公司 | Keyword retrieval-based method for judging construction period and price reasonability |
CN112926299A (en) * | 2021-03-29 | 2021-06-08 | 杭州天谷信息科技有限公司 | Text comparison method, contract review method and audit system |
CN112926299B (en) * | 2021-03-29 | 2024-04-09 | 杭州天谷信息科技有限公司 | Text comparison method, contract review method and auditing system |
CN113139374A (en) * | 2021-04-12 | 2021-07-20 | 北京明略昭辉科技有限公司 | Method, system, equipment and storage medium for querying marks of document similar paragraphs |
CN113762719A (en) * | 2021-08-03 | 2021-12-07 | 远光软件股份有限公司 | Text similarity calculation method, computer equipment and storage device |
CN113761869A (en) * | 2021-08-17 | 2021-12-07 | 中移(杭州)信息技术有限公司 | Method and device for detecting resource coverage rate and computer readable storage medium |
CN113704427A (en) * | 2021-08-30 | 2021-11-26 | 平安科技(深圳)有限公司 | Text provenance determination method, device, equipment and storage medium |
CN115801483A (en) * | 2023-02-10 | 2023-03-14 | 北京京能高安屯燃气热电有限责任公司 | Information sharing processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112199938B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112199938B (en) | Science and technology project similarity analysis method, computer equipment and storage medium | |
CN112199937B (en) | Short text similarity analysis method and system, computer equipment and medium thereof | |
US10831993B2 (en) | Method and apparatus for constructing binary feature dictionary | |
CN112199940B (en) | Project review method and storage medium | |
KR102104316B1 (en) | Apparatus for predicting stock price of company by analyzing news and operating method thereof | |
CN112199939B (en) | Intelligent recommendation method and storage medium for review experts | |
CN110888983B (en) | Positive and negative emotion analysis method, terminal equipment and storage medium | |
CN114780746A (en) | Knowledge graph-based document retrieval method and related equipment thereof | |
CN111680506A (en) | External key mapping method and device of database table, electronic equipment and storage medium | |
CN111429184A (en) | User portrait extraction method based on text information | |
CN110827131A (en) | Tax payer credit evaluation method based on distributed automatic feature combination | |
CN113703773A (en) | NLP-based binary code similarity comparison method | |
CN112329425B (en) | Scientific research project intelligent review method and storage medium | |
CN112381381B (en) | Expert's device is recommended to intelligence | |
CN112417840B (en) | Scientific research project intelligent review system and computer equipment | |
CN112199941A (en) | Scientific research project evaluation platform | |
CN117592470A (en) | Low-cost gazette data extraction method driven by large language model | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN114462383B (en) | Method, system, storage medium and equipment for obtaining design specification of building drawing | |
CN116578696A (en) | Text abstract generation method, device, equipment and storage medium | |
CN114580398A (en) | Text information extraction model generation method, text information extraction method and device | |
CN112632951A (en) | Method, computer equipment and storage medium for intelligently recommending experts | |
CN112837148B (en) | Risk logic relationship quantitative analysis method integrating domain knowledge | |
CN117494806B (en) | Relation extraction method, system and medium based on knowledge graph and large language model | |
CN116992869B (en) | Remote supervision relation extraction method and device based on search engine and classifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |