CN112199938A

CN112199938A - Scientific and technological project similarity analysis method, computer equipment and storage medium

Info

Publication number: CN112199938A
Application number: CN202011258083.6A
Authority: CN
Inventors: 汪桢子; 章彬; 何维; 汪伟
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-01-08
Anticipated expiration: 2040-11-12
Also published as: CN112199938B

Abstract

The invention relates to a similar analysis method, computer equipment and storage medium for scientific and technological projects. The method includes: obtaining an electronic document of application materials for a project to be reviewed, and performing text extraction on it to obtain title information of the project to be reviewed; obtaining historical review Electronic documents of project application materials, and text extraction to obtain the historical title information of the historical review project; short text similarity analysis is carried out according to the title information to be reviewed and the historical title information, and based on the analysis results, it is preliminarily determined whether the two are similar; if so, Then, text extraction is performed on the electronic documents of the items to be reviewed and the historical items to obtain the long text information to be reviewed and the historical long text information, and the similarity analysis of the long text and the final similarity judgment are performed. The invention is suitable for text similarity analysis of scientific and technological project application materials in various professional fields of electric power, and helps to realize intelligent auxiliary project approval review and avoid repeated project approval.

Description

Scientific and technological project similarity analysis method, computer equipment and storage medium

Technical Field

The invention relates to the technical field of software information, in particular to a scientific and technological project similarity analysis method, computer equipment and a storage medium.

Background

With the continuous deep electric power reform and the continuous development of scientific technology, scientific and technical research projects in various professional fields of power grid companies are more and more subjected to item review, and in order to avoid repeated declaration of similar projects, similarity review needs to be performed on declaration materials of the scientific and technical research projects. Generally speaking, science and technology project declaration materials are large texts, at present, a science and technology project similarity judgment method needs to depend on professional manual reading and discrimination comparison, and for each science and technology project declaration material, the science and technology project declaration material needs to be manually compared with a large amount of prior science and technology project declaration materials in a database, so that a large amount of labor and time cost is consumed, and the high efficiency and accuracy of similarity judgment are difficult to guarantee. With the enhancement of environmental awareness, the power grid company carries out paperless office work at present, scientific and technological project declaration materials are submitted and reviewed in an electronic document mode, the electronic document provides a basis for the informatization of review work, whether repeated declaration conditions exist can be judged by analyzing the text similarity of the project to be reviewed and the historical review project, the current text similarity analysis mainly comprises word segmentation and distance calculation between words after word segmentation, and finally a similarity result is obtained comprehensively.

However, the current text similarity analysis method is not suitable for scientific and technical research project establishment review in each professional field of the power grid company, and the main reasons are as follows:

(1) because the major words in the title are more and all appear as long words combined, the major words are not purely segmentable, such as 'research and application of a device visualization monitoring model based on big data accelerated analysis and three-dimensional digitization', wherein the 'big data accelerated analysis', 'device visualization detection model' is simply segmented into 'big data', 'accelerated', 'analysis', 'device', 'visualization', 'detection', 'model', and the meaning has changed;

(2) semantic understanding is less effective for professional names. Such as: the similarity of the key technology and the development mode research of the source end base comprehensive energy system and the research of the comprehensive energy system multi-energy conversion simulation and comprehensive energy efficiency evaluation technology on semantic understanding can be relatively high, but actually, the two scientific and technological projects are greatly different;

(3) the title of the scientific and technical project is relatively short, about 30 words are long, and only 10 words are short. Since science and technology project titles contain a large number of professional names, and the professional names are often combined together to form longer words containing semantics, for two project titles, if there are more repeated such terms in the two names, the likelihood that the two projects are similar is very high. But if direct edit distances are used for calculation, the similarity may be very low.

(4) The scientific and technological project target is a short text, and the contents of project abstract, main research contents, technical routes, expected targets and other parts in the declaration material of the scientific and technological project are long texts and are composed of a plurality of sentences, and the upper sentence and the lower sentence are mostly in mutual relation, so that the text comparison of the declaration material of a scientific and technological project cannot be simply processed by using a text comparison method, and the existing text processing does not consider the point.

Disclosure of Invention

The invention aims to provide a scientific and technological project similarity analysis method, computer equipment and a computer readable storage medium, which are suitable for text similarity analysis of scientific and technological project declaration materials in the various professional fields of electric power, are beneficial to realizing intelligent auxiliary establishment review, avoid repeated establishment and guarantee the quality improvement and efficiency improvement of establishment management work.

To achieve the above objective, according to a first aspect, an embodiment of the present invention provides a method for analyzing similarity of scientific and technological projects, including:

s1, obtaining an electronic document of the declared material of the project to be evaluated, and extracting the text of the electronic document to obtain the title information of the project to be evaluated;

step S2, obtaining an ith historical review project declaration material electronic document, and performing text extraction on the ith historical review project declaration material electronic document to obtain historical title information of the ith historical review project;

step S3, carrying out short text similarity analysis according to the information of the subject to be evaluated and the historical title information of the ith historical evaluation project, and preliminarily judging whether the information of the subject to be evaluated and the historical title information of the ith historical evaluation project are similar according to the analysis result; if yes, sequentially executing steps S4-S5, otherwise executing step S6; wherein the initial value of i is 1;

step S4, performing text extraction on the electronic document of the declaration material of the project to be evaluated to obtain long text information to be evaluated of the project to be evaluated, and performing text extraction on the electronic document of the declaration material of the ith historical project to obtain the long text information of the historical project;

step S5, according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, carrying out long text similarity analysis, and finally judging whether the two are similar according to the analysis result;

step S6, judging whether i is less than N; if yes, making i equal to i +1, and returning to the step S2; if not, outputting the similar judgment results between the project to be evaluated and all the historical evaluation projects to a display unit for displaying, and ending the analysis process; wherein M is a preset number; where N is the total number of historical review items.

Optionally, the step S31 includes:

step S31, obtaining the longest continuous common substring between the to-be-evaluated subject information and the historical title information of the ith historical evaluation project, and removing the longest continuous common substring from the to-be-evaluated subject information and the historical title information of the ith historical evaluation project respectively to obtain a first character string and a second character string;

step S32, calculating the edit distance between the first character string and the second character string;

step S33, calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance;

and step S34, judging whether the information to be evaluated and the historical title information of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the information to be evaluated and the historical title information of the ith historical evaluation project and a first similarity threshold value.

Optionally, the step S31 includes:

step S311, setting the subject information to be evaluated as a character string S₁The historical title information of the ith historical review project is a character string s_i；

Step S312, finding character string S₁And s_iLongest continuous common substring s_z；

Step S313, if the longest continuous common substring S_zIs greater than 2, the character string s is respectively connected₁And s_iS in_zAfter removal, a new 2 character string s is obtained₁₀And s_i0And order s₁＝s₁₀，s_i＝s_i0Then returning to step S312; if the longest consecutive common substring s_zIs less than or equal to 2, s is output₁₀As a first string, s_i0As a second string.

Optionally, the calculating the similarity between the to-be-reviewed title information and the historical title information of the ith historical review project according to the edit distance includes:

wherein s is₁₀Representing a first string, s_i0Representing a second string, sim(s)₁₀,s_i0) Calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance, ED represents the editing distance between the first character string and the second character string, len(s)₁₀) Indicates the length of the first string, len(s)_i0) Indicating the length of the second string.

Optionally, the information of the to-be-evaluated subject includes a project main title of the to-be-evaluated item and a subtitle in research content; the historical title information of the ith historical review project comprises a project main title of the ith historical review project and a subtitle in research content;

the step S31 specifically includes: obtaining the longest continuous common substring between each title information in the to-be-evaluated title information and each title information in the historical title information of the ith historical evaluation project, and respectively removing the longest continuous common substrings to obtain a first character string s_jk1And a second character string s_jk2(ii) a Wherein s is_jk1Showing a first character string, s, obtained by removing the jth title information in the to-be-evaluated title information and the kth title information in the historical title information after removing the maximum continuous common substring_jk2Representing a second character string obtained after removing the largest continuous common substring of the kth title information in the historical title information and the jth title information in the to-be-evaluated title information;

the step S32 specifically includes: calculating all the first strings s_jk1And a second character string s corresponding thereto_jk2The editing distance between the two groups is obtained to obtain an editing distance set; each title information in the to-be-evaluated title information has k corresponding editing distances;

the step S33 specifically includes: calculating all first character strings s according to the edit distance set_jk1And a second character string s corresponding thereto_jk2Calculating the similarity between the information of the title to be evaluated and the information of the historical title of the ith historical evaluation project according to all the similarity calculation results; and each title information in the to-be-evaluated title information has corresponding k similarity calculation results.

Optionally, the outputting the similar judgment results between the to-be-evaluated item and all the historical evaluation items to a display unit for displaying includes:

if at least one historical review project is similar to the project to be reviewed, outputting the declaration material electronic document of the at least one historical review project to a display unit;

if at least one historical evaluation project is similar to the to-be-evaluated project, sorting the similarity of the to-be-evaluated project and all the historical evaluation projects, and then selecting the declaration material electronic documents of the M historical evaluation projects with the highest similarity to output to a display unit for displaying; m is a preset number.

Optionally, the step S5 includes:

step S51, inputting pre-trained Doc2vec models respectively according to the long text information to be evaluated and the historical long text information of the ith historical evaluation project, and outputting corresponding paragraph vectors to be evaluated and the historical paragraph vectors of the ith historical evaluation project;

step S52, calculating the similarity between the paragraph vector to be reviewed and the historical paragraph vector of the ith historical review project;

and step S53, judging whether the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the segment vector to be evaluated and the historical segment vector of the ith historical evaluation project and a second similarity threshold.

Optionally, the step S1 further includes:

text extraction is carried out on the electronic document of the declaration material of the project to be evaluated to obtain project technical field information of the project to be evaluated;

the obtaining of the ith electronic document of history review project declaration material in step S2 specifically includes:

acquiring an ith historical review project declaration material electronic document in a database corresponding to the project technical field according to the project technical field information of the project to be reviewed;

wherein all the historical review items in the step S6 are all the historical review items in the database of the corresponding project technology field.

According to a third aspect, an embodiment of the present invention further provides a computer device, including: according to the scientific and technological project similarity analysis system; or a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps according to the science and technology project similarity analysis method.

According to a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the scientific and technical project similarity analysis method.

The embodiment of the invention provides a scientific and technological project similarity analysis method and system, computer equipment and a computer readable storage medium, wherein the title information of declaration material electronic documents of a project to be evaluated and a historical evaluation project is extracted, and the similarity of the extracted title information is judged; and further extracting the long text information of the project to be evaluated and the historical evaluation project according to the preliminary similarity judgment result, carrying out similarity analysis according to the long text information, and finally determining whether the projects are similar or not according to the analysis result. The method is based on the text characteristics of the scientific and technological project declaration material, and the short text similarity analysis and the long text similarity analysis are combined to judge whether two projects are similar, so that the method can assist a reviewer in quickly judging whether the projects are repeatedly declared, the efficiency and accuracy of similarity judgment are guaranteed, intelligent auxiliary project approval can be realized, repeated project approval is avoided, and the efficiency of project management work is guaranteed to be increased.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a scientific and technological project similarity analysis method according to an embodiment of the present invention.

FIG. 2 is a block diagram of a Doc2vec PV-DM according to an embodiment of the invention.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.

Referring to fig. 1, an embodiment of the present invention provides a method for analyzing similarity of scientific and technological projects, including:

for example, the "research on key technologies and development patterns of the source-end-base integrated energy system" is described.

for example, the research on the comprehensive energy system multi-energy conversion simulation and comprehensive energy efficiency evaluation technology is carried out.

step S6, judging whether i is less than N; if yes, making i equal to i +1, and returning to the step S2; if not, outputting the similar judgment results between the project to be evaluated and all the historical evaluation projects to a display unit for displaying, and ending the analysis process; wherein M is a preset number; where N is the total number of historical review items. M and N are integers.

According to the method, the header information of the declaration material electronic document of the project to be evaluated and the current historical evaluation project is extracted, and the similarity of the extracted header information is judged, and because the header information is a short text, the calculation amount is small, the required calculation resources are less, and the consumed calculation time is very small, the method is beneficial to traversing all the historical evaluation projects, preliminarily and quickly judging the similarity between the project to be evaluated and all the historical evaluation projects, and realizing the preliminary screening of the similarity projects; and further extracting the long text information of the project to be evaluated and the historical evaluation project according to the preliminary similarity judgment result, performing similarity analysis according to the long text information, and finally determining whether the project to be evaluated and the current historical evaluation project are similar or not according to the analysis result. In this embodiment, based on the text characteristics of the science and technology project declaration material, a method combining short text similarity analysis and long text similarity analysis is provided to determine whether two projects are similar.

Optionally, the step S31 includes:

illustratively, the longest continuous common substring of the key technology and development mode research of the source-end base comprehensive energy system and the research of the comprehensive energy system multi-energy conversion simulation and comprehensive energy efficiency evaluation technology is the comprehensive energy system.

Specifically, the reason for selecting the continuous common substring instead of the Longest Common Subsequence (LCS) in this embodiment is that the longest common subsequence may split an originally semantic noun into single words, whereas a continuous substring occurring in both character strings may be a complete noun; where the longest continuous common substring problem is finding the substring for which two or more known strings are longest, the longest continuous common substring problem differs from the longest common subsequence problem in that the subsequences need not be continuous, but the substrings must be.

specifically, the editing distance refers to the minimum editing times required for converting one substring into another substring between the two substrings; wherein the editing operation comprises deletion, insertion, replacement and the like.

The edit distance may be expressed as:

where D (str1, str2, i, j) represents the edit distance between the first i characters of the string str1 and the first j characters of the string str2, str1_iRepresenting the ith sub-string of the string str 1. The initial value D (str1, str2,0,0) is 0.

The above equation is a recursive definition, and if there are strings s1 and s2, which have lengths of m and n, respectively, a matrix of matching relationships of (m +1) × (n +1) orders is typically used to calculate the edit distance. The values of the elements in the matrix are:

wherein d is_i,jThe values of the ith row and j column in the matrix are shown, and are given belowAn example of a matching relationship matrix is obtained, and the edit distances of the character strings "similarity calculation" and "calculation similarity" are obtained, and the obtained edit distance is 4, as shown in table 1:

TABLE 1 edit distance computation matrix

0	Phase (C)	Like	Degree of rotation	Meter	Calculating out
						Meter	1	2	3	3	4
Calculating out	2	2	3	4	3
						Phase (C)	2	3	3	4	4
Like	3	2	3	4	5
						Degree of rotation	4	3	2	3	4

specifically, in this embodiment, some scientific and technological project sets are randomly selected, and the project title similarity calculation of the existing method and the project title similarity calculation of this embodiment are performed respectively, and the comparison results are shown in table 2 below: it can be seen that the calculated editing distance is relatively small, and the similarity result is more consistent with the similarity value close to the reality. In addition, the results obtained by the existing method and the method of the embodiment are the same when no common substring exists.

TABLE 2 name similarity comparison results under different algorithms

It should be noted that the method of the present embodiment is used for calculating and comparing the titles of the projects, and can achieve a more desirable effect. For example, the item A is similar to the item title of the item B in the main content subtitle, so that the item A and the item B may have more or less similar relations, and the similar relations are used as a preliminary judgment basis for repeated declaration of the items; moreover, the calculation comparison method needs a small amount of calculation, the electronic documents of the science and technology project declaration materials are usually large texts, if each historical project is compared with the full text in a conventional manner, a large amount of time and calculation resources are inevitably consumed, and the second similarity judgment is further performed according to the long text only when the similarity exists in the initial judgment, so that the technical problem can be effectively solved by the method.

Step S34, judging whether the information of the subject to be evaluated and the historical title information of the ith historical evaluation project are similar or not according to the comparison result of the similarity of the information of the subject to be evaluated and the historical title information of the ith historical evaluation project and a first similarity threshold;

specifically, when the similarity is greater than the first similarity threshold, it is determined that the subject information to be reviewed is similar to the ith historical review item, and at this time, the steps S4 to S5 are continuously performed.

Optionally, the step S31 includes:

specifically, in general, a project main title, that is, a project name, needs to be filled in a declaration material (project declaration form) of a scientific project; and describes the main study, which is generally described in several aspects, each of which has a subheading.

The step S31 specifically includes: obtaining the longest continuous common substring between each title information in the to-be-evaluated title information and each title information in the historical title information of the ith historical evaluation project, and respectively removing the longest continuous common substrings to obtain a first character string s_jk1And a second character string s_jk2(ii) a Wherein s is_jk1Showing a first character string, s, obtained by removing the jth title information in the to-be-evaluated title information and the kth title information in the historical title information after removing the maximum continuous common substring_jk2Indicating that the kth title information in the historical title information is removed from the historical title informationExamining a maximum continuous public substring of jth title information in the title information to obtain a second character string;

note that, both the main title of the project and the subtitle in the content under study are regarded as one piece of title information.

specifically, if there are j pieces of title information in the to-be-evaluated title information, j × k pieces of editing distance data are correspondingly associated with the to-be-evaluated title information.

Specifically, correspondingly, the title information to be reviewed has j × k similarity data; and for the j multiplied by k similarity data, taking the average similarity of the j multiplied by k similarity data and outputting the average similarity as the similarity of the to-be-evaluated subject information and the historical subject information of the ith historical evaluation project.

Specifically, after the similarity determination of the method of the present embodiment, the M most similar historical review items are output for the reviewers to further confirm.

Optionally, the step S5 includes:

illustratively, the similarity between two paragraph vectors may be determined according to the distance between them, wherein the closer the distance the greater the similarity.

It is understood that, in the present embodiment, the long text information may include multiple aspects, such as a project summary, main research content, and the like, each aspect includes multiple paragraphs, and the multiple aspects may be separated and individually subjected to similarity calculation; finally, carrying out comprehensive analysis calculation according to the similarity of multiple aspects, for example, taking the average value of the similarity of the multiple aspects as the analysis result of the similarity of the long text; for example, the similarity of multiple aspects is multiplied by corresponding preset weights respectively and then accumulated to be used as a long text similarity analysis result; for the similarity calculation of a certain aspect, for example, there are n paragraphs on the E aspect of the item to be evaluated, there are m paragraphs on the E aspect of the current history evaluation item, after the similarity calculation is performed on the multiple paragraphs on the certain aspect of the item to be evaluated and the multiple paragraphs on the certain aspect corresponding to the current history evaluation item, each paragraph on the E aspect of the item to be evaluated has m similarity calculation data, then there are n × m similarity calculation data on the n paragraphs on the E aspect of the item to be evaluated, and the similarity average value of the n × m similarity calculation data is used as the similarity of the item to be evaluated and the current history evaluation item on the E aspect.

Specifically, in the embodiment, the Doc2vec Model is trained by specifically using a PV-DM (distribution Memory Model of Paragraph vectors) training method, as shown in fig. 2, a frame diagram of the Doc2vec PV-DM in the embodiment is shown, and it can be seen from fig. 2 that a vector representation of each Paragraph/sentence is added in addition to a vector at a word level. For example, for a sentence 'the cat sat on', if the word on in the sentence is to be predicted, the prediction can be performed not only according to the corresponding features generated by other words, but also according to the generated features of other words and sentences. Each paragraph/sentence is mapped into a vector space, which may be represented by a column of a matrix. Each word is also mapped to vector space, which can be represented by a column of the matrix. And then, cascading or averaging the paragraph vector and the word vector to obtain features, and predicting a next word in the sentence. A paragraph vector/sentence vector can also be considered as a word, which acts as a memory unit for the context or as a subject for the paragraph. Wherein, during training, the context length is fixed, and the training set is generated by using a sliding window method. And paragraph/sentence vectors are shared in that context. The training process of the Doc2vec model in this embodiment is specifically as follows, and mainly includes the following ((i) and (ii)):

training a model, and obtaining a word vector, a softmax parameter and a paragraph vector/sentence vector in known training data.

Inference stage, for new paragraphs, gets its vector expression. Specifically, more columns are added in the matrix, and in the case of a fixed length, the training is performed by using the method described above, and a gradient descent method is used to obtain a new D (paragraph vector matrix), thereby obtaining a vector expression of a new paragraph.

Optionally, the step S1 further includes:

Specifically, since there are many reviewed historical scientific and technological projects, a preliminary classification concept is further proposed in this embodiment, the electronic documents of the declaration materials of different types of historical scientific and technological projects are respectively stored in different databases, and when similarity analysis is performed, the similarity comparison is performed between the project to be reviewed and the historical scientific and technological projects in the corresponding technical fields according to the technical fields of the project to be reviewed, thereby effectively reducing the calculation workload.

To sum up, the problem of large data volume for science and technology projects is addressed in this embodiment, and 3 aspects of targeted setting are proposed altogether, and first the database classification is screened, second the preliminary similar screening of short text, and the third is the secondary similar screening of long text, screens layer by layer, and the whole process not only can accurately carry out similarity analysis, and the work load is less moreover, and the processing speed is very fast.

Another embodiment of the present invention further provides a computer device, including: a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the scientific and technological project similarity analysis method according to the above-mentioned embodiment.

Of course, the computer device may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the computer device may also include other components for implementing the functions of the device, which are not described herein again.

Illustratively, the computer program may be divided into one or more units, which are stored in the memory and executed by the processor to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the computer device.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center for the computer device and connects the various parts of the overall computer device using various interfaces and lines.

The memory may be used for storing the computer program and/or unit, and the processor may implement various functions of the computer device by executing or executing the computer program and/or unit stored in the memory and calling data stored in the memory. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Another embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the scientific and technical project similarity analysis method according to the above-mentioned embodiment.

Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

To sum up, the embodiment of the invention provides a scientific and technological project similarity analysis method and system, computer equipment and a computer readable storage medium, the title information of the declaration material electronic documents of the project to be evaluated and the historical evaluation project is extracted, and the similarity of the extracted title information is judged.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A scientific and technological project similarity analysis method is characterized by comprising the following steps:

2. The scientific and technological project similarity analysis method according to claim 1, wherein the step S31 includes:

3. The scientific and technological project similarity analysis method according to claim 2, wherein the step S31 includes:

4. The method for analyzing similarity of technical projects according to claim 2, wherein the calculating the similarity between the information about the title to be reviewed and the information about the title of the ith historical review project according to the edit distance comprises:

wherein s is₁₀Representing a first string, s_i0Representing a second string, sim(s)₁₀,s_i0) Calculating the similarity between the title information to be reviewed and the historical title information of the ith historical review project according to the editing distance, ED representing the editing distance between the first character string and the second character stringFrom, len(s)₁₀) Indicates the length of the first string, len(s)_i0) Indicating the length of the second string.

5. A scientific and technological project similarity analysis method according to claim 2, wherein the information of the titles to be evaluated comprises project main titles and sub-titles in research contents of the projects to be evaluated; the historical title information of the ith historical review project comprises a project main title of the ith historical review project and a subtitle in research content;

6. The scientific and technological project similarity analysis method according to claim 1, wherein the outputting of the similarity determination results between the project to be reviewed and all the historical review projects to a display unit for display comprises:

7. The scientific and technological project similarity analysis method according to claim 1, wherein the step S5 includes:

8. A scientific and technological project similarity analysis method according to claim 1,

the step S1 further includes:

and acquiring an ith historical review project declaration material electronic document in a database corresponding to the project technical field according to the project technical field information of the project to be reviewed.

9. A computer device, comprising: a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the scientific and technological project similarity analysis method according to any one of claims 1 to 8.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the scientific project similarity analysis method according to any one of claims 1 to 8.