CN118229465B - Pre-application patent quality assessment method and system based on cluster center representation - Google Patents

Pre-application patent quality assessment method and system based on cluster center representation Download PDF

Info

Publication number
CN118229465B
CN118229465B CN202410610670.9A CN202410610670A CN118229465B CN 118229465 B CN118229465 B CN 118229465B CN 202410610670 A CN202410610670 A CN 202410610670A CN 118229465 B CN118229465 B CN 118229465B
Authority
CN
China
Prior art keywords
representation
predicted
information
similarity
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410610670.9A
Other languages
Chinese (zh)
Other versions
CN118229465A (en
Inventor
赖培源
李岱素
江昊钒
廖晓东
蔡焕涛
刘士雨
李奎
梁育玮
孙晓麒
黄俊铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong South China Technology Transfer Center Co ltd
Original Assignee
Guangdong South China Technology Transfer Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong South China Technology Transfer Center Co ltd filed Critical Guangdong South China Technology Transfer Center Co ltd
Priority to CN202410610670.9A priority Critical patent/CN118229465B/en
Publication of CN118229465A publication Critical patent/CN118229465A/en
Application granted granted Critical
Publication of CN118229465B publication Critical patent/CN118229465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Animal Behavior & Ethology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

本发明公开了一种基于聚类中心表征的申请前专利质量评估方法及系统,包括:基于用户输入的专利文本提取关键词进行检索,在专利大数据中生成特征相似的子数据集,通过聚类模型生成所述子数据集的中心表征;在用户输入的专利文本中截取待预测专利信息,生成文本表征;计算待预测专利信息的文本表征与中心表征的相似度,基于相似度结合专利质量指标生成约束信息;利用约束信息训练专利质量评估模型,对用户输入的专利获得多维质量评价结果。本发明在解决海量数据比对的同时,快速对用户计划申请的专利进行多维度质量分析,有利于提升用户申请的成功率及培育高价值专利,降低企业申请专利的成本。

The present invention discloses a patent quality assessment method and system before application based on cluster center representation, including: extracting keywords based on patent text input by users for retrieval, generating sub-datasets with similar features in patent big data, and generating center representation of the sub-datasets through clustering models; intercepting patent information to be predicted in the patent text input by users to generate text representation; calculating the similarity between the text representation and the center representation of the patent information to be predicted, and generating constraint information based on the similarity combined with patent quality indicators; using constraint information to train patent quality assessment models, and obtaining multi-dimensional quality evaluation results for patents input by users. While solving the problem of massive data comparison, the present invention quickly conducts multi-dimensional quality analysis on patents that users plan to apply for, which is conducive to improving the success rate of user applications and cultivating high-value patents, and reducing the cost of patent applications for enterprises.

Description

Pre-application patent quality assessment method and system based on clustering center characterization
Technical Field
The invention relates to the technical field of patent quality evaluation, in particular to a method and a system for evaluating the quality of a pre-application patent based on clustering center characterization.
Background
Patents are important components of intellectual property rights and main achievements of technological innovation, wherein the number of the patents reflects the whole scale of the patents, and the quality of the patents reflects the quality of the patents. At present, the patent level of a region is usually measured by analyzing the number of patents, but the analysis of the quality of the patents is ignored, and the result is that the real situation of the patents is reflected on one side. In recent years, the number of patents is increased in an explosive manner, and a plurality of challenges are brought to patent examination and conversion operation work, so that the patent quality is highly concerned, and the selection of a scientific and reasonable patent quality evaluation method is also a hot problem of academic research, and particularly in mass data analysis, the quality evaluation assistance is carried out by constructing a subdivided small data set, so that the method is an important direction for carrying out large-scale application on an evaluation model.
At present, the number of patent application files increases faster, but the number of patent practitioners is insufficient and the expertise is good and bad, so that the workload of the patent practitioners is increased, and the quality of the patent application files is reduced indirectly. Therefore, the quality of the patent application is affected by the patent application file, the quality of the patent application file is improved, on one hand, the protection scope of the research and development scheme of the current enterprise is fully shown, the intellectual property service work of the enterprise is better carried out, and on the other hand, the quality of the patent application is improved. Therefore, multidimensional quality assessment of patent application text is one of the problems to be solved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a pre-application patent quality assessment method and system based on clustering center characterization.
The first aspect of the invention provides a pre-application patent quality assessment method based on clustering center characterization, which comprises the following steps:
Extracting keywords based on patent text input by a user for searching, generating a sub-data set with feature similarity meeting a preset standard in the patent big data, and generating a central representation of the sub-data set through a clustering model;
intercepting patent information to be predicted from a patent text input by a user, and generating text representation of the patent information to be predicted;
Calculating the similarity between the patent information to be predicted and the central representation, and generating constraint information based on the similarity and the patent quality index;
and training a patent quality evaluation model by using constraint information, and obtaining a multidimensional quality evaluation result for the patent input by the user through the patent quality evaluation model.
In the scheme, keywords are extracted and retrieved based on patent text input by a user, and a sub-data set with similar characteristics meeting preset standards is generated in patent big data, specifically:
The method comprises the steps of obtaining a patent text input by a user, performing word segmentation pretreatment, generating a serialization representation of the patent text, judging part-of-speech tags of word vectors in the serialization representation of the patent text, and performing sequence labeling by using the part-of-speech tags;
cutting and blocking the serialized representation of the patent text and embedding the representation by Roberta to obtain an embedded vector of the patent text, screening a preset phrase through the part-of-speech tag, screening a corresponding embedded vector based on the position feature matching of the preset phrase, and splicing the matched and screened embedded vectors to obtain a spliced embedded vector;
introducing a self-attention mechanism into the embedded vector of the patent text, strengthening the characteristics of the embedded vector through the weighting of the self-attention weight, introducing cross attention between the spliced embedded vector and the embedded vector, acquiring a neighborhood embedded vector of the spliced embedded vector, and strengthening the context semantic;
Acquiring an attention-encoded embedded vector sequence and a neighborhood embedded vector sequence, calculating the similarity of the embedded vector and the neighborhood embedded vector in the sequence, and acquiring a spliced embedded vector with the similarity meeting a preset similarity threshold value for decoding, wherein the spliced embedded vector is used as a keyword extraction result;
and establishing a search index according to the keywords, and calculating the feature similarity of the keywords in massive patent big data by using the search index to obtain the patent data meeting the preset similarity standard to construct a sub-data set containing the keywords.
In this scheme, the central representation of the sub-dataset is generated by a clustering model, specifically:
Optimizing an initial clustering center of the sub-data set by utilizing a sparrow search algorithm, initializing parameters of the sparrow search algorithm, calculating fitness values in the sparrow population, and obtaining an optimal fitness value, a worst fitness value and corresponding positions;
Selecting discoverers, joiners and scouters, updating positions, introducing adaptive t distribution variation in the process of updating the positions of sparrows, iteratively calculating fitness and updating the positions of the sparrows, and outputting the optimal sparrows to obtain a clustering center matrix after the maximum iteration times are met;
acquiring an initial cluster center according to the cluster center matrix, using Euclidean distance as a measurement function, distributing the patent data in the sub-data set to the initial cluster center closest to the initial cluster center, and updating the cluster center in different clusters after the distribution of all the patent data is finished;
And obtaining a final clustering result of the sub-data set through iterative clustering, and generating a central representation of the sub-data set according to the partitioned different clusters.
In the scheme, patent information to be predicted is intercepted in a patent text input by a user, and text representation of the patent information to be predicted is generated, specifically:
intercepting a patent text input by a user according to a preset paragraph position and an indication keyword to generate patent information to be predicted, extracting and generating an embedded vector of the patent text corresponding to the patent information to be predicted, and dividing the word embedded vector, the sentence embedded vector and the paragraph embedded vector;
Leading the embedded vectors of the patent information to be predicted into a two-way long-short-term memory network, introducing an attention mechanism to calculate the embedded vectors of different levels by utilizing a forward LSTM and a reverse LSTM, calculating forward and reverse calculation results through a hidden layer, and outputting semantic features corresponding to the embedded vectors of the patent information to be predicted;
And carrying out representation matching on the semantic features according to the embedded vectors of different levels corresponding to the to-be-predicted patent information, and generating text representation of the to-be-predicted patent information.
In this scheme, constraint information is generated based on the similarity and the patent quality index, specifically:
calculating the text representation and all center representations and the similarity of the patent information to be predicted and the sub-data set, calculating cosine similarity between embedded vectors after dimension alignment, and extracting semantic features of the patent information to be predicted at corresponding positions when the cosine similarity is larger than a preset threshold value, and carrying out similarity correction by using the semantic features;
traversing the patent information to be predicted to obtain all the similarities, carrying out average value calculation of absolute values to generate average similarity, taking the reciprocal of the average similarity, and generating one of constraint information;
acquiring a patent quality evaluation example by utilizing a big data engine, extracting a patent quality evaluation index from the patent quality evaluation example, and carrying out principal component analysis on the patent quality evaluation index to identify key influencing factors;
according to the patent quality evaluation example, obtaining interaction relations between key influence factors and patent texts and between different key influence factors, constructing triples based on different interaction relations and attributes corresponding to the key influence factors, and constructing a knowledge graph by utilizing a knowledge graph convolution neural network learning graph structure;
The method comprises the steps of obtaining the centrality of the relation edge quantity computing nodes directly connected with key influence factors in a knowledge graph, utilizing the centrality to represent the importance degree of the key influence factors, selecting a preset quantity of key influence factors according to the importance degree, and obtaining constraint information composed of corresponding index variables.
In the scheme, the patent quality evaluation model is trained by using constraint information, and specifically comprises the following steps:
Constructing a patent quality evaluation model, training corresponding encoders through training data of patent quality indexes of different categories in constraint information, and extracting index features from text characterization of the patent information to be predicted by utilizing the encoders of the different patent quality indexes;
inputting index features of the to-be-predicted patent information into different multi-layer perceptrons by combining the text representation with the inverse of the average similarity of the center representation, obtaining a feature importance matrix, obtaining attention distribution of the feature importance distribution by adopting cooperative attention, and obtaining the representation of the to-be-predicted patent information under different constraints according to weighted calculation;
And fully connecting the index features with the weighted characterization, outputting vectors through interaction of the multi-layer perceptron, converting the output vectors into probability distribution to obtain predictive evaluation, scoring by using MSE evaluation indexes, and obtaining a quality evaluation result of the patent information to be predicted.
The invention also provides a pre-application patent quality assessment system based on the clustering center characterization, which comprises a memory, a processor, a user interaction module, an assessment data set generation module, a quality assessment module and a data storage management module, wherein the memory and the processor store and execute a pre-application patent quality assessment method program based on the clustering center characterization;
The system comprises a user interaction module, a patent data storage module, a system evaluation module, a user interaction module and a user analysis module, wherein the user is used for inputting a keyword group, determining an estimated patent data subset, inputting information of patent information to be predicted as an estimation input window, returning a result after the system estimation, and displaying an estimation result for the user;
the evaluation data set generation module is used for generating a sub-data set based on the patent big data set according to the keyword group provided by the user;
the quality evaluation module is responsible for carrying out quality evaluation based on the patent information to be evaluated and the sub-data set;
And the data storage management module is responsible for storing the patent big data set and storing the patent subset generated based on the user key word group, so that the operation of the non-real-time assessment task is facilitated.
The invention discloses a pre-application patent quality assessment method and system based on clustering center characterization, and the method comprises the steps of extracting keywords based on patent texts input by a user for searching, generating a sub-data set with similar characteristics in big patent data, generating center characterization of the sub-data set through a clustering model, intercepting patent information to be predicted in the patent texts input by the user, generating text characterization, calculating similarity between the text characterization of the patent information to be predicted and the center characterization, generating constraint information based on the similarity and combining with patent quality indexes, training the patent quality assessment model through the constraint information, and obtaining a multi-dimensional quality assessment result for the patents input by the user. The multi-dimensional mass analysis method and the multi-dimensional mass analysis system can rapidly analyze the patent applied by the user plan while solving the problem of mass data comparison, are beneficial to improving the success rate of the user application and cultivating high-value patents, and reduce the cost of patent application of enterprises.
Drawings
FIG. 1 shows a flow chart of a pre-application patent quality assessment method based on cluster center characterization of the present invention;
FIG. 2 illustrates a flow chart of the present invention for generating a central representation of a sub-dataset;
FIG. 3 shows a flow chart of the present invention for constructing a patent quality assessment model;
FIG. 4 shows a block diagram of a pre-application patent quality assessment system of the present invention based on cluster center characterization.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
FIG. 1 shows a flow chart of a pre-application patent quality assessment method based on cluster center characterization of the present invention.
As shown in fig. 1, a first aspect of the present invention provides a method for evaluating the quality of a pre-application patent based on cluster center characterization, including:
S102, extracting keywords based on patent text input by a user for searching, generating a sub-data set with feature similarity meeting a preset standard in patent big data, and generating a central representation of the sub-data set through a clustering model;
S104, intercepting patent information to be predicted from a patent text input by a user, and generating a text representation of the patent information to be predicted;
s106, calculating the similarity between the to-be-predicted patent information and the central representation, and generating constraint information based on the similarity and the patent quality index;
S108, training a patent quality evaluation model by using constraint information, and obtaining a multidimensional quality evaluation result for the patent input by the user through the patent quality evaluation model.
It is to be noted that, the method includes the steps of obtaining the patent text input by the user, performing pretreatment such as word segmentation, normalization and stop word screening, generating the serialization representation of the patent text, judging the part of speech labels of word vectors in the serialization representation of the patent text, including prepositions, adjectives, nouns, proper nouns and the like, using the part of speech labels to perform sequence labeling, utilizing Roberta to cut and block the serialization representation of the patent text and insert the representation, using Roberta to encode, and then mutually associating the insert vectors, enhancing semantic learning capability, and obtaining semantic features expressed under different contexts. The method comprises the steps of obtaining embedded vectors of patent texts, screening preset phrases through the part-of-speech labels, screening corresponding embedded vectors based on position feature matching of the preset phrases, splicing the matched and screened embedded vectors, performing dimension transformation through a linear layer after unifying the lengths, obtaining spliced embedded vectors, introducing a self-attention mechanism into the embedded vectors of the patent texts, strengthening the features of the embedded vectors through weighting of self-attention weights, introducing cross attention between the spliced embedded vectors and the embedded vectors, obtaining neighborhood embedded vectors of the spliced embedded vectors, strengthening context semantics, obtaining an attention-coded embedded vector sequence and a neighborhood embedded vector sequence, wherein the neighborhood embedded vector sequence contains global semantics, the neighborhood embedded vector sequence is rich in local upper and lower Wen Yuyi, calculating similarity of the embedded vectors and the neighborhood embedded vectors in the sequence after dimension reduction, obtaining spliced embedded vectors with the similarity as a basis of importance or not, decoding the similarity meeting a preset similarity threshold value, establishing a search index according to the keyword, calculating the key feature similarity in massive patent big data according to the key, and obtaining a patent data set with a keyword similarity standard.
FIG. 2 illustrates a flow chart of the present invention for generating a central representation of a sub-dataset.
According to the embodiment of the invention, the central representation of the sub-data set is generated through a clustering model, specifically:
S202, optimizing an initial clustering center of a sub-data set by utilizing a sparrow search algorithm, initializing parameters of the sparrow search algorithm, calculating fitness values in a sparrow population, and obtaining an optimal fitness value, a worst fitness value and corresponding positions;
S204, selecting discoverers, joiners and scouters, updating positions, introducing self-adaptive t distribution variation in the process of updating the positions of sparrows, iteratively calculating fitness and updating the positions of the sparrows, and outputting the optimal sparrows to obtain a clustering center matrix after the maximum iteration times are met;
S206, acquiring an initial cluster center according to the cluster center matrix, using Euclidean distance as a measurement function, distributing the patent data in the sub-data set to the initial cluster center closest to the initial cluster center, and updating the cluster center in different clusters after the distribution of all the patent data is finished;
S208, obtaining a final clustering result of the sub-data set through iterative clustering, and generating a center representation of the sub-data set according to the partitioned different clusters.
It should be noted that the K-means clustering algorithm can improve the running speed, but too centralized or dispersed center points can cause poor clustering effect when the clustering center is selected randomly, and the accuracy of center characterization after clustering is affected. The sparrow search algorithm has the advantages of high convergence speed and the like, and improves the influence of an initial clustering center on a clustering result. Initializing parameters of a sparrow search algorithm, and setting maximum iteration times, population scale, number of discoverers, number of alerters and alarm values. The discoverer consists of sparrows with the best positions, the rest sparrows are the followers, the alerter can randomly generate sparrows, and the higher the adaptation degree of the sparrows is, the higher the priority of the sparrows for obtaining food is represented. And finding a clustering center matrix with the best adaptability through a sparrow search algorithm and the input clustering number, and carrying out self-adaptive t distribution variation on the sparrow positions in order to avoid the clustering algorithm from falling into local optimum, wherein the t distribution combines the characteristics of Cauchy distribution and Gaussian distribution, balances global exploration capacity and local development capacity, acquires the current latest position, and updates the position matrix if the current latest position is better than the previous optimal position until the optimal position matrix is output to acquire the clustering center matrix. Frequently, more than one center token is clustered, e.g., N categories are obtained after cluster analysis, with corresponding N center tokens.
It should be noted that, according to the preset paragraph positions such as the claims or the abstract of the specification and the instruction keywords, the patent text input by the user is intercepted to generate the patent information to be predicted, the embedded vector corresponding to the patent text to be predicted is extracted and generated, the word embedded vector, the sentence embedded vector and the segment embedded vector are divided, the embedded vectors at different positions and levels can improve the efficiency of text semantic recognition, the embedded vectors of the patent information to be predicted are imported into a two-way long-short-term memory network, the attention mechanism is introduced to calculate the embedded vectors at different levels by using forward LSTM and reverse LSTM, the forward LSTM carries out forward operation on the embedded vectors input at the t moment and the output at the t-1 moment to obtain the forward output at the t moment, the reverse LSTM carries out reverse operation on the embedded vectors input at the t moment and the output at the t+1 moment to obtain the reverse output at the t moment, the semantic features corresponding to the embedded vectors of the patent information to be predicted are output through the operation of the hidden layer, the semantic features are represented and matched with the embedded vectors at different levels corresponding to the patent information to be predicted according to the position codes to generate the forward characterization text of the patent information to be predicted.
FIG. 3 shows a flow chart of the present invention for constructing a patent quality assessment model.
According to the embodiment of the invention, the patent quality assessment model is trained by using constraint information, and specifically comprises the following steps:
S302, constructing a patent quality evaluation model, training corresponding encoders through training data of patent quality indexes of different categories in constraint information, and extracting index features from text characterization of the to-be-predicted patent information by utilizing the encoders of the different patent quality indexes;
S304, inputting index features of the to-be-predicted patent information into different multi-layer perceptrons by combining text characterization and inverse of central characterization average similarity, acquiring a feature importance matrix, acquiring attention distribution of the feature importance distribution by adopting cooperative attention, and acquiring characterization of the to-be-predicted patent information under different constraints according to weighted calculation;
And S306, fully connecting the index features with the weighted characterization, outputting vectors through interaction of the multi-layer perceptron, converting the output vectors into probability distribution to obtain prediction evaluation, and grading by using MSE evaluation indexes to obtain a quality evaluation result of the patent information to be predicted.
It is to be noted that, calculating the overall center characterization and similarity of the text characterization and sub-data set of the patent information to be predicted, calculating the cosine similarity between the embedded vectors after the dimensions are aligned, when the cosine similarity is larger than a preset threshold, extracting the semantic features of the patent information to be predicted at the corresponding position, carrying out similarity correction by using the semantic features, traversing the overall similarity of the patent information to be predicted, carrying out mean calculation of absolute values, generating average similarity, taking the reciprocal of the average similarity, obtaining the probability of reaction authorization to a certain extent, generating one of the quality evaluation indexes in constraint information, obtaining a patent quality evaluation example by using a big data engine, extracting the patent quality evaluation indexes in the patent quality evaluation example, such as the number and length of claims, the number of patent references, the number of non-patent documents, the technical life cycle, the patent class, the number of patent families, the inventor and the number of applicant, and the like, carrying out principal component analysis on the key influence factors for the patent quality evaluation indexes, obtaining the interaction relation between the key influence factors and the patent text and different key influence factors according to the patent quality evaluation, obtaining the key relation between the key relation and the key relation, based on the different three-dimensional interaction relation and the key relation, obtaining the key relation between the key relation and the key relation, and the key relation by using the key relation, and the key relation, which can be directly constructed by using the key relation of the key node map, the key node map and the key node map has the characteristics, the key relation, the key node map and the key node map has the importance relation and the key relation is obtained by the key node map and the key relation, a more central node will be more important than other nodes. And selecting a preset number of key influence factors according to the importance degree, and acquiring constraint information composed of corresponding index variables.
Constructing a patent quality evaluation model, acquiring the extracted index features from the text representation of the patent information to be predicted by using a multi-scale encoder module to obtain the representation of the text representation under different patent evaluation index variables in constraint information, and estimating the different importance of the different representations by using a cooperative attention mechanism, wherein the calculation formula is as follows: Wherein The distribution of the attention is indicated and,The representation text represents the representation corresponding to the j-th and n-th patent evaluation index variables, and m represents the total number of representations. And correspondingly connecting the index features before the attention mechanism and the index representations after the attention mechanism respectively, outputting a result through a multi-layer perceptron interactive network, grading by using an MSE evaluation index, and obtaining a quality evaluation result of the to-be-predicted patent information, wherein the smaller the MSE evaluation index is, the closer the predicted value output by the quality evaluation model is to the true value, and the better the quality of the to-be-predicted patent information is proved.
It should be noted that, a historical quality evaluation result of an enterprise patent text is obtained, a writing image of the enterprise patent text is constructed according to the historical quality evaluation result, a personalized database is constructed based on writing images of different time periods, quality evaluation indexes with larger differences from other quality evaluation indexes in the writing image are obtained in the personalized database to mark, an improvement direction of patent text writing is generated in a current writing workflow according to the marked quality evaluation indexes, tracing is carried out by utilizing the improvement direction based on an ant colony algorithm, influence factors of abnormal quality evaluation indexes are obtained according to tracing paths, writing flows of patent text and technical bottom documents are improved according to influence factor retrieval optimization measures, and corresponding writing workflow is updated.
FIG. 4 shows a block diagram of a pre-application patent quality assessment system of the present invention based on cluster center characterization.
The invention also provides a pre-application patent quality assessment system 4 based on cluster center characterization, which comprises a memory 41, a processor 42, a user interaction module 43, an assessment data set generation module 44, a quality assessment module 45 and a data storage management module 46, wherein the memory 41 and the processor 42 store and execute a pre-application patent quality assessment method program based on cluster center characterization;
The user interaction module 43 is used for inputting key word groups by a user to determine an estimated patent data subset, inputting information of the patent information to be predicted as an estimated input window, returning a result after system estimation and displaying an estimated result for the user;
An evaluation dataset generation module 44 that generates sub-datasets based on the patent big dataset from the user-supplied key word groups;
the quality evaluation module 45 is responsible for performing quality evaluation based on the patent information to be evaluated and the sub-data set;
The data storage management module 46 is responsible for the storage of patent big data sets and the storage of patent subsets generated based on user key word groups, facilitating the operation of non-real-time assessment tasks.
The third aspect of the present invention also provides a computer readable storage medium, where the computer readable storage medium includes a pre-application patent quality assessment method program based on cluster center characterization, where the pre-application patent quality assessment method program based on cluster center characterization is executed by a processor, to implement the steps of the pre-application patent quality assessment method based on cluster center characterization as described in any one of the above.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions of actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate components may or may not be physically separate, and components displayed as units may or may not be physical units, may be located in one place or distributed on a plurality of network units, and may select some or all of the units according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of hardware plus a form of software functional unit.
It will be appreciated by those of ordinary skill in the art that implementing all or part of the steps of the above method embodiments may be implemented by hardware associated with program instructions, where the above program may be stored in a computer readable storage medium, where the program when executed performs the steps comprising the above method embodiments, where the above storage medium includes a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic or optical disk, or other various media that may store program code.
Or the above-described integrated units of the invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. The storage medium includes various media capable of storing program codes such as a removable storage device, a ROM, a RAM, a magnetic disk or an optical disk.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1.一种基于聚类中心表征的申请前专利质量评估方法,其特征在于1. A pre-application patent quality assessment method based on cluster center representation, characterized in that 基于用户输入的专利文本提取关键词进行检索,在专利大数据中生成特征相似度符合预设标准的子数据集,并通过聚类模型生成所述子数据集的中心表征;Extract keywords from patent texts input by users for retrieval, generate sub-datasets whose feature similarity meets preset standards in patent big data, and generate central representations of the sub-datasets through clustering models; 在用户输入的专利文本中截取待预测专利信息,生成所述待预测专利信息的文本表征;Extracting the patent information to be predicted from the patent text input by the user, and generating a text representation of the patent information to be predicted; 计算待预测专利信息与中心表征的相似度,基于所述相似度结合专利质量指标生成约束信息;Calculate the similarity between the patent information to be predicted and the central representation, and generate constraint information based on the similarity combined with the patent quality index; 利用约束信息训练专利质量评估模型,通过所述专利质量评估模型对用户输入的专利获得多维质量评价结果;Using constraint information to train a patent quality assessment model, and obtaining a multi-dimensional quality evaluation result for a patent input by a user through the patent quality assessment model; 基于所述相似度结合专利质量指标生成约束信息,具体为:Based on the similarity and patent quality indicators, constraint information is generated, specifically: 计算待预测专利信息文本表征与子数据集全部中心表征的相似度,在维度对齐后计算嵌入向量之间的余弦相似度,当所述余弦相似度大于预设阈值时,则提取待预测专利信息在对应位置的语义特征,利用语义特征进行相似度修正;Calculate the similarity between the text representation of the patent information to be predicted and the central representation of all sub-datasets, calculate the cosine similarity between the embedded vectors after dimensional alignment, and when the cosine similarity is greater than a preset threshold, extract the semantic features of the patent information to be predicted at the corresponding position, and use the semantic features to correct the similarity; 遍历待预测专利信息获取全部相似度,进行绝对值的均值计算,生成平均相似度,对所述平均相似度进行取倒数,生成约束信息之一;Traversing the patent information to be predicted to obtain all similarities, calculating the average of the absolute values to generate an average similarity, taking the inverse of the average similarity to generate one of the constraint information; 利用大数据引擎获取专利质量评价实例,在所述专利质量评价实例中提取专利质量评价指标,对所述专利质量评价指标进行主成分分析识别关键影响因素;Using a big data engine to obtain patent quality evaluation examples, extracting patent quality evaluation indicators from the patent quality evaluation examples, and performing principal component analysis on the patent quality evaluation indicators to identify key influencing factors; 根据专利质量评价实例获取关键影响因素与专利文本之间及不同关键影响因素之间的交互关系,基于不同交互关系及关键影响因素对应的属性组建三元组,利用知识图卷积神经网络学习图结构构建知识图谱;According to the patent quality evaluation examples, the interactive relationship between key influencing factors and patent texts and between different key influencing factors is obtained, and triplets are formed based on the attributes corresponding to different interactive relationships and key influencing factors. The knowledge graph convolutional neural network is used to learn the graph structure and construct the knowledge graph. 在知识图谱中获取与关键影响因素直接相连的关系边数量计算节点的中心性,利用所述中心性表征关键影响因素的重要程度,根据重要程度选取预设数量关键影响因素,并获取对应指标变量组成约束信息;Obtain the number of relationship edges directly connected to the key influencing factors in the knowledge graph to calculate the centrality of the node, use the centrality to characterize the importance of the key influencing factors, select a preset number of key influencing factors according to the importance, and obtain the corresponding indicator variable composition constraint information; 利用约束信息训练专利质量评估模型,具体为:The patent quality assessment model is trained using constraint information, specifically: 构建专利质量评估模型,通过约束信息中不同类别专利质量指标的训练数据训练对应的编码器,利用不同专利质量指标的编码器在待预测专利信息的文本表征中提取指标特征;Construct a patent quality assessment model, train the corresponding encoders through the training data of different categories of patent quality indicators in the constraint information, and use the encoders of different patent quality indicators to extract indicator features from the text representation of the patent information to be predicted; 将待预测专利信息的指标特征结合文本表征与中心表征平均相似度的倒数输入不同的多层感知机,获取特征重要性矩阵,采用协同注意力获取特征重要性分布的注意力分布,根据加权计算获取待预测专利信息的在不同约束下的表征;The index features of the patent information to be predicted are combined with the inverse of the average similarity between the text representation and the center representation and input into different multi-layer perceptrons to obtain the feature importance matrix, and the attention distribution of the feature importance distribution is obtained by using collaborative attention. The representation of the patent information to be predicted under different constraints is obtained according to weighted calculation; 将指标特征与加权后的表征进行全连接,通过多层感知机交互输出向量,将输出向量转换为概率分布得到预测评价,利用MSE评价指标进行评分,获取待预测专利信息的质量评估结果。The indicator features are fully connected with the weighted representation, and the output vector is converted into a probability distribution through the interaction of the multi-layer perceptron to obtain the prediction evaluation. The MSE evaluation indicator is used for scoring to obtain the quality assessment result of the patent information to be predicted. 2.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法,其特征在于,基于用户输入的专利文本提取关键词进行检索,在专利大数据中生成特征相似符合预设标准的子数据集,具体为:2. A method for pre-application patent quality assessment based on cluster center representation according to claim 1, characterized in that keywords are extracted from patent text input by the user for retrieval, and a sub-dataset with similar features and meeting preset standards is generated in the patent big data, specifically: 获取用户输入的专利文本进行分词预处理,生成专利文本的序列化表示,判断所述专利文本的序列化表示中词向量的词性标签,使用词性标签进行序列标注;Obtain the patent text input by the user for word segmentation preprocessing, generate a serialized representation of the patent text, determine the part-of-speech tags of the word vectors in the serialized representation of the patent text, and use the part-of-speech tags for sequence annotation; 利用Roberta对所述专利文本的序列化表示进行剪裁分块及嵌入表示,获取专利文本的嵌入向量,通过所述词性标签筛选预设短语,基于预设短语的位置特征匹配筛选对应的嵌入向量,将匹配筛选的嵌入向量进行拼接,获取拼接嵌入向量;Using Roberta to trim, segment and embed the serialized representation of the patent text, obtain an embedding vector of the patent text, filter preset phrases through the part-of-speech tags, filter corresponding embedding vectors based on position feature matching of the preset phrases, and splice the matched and filtered embedding vectors to obtain a spliced embedding vector; 在专利文本的嵌入向量中引入自注意力机制,通过自注意力权重的加权强化嵌入向量的特征,并在拼接嵌入向量及嵌入向量之间引入交叉注意力,获取拼接嵌入向量的邻域嵌入向量,进行上下文语义的强化;A self-attention mechanism is introduced into the embedding vector of the patent text. The features of the embedding vector are strengthened by weighting the self-attention weight. Cross-attention is introduced between the concatenated embedding vector and the embedded vector to obtain the neighborhood embedding vector of the concatenated embedding vector and strengthen the contextual semantics. 获取注意力编码后的嵌入向量序列及邻域嵌入向量序列,计算序列中嵌入向量及邻域嵌入向量的相似度,获取相似度符合预设相似度阈值的拼接嵌入向量进行解码,作为关键词的抽取结果;Obtain the embedded vector sequence and the neighborhood embedded vector sequence after attention encoding, calculate the similarity of the embedded vector and the neighborhood embedded vector in the sequence, obtain the concatenated embedded vector whose similarity meets the preset similarity threshold for decoding, and use it as the keyword extraction result; 根据所述关键词建立检索索引,利用所述检索索引在海量的专利大数据中进行关键词特征相似度计算,获取符合预设相似度标准的专利数据构建含有关键词的子数据集。A search index is established based on the keywords, and the keyword feature similarity is calculated in the massive patent big data using the search index to obtain patent data that meets the preset similarity standard to construct a sub-dataset containing the keywords. 3.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法,其特征在于,通过聚类模型生成所述子数据集的中心表征,具体为:3. The method for pre-application patent quality assessment based on cluster center representation according to claim 1 is characterized in that the center representation of the sub-dataset is generated by a clustering model, specifically: 利用麻雀搜索算法对子数据集的初始聚类中心进行寻优,初始化麻雀搜索算法的参数,计算麻雀种群中的适应度值,获取最优适应度值和最差适应度值以及相对应的位置;Use the sparrow search algorithm to optimize the initial cluster center of the sub-dataset, initialize the parameters of the sparrow search algorithm, calculate the fitness value in the sparrow population, and obtain the optimal fitness value and the worst fitness value and the corresponding position; 选取发现者、加入者及侦察者并更新位置,在麻雀的位置更新过程中引入自适应的t分布变异,迭代计算适应度并更新麻雀位置,满足最大迭代次数后输出最佳麻雀位置获取聚类中心矩阵;Select the discoverer, joiner and scout and update the position. In the process of updating the sparrow's position, introduce the adaptive t-distribution variation, iteratively calculate the fitness and update the sparrow's position. After the maximum number of iterations is met, output the best sparrow position to obtain the cluster center matrix. 根据所述聚类中心矩阵获取初始聚类中心,利用欧式距离作为度量函数,将子数据集中的专利数据分配至距离最近的初始聚类中心,所有专利数据分配结束后在不同类簇中更新聚类中心;Obtaining initial cluster centers according to the cluster center matrix, using Euclidean distance as a metric function, assigning patent data in the sub-dataset to the initial cluster center closest to the data, and updating cluster centers in different clusters after all patent data are assigned; 通过迭代聚类获取子数据集的最终聚类结果,根据划分的不同类簇生成子数据集的中心表征。The final clustering result of the sub-dataset is obtained through iterative clustering, and the central representation of the sub-dataset is generated according to the different clusters divided. 4.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法,其特征在于,在用户输入的专利文本中截取待预测专利信息,生成所述待预测专利信息的文本表征,具体为:4. A method for pre-application patent quality assessment based on cluster center representation according to claim 1, characterized in that the patent information to be predicted is intercepted from the patent text input by the user to generate a text representation of the patent information to be predicted, specifically: 根据预设段落位置及指示关键词将用户输入的专利文本进行截取生成待预测专利信息,并提取生成待预测专利信息对应专利文本的嵌入向量,划分词嵌入向量、句嵌入向量及段嵌入向量;According to the preset paragraph position and indicated keywords, the patent text input by the user is intercepted to generate the patent information to be predicted, and the embedding vector of the patent text corresponding to the patent information to be predicted is extracted and divided into word embedding vector, sentence embedding vector and paragraph embedding vector; 将待预测专利信息的嵌入向量导入双向长短期记忆网络,引入注意力机制对不同层级的嵌入向量利用正向LSTM及反向LSTM进行计算,通过隐藏层将正反向计算结果进行运算,输出待预测专利信息嵌入向量对应的语义特征;The embedded vector of the patent information to be predicted is imported into the bidirectional long short-term memory network, and the attention mechanism is introduced to calculate the embedded vectors of different levels using forward LSTM and reverse LSTM. The forward and reverse calculation results are calculated through the hidden layer to output the semantic features corresponding to the embedded vector of the patent information to be predicted; 将所述语义特征根据位置编码与待预测专利信息对应不同层级的嵌入向量进行表示匹配,生成所述待预测专利信息的文本表征。The semantic features are represented and matched with embedding vectors of different levels corresponding to the patent information to be predicted according to the position encoding to generate a text representation of the patent information to be predicted. 5.一种基于聚类中心表征的申请前专利质量评估系统,其特征在于,实现如权利要求1-4任一项所述的基于聚类中心表征的申请前专利质量评估方法,该系统包括:存储器、处理器、用户交互模块、评估数据集生成模块、质量评估模块、数据存储管理模块,存储器及处理器中存储并执行基于聚类中心表征的申请前专利质量评估方法程序;5. A system for evaluating patent quality before application based on cluster center representation, characterized in that it implements the method for evaluating patent quality before application based on cluster center representation as described in any one of claims 1 to 4, and the system comprises: a memory, a processor, a user interaction module, an evaluation data set generation module, a quality evaluation module, and a data storage management module, wherein the program for evaluating patent quality before application based on cluster center representation is stored and executed in the memory and the processor; 用户交互模块,用于用户输入关键词组,确定评估的专利数据子集;以及输入待预测专利信息的信息,作为评估输入窗口;并将系统评估后的结果返回,为用户显示评估结果;The user interaction module is used for the user to input a keyword group to determine the patent data subset to be evaluated; and to input information of the patent information to be predicted as an evaluation input window; and to return the results of the system evaluation and display the evaluation results to the user; 评估数据集生成模块,根据用户提供的关键词组,基于专利大数据集,生成子数据集;The evaluation data set generation module generates sub-data sets based on the patent big data set according to the keyword groups provided by the user; 质量评估模块,负责基于待评估专利信息和子数据集进行质量评估;The quality assessment module is responsible for quality assessment based on the patent information and sub-datasets to be assessed; 数据存储管理模块,负责专利大数据集的存储,以及基于用户关键词组生成的专利子集的存储,便于在非实时评估任务的运行。The data storage management module is responsible for the storage of large patent data sets and the storage of patent subsets generated based on user keyword groups, which facilitates the operation of non-real-time evaluation tasks. 6.根据权利要求5所述的一种基于聚类中心表征的申请前专利质量评估系统,其特征在于,在评估数据集生成模块生成所述子数据集的中心表征,具体为:6. A pre-application patent quality assessment system based on cluster center representation according to claim 5, characterized in that the center representation of the sub-dataset is generated in the assessment data set generation module, specifically: 利用麻雀搜索算法对子数据集的初始聚类中心进行寻优,初始化麻雀搜索算法的参数,计算麻雀种群中的适应度值,获取最优适应度值和最差适应度值以及相对应的位置;Use the sparrow search algorithm to optimize the initial cluster center of the sub-dataset, initialize the parameters of the sparrow search algorithm, calculate the fitness value in the sparrow population, and obtain the optimal fitness value and the worst fitness value and the corresponding position; 选取发现者、加入者及侦察者并更新位置,在麻雀的位置更新过程中引入自适应的t分布变异,迭代计算适应度并更新麻雀位置,满足最大迭代次数后输出最佳麻雀位置获取聚类中心矩阵;Select the discoverer, joiner and scout and update the position. In the process of updating the sparrow's position, introduce the adaptive t-distribution variation, iteratively calculate the fitness and update the sparrow's position. After the maximum number of iterations is met, output the best sparrow position to obtain the cluster center matrix. 根据所述聚类中心矩阵获取初始聚类中心,利用欧式距离作为度量函数,将子数据集中的专利数据分配至距离最近的初始聚类中心,所有专利数据分配结束后在不同类簇中更新聚类中心;Obtaining initial cluster centers according to the cluster center matrix, using Euclidean distance as a metric function, assigning patent data in the sub-dataset to the initial cluster center closest to the data, and updating cluster centers in different clusters after all patent data are assigned; 通过迭代聚类获取子数据集的最终聚类结果,根据划分的不同类簇生成子数据集的中心表征。The final clustering result of the sub-dataset is obtained through iterative clustering, and the central representation of the sub-dataset is generated according to the different clusters divided. 7.根据权利要求5所述的一种基于聚类中心表征的申请前专利质量评估系统,其特征在于,在质量评估模块中获取专利质量评估模型的约束信息,具体为:7. The pre-application patent quality assessment system based on cluster center representation according to claim 5 is characterized in that the constraint information of the patent quality assessment model is obtained in the quality assessment module, specifically: 计算待预测专利信息文本表征与子数据集全部中心表征及相似度,在维度对齐后计算嵌入向量之间的余弦相似度,当所述余弦相似度大于预设阈值时,则提取待预测专利信息在对应位置的语义特征,利用语义特征进行相似度修正;Calculate the text representation of the patent information to be predicted and all the central representations of the sub-datasets and their similarities, calculate the cosine similarity between the embedded vectors after dimensional alignment, and when the cosine similarity is greater than a preset threshold, extract the semantic features of the patent information to be predicted at the corresponding position, and use the semantic features to correct the similarity; 遍历待预测专利信息获取全部相似度,进行绝对值的均值计算,生成平均相似度,对所述平均相似度进行取倒数,生成约束信息之一;Traversing the patent information to be predicted to obtain all similarities, calculating the average of the absolute values to generate an average similarity, taking the inverse of the average similarity to generate one of the constraint information; 利用大数据引擎获取专利质量评价实例,在所述专利质量评价实例中提取专利质量评价指标,对所述专利质量评价指标进行主成分分析识别关键影响因素;Using a big data engine to obtain patent quality evaluation examples, extracting patent quality evaluation indicators from the patent quality evaluation examples, and performing principal component analysis on the patent quality evaluation indicators to identify key influencing factors; 根据专利质量评价实例获取关键影响因素与专利文本之间及不同关键影响因素之间的交互关系,基于不同交互关系及关键影响因素对应的属性组建三元组,利用知识图卷积神经网络学习图结构构建知识图谱;According to the patent quality evaluation examples, the interactive relationship between key influencing factors and patent texts and between different key influencing factors is obtained, and triplets are formed based on the attributes corresponding to different interactive relationships and key influencing factors. The knowledge graph convolutional neural network is used to learn the graph structure and construct the knowledge graph. 在知识图谱中获取与关键影响因素直接相连的关系边数量计算节点的中心性,利用所述中心性表征关键影响因素的重要程度,根据重要程度选取预设数量关键影响因素,并获取对应指标变量组成约束信息。In the knowledge graph, the number of relationship edges directly connected to the key influencing factors is obtained to calculate the centrality of the node, and the centrality is used to characterize the importance of the key influencing factors. A preset number of key influencing factors are selected according to the importance, and the corresponding indicator variable composition constraint information is obtained. 8.根据权利要求5所述的一种基于聚类中心表征的申请前专利质量评估系统,其特征在于,所述质量评估模块中的专利质量评估模型,具体为:8. A pre-application patent quality assessment system based on cluster center representation according to claim 5, characterized in that the patent quality assessment model in the quality assessment module is specifically: 构建专利质量评估模型,通过约束信息中不同类别专利质量指标的训练数据训练对应的编码器,利用不同专利质量指标的编码器在待预测专利信息的文本表征中提取指标特征;Construct a patent quality assessment model, train the corresponding encoders with the training data of different categories of patent quality indicators in the constraint information, and use the encoders of different patent quality indicators to extract indicator features from the text representation of the patent information to be predicted; 将待预测专利信息的指标特征结合文本表征与中心表征平均相似度的倒数输入不同的多层感知机,获取特征重要性矩阵,采用协同注意力获取特征重要性分布的注意力分布,根据加权计算获取待预测专利信息的在不同约束下的表征;The index features of the patent information to be predicted are combined with the inverse of the average similarity between the text representation and the center representation and input into different multi-layer perceptrons to obtain the feature importance matrix, and the attention distribution of the feature importance distribution is obtained by using collaborative attention. The representation of the patent information to be predicted under different constraints is obtained according to weighted calculation; 将指标特征与加权后的表征进行全连接,通过多层感知机交互输出向量,将输出向量转换为概率分布得到预测评价,利用MSE评价指标进行评分,获取待预测专利信息的质量评估结果。The indicator features are fully connected with the weighted representation, and the output vector is converted into a probability distribution through the interaction of the multi-layer perceptron to obtain the prediction evaluation. The MSE evaluation index is used for scoring to obtain the quality assessment result of the patent information to be predicted.
CN202410610670.9A 2024-05-16 2024-05-16 Pre-application patent quality assessment method and system based on cluster center representation Active CN118229465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410610670.9A CN118229465B (en) 2024-05-16 2024-05-16 Pre-application patent quality assessment method and system based on cluster center representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410610670.9A CN118229465B (en) 2024-05-16 2024-05-16 Pre-application patent quality assessment method and system based on cluster center representation

Publications (2)

Publication Number Publication Date
CN118229465A CN118229465A (en) 2024-06-21
CN118229465B true CN118229465B (en) 2025-02-11

Family

ID=91512045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410610670.9A Active CN118229465B (en) 2024-05-16 2024-05-16 Pre-application patent quality assessment method and system based on cluster center representation

Country Status (1)

Country Link
CN (1) CN118229465B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119397028A (en) * 2024-12-31 2025-02-07 山东华智人才科技有限公司 Patent multi-dimensional evaluation method and system based on large language model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196896A (en) * 2023-08-15 2023-12-08 北京理工大学 Patent text similarity prediction method based on structural representation decoupling

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102085217B1 (en) * 2019-10-14 2020-03-04 (주)디앤아이파비스 Method, apparatus and system for determining similarity of patent documents
CN112434151A (en) * 2020-11-26 2021-03-02 重庆知识产权大数据研究院有限公司 Patent recommendation method and device, computer equipment and storage medium
CN112632228A (en) * 2020-12-30 2021-04-09 深圳供电局有限公司 Text mining-based auxiliary bid evaluation method and system
CN117648444B (en) * 2024-01-30 2024-04-30 广东省华南技术转移中心有限公司 Patent clustering method and system based on graph convolution attribute aggregation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196896A (en) * 2023-08-15 2023-12-08 北京理工大学 Patent text similarity prediction method based on structural representation decoupling

Also Published As

Publication number Publication date
CN118229465A (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN109376222A (en) Question and answer matching degree calculation method, question and answer automatic matching method and device
CN116629258B (en) Structured analysis method and system for judicial document based on complex information item data
CN113836896A (en) Patent text abstract generation method and device based on deep learning
CN118735217B (en) Bidirectional matching method and system for scientific research talents and R&D tasks based on graph representation
CN116402630B (en) Financial risk prediction method and system based on characterization learning
CN119049682B (en) Medical asset management big data analysis method and system
CN118229465B (en) Pre-application patent quality assessment method and system based on cluster center representation
CN118468061A (en) Automatic algorithm matching and parameter optimizing method and system
CN117891939A (en) Text classification method combining particle swarm algorithm with CNN convolutional neural network
CN119202300A (en) Remote sensing image cross-modal retrieval method, device and electronic equipment
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN117217277A (en) Pre-training method, device, equipment, storage medium and product of language model
CN118964641B (en) Method and system for building AI knowledge base model for enterprises
CN119166639A (en) Fast data visualization generation method and device based on large language model
CN116932487B (en) Quantized data analysis method and system based on data paragraph division
CN117851600A (en) Text data classification method, apparatus, computer device, storage medium and product
CN117971992A (en) Classification method and device for structured data
CN117390156A (en) Cross-modal-based question-answer dialogue method, system, equipment and storage medium
CN116720517B (en) Search word component recognition model construction method and search word component recognition method
CN116431788A (en) A Semantic Retrieval Method for Cross-modal Data
CN115269998A (en) Information recommendation method, device, electronic device and storage medium
CN117077680A (en) Question and answer intention recognition method and device
CN114580955A (en) A policy recommendation method, system, device and storage medium
CN113157892A (en) User intention processing method and device, computer equipment and storage medium
CN118886427B (en) A prompt word optimization method combining expert evaluation rules and large language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant