CN116975068A - Metadata-based patent document data storage method, device and storage medium - Google Patents
Metadata-based patent document data storage method, device and storage medium Download PDFInfo
- Publication number
- CN116975068A CN116975068A CN202311234829.3A CN202311234829A CN116975068A CN 116975068 A CN116975068 A CN 116975068A CN 202311234829 A CN202311234829 A CN 202311234829A CN 116975068 A CN116975068 A CN 116975068A
- Authority
- CN
- China
- Prior art keywords
- data
- patent document
- feature vector
- metadata
- piece
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013500 data storage Methods 0.000 title claims abstract description 28
- 238000004364 calculation method Methods 0.000 claims abstract description 27
- 238000013135 deep learning Methods 0.000 claims abstract description 15
- 238000013461 design Methods 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 230000008901 benefit Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- 230000014509 gene expression Effects 0.000 claims description 7
- 230000004048 modification Effects 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 6
- 238000013075 data extraction Methods 0.000 claims description 4
- 239000013589 supplement Substances 0.000 claims description 4
- 230000000153 supplemental effect Effects 0.000 claims description 4
- 230000009469 supplementation Effects 0.000 claims description 4
- 230000001502 supplementing effect Effects 0.000 claims description 4
- 239000000470 constituent Substances 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000007726 management method Methods 0.000 abstract description 9
- 238000007405 data analysis Methods 0.000 abstract description 3
- 238000007796 conventional method Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Tourism & Hospitality (AREA)
- Evolutionary Computation (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Technology Law (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Resources & Organizations (AREA)
- Bioinformatics & Computational Biology (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a patent document data storage method, device and storage medium based on metadata, relates to the technical field of metadata, and solves the problem that the conventional method cannot perform standardized management on patent document data. The method comprises the following steps: extracting a plurality of pieces of data in a target patent document according to a patent document metadata template; determining a category of each piece of extracted data based on the document structure; traversing each piece of extracted data, carrying out semantic similarity calculation based on deep learning on the data of the same category, determining the relation between the data of the same category, and merging the data of the same category or similar data; and importing the combined data into a storage table generated according to the patent document metadata template. The method can normalize the patent document data through the metadata, is convenient for unified management application, and enables a user to fully utilize all data in the table, thereby providing powerful support for data analysis.
Description
Technical Field
The application relates to the technical field of metadata, in particular to a patent literature data storage technology based on metadata.
Background
Modern enterprises are increasingly competing, the means of competition being a variety, with competition for intellectual property being one of the important aspects. At present, most enterprises can manage patent information in related or similar technical fields, the enterprises usually use software such as electronic forms to manually record the patent information, but due to the large amount of patent general information required to be managed by the enterprises, the situation that data are different and are easy to tamper, lose, record errors and the like can be caused due to the difference of manual recording and retrieval websites, and the management mode is excessively dependent on manual work and has various uncertain factors. At present, some management software products exist in the market, but the functions of the management software products are complex and are not beneficial to the management of enterprises, so that a simple and easy-to-use intelligent patent information management scheme is needed to be provided to overcome the defect, reduce the management cost of the enterprise intellectual property management work and improve the work efficiency of the enterprise intellectual property management work.
Disclosure of Invention
In order to solve the technical defects, the embodiment of the application provides a spatial geographic data storage method, a spatial geographic data storage device, electronic equipment and a storage medium based on metadata.
An embodiment of a first aspect of the present application provides a method for storing patent document data based on metadata, including the steps of: extracting a plurality of pieces of data in a target patent document according to a patent document metadata template; determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located; traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant; and importing the combined data into a storage table generated according to the patent document metadata template.
In one possible implementation, the category includes one or more of the following: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information.
In one possible implementation manner, determining the category of each piece of extracted data further includes: and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
In one possible implementation manner, performing cosine similarity calculation based on deep learning on the first feature vector and the second feature vector corresponding to the first data and the second data includes: the first eigenvector V is expressed by the following expression t1 And the second eigenvector V t2 Cosine similarity calculation is performed:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the ith group of (2)Element, T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
In one possible implementation manner, the first feature vector V t1 And the second eigenvector V t2 Obtained by the following steps: extracting texts from the first data and the second data to obtain a first text T1 and a second text T2, respectively preprocessing the first text T1 and the second text T2, and respectively pooling the preprocessed data to obtain a first feature vector V comprising n elements t1 And a second eigenvector V t2 。
In one possible implementation manner, the method further includes: and for the data in the storage table, according to a preset rule, combining the local data in the storage table, and supplementing the data in the storage table completely, or receiving the supplement or modification of the data in the storage table by a user.
In one possible implementation manner, the method further includes: the supplemental or modified data is marked.
An embodiment of the second aspect of the present application further provides a metadata-based patent document data storage device, including: a data extraction module for extracting a plurality of pieces of data in the target patent document according to the patent document metadata template; the data classification module is used for determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located; the data fusion module is used for traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant; and the data storage module is used for importing the combined data into a storage table generated according to the patent document metadata template.
An embodiment of the third aspect of the present application further provides an electronic device, including: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method described above.
The fourth aspect embodiment of the present application also provides a computer-readable storage medium having a computer program stored thereon; the computer program is executed by a processor to implement the method described above.
According to the metadata-based patent document data storage method and device provided by the embodiment of the application, the metadata is used for standardizing the patent document data, so that unified management application is facilitated, a user can fully utilize all data in the table, and powerful support is provided for data analysis.
Drawings
FIG. 1 is a schematic diagram of an electronic device 100 according to one embodiment of the application;
FIG. 2 is a flow chart of a metadata-based patent document data storage method 200 according to one embodiment of the present application;
fig. 3 is a schematic structural view of a metadata-based patent document data storage device 300 according to one embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
Fig. 1 shows a schematic diagram of an electronic device 100 according to an embodiment of the application. Note that, the electronic device 100 shown in fig. 1 is only an example, and in practice, the electronic device used to implement the metadata-based patent document data storage method of the present application may be any type of device, and the hardware configuration may be the same as the electronic device 100 shown in fig. 1 or may be different from the electronic device 100 shown in fig. 1. In practice, the electronic device used to implement the metadata-based patent document data storage method of the present application may add or delete hardware components of the electronic device 100 shown in fig. 1, and the present application is not limited to the specific hardware configuration of the electronic device.
As shown in fig. 1, electronic device 100 typically includes one or more processors 110, and memory 120. Memory bus 130 may be used for communication between processor 110 and memory 120.
The memory 120 has stored therein operating system program instructions 121 and application program instructions 122, the application running on top of the operating system. When the electronic device 100 starts up, the processor 110 reads the operating system program instructions 121 from the memory 120 and executes them. When a user launches an application, the processor 110 reads and executes the application instructions 122 from the memory 120. The memory 120 also stores application data 123, where the application data 123 is data that may be used during the running process of the application, such as a table.
In the electronic device 100 according to the present application, the application program instructions 122 include computer program instructions for performing the metadata-based patent document data storage method 200 of the present application, which may instruct the processor 110 to perform the metadata-based spatial geographic data storage method 200 of the present application.
Fig. 2 shows a flowchart of a metadata-based patent document data storage method 200 according to one embodiment of the present application, the method 200 being performed in an electronic device (e.g., the aforementioned electronic device 100). As shown in fig. 2, the method 200 begins at step S210.
S210, extracting a plurality of pieces of data in the target patent document according to the patent document metadata template.
S220, determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located.
S230, traversing each piece of extracted data, carrying out semantic similarity calculation based on deep learning on the data of the same category, determining the relation between the data of the same category, and merging the data of the same category or similar data; specifically, cosine similarity calculation based on deep learning is carried out on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant.
S240, importing the combined data into a storage table generated according to the patent document metadata template.
The metadata template in the embodiment of the application is a template in a form of a table, and the template contains data of a plurality of categories in patent documents. The categories include, but are not limited to: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information. In practice, the metadata template may further include information such as the number of pictures, abstract drawings, and the like. After a user requests to store patent document data, the metadata-based patent document data storage method 200 in the embodiment of the present application generates a storage table according to the patent document metadata template.
And extracting a plurality of related data from the patent literature according to the patent literature metadata template. For patent documents, each document structure of the patent document usually only describes one core theme, for example, the background section only describes the problems existing in the prior art, and other themes are not involved. The background section may have both positive and negative descriptions that may be considered to be detailed descriptions of problems with the prior art. For example, the final part of the summary generally demonstrates the technical effects of the patent literature, including design demonstration information and advantage and disadvantage information. In step S220 of the embodiment of the present application, the category of the extracted data is determined based on the document structure characteristics of the patent document.
In the embodiment of the present application, step S220 further includes, after determining the category of the extracted data according to the document structure of the target patent document where the extracted data is located; and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
For patent documents, in the detailed description section, a plurality of subjects or categories may be described, and when the category of the corresponding data cannot be determined according to the document structure, the category of the corresponding data may be determined by using a semantic analysis method.
There are repeated expressions of entities and relationships in the same patent document, usually in different expressions, or in different sentence patterns, grammars or synonyms, which are repeated to express the same meaning. The multiple expressions of the natural language do not have contradictory conflict problems in terms of semantic content and do not affect conceptual confusion of designers, and in the embodiment of the application, the data of the same category is extracted from different document structures, and the situation that the same meaning is repeatedly expressed by different sentence patterns, grammars or synonyms exists, so that the extracted data needs to be subjected to data fusion, and in the embodiment of the application, the semantic similarity calculation based on deep learning on the data of the same category in step S230 comprises: calculating a first eigenvector V corresponding to the first data and the second data of the same class t1 And a second eigenvector V t2 Cosine similarity of (c); and comparing the cosine similarity calculation result with a preset threshold value, and determining the relation between the first data and the second data.
In the embodiment of the application, for the first data A1 and the second data A2 of the same class, text is extracted from the first data A1 and the second data A2 to obtain a first text T1And a second text T2, performing model processing based on deep learning on the first text T1 and the second text T2 respectively, and performing pooling operation on the processed data respectively to obtain a first feature vector V with a preset length t1 And a second eigenvector V t2 For the first feature vector V t1 And the second eigenvector V t2 And (3) performing cosine similarity calculation, comparing the cosine similarity calculation result with a preset threshold value, and determining the relation between the data of the same category.
The relationship between data in the embodiment of the application comprises the same, similar and irrelevant, wherein a model based on deep learning is trained in the following way: and acquiring a preset number of patent documents, carrying out pairwise comparison analysis on the data of the same category in the same patent document, adopting a sentence converter neural network algorithm based on a deep learning model, converting the data pairs of the same category in the patent document into feature vector pairs, and determining the model similarity coefficient based on the deep learning and a corresponding threshold value by using cosine similarity calculation of text semantics.
In the embodiment of the application, the first characteristic vector V corresponding to the first data and the second data of the same class is calculated t1 And a second eigenvector V t2 The cosine similarity of (2) includes:
the first eigenvector V t1 And the second eigenvector V t2 Cosine similarity calculation is performed by the following expression:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the ith constituent element of T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
The metadata-based patent document data storage method 200 provided by the embodiment of the application may further include: and for the data in the storage table, according to a preset rule, combining the local data in the storage table, and supplementing the data in the storage table completely, or receiving the supplement or modification of the data in the storage table by a user.
In an embodiment of the present application, the metadata-based patent document data storage method 200 further includes: the supplemental or modified data is marked.
For various reasons, some of the category data is not extracted, such as design arguments, or the abstract drawing is empty in the table. In order to facilitate subsequent data analysis, the data of these voids may be supplemented to completion in embodiments of the present application. In particular, the following rules are possible.
1. In combination with other patent documents of the designer and/or the applicant, the design intent, design demonstration information, advantage and disadvantage information and other data are supplemented completely, for example, the first step of the scheme of the current patent document is to acquire the A information, the later step is to process or analyze the A information, and the scheme of acquiring the A information is specifically set forth in the patent documents of the same designer and applicant applied previously, so that the vacant data are supplemented completely.
2. The technical problem information is supplemented and completed by combining the reference relation in the patent literature, for example, another patent literature in the prior art is cited in the background art, corresponding information such as application numbers, application names and the like is given, and the technical problem information is supplemented and completed according to the content cited in the current patent literature.
3. The information such as abstract drawings is supplemented to be complete by default values, for example, patent documents have no drawings or have drawings but no abstract drawings, and the vacant data is supplemented to be complete by default values.
The reliability of the supplemental data is relatively low compared to the actual acquired data, and therefore, the predicted data in the table may be marked for distinction from other information.
Embodiments of the present application also provide a metadata-based patent document data storage apparatus 300 capable of performing the respective step processes of the metadata-based patent document data storage method 200 as described above. The above-described apparatus 300 is described below in connection with fig. 3.
As shown in fig. 3, the apparatus 300 includes a data extraction module 310, a data classification module 320, a data fusion module 330, and a data storage module 340.
A data extraction module 310 for extracting pieces of data in the target patent document according to the patent document metadata template; a data classification module 320, configured to determine a category of each piece of extracted data according to a document structure of the target patent document where each piece of extracted data is located; the data fusion module 330 is configured to traverse each piece of extracted data, and perform cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant; the data storage module 340 is configured to import the merged data into a storage table generated according to the patent document metadata template.
As a preferred embodiment of the present application, the categories in the data classification module 320 include one or more of the following: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information.
As a preferred embodiment of the present application, the data classification module 320 is further configured to: and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
As a preferred embodiment of the present application, the data fusion module 330 performs semantic similarity calculation based on deep learning on the data of the same category, including: calculating a first eigenvector V corresponding to the first data and the second data of the same class t1 And a second eigenvector V t2 Cosine similarity of (c); and comparing the cosine similarity calculation result with a preset threshold value, and determining the relation between the first data and the second data.
As a preferred embodiment of the present application, the data fusion module 330 calculates the first feature vector V corresponding to the first data and the second data of the same class t1 And a second eigenvector V t2 The cosine similarity of (2) includes: the first eigenvector V t1 And the second eigenvector V t2 Cosine similarity calculation is performed by the following expression:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the ith constituent element of T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
As a preferred embodiment of the present application, the metadata-based patent document data storage apparatus 300 further includes: and the data editing module is used for supplementing the data in the storage table completely according to the preset rule and combining the local data in the storage table, or receiving the supplement or modification of the data in the storage table by a user.
As a preferred embodiment of the present application, the metadata-based patent document data storage device 300 further includes a data marking module for marking supplementary or modified data.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods, apparatus and devices of the present application.
While the application has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments are contemplated within the scope of the application as described herein. In addition, various modifications and alterations of this application may be made by those skilled in the art without departing from the spirit and scope of this application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (10)
1. A patent document data storage method based on metadata, characterized by comprising the steps of:
extracting a plurality of pieces of data in a target patent document according to a patent document metadata template;
determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located;
traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant;
and importing the combined data into a storage table generated according to the patent document metadata template.
2. The method of claim 1, wherein the categories include one or more of: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information.
3. The method of claim 2, wherein determining the category of each piece of extracted data based on the document structure of the target patent document in which each piece of extracted data is located further comprises:
and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
4. The method of claim 1, wherein performing deep learning based cosine similarity calculation on the first feature vector and the second feature vector corresponding to the first data and the second data comprises:
the first eigenvector V is expressed by the following expression t1 And the second eigenvector V t2 Cosine similarity calculation is performed:
Sim(T 1 , T 2 ) = cos(θ) ==/>;
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the first of (2)i constituent elements, T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
5. The method of claim 4, wherein the first feature vector V t1 And the second eigenvector V t2 Obtained by the following steps:
extracting texts from the first data and the second data to obtain a first text T1 and a second text T2, respectively preprocessing the first text T1 and the second text T2, and respectively pooling the preprocessed data to obtain a first feature vector V comprising n elements t1 And a second eigenvector V t2 。
6. The method of any one of claims 1 to 5, wherein the method further comprises:
and combining the partial data in the storage table according to a preset rule for the data in the storage table, supplementing the data in the storage table completely, or,
and receiving the supplement or modification of the data in the storage form by a user.
7. The method of claim 6, wherein the method further comprises:
the supplemental or modified data is marked.
8. A metadata-based patent document data storage device, comprising:
a data extraction module for extracting a plurality of pieces of data in the target patent document according to the patent document metadata template;
the data classification module is used for determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located;
the data fusion module is used for traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant;
and the data storage module is used for importing the combined data into a storage table generated according to the patent document metadata template.
9. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized by a computer program stored thereon; the computer program being executed by a processor to implement the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311234829.3A CN116975068A (en) | 2023-09-25 | 2023-09-25 | Metadata-based patent document data storage method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311234829.3A CN116975068A (en) | 2023-09-25 | 2023-09-25 | Metadata-based patent document data storage method, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116975068A true CN116975068A (en) | 2023-10-31 |
Family
ID=88473478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311234829.3A Pending CN116975068A (en) | 2023-09-25 | 2023-09-25 | Metadata-based patent document data storage method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116975068A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117236435A (en) * | 2023-11-08 | 2023-12-15 | 中国标准化研究院 | Knowledge fusion method, device and storage medium of design rationality knowledge network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105808524A (en) * | 2016-03-11 | 2016-07-27 | 江苏畅远信息科技有限公司 | Patent document abstract-based automatic patent classification method |
CN107122382A (en) * | 2017-02-16 | 2017-09-01 | 江苏大学 | A kind of patent classification method based on specification |
CN112257419A (en) * | 2020-11-06 | 2021-01-22 | 开普云信息科技股份有限公司 | Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof |
CN115481636A (en) * | 2022-09-14 | 2022-12-16 | 电子科技大学 | Technical efficacy matrix construction method for technical literature |
-
2023
- 2023-09-25 CN CN202311234829.3A patent/CN116975068A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105808524A (en) * | 2016-03-11 | 2016-07-27 | 江苏畅远信息科技有限公司 | Patent document abstract-based automatic patent classification method |
CN107122382A (en) * | 2017-02-16 | 2017-09-01 | 江苏大学 | A kind of patent classification method based on specification |
CN112257419A (en) * | 2020-11-06 | 2021-01-22 | 开普云信息科技股份有限公司 | Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof |
CN115481636A (en) * | 2022-09-14 | 2022-12-16 | 电子科技大学 | Technical efficacy matrix construction method for technical literature |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117236435A (en) * | 2023-11-08 | 2023-12-15 | 中国标准化研究院 | Knowledge fusion method, device and storage medium of design rationality knowledge network |
CN117236435B (en) * | 2023-11-08 | 2024-01-30 | 中国标准化研究院 | Knowledge fusion method, device and storage medium of design rationality knowledge network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190228320A1 (en) | Method, system and terminal for normalizing entities in a knowledge base, and computer readable storage medium | |
US11651014B2 (en) | Source code retrieval | |
US9189504B2 (en) | Application source code scanning for database migration | |
CN112131881B (en) | Information extraction method and device, electronic equipment and storage medium | |
CN111144370B (en) | Document element extraction method, device, equipment and storage medium | |
US11556812B2 (en) | Method and device for acquiring data model in knowledge graph, and medium | |
WO2019028990A1 (en) | Code element naming method, device, electronic equipment and medium | |
CN111858913A (en) | Method and system for automatically generating text abstract | |
CN110750977B (en) | Text similarity calculation method and system | |
WO2024067276A1 (en) | Video tag determination method and apparatus, device and medium | |
CN111177328B (en) | Question-answer matching system and method, question-answer processing device and medium | |
WO2022042297A1 (en) | Text clustering method, apparatus, electronic device, and storage medium | |
CN112613315B (en) | Text knowledge automatic extraction method, device, equipment and storage medium | |
CN116975068A (en) | Metadata-based patent document data storage method, device and storage medium | |
CN107958068B (en) | Language model smoothing method based on entity knowledge base | |
CN114676705B (en) | Dialogue relation processing method, computer and readable storage medium | |
CN113741864A (en) | Automatic design method and system of semantic service interface based on natural language processing | |
CN116029280A (en) | Method, device, computing equipment and storage medium for extracting key information of document | |
CN113705207A (en) | Grammar error recognition method and device | |
CN113722431B (en) | Named entity relationship identification method and device, electronic equipment and storage medium | |
CN114996360B (en) | Data analysis method, system, readable storage medium and computer equipment | |
CN115964474A (en) | Policy keyword extraction method and device, storage medium and electronic equipment | |
CN116757205A (en) | Entity relation extraction method and device based on ontology knowledge enhancement | |
US11423228B2 (en) | Weakly supervised semantic entity recognition using general and target domain knowledge | |
CN115759085A (en) | Information prediction method and device based on prompt model, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20231031 |
|
RJ01 | Rejection of invention patent application after publication |