CN116975068A - Metadata-based patent document data storage method, device and storage medium - Google Patents

Metadata-based patent document data storage method, device and storage medium Download PDF

Info

Publication number
CN116975068A
CN116975068A CN202311234829.3A CN202311234829A CN116975068A CN 116975068 A CN116975068 A CN 116975068A CN 202311234829 A CN202311234829 A CN 202311234829A CN 116975068 A CN116975068 A CN 116975068A
Authority
CN
China
Prior art keywords
data
patent document
feature vector
metadata
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311234829.3A
Other languages
Chinese (zh)
Inventor
孙广芝
王淑敏
隋媛
李岭岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Institute of Standardization
Original Assignee
China National Institute of Standardization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Institute of Standardization filed Critical China National Institute of Standardization
Priority to CN202311234829.3A priority Critical patent/CN116975068A/en
Publication of CN116975068A publication Critical patent/CN116975068A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Resources & Organizations (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a patent document data storage method, device and storage medium based on metadata, relates to the technical field of metadata, and solves the problem that the conventional method cannot perform standardized management on patent document data. The method comprises the following steps: extracting a plurality of pieces of data in a target patent document according to a patent document metadata template; determining a category of each piece of extracted data based on the document structure; traversing each piece of extracted data, carrying out semantic similarity calculation based on deep learning on the data of the same category, determining the relation between the data of the same category, and merging the data of the same category or similar data; and importing the combined data into a storage table generated according to the patent document metadata template. The method can normalize the patent document data through the metadata, is convenient for unified management application, and enables a user to fully utilize all data in the table, thereby providing powerful support for data analysis.

Description

Metadata-based patent document data storage method, device and storage medium
Technical Field
The application relates to the technical field of metadata, in particular to a patent literature data storage technology based on metadata.
Background
Modern enterprises are increasingly competing, the means of competition being a variety, with competition for intellectual property being one of the important aspects. At present, most enterprises can manage patent information in related or similar technical fields, the enterprises usually use software such as electronic forms to manually record the patent information, but due to the large amount of patent general information required to be managed by the enterprises, the situation that data are different and are easy to tamper, lose, record errors and the like can be caused due to the difference of manual recording and retrieval websites, and the management mode is excessively dependent on manual work and has various uncertain factors. At present, some management software products exist in the market, but the functions of the management software products are complex and are not beneficial to the management of enterprises, so that a simple and easy-to-use intelligent patent information management scheme is needed to be provided to overcome the defect, reduce the management cost of the enterprise intellectual property management work and improve the work efficiency of the enterprise intellectual property management work.
Disclosure of Invention
In order to solve the technical defects, the embodiment of the application provides a spatial geographic data storage method, a spatial geographic data storage device, electronic equipment and a storage medium based on metadata.
An embodiment of a first aspect of the present application provides a method for storing patent document data based on metadata, including the steps of: extracting a plurality of pieces of data in a target patent document according to a patent document metadata template; determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located; traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant; and importing the combined data into a storage table generated according to the patent document metadata template.
In one possible implementation, the category includes one or more of the following: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information.
In one possible implementation manner, determining the category of each piece of extracted data further includes: and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
In one possible implementation manner, performing cosine similarity calculation based on deep learning on the first feature vector and the second feature vector corresponding to the first data and the second data includes: the first eigenvector V is expressed by the following expression t1 And the second eigenvector V t2 Cosine similarity calculation is performed:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the ith group of (2)Element, T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
In one possible implementation manner, the first feature vector V t1 And the second eigenvector V t2 Obtained by the following steps: extracting texts from the first data and the second data to obtain a first text T1 and a second text T2, respectively preprocessing the first text T1 and the second text T2, and respectively pooling the preprocessed data to obtain a first feature vector V comprising n elements t1 And a second eigenvector V t2
In one possible implementation manner, the method further includes: and for the data in the storage table, according to a preset rule, combining the local data in the storage table, and supplementing the data in the storage table completely, or receiving the supplement or modification of the data in the storage table by a user.
In one possible implementation manner, the method further includes: the supplemental or modified data is marked.
An embodiment of the second aspect of the present application further provides a metadata-based patent document data storage device, including: a data extraction module for extracting a plurality of pieces of data in the target patent document according to the patent document metadata template; the data classification module is used for determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located; the data fusion module is used for traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant; and the data storage module is used for importing the combined data into a storage table generated according to the patent document metadata template.
An embodiment of the third aspect of the present application further provides an electronic device, including: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method described above.
The fourth aspect embodiment of the present application also provides a computer-readable storage medium having a computer program stored thereon; the computer program is executed by a processor to implement the method described above.
According to the metadata-based patent document data storage method and device provided by the embodiment of the application, the metadata is used for standardizing the patent document data, so that unified management application is facilitated, a user can fully utilize all data in the table, and powerful support is provided for data analysis.
Drawings
FIG. 1 is a schematic diagram of an electronic device 100 according to one embodiment of the application;
FIG. 2 is a flow chart of a metadata-based patent document data storage method 200 according to one embodiment of the present application;
fig. 3 is a schematic structural view of a metadata-based patent document data storage device 300 according to one embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
Fig. 1 shows a schematic diagram of an electronic device 100 according to an embodiment of the application. Note that, the electronic device 100 shown in fig. 1 is only an example, and in practice, the electronic device used to implement the metadata-based patent document data storage method of the present application may be any type of device, and the hardware configuration may be the same as the electronic device 100 shown in fig. 1 or may be different from the electronic device 100 shown in fig. 1. In practice, the electronic device used to implement the metadata-based patent document data storage method of the present application may add or delete hardware components of the electronic device 100 shown in fig. 1, and the present application is not limited to the specific hardware configuration of the electronic device.
As shown in fig. 1, electronic device 100 typically includes one or more processors 110, and memory 120. Memory bus 130 may be used for communication between processor 110 and memory 120.
The memory 120 has stored therein operating system program instructions 121 and application program instructions 122, the application running on top of the operating system. When the electronic device 100 starts up, the processor 110 reads the operating system program instructions 121 from the memory 120 and executes them. When a user launches an application, the processor 110 reads and executes the application instructions 122 from the memory 120. The memory 120 also stores application data 123, where the application data 123 is data that may be used during the running process of the application, such as a table.
In the electronic device 100 according to the present application, the application program instructions 122 include computer program instructions for performing the metadata-based patent document data storage method 200 of the present application, which may instruct the processor 110 to perform the metadata-based spatial geographic data storage method 200 of the present application.
Fig. 2 shows a flowchart of a metadata-based patent document data storage method 200 according to one embodiment of the present application, the method 200 being performed in an electronic device (e.g., the aforementioned electronic device 100). As shown in fig. 2, the method 200 begins at step S210.
S210, extracting a plurality of pieces of data in the target patent document according to the patent document metadata template.
S220, determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located.
S230, traversing each piece of extracted data, carrying out semantic similarity calculation based on deep learning on the data of the same category, determining the relation between the data of the same category, and merging the data of the same category or similar data; specifically, cosine similarity calculation based on deep learning is carried out on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant.
S240, importing the combined data into a storage table generated according to the patent document metadata template.
The metadata template in the embodiment of the application is a template in a form of a table, and the template contains data of a plurality of categories in patent documents. The categories include, but are not limited to: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information. In practice, the metadata template may further include information such as the number of pictures, abstract drawings, and the like. After a user requests to store patent document data, the metadata-based patent document data storage method 200 in the embodiment of the present application generates a storage table according to the patent document metadata template.
And extracting a plurality of related data from the patent literature according to the patent literature metadata template. For patent documents, each document structure of the patent document usually only describes one core theme, for example, the background section only describes the problems existing in the prior art, and other themes are not involved. The background section may have both positive and negative descriptions that may be considered to be detailed descriptions of problems with the prior art. For example, the final part of the summary generally demonstrates the technical effects of the patent literature, including design demonstration information and advantage and disadvantage information. In step S220 of the embodiment of the present application, the category of the extracted data is determined based on the document structure characteristics of the patent document.
In the embodiment of the present application, step S220 further includes, after determining the category of the extracted data according to the document structure of the target patent document where the extracted data is located; and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
For patent documents, in the detailed description section, a plurality of subjects or categories may be described, and when the category of the corresponding data cannot be determined according to the document structure, the category of the corresponding data may be determined by using a semantic analysis method.
There are repeated expressions of entities and relationships in the same patent document, usually in different expressions, or in different sentence patterns, grammars or synonyms, which are repeated to express the same meaning. The multiple expressions of the natural language do not have contradictory conflict problems in terms of semantic content and do not affect conceptual confusion of designers, and in the embodiment of the application, the data of the same category is extracted from different document structures, and the situation that the same meaning is repeatedly expressed by different sentence patterns, grammars or synonyms exists, so that the extracted data needs to be subjected to data fusion, and in the embodiment of the application, the semantic similarity calculation based on deep learning on the data of the same category in step S230 comprises: calculating a first eigenvector V corresponding to the first data and the second data of the same class t1 And a second eigenvector V t2 Cosine similarity of (c); and comparing the cosine similarity calculation result with a preset threshold value, and determining the relation between the first data and the second data.
In the embodiment of the application, for the first data A1 and the second data A2 of the same class, text is extracted from the first data A1 and the second data A2 to obtain a first text T1And a second text T2, performing model processing based on deep learning on the first text T1 and the second text T2 respectively, and performing pooling operation on the processed data respectively to obtain a first feature vector V with a preset length t1 And a second eigenvector V t2 For the first feature vector V t1 And the second eigenvector V t2 And (3) performing cosine similarity calculation, comparing the cosine similarity calculation result with a preset threshold value, and determining the relation between the data of the same category.
The relationship between data in the embodiment of the application comprises the same, similar and irrelevant, wherein a model based on deep learning is trained in the following way: and acquiring a preset number of patent documents, carrying out pairwise comparison analysis on the data of the same category in the same patent document, adopting a sentence converter neural network algorithm based on a deep learning model, converting the data pairs of the same category in the patent document into feature vector pairs, and determining the model similarity coefficient based on the deep learning and a corresponding threshold value by using cosine similarity calculation of text semantics.
In the embodiment of the application, the first characteristic vector V corresponding to the first data and the second data of the same class is calculated t1 And a second eigenvector V t2 The cosine similarity of (2) includes:
the first eigenvector V t1 And the second eigenvector V t2 Cosine similarity calculation is performed by the following expression:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the ith constituent element of T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
The metadata-based patent document data storage method 200 provided by the embodiment of the application may further include: and for the data in the storage table, according to a preset rule, combining the local data in the storage table, and supplementing the data in the storage table completely, or receiving the supplement or modification of the data in the storage table by a user.
In an embodiment of the present application, the metadata-based patent document data storage method 200 further includes: the supplemental or modified data is marked.
For various reasons, some of the category data is not extracted, such as design arguments, or the abstract drawing is empty in the table. In order to facilitate subsequent data analysis, the data of these voids may be supplemented to completion in embodiments of the present application. In particular, the following rules are possible.
1. In combination with other patent documents of the designer and/or the applicant, the design intent, design demonstration information, advantage and disadvantage information and other data are supplemented completely, for example, the first step of the scheme of the current patent document is to acquire the A information, the later step is to process or analyze the A information, and the scheme of acquiring the A information is specifically set forth in the patent documents of the same designer and applicant applied previously, so that the vacant data are supplemented completely.
2. The technical problem information is supplemented and completed by combining the reference relation in the patent literature, for example, another patent literature in the prior art is cited in the background art, corresponding information such as application numbers, application names and the like is given, and the technical problem information is supplemented and completed according to the content cited in the current patent literature.
3. The information such as abstract drawings is supplemented to be complete by default values, for example, patent documents have no drawings or have drawings but no abstract drawings, and the vacant data is supplemented to be complete by default values.
The reliability of the supplemental data is relatively low compared to the actual acquired data, and therefore, the predicted data in the table may be marked for distinction from other information.
Embodiments of the present application also provide a metadata-based patent document data storage apparatus 300 capable of performing the respective step processes of the metadata-based patent document data storage method 200 as described above. The above-described apparatus 300 is described below in connection with fig. 3.
As shown in fig. 3, the apparatus 300 includes a data extraction module 310, a data classification module 320, a data fusion module 330, and a data storage module 340.
A data extraction module 310 for extracting pieces of data in the target patent document according to the patent document metadata template; a data classification module 320, configured to determine a category of each piece of extracted data according to a document structure of the target patent document where each piece of extracted data is located; the data fusion module 330 is configured to traverse each piece of extracted data, and perform cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant; the data storage module 340 is configured to import the merged data into a storage table generated according to the patent document metadata template.
As a preferred embodiment of the present application, the categories in the data classification module 320 include one or more of the following: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information.
As a preferred embodiment of the present application, the data classification module 320 is further configured to: and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
As a preferred embodiment of the present application, the data fusion module 330 performs semantic similarity calculation based on deep learning on the data of the same category, including: calculating a first eigenvector V corresponding to the first data and the second data of the same class t1 And a second eigenvector V t2 Cosine similarity of (c); and comparing the cosine similarity calculation result with a preset threshold value, and determining the relation between the first data and the second data.
As a preferred embodiment of the present application, the data fusion module 330 calculates the first feature vector V corresponding to the first data and the second data of the same class t1 And a second eigenvector V t2 The cosine similarity of (2) includes: the first eigenvector V t1 And the second eigenvector V t2 Cosine similarity calculation is performed by the following expression:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the ith constituent element of T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
As a preferred embodiment of the present application, the metadata-based patent document data storage apparatus 300 further includes: and the data editing module is used for supplementing the data in the storage table completely according to the preset rule and combining the local data in the storage table, or receiving the supplement or modification of the data in the storage table by a user.
As a preferred embodiment of the present application, the metadata-based patent document data storage device 300 further includes a data marking module for marking supplementary or modified data.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods, apparatus and devices of the present application.
While the application has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments are contemplated within the scope of the application as described herein. In addition, various modifications and alterations of this application may be made by those skilled in the art without departing from the spirit and scope of this application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A patent document data storage method based on metadata, characterized by comprising the steps of:
extracting a plurality of pieces of data in a target patent document according to a patent document metadata template;
determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located;
traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant;
and importing the combined data into a storage table generated according to the patent document metadata template.
2. The method of claim 1, wherein the categories include one or more of: name, designer, applicant, application number, application date, class number, technical problem information, design intent information, design demonstration information, design scheme information, and advantage and disadvantage information.
3. The method of claim 2, wherein determining the category of each piece of extracted data based on the document structure of the target patent document in which each piece of extracted data is located further comprises:
and carrying out semantic analysis on the data of which the category cannot be determined according to the document structure, and determining the category of the corresponding data according to the semantic analysis result.
4. The method of claim 1, wherein performing deep learning based cosine similarity calculation on the first feature vector and the second feature vector corresponding to the first data and the second data comprises:
the first eigenvector V is expressed by the following expression t1 And the second eigenvector V t2 Cosine similarity calculation is performed:
Sim(T 1 , T 2 ) = cos(θ) ==/>
wherein θ is the first feature vector V t1 And the second eigenvector V t2 Included angle V of t1i And V t2i Respectively the first feature vector V t1 And the second eigenvector V t2 Is the first of (2)i constituent elements, T 1 And T 2 Text corresponding to the first data and the second data respectively, n is the number of feature vector elements, ||V t1 I and I V t2 I is the first feature vector V respectively t1 And the second eigenvector V t2 Is a mold of (a).
5. The method of claim 4, wherein the first feature vector V t1 And the second eigenvector V t2 Obtained by the following steps:
extracting texts from the first data and the second data to obtain a first text T1 and a second text T2, respectively preprocessing the first text T1 and the second text T2, and respectively pooling the preprocessed data to obtain a first feature vector V comprising n elements t1 And a second eigenvector V t2
6. The method of any one of claims 1 to 5, wherein the method further comprises:
and combining the partial data in the storage table according to a preset rule for the data in the storage table, supplementing the data in the storage table completely, or,
and receiving the supplement or modification of the data in the storage form by a user.
7. The method of claim 6, wherein the method further comprises:
the supplemental or modified data is marked.
8. A metadata-based patent document data storage device, comprising:
a data extraction module for extracting a plurality of pieces of data in the target patent document according to the patent document metadata template;
the data classification module is used for determining the category of each piece of extracted data according to the document structure of the target patent document where each piece of extracted data is located;
the data fusion module is used for traversing each piece of extracted data, and performing cosine similarity calculation based on deep learning on a first feature vector and a second feature vector corresponding to the first data and the second data; comparing the cosine similarity calculation result with a preset threshold value, determining the relation between the first data and the second data, and combining the first data and the second data with the same or similar relation; the first data and the second data are the same class of data, and the relationship between the first data and the second data comprises the same, similar or irrelevant;
and the data storage module is used for importing the combined data into a storage table generated according to the patent document metadata template.
9. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized by a computer program stored thereon; the computer program being executed by a processor to implement the method of any one of claims 1 to 7.
CN202311234829.3A 2023-09-25 2023-09-25 Metadata-based patent document data storage method, device and storage medium Pending CN116975068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311234829.3A CN116975068A (en) 2023-09-25 2023-09-25 Metadata-based patent document data storage method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311234829.3A CN116975068A (en) 2023-09-25 2023-09-25 Metadata-based patent document data storage method, device and storage medium

Publications (1)

Publication Number Publication Date
CN116975068A true CN116975068A (en) 2023-10-31

Family

ID=88473478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311234829.3A Pending CN116975068A (en) 2023-09-25 2023-09-25 Metadata-based patent document data storage method, device and storage medium

Country Status (1)

Country Link
CN (1) CN116975068A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236435A (en) * 2023-11-08 2023-12-15 中国标准化研究院 Knowledge fusion method, device and storage medium of design rationality knowledge network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808524A (en) * 2016-03-11 2016-07-27 江苏畅远信息科技有限公司 Patent document abstract-based automatic patent classification method
CN107122382A (en) * 2017-02-16 2017-09-01 江苏大学 A kind of patent classification method based on specification
CN112257419A (en) * 2020-11-06 2021-01-22 开普云信息科技股份有限公司 Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof
CN115481636A (en) * 2022-09-14 2022-12-16 电子科技大学 Technical efficacy matrix construction method for technical literature

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808524A (en) * 2016-03-11 2016-07-27 江苏畅远信息科技有限公司 Patent document abstract-based automatic patent classification method
CN107122382A (en) * 2017-02-16 2017-09-01 江苏大学 A kind of patent classification method based on specification
CN112257419A (en) * 2020-11-06 2021-01-22 开普云信息科技股份有限公司 Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof
CN115481636A (en) * 2022-09-14 2022-12-16 电子科技大学 Technical efficacy matrix construction method for technical literature

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236435A (en) * 2023-11-08 2023-12-15 中国标准化研究院 Knowledge fusion method, device and storage medium of design rationality knowledge network
CN117236435B (en) * 2023-11-08 2024-01-30 中国标准化研究院 Knowledge fusion method, device and storage medium of design rationality knowledge network

Similar Documents

Publication Publication Date Title
US20190228320A1 (en) Method, system and terminal for normalizing entities in a knowledge base, and computer readable storage medium
US11651014B2 (en) Source code retrieval
US9189504B2 (en) Application source code scanning for database migration
CN112131881B (en) Information extraction method and device, electronic equipment and storage medium
CN111144370B (en) Document element extraction method, device, equipment and storage medium
US11556812B2 (en) Method and device for acquiring data model in knowledge graph, and medium
WO2019028990A1 (en) Code element naming method, device, electronic equipment and medium
CN111858913A (en) Method and system for automatically generating text abstract
CN110750977B (en) Text similarity calculation method and system
WO2024067276A1 (en) Video tag determination method and apparatus, device and medium
CN111177328B (en) Question-answer matching system and method, question-answer processing device and medium
WO2022042297A1 (en) Text clustering method, apparatus, electronic device, and storage medium
CN112613315B (en) Text knowledge automatic extraction method, device, equipment and storage medium
CN116975068A (en) Metadata-based patent document data storage method, device and storage medium
CN107958068B (en) Language model smoothing method based on entity knowledge base
CN114676705B (en) Dialogue relation processing method, computer and readable storage medium
CN113741864A (en) Automatic design method and system of semantic service interface based on natural language processing
CN116029280A (en) Method, device, computing equipment and storage medium for extracting key information of document
CN113705207A (en) Grammar error recognition method and device
CN113722431B (en) Named entity relationship identification method and device, electronic equipment and storage medium
CN114996360B (en) Data analysis method, system, readable storage medium and computer equipment
CN115964474A (en) Policy keyword extraction method and device, storage medium and electronic equipment
CN116757205A (en) Entity relation extraction method and device based on ontology knowledge enhancement
US11423228B2 (en) Weakly supervised semantic entity recognition using general and target domain knowledge
CN115759085A (en) Information prediction method and device based on prompt model, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20231031

RJ01 Rejection of invention patent application after publication