CN116226371A - Digital economic patent classification method - Google Patents

Digital economic patent classification method Download PDF

Info

Publication number
CN116226371A
CN116226371A CN202211584594.6A CN202211584594A CN116226371A CN 116226371 A CN116226371 A CN 116226371A CN 202211584594 A CN202211584594 A CN 202211584594A CN 116226371 A CN116226371 A CN 116226371A
Authority
CN
China
Prior art keywords
digital
industry
subclass
economic
jumping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211584594.6A
Other languages
Chinese (zh)
Inventor
薛小龙
高鸿铭
谭宪宇
朱慧
黄琼宇
陈建硕
魏焕哲
王玉娜
薛维锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202211584594.6A priority Critical patent/CN116226371A/en
Publication of CN116226371A publication Critical patent/CN116226371A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The utility model relates to the technical field of text mining and data processing, and discloses a digital economic patent classification method which is used for associating digital economy with innovation by proposing a digital economic patent identification and classification method based on natural semantic processing. The method is characterized in that the national digital economy and its core industry statistical classification (2021) are associated and matched with the actual patents (including all patent types of innovation activities such as utility model patent, utility model, appearance design, utility model application, PCT international patent, short-term patent and the like) in the digital economy industry of 5 major classes and the 156 minor classes of industry theoretical classes of 32 major classes, so that a statistical basis is provided for measuring the development situation of the digital economy microcosmic level, especially the digital economy activity of enterprises.

Description

Digital economic patent classification method
Technical Field
The utility model relates to the technical field of text mining and data processing, in particular to a digital economic patent classification method.
Background
In recent years, digital economy takes digital resources as core production elements, takes a modern information network as a carrier and takes an information communication technology as a means, and is remodelling world economy layout, improving production efficiency and optimizing an economy structure, and enabling sustainable and high-quality growth of Chinese economy health. The digital economic innovation provides new challenges and opportunities for traditional innovation management, the digital development affects innovation results, and the application of digital technology lightens the boundaries of innovation bodies, so that knowledge sharing and cooperation iteration among the innovation bodies are more efficient.
At present, data adopted by a digital economic measuring and calculating method are obtained in a questionnaire investigation acquisition and public data acquisition mode; the index system is more, and the industry data is the main. The existing measuring and calculating method is in a situation of developing digital economy from a macroscopic level, but lacks consideration of a digital economy microscopic level, cannot be used for carrying out digital economy on microscopic levels such as a local city, a county and the like, particularly cannot analyze digital economic activities of enterprises serving as innovation main force, and the reliability and accuracy of data acquisition at the industrial and national levels are difficult to control. In addition, the digital economy innovation is still split into two ideas of digital economy and innovation, and the mapping relationship between the digital economy and innovation behavior is not quantized.
The year 2021 is 5, the country issued the digital economy and its core industry statistical classification (2021), the digital economy industry is divided into 5 major categories, 32 middle categories and 156 minor categories, the digital economy and its core industry statistical range is scientifically defined, and the method provides guidance for meeting the statistical demands of various levels of party committee, government and society on digital economy development scale, speed, structure and other aspects. At present, no method for classifying patents against the digital economic industry classification in digital economy and its core industry statistics classification (2021) exists. In the present patent application, a patent matching against "digital economy and its core industry statistical classification (2021)" is referred to as "digital economy patent", and this classification method is referred to as "digital economy patent classification method".
Patent name "a comparison method of patent and national economy industry classification" publication number CN113590723 provides a comparison method capable of realizing patent and national economy industry classification, but ignores identification of digital economy industry class, does not consider information knowledge of patent text data, classifies by only relying on classification code structuring rules of national economy industry, and has larger classification error.
In the current conventional patent classification technology, one of the technologies is to perform association rule matching by using keywords of patent text information, for example, the keyword "graphic neural network" appears in patent titles, and is identified as the artificial intelligence industry field according to the association rule. And the other type adopts a patent test data set which is endowed with an explicit label, a machine learning algorithm is put forward to carry out training learning estimation and optimize model parameters, and then automatic classification is carried out on other unknown classified patent data. However, the association rule is complex to manage, and the manual processing cost is high when facing to data of tens of millions or even hundreds of millions; the model classification method has high calculation force requirement in practice and slow running speed, and is difficult to meet the large-scale calculation requirement in practical application.
In summary, in order to classify patents according to the national issued "digital economy and its core industry statistical classification (2021)", the blank of the classification method and matching rule of the existing patent in the digital economy industry field is made up, and we propose a digital economy patent classification method.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a digital economic patent classification method corresponding to the digital economic industry classification table of the national promulgated digital economy and core industry statistical classification (2021) thereof, which solves the problems.
(II) technical scheme
In order to achieve the above purpose, the present invention provides the following technical solutions: a digital economic patent classification method comprising the steps of:
the first step: collecting patent text data;
and a second step of: converting the patent unstructured text into a structured feature vector data representation, wherein the structured feature vector representation is represented by a patent feature vector with the dimension of Ni+Mi+Ki, i is a label of a single unique patent, and the i patent is represented;
and a third step of: drawing a digital economic industry code and name table, namely drawing codes and name tables of 156 subclasses of industries in 5 major classes and 32 major classes of digital economic industries;
Fourth step: semantic rule construction of a description text of the digital economy class industry;
fifth step: establishing a complete patent-digital economic subclass mapping relation model, namely establishing 7 sets of rules in total, and sequentially establishing a 'patent-digital economic subclass mapping relation model' which is identified by IPC, NIC and semantic rule multi-rule in the third step and the fourth step aiming at 156 digital economic subclasses, so as to obtain 7 sets of patent-digital economic subclass industry comparison rules comprising single condition, double condition, triple condition and compound multi-condition;
sixth step: according to the "patent-digital economic subclass mapping relation model", the first set of control rules R1: the single-condition IPC number is used for classifying patent and digital economy subclasses in a contrast mode, and the IPC number for distinguishing the digital economy subclasses in digital economy and core industry statistics classification (2021) is divided into two types of accurate IPC number condition matching and fuzzy IPC number condition matching;
seventh step: according to the "patent-digital economic subclass mapping relation model", the second set of control rules R2: the double-condition combination of any keyword in the IPC number and the semantic rule classifies patent and digital economy subclass industry contrast;
eighth step: according to the "patent-digital economic subclass mapping relation model", a third set of control rules R3: the three-condition combination of a pair of fixed key word groups in the IPC number and the semantic rule is used for comparing and classifying the patent and digital economy subclasses industry;
Ninth step: according to the patent-digital economic subclass mapping relation model, a fourth set of control rules R4: the dual-condition combination of the NIC number and any keyword in the semantic rule classifies patent and digital economy subclass industry contrast;
tenth step: according to the "patent-digital economic subclass mapping relation model", a fifth set of control rules R5: the three-condition combination of the NIC number and a pair of fixed keywords in the semantic rules is used for carrying out contrast classification on patent and digital economy subclasses;
eleventh step: combining the Patent sets of the identified digital economic subclasses, and combining the output Patent sets obtained in the sixth step to the tenth step to obtain an identified digital economic Patent set patent_DE;
twelfth step: according to the patent-digital economic subclass mapping relation model, a sixth set of control rules R6: the multi-condition combination of all fixed mutually exclusive keyword pairs in NIC numbers and semantic rules is used for carrying out patent and digital economy subclass industry contrast classification;
thirteenth step: according to the patent-digital economic subclass mapping relation model, a seventh set of control rules R7: the multi-condition combination of a keyword and a deleted mutually exclusive patent in the NIC number and the semantic rule is used for comparing and classifying the patent and the digital economy subclass industry;
Fourteenth step: the storage of digital economic patents and non-digital economic patents uses patent_DE to match the "Patent-digital economic subclass" classification results back into the corresponding original patents.
Preferably, the specific contents in the second step are as follows:
s1: extracting labeled Ni international classification patent number IPC codes of the single patent i, and constructing an IPC attribute feature vector of Ni dimension;
s2: extracting labeled Mi national economy industry classification NIC codes of the single patent i, and constructing Mi-dimensional NIC attribute feature vectors;
s3: aiming at abstract text of a single patent i, acquiring a word array capable of representing technical characteristics of the patent in R language, and constructing a technical phrase characteristic vector of Ki dimension;
s4: and (3) completing definition, and looping the steps S1 to S4 to convert all patents into corresponding patent feature vectors.
Preferably, the specific content of the fourth step is: the 156 digital economy subclasses industry given by the country is taken as the most basic digital economy class unit, wherein part of digital economy subclasses industry cannot be matched by means of single condition or multi-condition combination of IPC and NIC codes, semantic rules referring to explanatory texts of digital economy subclasses industry in digital economy and core industry statistics classification (2021) thereof are additionally added to match, the explanatory texts of each subclass in digital economy and core industry statistics classification (2021) thereof are sequentially segmented by using a jiebaR Chinese segmentation kit tool aiming at 156 digital economy subclasses, auxiliary words, prepositions and conjunctive words are removed, a core matching keyword set DEj of the digital economy subclasses is obtained, j represents the j digital economy subclasses industry, the value range of j is [1,156], and the keywords in the keyword set DEj of the digital economy subclasses industry j are synonymously expanded for guaranteeing the completeness of classification.
Preferably, the specific content of the sixth step is as follows:
s1: the accurate IPC number condition matching judges whether any element in the accurate IPC number set IPCj of the digital economy subclass industry j is included in the patent feature vector, namely judges whether the intersection of the two sets is not an empty set,
Figure SMS_1
Figure SMS_2
if so, the patent i is marked as a digital economic subclass industry class j;
s2: if not, making i=i+1, repeating the step i+1 until all patents are judged;
s3: judging whether any element in the IPCj ambiguity of the IPC number set is ambiguous for the i-th patent feature vector patent, if yes, marking the patent i as a digital economic subclass industry class j;
s4: if not, making i=i+1, repeating the step i+1 until all patents are judged;
s5: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
Preferably, the seventh step includes the following:
s1: the accurate IPC number condition is matched, whether any element in the accurate IPC number set IPCj of the i-th patent feature vector patent is contained in the patent feature vector is judged, namely whether the intersection of the two sets is not an empty set is judged,
Figure SMS_3
Figure SMS_4
If so, jumping to S2, and if not, jumping to S3;
s2: according to the keyword in the semantic rule set DEj of the digital economy class industry j, whether the keyword appears in the patent characteristic vector of the patent i or not is judged, namely whether the intersection of the two sets is not an empty set or not is judged
Figure SMS_5
If so, marking the patent i as a digital economic subclass industry class j, and if not, jumping to a step S3;
s3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: judging whether any element in the fuzzy IPC number set IPCj fuzzy is contained in the patent characteristic vector, if so, jumping to S5, otherwise, jumping to S6;
s5: according to the keyword in the semantic rule set DEj of the digital economy class industry j, whether the keyword appears in the patent characteristic vector of the patent i or not is judged, namely whether the intersection of the two sets is not an empty set or not is judged
Figure SMS_6
If so, marking the patent i as a digital economic subclass industry class j, and if not, jumping to a step S6;
s6: i=i+1, and jumping to S4 for the i+1 patent until all patents are judged;
S7: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
Preferably, the eighth step includes the following:
s1: the accurate IPC number condition is matched, whether any element in the accurate IPC number set IPCj of the i-th patent feature vector patent is contained in the patent feature vector is judged, namely whether the intersection of the two sets is not an empty set is judged,
Figure SMS_7
Figure SMS_8
if yes, jumping to the step S2, if not, jumping to the step S3;
s2: judging whether the pair of keywords are simultaneously present in the patent characteristic vector of the patent i according to a pair of fixed keywords in the semantic rule set DEj of the digital economic subclass industry j, if so, marking the patent i as the digital economic subclass industry class j, and if not, jumping to the step S3;
s3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: judging whether any element in the fuzzy IPC number set IPCj fuzzy is contained in the patent characteristic vector, if so, jumping to the step S5, otherwise, jumping to the step S6;
S5: repeating S2, judging whether the keyword is simultaneously present in the patent characteristic vector of the patent i, and if so, marking the patent i as a digital economic subclass industry class j. If not, jumping to the step S6;
s6: i=i+1, and jumping to S4 for the i+1 patent until all patents are judged;
s7: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
Preferably, the ninth step includes the following:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to the step S2, otherwise, jumping to the step S3;
s2: judging whether the keyword appears in the patent characteristic vector of the patent i according to any keyword in the semantic rule set DEj of the digital economic subclass industry j, if so, marking the patent i as the digital economic subclass industry class, and if not, jumping to S3;
s3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
S4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
Preferably, the tenth step includes the following:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to S2, and if not, jumping to S3;
s2: judging whether the pair of keywords are simultaneously present in the patent i patent feature vector according to a pair of fixed keywords in the semantic rule set DEj of the digital economic subclass industry j, if so, marking the patent i as the digital economic subclass industry class j, and if not, jumping to S3;
s3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
Preferably, the twelfth step includes the following:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to S2, and if not, jumping to S3;
S2: judging whether any pair of keywords are simultaneously appeared in the patent characteristic vector of the patent i according to all fixed mutually exclusive keywords in the semantic rule set DEj of the digital economic subclass industry j, if not, marking the patent i as the digital economic subclass industry class j, and if not, jumping to S3;
s3: let i=i+1, jump to S1 for the i+1 patent. Until all patents are judged;
s4: for Patent sets belonging to the digital economic subclass industry class j, duplicate patents are deleted, a unique Patent set is reserved, and the unique Patent set is added and updated to the identified digital economic Patent set patent_DE.
Preferably, the thirteenth step includes the following:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to S2, and if not, jumping to S4;
s2: judging whether the keyword appears in the patent characteristic vector of the patent i according to any keyword in the semantic rule set DEj of the digital economic subclass industry j, if so, jumping to S3, and if not, jumping to S4;
S3: comparing the digital economic Patent set patent_DE according to the mutual exclusion rule of the digital economic subclass industry j and part of the digital economic subclass industry, if the Patent i does not appear in the part of the digital economic subclass mutually exclusive with the industry j, marking the Patent i as the digital economic subclass industry class j, and if the Patent i does not appear in the part of the digital economic subclass mutually exclusive with the industry j, jumping to S4;
s4: let i=i+1, jump to S1 for the i+1 patent until all patents are identified.
(III) beneficial effects
Compared with the prior art, the utility model provides a digital economic patent classification method, which has the following beneficial effects:
1. the digital economic patent classification method relates digital economy with innovation by proposing a digital economic patent identification and classification method based on natural semantic processing. The method is characterized in that the national digital economy and its core industry statistical classification (2021) are associated and matched with the actual patents (including all patent types of innovation activities such as utility model patent, utility model, appearance design, utility model application, PCT international patent, short-term patent and the like) in the digital economy industry of 5 major classes and the 156 minor classes of industry theoretical classes of 32 major classes, so that a statistical basis is provided for measuring the development situation of the digital economy microcosmic level, especially the digital economy activity of enterprises.
2. The digital economic patent classification method combines the structured code rules of international classification patent number IPC and national economic industry classification NIC with a classification system for processing text unstructured information based on natural language, not only implements the matching of technical industry categories pointed by IPC and NIC, but also increases the semantic expansion of synonyms of technical phrases in the patent text information so as to reduce the matching divergence on the mapping relation of patent-digital economic subclass industry, thereby controlling the classification accuracy.
3. The digital economic patent classification method realizes automatic programming of computer languages, so that the digital economic patent classification system has strong operability. The classifying workload is reduced, and the huge labor matching cost required to be paid is reduced when the patent classification of the large data level is processed. The classification accuracy is ensured, and meanwhile, the speed of processing patent classification is greatly improved.
Drawings
FIG. 1 is a schematic diagram of the overall process of the present invention;
FIG. 2 is a detailed flow chart of the digital economic patent classification of an embodiment of the present invention;
FIG. 3 is an example of the structure and coding linking relationship of the various classes of the digital economic industry of the present invention;
FIG. 4 is a diagram of a detailed digital economic industry class output label for patent matching and classification in accordance with the present invention;
FIG. 5 is a diagram of a detailed digital economic industry class output label for patent matching and classification in accordance with the present utility model;
FIG. 6 is a diagram of a detailed digital economic industry class output tag for patent matching and classification in accordance with the present utility model.
Detailed Description
The following description of the embodiments of the present utility model will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present utility model, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the utility model without making any inventive effort, are intended to be within the scope of the utility model.
Referring to fig. 1-6, a digital economic patent classification method includes the following steps:
step S1: patent text data acquisition, the acquisition of a large number of data series of published patent texts to be classified, covers all types of innovation activities, including utility model patents, utility models, design, utility model applications, PCT international patents, short-term patents and the like. The acquisition channel comprises a free database website, such as a national intellectual property office patent retrieval and analysis system official network, a Chinese patent information center patent star retrieval system official network and the like, and a charging business database and the like.
Step S2: the patent unstructured text is converted into structured feature vector data representation, each piece of patent information is mapped into a vector space and is represented by a patent feature vector with the dimension of Ni+Mi+Ki, wherein i is a label of a single unique patent, and the i patent is represented, and the following is the same.
Step S2.1: and extracting marked Ni international classification patent number IPC codes of the single patent i, and constructing an IPC attribute feature vector of Ni dimension.
Step S2.2: extracting labeled Mi national economy industry classification NIC codes of the single patent i, and constructing Mi-dimensional NIC attribute feature vectors.
Step S2.3: aiming at the abstract text of a single patent i, a jiebaR Chinese word segmentation kit tool in R language is used for segmenting the patent abstract text, the auxiliary words, prepositions and conjunctive words are removed, a word array capable of representing the technical characteristics of the patent is obtained, and a Ki-dimensional technical phrase feature vector is constructed.
Step S2.4: completing defining a feature vector patent of Ni+Mi+Ki dimension to represent the technical knowledge contained in a single patent i; steps S2.1 to S2.4 are looped to convert all patents into corresponding patent feature vectors.
Step S3: the digital economic industry codes and the name form are drawn, and official documents of national economy and its core industry statistical classification (2021) and national economy industry classification annotation (2017) are collected, wherein the national intellectual property office is a national economy industry classification reference relation table (2018) and the national quality supervision, inspection and quarantine administration and the national standardization management committee of the people's republic of China are combined to form the official document of national economy industry classification (GB/T4754-2017).
According to the official document, the code and name tables of the digital economic industry of 5 major classes and the digital economic industry of 156 minor classes of 32 are simplified and drawn, the structure and coding link relation of each class of the digital economic industry is shown in fig. 3, for example, the subclass of '030101 basic software development', belonging to the class of '0301 software development' and classified as the class of '03 digital technology application industry'. The complete information is shown in fig. 4-6 as output labels for matching and sorting of patents.
Step S4: semantic rule construction of explanatory text of the digital economy subclass industry, 156 digital economy subclass industries given by countries are used as the most basic digital economy class units. Wherein, part of digital economy subclasses cannot be matched by means of single condition or multi-condition combination of IPC and NIC codes, and semantic rules of explanatory texts referring to digital economy and digital economy subclasses in digital economy and core industry statistical classification (2021) are additionally added for matching. Sequentially aiming at 156 digital economy subclasses, a jiebaR Chinese word segmentation package tool is utilized to segment the explanatory text of each subclass in digital economy and core industry statistical classification (2021), the words of the aid, preposition and conjunctive are removed, a core matching keyword set DEj of the digital economy subclasses is obtained, j represents the j digital economy subclasses industry, and the value range of j is [1,156]. In order to ensure the completeness of classification, each word in the keyword set DEj of the digital economy subclass industry j is subjected to synonymous word meaning expansion, for example, "electronic recreation" existing in the keyword set DEj in the subclass of the electronic game recreation equipment manufacturing 010602, and after the synonymous word meaning expansion, the electronic games "," computer recreation "," electronic recreation "and the like are expanded. And determining a semantic keyword set of each class of digital economic subclass industry, and updating the semantic keyword set into a corresponding semantic rule.
Step S5: and establishing a complete patent-digital economic subclass mapping relation model, wherein 7 sets of rules are provided. The "patent-digital economic subclass mapping relationship model" identified by IPC, NIC and semantic rules multi-rule is built from steps S3 and S4, sequentially for 156 digital economic subclasses. And 7 sets of patent-digital economy subclass industry control rules including single condition, double condition, triple condition and compound multi-condition are obtained.
Step S6: according to the mapping relation model, a first set of comparison rules R1: single condition IPC number classifies patent and digital economic subclasses industry controls. The IPC numbers of the digital economy and the core industry statistical classification (2021) thereof for distinguishing the digital economy subclasses are divided into two types, namely accurate IPC number condition matching and fuzzy IPC number condition matching, and the digital economy subclass j is taken as an example.
Step S6.1: the accurate IPC number condition matching judges whether any element in the accurate IPC number collection IPCj of the digital economy subclass industry j, such as the accurate IPC number G06F15/04 manufactured by the whole 010101 computer, is contained in the patent eigenvector; immediate judgmentWhether the intersection of two sets is not an empty set is broken,
Figure SMS_9
Figure SMS_10
if so, the patent i is written as a digital economic subclass industry class j.
Step S6.2: if not, let i=i+1, and the i+1 patent is discriminated in step S6.1. Until all patents are judged.
Step S6.3: the fuzzy IPC number condition matching judges whether any element in the fuzzy IPC number collection IPCj of the ith patent feature vector patent, such as fuzzy IPC number G06F15/16 manufactured by 010101 computer, is fuzzy to be contained in the patent feature vector (if G06F15/163 exists), and if so, the patent i is marked as the digital economic industry category j.
Step S6.4: if not, let i=i+1, and the i+1 patent is discriminated in step S6.3. Until all patents are judged.
Step S6.5: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
Step S7: according to the mapping relation model, a second set of comparison rules R2: the double-condition combination of any keyword in the IPC number and the semantic rule classifies patent and digital economy subclass industry contrast.
Step S7.1: the accurate IPC number condition is matched, whether any element in the accurate IPC number set IPCj of the i-th patent feature vector patent is contained in the patent feature vector is judged, namely whether the intersection of the two sets is not an empty set is judged,
Figure SMS_11
If so, jumping to the step S7.2, and if not, jumping to the step S7.3./>
Step S7.2: semantic rule set DEj according to digital economic subclass industry jWhether or not the keyword of the patent (e.g. "010104 industrial control computer and system manufacturing" is industrial control), is present in the patent i's patent feature vector, i.e. whether or not the intersection of two sets is not an empty set, i.e.
Figure SMS_12
If so, the patent i is written as a digital economic subclass industry class j. If not, go to step S7.3.
Step S7.3: let i=i+1, jump to step S7.1 for the i+1 patent. Until all patents are judged.
Step S7.4: and judging whether any element in the fuzzy IPC number set IPCj fuzzy is contained in the patent characteristic vector, if so, jumping to the step S7.5, and if not, jumping to the step S7.6.
Step S7.5: according to the semantic rule set DEj of the digital economy class industry j (for example, a certain keyword of "010104 industrial control computer and System manufacturing" is industrial control), whether the keyword appears in the patent characteristic vector of the patent i or not, namely, whether the intersection of the two sets is not an empty set or not is judged
Figure SMS_13
If so, the patent i is written as a digital economic subclass industry class j. If not, go to step S7.6.
Step S7.6: let i=i+1, jump to step S7.4 for the i+1th patent. Until all patents are judged.
Step S7.7: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
Step S8: according to the mapping relation model, a third set of comparison rules R3: the three-condition combination of the IPC number and a pair of fixed key word groups in the semantic rules are used for comparing and classifying the patent and digital economy subclasses industry.
Step S8.1: accurate IPC number condition matching, pairThe i-th patent feature vector patent judges whether any element in the accurate IPC number set IPCj of the digital economy class industry j is contained in the patent feature vector, namely judges whether the intersection of the two sets is not an empty set,
Figure SMS_14
if so, jumping to step S8.2, and if not, jumping to step S8.3.
Step S8.2: according to a pair of fixed keywords (the 'information' + 'security' of the '010105 information security device manufacturing') in the semantic rule set DEj of the digital economic subclass industry j, judging whether the pair of keywords are simultaneously present in the patent characteristic vector of the patent i, and if so, marking the patent i as the digital economic subclass industry class j. If not, go to step S8.3.
Step S8.3: let i=i+1, jump to step S8.1 for the i+1 patent. Until all patents are judged.
Step S8.4: and judging whether any element in the fuzzy IPC number set IPCj fuzzy is contained in the patent characteristic vector, if so, jumping to the step S8.5, and if not, jumping to the step S8.6.
Step S8.5: according to a pair of fixed keywords (the 'information' + 'security' of the '010105 information security device manufacturing') in the semantic rule set DEj of the digital economic subclass industry j, judging whether the pair of keywords are simultaneously present in the patent characteristic vector of the patent i, and if so, marking the patent i as the digital economic subclass industry class j. If not, go to step S8.6.
Step S8.6: let i=i+1, jump to step S8.4 for the i+1th patent. Until all patents are judged.
Step S8.7: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
Step S9: according to the mapping relation model, a fourth set of comparison rules R4: the dual-condition combination of NIC number and any keyword in semantic rules classifies patent and digital economic subclasses against industry.
Step S9.1: and (3) judging whether any element (such as A01 in the NIC set of '050101 digital facility planting') in the NIC number set NICj of the i-th patent feature vector patent is fuzzy or not (such as the pulse planting code A0121 in the NIC of the national economy industry classification) in the Mi-dimension NIC attribute vector set contained in the patent feature vector, if so, jumping to the step S9.2, and if not, jumping to the step S9.3.
Step S9.2: judging whether the keyword appears in the patent characteristic vector of the patent i according to any keyword in the semantic rule set DEj of the digital economic subclass industry j, and if so, marking the patent i as the digital economic subclass industry class j. If not, go to step S9.3.
Step S9.3: let i=i+1, jump to step S9.1 for the i+1 patent. Until all patents are judged.
Step S9.4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
Step S10: according to the mapping relation model, a fifth set of comparison rules R5: the tri-conditional combination of NIC number and a pair of fixed keywords in the semantic rules classifies patent and digital economic subclasses against industry.
Step S10.1: and judging whether any element in the NIC number set NICj of the digital economy subclass industry j obscures the NIC attribute vector set of Mi dimension contained in the patent feature vector for the i-th patent feature vector patent, if so, jumping to the step S10.2, and if not, jumping to the step S10.3.
Step S10.2: judging whether the pair of keywords are simultaneously appeared in the patent i patent feature vector according to a pair of fixed keywords (for example, a pair of fixed keywords of 'movie' + 'equipment' of '020103 broadcast video equipment wholesale') in a semantic rule set DEj of the digital economic subclass industry j, and if so, recording the patent i as the digital economic subclass industry class j. If not, go to step S10.3.
Step S10.3: let i=i+1, jump to step S10.1 for the i+1th patent. Until all patents are judged.
Step S10.4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
Step S11: and merging the Patent sets of the identified digital economic subclasses, and merging the output Patent sets obtained in the steps S6 to S10 to obtain the identified digital economic Patent set patent_DE.
Step S12: according to the mapping relation model, a sixth set of control rules R6: the multi-conditional combination of NIC numbers with all fixed mutually exclusive keyword pairs in the semantic rules classifies patent and digital economic subclasses against industry.
Step S12.1: the NIC number condition is matched, and whether any element (e.g., J66 in the NIC set of "050501 bank finance service") in the NIC number set NICj of the i-th patent feature vector patent is ambiguous or not is judged, if yes, the step S12.2 is skipped, and if no, the step S12.3 is skipped.
Step S12.2: according to all fixed mutually exclusive keywords (such as seven fixed mutually exclusive keyword pairs of '050501 bank financial service', including 'Internet' + 'financing platform', 'Internet' + 'lending', 'Internet' + 'loan', 'Internet' + 'internet' + 'borrowing', 'Internet' + 'finance', and 'Internet' + 'finance') in the semantic rule set DEj of the digital economic subclass industry j, judging whether any pair of keywords simultaneously appear in the patent feature vector of the patent i, and if all keyword pairs do not appear in the patent feature vector of the patent i, marking the patent i as the digital economic subclass industry class j. If not, go to step S12.3.
Step S12.3: let i=i+1, jump to step S12.1 for the i+1 patent. Until all patents are judged.
Step S12.4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
Step S12.5: the result of step S12.4 is added and updated to the identified digital economic Patent set patent_de.
Step S13: according to the mapping relation model, a seventh set of comparison rules R7: the NIC number, a keyword in semantic rules and the multi-condition combination of deleting mutually exclusive patents are used for comparing and classifying the patent and digital economy subclass industry.
Step S13.1: the NIC number condition is matched, whether any element in the NIC number set NICj of the i-th patent feature vector patent (e.g., C34 in the NIC set of the digital economy class industry j, which is "050201 digital general purpose, special equipment manufacturing") obscures the Mi-dimensional NIC attribute vector set contained in the patent feature vector, if so, the process goes to step S13.2, and if not, the process goes to step S13.4.
Step S13.2: according to any keyword in the semantic rule set DEj of the digital economic subclass industry j (for example, a certain keyword of ' 050201 digital general purpose, special equipment manufacturing ' is digital twinned), judging whether the keyword appears in the patent i ' S patent feature vector, if so, jumping to step S13.3, otherwise, jumping to step S13.4.
Step S13.3: and comparing the digital economic Patent set patent_DE according to the mutual exclusion rules of the digital economic subclass industry j and the partial digital economic subclass industry, if the Patent i does not appear in the partial digital economic subclass mutually exclusive with the industry j, recording the Patent i as the digital economic subclass industry class j, and if the Patent i does not appear in the partial digital economic subclass mutually exclusive with the industry j, jumping to the step S13.4. (including "010604 calculator and money specific device manufacturing", "010401 industrial robot manufacturing", "010501 semiconductor device specific device manufacturing", "010502 electronic component and electromechanical assembly device manufacturing", mutually exclusive to "050201 digital versatile, specific device manufacturing")
Step S13.4: let i=i+1, jump to step S13.1 for the i+1 patent. Until all patents are judged.
Step S13.5: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved. Step S13.6: the result of step S13.5 is added and updated to the identified digital economic Patent set patent_de.
Step S14: storage of digital economic patents and non-digital economic patents.
Step S14.1: the Patent-digital economic subclass classification result is matched back into the corresponding original Patent using patent_de. In all the patents to be identified, the patent part with the classification result is identified as a digital economic patent for storage; there is no patent part of the classification result, stored as a non-digital economic patent collection.
Step S14.2: for the detailed recognition of the digital economic patent attribution digital economic industry, according to fig. 3 and 4, using the digital economic subclass codes and names as keywords (such as "010101 computer complete machine manufacturing"), linking the class codes and names in the corresponding digital economy ("0101 computer manufacturing"), and the digital economic major class codes and names ("01 digital product manufacturing"), supplementing each digital economic patent with its corresponding digital economic industry class of each level.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for classifying digital economic patents, comprising the steps of:
the first step: collecting patent text data;
and a second step of: converting the patent unstructured text into a structured feature vector data representation, wherein the structured feature vector representation is represented by a patent feature vector with the dimension of Ni+Mi+Ki, i is a label of a single unique patent, and the i patent is represented;
And a third step of: drawing a digital economic industry code and name table, namely drawing codes and name tables of 156 subclasses of industries in 5 major classes and 32 major classes of digital economic industries;
fourth step: semantic rule construction of a description text of the digital economy class industry;
fifth step: establishing a complete patent-digital economic subclass mapping relation model, namely establishing 7 sets of rules in total, and sequentially establishing a 'patent-digital economic subclass mapping relation model' which is identified by IPC, NIC and semantic rule multi-rule in the third step and the fourth step aiming at 156 digital economic subclasses, so as to obtain 7 sets of patent-digital economic subclass industry comparison rules comprising single condition, double condition, triple condition and compound multi-condition;
sixth step: according to the "patent-digital economic subclass mapping relation model", the first set of control rules R1: the single-condition IPC number is used for classifying patent and digital economy subclasses in a contrast mode, and the IPC number for distinguishing the digital economy subclasses in digital economy and core industry statistics classification (2021) is divided into two types of accurate IPC number condition matching and fuzzy IPC number condition matching;
seventh step: according to the "patent-digital economic subclass mapping relation model", the second set of control rules R2: the double-condition combination of any keyword in the IPC number and the semantic rule classifies patent and digital economy subclass industry contrast;
Eighth step: according to the "patent-digital economic subclass mapping relation model", a third set of control rules R3: the three-condition combination of a pair of fixed key word groups in the IPC number and the semantic rule is used for comparing and classifying the patent and digital economy subclasses industry;
ninth step: according to the patent-digital economic subclass mapping relation model, a fourth set of control rules R4: the dual-condition combination of the NIC number and any keyword in the semantic rule classifies patent and digital economy subclass industry contrast;
tenth step: according to the "patent-digital economic subclass mapping relation model", a fifth set of control rules R5: the three-condition combination of the NIC number and a pair of fixed keywords in the semantic rules is used for carrying out contrast classification on patent and digital economy subclasses;
eleventh step: combining the Patent sets of the identified digital economic subclasses, and combining the output Patent sets obtained in the sixth step to the tenth step to obtain an identified digital economic Patent set patent_DE;
twelfth step: according to the patent-digital economic subclass mapping relation model, a sixth set of control rules R6: the multi-condition combination of all fixed mutually exclusive keyword pairs in NIC numbers and semantic rules is used for carrying out patent and digital economy subclass industry contrast classification;
Thirteenth step: according to the patent-digital economic subclass mapping relation model, a seventh set of control rules R7: the multi-condition combination of a keyword and a deleted mutually exclusive patent in the NIC number and the semantic rule is used for comparing and classifying the patent and the digital economy subclass industry;
fourteenth step: the storage of digital economic patents and non-digital economic patents uses patent_DE to match the "Patent-digital economic subclass" classification results back into the corresponding original patents.
2. The digital economic patent classification method according to claim 1, wherein: the specific contents in the second step are as follows:
s1: extracting labeled Ni international classification patent number IPC codes of the single patent i, and constructing an IPC attribute feature vector of Ni dimension;
s2: extracting labeled Mi national economy industry classification NIC codes of the single patent i, and constructing Mi-dimensional NIC attribute feature vectors;
s3: aiming at abstract text of a single patent i, acquiring a word array capable of representing technical characteristics of the patent in R language, and constructing a technical phrase characteristic vector of Ki dimension;
s4: and (3) completing definition, and looping the steps S1 to S4 to convert all patents into corresponding patent feature vectors.
3. The digital economic patent classification method according to claim 1, wherein: the fourth step comprises the following specific contents: the 156 digital economy subclasses industry given by the country is taken as the most basic digital economy class unit, wherein part of digital economy subclasses industry cannot be matched by means of single condition or multi-condition combination of IPC and NIC codes, semantic rules referring to explanatory texts of digital economy subclasses industry in digital economy and core industry statistics classification (2021) thereof are additionally added to match, the explanatory texts of each subclass in digital economy and core industry statistics classification (2021) thereof are sequentially segmented by using a jiebaR Chinese segmentation kit tool aiming at 156 digital economy subclasses, auxiliary words, prepositions and conjunctive words are removed, a core matching keyword set DEj of the digital economy subclasses is obtained, j represents the j digital economy subclasses industry, the value range of j is [1,156], and the keywords in the keyword set DEj of the digital economy subclasses industry j are synonymously expanded for guaranteeing the completeness of classification.
4. The digital economic patent classification method according to claim 1, wherein: the specific content of the sixth step is as follows:
s1: the accurate IPC number condition matching judges whether any element in the accurate IPC number set IPCj of the digital economy subclass industry j is included in the patent feature vector or not, namely, judges whether the intersection of the two sets is not an empty set or not, and the IPCj is accurate n
Figure QLYQS_1
If so, the patent i is marked as a digital economic subclass industry class j;
s2: if not, making i=i+1, repeating the step i+1 until all patents are judged;
s3: judging whether any element in the IPCj ambiguity of the IPC number set is ambiguous for the i-th patent feature vector patent, if yes, marking the patent i as a digital economic subclass industry class j;
s4: if not, making i=i+1, repeating the step i+1 until all patents are judged;
s5: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
5. The digital economic patent classification method according to claim 1, wherein: the seventh step comprises the following:
s1: accurate IPC number condition matching, for the i-th patent feature vector patent, judging whether any element in the accurate IPC number set IPCj of the digital economy subclass industry j is contained in the patent feature vector, namely judging whether the intersection of the two sets is not an empty set, and judging whether the IPCj is accurate n
Figure QLYQS_2
If so, jumping to S2, and if not, jumping to S3;
s2: according to the keyword in the semantic rule set DEj of the digital economy class industry j, whether the keyword appears in the patent characteristic vector of the patent i or not is judged, namely whether the intersection of the two sets is not an empty set or not is judged
Figure QLYQS_3
If so, marking the patent i as a digital economic subclass industry class j, and if not, jumping to a step S3;
s3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: judging whether any element in the fuzzy IPC number set IPCj fuzzy is contained in the patent characteristic vector, if so, jumping to S5, otherwise, jumping to S6;
S5: according to the keyword in the semantic rule set DEj of the digital economy class industry j, whether the keyword appears in the patent characteristic vector of the patent i or not is judged, namely whether the intersection of the two sets is not an empty set or not is judged
Figure QLYQS_4
If so, marking the patent i as a digital economic subclass industry class j, and if not, jumping to a step S6;
s6: i=i+1, and jumping to S4 for the i+1 patent until all patents are judged;
s7: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
6. The digital economic patent classification method according to claim 1, wherein: the eighth step comprises the following steps:
s1: accurate IPC number condition matching, for the i-th patent feature vector patent, judging whether any element in the accurate IPC number set IPCj of the digital economy subclass industry j is contained in the patent feature vector, namely judging whether the intersection of the two sets is not an empty set, and judging whether the IPCj is accurate n
Figure QLYQS_5
If yes, jumping to the step S2, if not, jumping to the step S3;
s2: judging whether the pair of keywords are simultaneously present in the patent characteristic vector of the patent i according to a pair of fixed keywords in the semantic rule set DEj of the digital economic subclass industry j, if so, marking the patent i as the digital economic subclass industry class j, and if not, jumping to the step S3;
S3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: judging whether any element in the fuzzy IPC number set IPCj fuzzy is contained in the patent characteristic vector, if so, jumping to the step S5, otherwise, jumping to the step S6;
s5: repeating S2, judging whether the keyword is simultaneously present in the patent characteristic vector of the patent i, and if so, marking the patent i as a digital economic subclass industry class j. If not, jumping to the step S6;
s6: i=i+1, and jumping to S4 for the i+1 patent until all patents are judged;
s7: for patent collection belonging to the digital economy subclass industry class j, duplicate patents brought by two processes of accurate IPC number matching and fuzzy IPC number matching are deleted, and a unique patent collection is reserved.
7. The digital economic patent classification method according to claim 1, wherein: the ninth step comprises the following steps:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to the step S2, otherwise, jumping to the step S3;
S2: judging whether the keyword appears in the patent characteristic vector of the patent i according to any keyword in the semantic rule set DEj of the digital economic subclass industry j, if so, marking the patent i as the digital economic subclass industry class, and if not, jumping to S3;
s3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
8. The digital economic patent classification method according to claim 1, wherein: the tenth step comprises the following steps:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to S2, and if not, jumping to S3;
s2: judging whether the pair of keywords are simultaneously present in the patent IPatenti feature vector according to a pair of fixed keywords in the semantic rule set DEj of the digital economic subclass industry j, if so, marking the patent i as the digital economic subclass industry class j, and if not, jumping to S3;
S3: i=i+1, and jumping to S1 for the i+1 patent until all patents are judged;
s4: for patent sets belonging to the digital economy class industry class j, duplicate patents are deleted, and a unique patent set is reserved.
9. The digital economic patent classification method according to claim 1, wherein: the twelfth step includes the following:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to S2, and if not, jumping to S3;
s2: judging whether any pair of keywords are simultaneously appeared in the patent characteristic vector of the patent i according to all fixed mutually exclusive keywords in the semantic rule set DEj of the digital economic subclass industry j, if not, marking the patent i as the digital economic subclass industry class j, and if not, jumping to S3;
s3: let i=i+1, jump to S1 for the i+1 patent. Until all patents are judged;
s4: for Patent sets belonging to the digital economic subclass industry class j, duplicate patents are deleted, a unique Patent set is reserved, and the unique Patent set is added and updated to the identified digital economic Patent set patent_DE.
10. The digital economic patent classification method according to claim 1, wherein: the twelfth step includes the following: the thirteenth step includes the following:
s1: judging whether any element in a NIC number set NICj of the digital economy subclass industry j obscures a Mi-dimensional NIC attribute vector set contained in the patent feature vector for the ith patent feature vector patent, if so, jumping to S2, and if not, jumping to S4;
s2: judging whether the keyword appears in the patent characteristic vector of the patent i according to any keyword in the semantic rule set DEj of the digital economic subclass industry j, if so, jumping to S3, and if not, jumping to S4;
s3: comparing the digital economic Patent set patent_DE according to the mutual exclusion rule of the digital economic subclass industry j and part of the digital economic subclass industry, if the Patent i does not appear in the part of the digital economic subclass mutually exclusive with the industry j, marking the Patent i as the digital economic subclass industry class j, and if the Patent i does not appear in the part of the digital economic subclass mutually exclusive with the industry j, jumping to S4;
s4: let i=i+1, jump to S1 for the i+1 patent until all patents are identified.
CN202211584594.6A 2022-12-09 2022-12-09 Digital economic patent classification method Pending CN116226371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211584594.6A CN116226371A (en) 2022-12-09 2022-12-09 Digital economic patent classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211584594.6A CN116226371A (en) 2022-12-09 2022-12-09 Digital economic patent classification method

Publications (1)

Publication Number Publication Date
CN116226371A true CN116226371A (en) 2023-06-06

Family

ID=86583123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211584594.6A Pending CN116226371A (en) 2022-12-09 2022-12-09 Digital economic patent classification method

Country Status (1)

Country Link
CN (1) CN116226371A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093883A (en) * 2024-04-26 2024-05-28 营动智能技术(山东)有限公司 Mapping method and system based on product classification and patent classification

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118093883A (en) * 2024-04-26 2024-05-28 营动智能技术(山东)有限公司 Mapping method and system based on product classification and patent classification

Similar Documents

Publication Publication Date Title
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN115858758A (en) Intelligent customer service knowledge graph system with multiple unstructured data identification
CN111191051B (en) Method and system for constructing emergency knowledge map based on Chinese word segmentation technology
CN113434688B (en) Data processing method and device for public opinion classification model training
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN112906397A (en) Short text entity disambiguation method
CN115526236A (en) Text network graph classification method based on multi-modal comparative learning
CN115408525A (en) Petition text classification method, device, equipment and medium based on multi-level label
CN111428502A (en) Named entity labeling method for military corpus
CN117807482B (en) Method, device, equipment and storage medium for classifying customs clearance notes
CN114238524B (en) Satellite frequency-orbit data information extraction method based on enhanced sample model
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN111783464A (en) Electric power-oriented domain entity identification method, system and storage medium
CN116226371A (en) Digital economic patent classification method
CN109446522B (en) Automatic test question classification system and method
CN111178080A (en) Named entity identification method and system based on structured information
CN111339258B (en) University computer basic exercise recommendation method based on knowledge graph
CN117151222A (en) Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium
CN112084783A (en) Entity identification method and system based on civil aviation non-civilized passengers
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN113886602B (en) Domain knowledge base entity identification method based on multi-granularity cognition
CN115936003A (en) Software function point duplicate checking method, device, equipment and medium based on neural network
CN115204179A (en) Entity relationship prediction method and device based on power grid public data model
CN113537802A (en) Open source information-based geopolitical risk deduction method
CN112488593A (en) Auxiliary bid evaluation system and method for bidding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination