CN112395854A - Standard element consistency inspection method - Google Patents

Standard element consistency inspection method Download PDF

Info

Publication number
CN112395854A
CN112395854A CN202011386161.0A CN202011386161A CN112395854A CN 112395854 A CN112395854 A CN 112395854A CN 202011386161 A CN202011386161 A CN 202011386161A CN 112395854 A CN112395854 A CN 112395854A
Authority
CN
China
Prior art keywords
standard
class
similarity
terms
classes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011386161.0A
Other languages
Chinese (zh)
Other versions
CN112395854B (en
Inventor
王双
高昂
程越
朱虹
万利
李柏晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Institute of Standardization
Original Assignee
China National Institute of Standardization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Institute of Standardization filed Critical China National Institute of Standardization
Priority to CN202011386161.0A priority Critical patent/CN112395854B/en
Publication of CN112395854A publication Critical patent/CN112395854A/en
Application granted granted Critical
Publication of CN112395854B publication Critical patent/CN112395854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a consistency check method aiming at standard elements, firstly, standard knowledge element extraction is carried out based on rules, unstructured standard files are converted into regular knowledge element storage models, terms, normative citation files and coding tables in the unstructured standard files are extracted, and vectorization processing is carried out on the unstructured standard files. Furthermore, the consistency of the terms and the normative citations in the standards is checked, and the normative and harmonious consistency of the terms and the normative citations among the standards is ensured. Particularly, aiming at important basic universality standards such as information classification codes, manual intervention in database migration work caused by implementation of a new version of the information classification code standard is reduced through code consistency inspection, and the work efficiency is improved. Furthermore, the knowledge element model can be used as a neuron of a neural network input layer, the neural network is trained by using a standard file data set, and the processing of mass file data is realized more efficiently.

Description

Standard element consistency inspection method
Technical Field
The application relates to a method for consistency checking of elements in a standard.
Background
The standard refers to "a normative document that is agreed upon and approved by a recognized agency, co-used and reused for optimal order within a certain range". The standard has a strict structure and consists of normative elements and data elements, wherein the normative elements comprise names, ranges, normative citations, terms and definitions, codes and abbreviations, normative annexes and the like; the data appendix includes cover, current, introduction, data appendix, references, index, etc. The normative elements are the core part of the standard, and although not all normative elements need to be included in one standard, the normative elements must be written with the correctness and consistency of other standards, such as the consistency of terms and normative citations, which determine the consistency of different standards and the coordination between different versions of the standards. In particular, for important basic universal standards such as information classification coding standards, consistency check between codes has important significance for migration of information system databases and information exchange between databases. At present, the management of standard files mainly stays at the data management level, and a plurality of standard file management platforms (such as international standard file retrieval platforms of international organization for standardization (ISO), International Electrotechnical Commission (IEC), and the like, national standard full-text open systems, national standard information public service platforms, national standard file sharing service platforms, standard search, and other domestic standard file management platforms) mainly provide retrieval and distribution services of standard data, and lack fine-grained data analysis of standard files, so that an effective inspection means is lacked for phenomena of crossing, repetition or inconsistency possibly occurring between standards, and certain influences are brought to the establishment and implementation of standards.
In order to solve the technical problems, the application provides a method for checking consistency of standard elements, which comprises the steps of firstly, extracting standard knowledge elements based on rules, converting unstructured standard files into regular knowledge element storage models, extracting terms, normative citations and coding tables in the unstructured standard files, and vectorizing the regular knowledge element storage models. Furthermore, the consistency of the terms and the normative citations in the standards is checked, and the normative and harmonious consistency of the terms and the normative citations among the standards is ensured. Particularly, aiming at important basic universality standards such as information classification codes, manual intervention in database migration work caused by implementation of a new version of the information classification code standard is reduced through code consistency inspection, and the work efficiency is improved. Furthermore, the knowledge element model can be used as a neuron of a neural network input layer, the neural network is trained by using a standard file data set, and the processing of mass file data is realized more efficiently.
Disclosure of Invention
The invention aims to ensure the normalization and coordination consistency of terms and normative citation files among standards through the consistency test of the terms, the normative citation files and the classified codes in the standards, simultaneously realize the automatic mapping of the corresponding relations such as one-to-one, one-to-many, many-to-one, fuzzy correspondence and the like among the codes, reduce the manual intervention in the database migration work caused by the implementation of a new version of the information classified code standard and improve the working efficiency.
In order to achieve the above object, the present application provides a method for checking consistency of standard elements, comprising: (1) extracting standard knowledge elements based on rules, and converting standard files into knowledge element storage models; (2) establishing a vector storage model of terms, normative reference files and a coding table, and carrying out normalized storage on the vector storage model; (3) the consistency checking step aiming at the terms comprises the steps of firstly searching the terms to obtain a term set containing the same or similar terms; further realizing consistency check on terms with the same name and similarity calculation of similar terms based on the term vector model; (4) the step of checking the consistency of the normative cited document comprises the steps of firstly retrieving the normative cited document to obtain all standard clause sets containing the cited document. And judging according to the date reference and the date reference, and whether a specific clause combination condition is referred, and carrying out consistency comparison on the standard content and corresponding clauses or the whole text in the normative citation file.
The knowledge elements are mutually independent units capable of characterizing knowledge.
In particular, the application also aims at carrying out coding comparison analysis between new and old versions of the information classification coding standard.
The implementation steps of the code alignment analysis comprise: (1) respectively extracting coding tables of two standards aiming at the new and old version standards needing to be subjected to coding comparison; (2) constructing a knowledge element storage model of the coding table; (3) determining the mapping relation from the same-name class: (4) and further determining the mapping relation between the non-homonymous classes.
Drawings
FIG. 1 is a flow of a consistency check of the present application;
FIG. 2 is a system search portal according to the present application;
FIG. 3 is an example (partial) of a coding table as used herein;
FIG. 4 shows an example of the national standard for GB/T14885-2010 fixed asset Classification and code.
Detailed Description
The application describes a method for checking the consistency of standard elements, which comprises the consistency check of terms in the standard, the consistency check of a normative citation document and the code consistency check aiming at an information classification coding standard. The method provided by the application is beneficial to solving the phenomena of cross repetition of normative elements in the standard and the like, and meanwhile, the automatic identification of mapping relations such as one-to-one, one-to-many, many-to-one, fuzzy matching and the like between the coding tables is completed through the construction of the semantic similarity model and the structural similarity model, so that the manual intervention is reduced, and the efficiency of the coding comparison work is improved.
Figure 1 shows the flow of the consistency check.
The implementation steps of the rule-based standard knowledge element extraction and normalized storage specifically comprise:
1. converting the unstructured standard file into a regular knowledge element storage model, wherein the definition:
p ═ { p | p is a standard file }
The element storage model T of the standard file p is { T | T ∈ p, T is a node of p, i.e., the element }
2. Extracting the terms, the normative citation files and the coding table knowledge elements, and establishing a vector storage model of the terms, the normative citation files and the coding table knowledge elements, which specifically comprises the following steps:
if t is a term, the term knowledge element is extracted and stored according to the following five-tuple. Wherein CName is the Chinese name of the term, EName is the English name of the term, Des is the definition of the term, Note is the footnote of the term, and Qut is the reference file information of the term.
T=<CName,EName,Des,Note,Qut>
And if t is the normative reference file, extracting and storing the normative reference file according to the following quadruple. Wherein, SName is the Chinese name of the reference standard, SNum is the sequence number of the reference standard, SYear is the year code number (date) of the reference standard, and Clause is a certain term of the reference standard.
T=<SName,SNum,SYear,Clause>
And if t is the coding table, extracting and storing the coding table knowledge element according to the following quintuple group. Wherein, the ParentItem is a parent class of a certain class, the childltem is a subclass of a certain class, the Itemcode is a code of the class, the ItemName is a name of the class, and the Description is Description information.
T=<ParentItem,ChildItem,ItemCode,ItemName,Description>
Secondly, the consistency checking step based on the normalized storage model is as follows:
1. the implementation steps of the term consistency check specifically include:
1.1, carrying out fuzzy retrieval on a certain term from a standard database to obtain a term set;
1.2 judging whether the terms are same-name terms or similar terms according to the CName based on the term quintuple vector storage model;
1.3 if the terms are same-name terms, the quadruples of EName, Des, Note and Qut in the term vector must be completely consistent;
1.4 if the terms are not the same name terms, similarity calculation of the terms is performed. If the two terms are respectively A and B, firstly, respectively vectorizing the definitions of the two terms to obtain two text vectors DesA and DesB, defining an included angle between the two text vectors as theta, and then the similarity between the definitions of the two terms is a cosine value of the included angle of the two text vectors, wherein the calculation formula is as follows:
Figure BSA0000226499850000021
the closer the resulting similarity is to 1, the higher the degree of similarity between the two terms is.
2. The implementation steps of the normative citation file consistency check specifically comprise:
2.1, searching a certain standard from a standard database to obtain all standard sets which reference the standard;
2.2 traversing the canonical citation file four-tuple vector storage model in the set, and judging whether the SYear is referenced by the annotation date according to whether the SYear is empty;
2.3 if the date is quoted, judging whether a specific clause is quoted in the text;
2.4, aiming at the standard set quoted from the same specific clause, carrying out consistency comparison analysis on the standard set quoted from the same specific clause and the specific clause in the quoted file;
2.5 for the standard set which does not reference a specific clause or does not note date reference (does not refer to the specific clause), the consistency comparison analysis is carried out with the whole text of the cited document.
3. The implementation steps of the code consistency check aiming at the information classification coding standard specifically comprise:
3.1 constructing a coding table structure tree according to the parent-child relationship in the coding table knowledge element storage model, thereby determining whether a certain class is a non-leaf class (can be subdivided) or a leaf subclass (cannot be subdivided);
3.2 determine the mapping relationships starting from the same-name class:
3.2.1 if only one same-name class exists in the new standard and the old standard respectively, directly establishing a one-to-one mapping relation;
3.2.2 if one or more of the new and old standards respectively has a plurality of homonymy classes, determining the mapping relation between the homonymy classes by carrying out ambiguity resolution on the structural similarity between class pairs;
3.2.3 based on the processing results of 3.2.1 and 3.2.2, if one of the homonymous classes of a pair is a non-leaf class and the other is a leaf class, then the descendant classes of the non-leaf class are all mapped with the leaf class of the pair in a one-to-many or many-to-one manner.
3.3 determining the mapping relation between the non-homonymous classes in the new and old standards:
3.3.1 calculating the structural and semantic similarity between the non-homonymous classes in the new and old standards;
3.3.2 calculating the class sets which are not mapped in the new standard and the old standard to be N and O respectively;
3.3.3 Loop execution until set N or O is empty or a set maximum number of executions:
taking the class pair C with the maximum structural and semantic similarity in N and OnAnd CoIf the similarity is larger than the threshold value alpha, establishing CnAnd CoThe mapping relationship between the two;
if CnAnd CoIf one of the leaf classes is a leaf class and the other is a non-leaf class, establishing a one-to-many or many-to-one mapping relation according to the method in the step 3;
and updating the set N or O, and removing the classes with the established mapping relation.
The similarity between classes is composed of semantic similarity and structural similarity: sim ═ SSemantics+TStructure of the product
The semantic similarity between the new and old classes is mainly calculated based on the class name and the name of the parent class. Setting new and old class namesCalled Cn and Co respectively, and obtaining a word set W after removing stop words after word segmentationnAnd WoThen semantic similarity
Figure BSA0000226499850000031
The structural similarity between the new and old classes consists of class hierarchy similarity, ancestor class set similarity, brother class set similarity and child class set similarity: t isStructure of the product=TClass hierarchy+TSet of ancestor classes+TSibling collection+TSet of children
The class hierarchy similarity is determined by the hierarchy of the new and old classes (set as L)nAnd Lo) And calculating to obtain: t isClass hierarchy=1.0/(|Lo-Ln|+1.0)。TSet of ancestor classes、T Sibling collection、TSet of childrenThe result is obtained by calculating the name character strings of all classes in the set according to the calculation formula of the semantic similarity.
FIG. 4 shows an example of the national standard for GB/T14885-2010 fixed asset Classification and code.
The encoding comparison is performed by taking GB/T14885 and 1994 fixed asset Classification and code (old version) and GB/T14885 and 2010 fixed asset Classification and code (new version) as examples. By analyzing the extracted coding table, 8147 classes are shared by GB/T14885-. As shown in table 1:
TABLE 1 Standard New and old edition analysis table
Figure BSA0000226499850000032
Figure BSA0000226499850000041
By comparing with the result of expert comparison, the method provided by the invention can automatically complete more than 70% of work, greatly improve the efficiency of code comparison analysis work, and the accuracy and recall ratio obtained according to different threshold settings in the similarity model are shown in table 2.
TABLE 2 accuracy and recall Table
Rate of accuracy Recall rate
Threshold=2 78.98% 71.27%
Threshold=2.5 81.73% 71.29%
Threshold=3 86.14% 71.12%
Threshold=3.5 88.95% 70.09%
Threshold=4 92.04% 68.92%
Threshold=4.5 93.21% 66.90%
Threshold=5 94.48% 65.97%
Threshold=5.5 95.87% 64.64%
Threshold=6 96.38% 63.89%
Threshold=6.5 97.62% 62.83%
Threshold=7 98.05% 62.14%
Threshold=7.5 98.08% 61.82%
The partial results obtained from the code alignment analysis are shown in the table below (Threshold ═ 7.5):
TABLE 3 partial results table of code alignment analysis at threshold 7.5
Figure BSA0000226499850000042
Figure BSA0000226499850000051
Figure BSA0000226499850000061
Many embodiments have been described above. Nevertheless, various modifications may be made without departing from the spirit and scope of the disclosure. For example, various forms of the flows shown above may be used with steps reordered, added, or removed.

Claims (10)

1. A method for checking the consistency of standard elements comprises the following steps:
converting the standard file into a knowledge element storage model;
acquiring a knowledge element from a knowledge element storage model, and establishing a vector storage model;
extracting and storing the knowledge elements by using a vector storage model;
and realizing consistency check based on the vector storage model.
2. The method of claim 1, wherein the intellectual element is a term, a normative citation and/or a coding table.
3. The method of claim 2, wherein when the knowledgebase is a term, the vector storage model is the following five-element vector set: t ═ CName, EName, Des, Note, Qut >, wherein CName is the Chinese name of the term, EName is the English name of the term, Des is the definition of the term, Note is the footnote of the term, Qut is the citation file information of the term.
4. A method according to claim 3, characterized by the steps of:
carrying out fuzzy retrieval on a certain term in a standard database to obtain a term set;
judging whether the terms are same-name terms or similar terms according to the CName based on the term quintuple vector storage model;
if the terms are homonymous terms, the quadruplets of EName, Des, Note and Qut in the term vector must be completely consistent;
if the terms are not the same name terms, similarity calculation of the terms is performed. If the two terms are respectively A and B, firstly, respectively vectorizing the definitions of the two terms to obtain two text vectors DesA and DesB, defining an included angle between the two text vectors as theta, and then the similarity between the definitions of the two terms is a cosine value of the included angle of the two text vectors, wherein the calculation formula is as follows:
Figure FSA0000226499840000011
the closer the resulting similarity is to 1, the higher the degree of similarity between the two terms is.
5. The method of claim 1, wherein when the knowledgeelement is a canonical citation file, the vector storage model is the following four-tuple: t ═ SName, SNum, SYear, Clause >
Wherein, SName is the Chinese name of the reference standard, SNum is the sequence number of the reference standard, SYear is the year code number (date) of the reference standard, and Clause is a certain term of the reference standard.
6. The method of claim 5, comprising the steps of:
retrieving a certain standard from a standard database to obtain all standard sets which refer to the standard;
traversing a canonical citation file four-tuple vector storage model in the set, and judging whether the SYear is quoted by the date according to whether the SYear is empty;
if the date is quoted, judging whether a specific clause is quoted in the text;
carrying out consistency comparison analysis on the standard set which is quoted from the same specific clause and the specific clause in the quoted file;
and for the standard set which is not referred to by a specific term or is not referred by date, carrying out consistency comparison analysis on the standard set and the whole file to be referred.
7. The method of claim 1, wherein when the knowledgeelement is an encoding table, the vector storage model is the following five-tuple vector set: and T is < ParentItem, ChildItem, Itemcode, ItemName and Description >, wherein ParentItem is a parent class of a certain class, ChildItem is a subclass of the certain class, Itemcode is the code of the class, ItemName is the name of the class, and Description is Description information.
8. The method of claim 7, wherein the coding table structure tree is constructed based on parent-child structure relationships between coding classes to determine whether a class is a non-leaf class or a leaf-child class.
9. The method of claim 8, wherein the mapping is determined starting from a same-name class by the steps of: (1) if only one same-name class exists in the new standard and the old standard respectively, a one-to-one mapping relation is directly established;
(2) if one or more homonymous classes exist in the new standard and the old standard, ambiguity resolution is carried out on the structural similarity between paired homonymous classes to determine the mapping relation between the homonymous classes;
(3) based on the processing results of (1) and (2), if one of the same-name classes of a certain pair of comparison is a non-leaf class and the other is a leaf class, establishing a one-to-many or many-to-one mapping relation between the descendant classes of the non-leaf class and the leaf class of the comparison.
10. The method of claim 8, wherein the step of determining the mapping relationship between the non-homonymous classes in the old and new standards comprises:
(1) calculating the structural and semantic similarity between the non-homonymous classes in the new standard and the old standard;
(2) calculating class sets which are not mapped in the new standard and the old standard to be N and O respectively;
(3) and circularly executing until the set N or O is empty or the set maximum execution times are reached:
taking the class pair Cn and Co with the maximum structural and semantic similarity in N and O, and if the similarity is greater than a threshold value alpha, establishing a mapping relation between Cn and Co;
if one of Cn and Co is a leaf subclass and the other is a non-leaf subclass, establishing a one-to-many or many-to-one mapping relation according to the method in the step 3;
and updating the sets N and O, and removing the classes with the established mapping relation.
The similarity between classes is composed of semantic similarity and structural similarity: sim ═ SSemantics+TStructure of the product
The semantic similarity between the new and old classes is mainly calculated based on the class name and the name of the parent class. Setting new and old class names as Cn and Co respectively, obtaining word sets Wn and Wo after removing stop words after word segmentation, and then
Figure FSA0000226499840000021
The structural similarity between the new and old classes consists of class hierarchy similarity, ancestor class set similarity, brother class set similarity and child class set similarity: t isStructure of the product=TClass hierarchy+TSet of ancestor classes+TSibling collection+TSet of children
The class hierarchy similarity is calculated by the hierarchies of the new class and the old class (set as Ln and Lo): t isClass hierarchy=1.0/(|Lo-Ln|+1.0)。TSet of ancestor classes、TSibling collection、TSet of childrenThe result is obtained by calculating the name character strings of all classes in the set according to the calculation formula of the semantic similarity.
CN202011386161.0A 2020-12-02 2020-12-02 Standard element consistency inspection method Active CN112395854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011386161.0A CN112395854B (en) 2020-12-02 2020-12-02 Standard element consistency inspection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011386161.0A CN112395854B (en) 2020-12-02 2020-12-02 Standard element consistency inspection method

Publications (2)

Publication Number Publication Date
CN112395854A true CN112395854A (en) 2021-02-23
CN112395854B CN112395854B (en) 2022-11-22

Family

ID=74604074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011386161.0A Active CN112395854B (en) 2020-12-02 2020-12-02 Standard element consistency inspection method

Country Status (1)

Country Link
CN (1) CN112395854B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794798A (en) * 2022-12-12 2023-03-14 江苏省工商行政管理局信息中心 Market supervision informationized standard management and dynamic maintenance system and method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
US20020016787A1 (en) * 2000-06-28 2002-02-07 Matsushita Electric Industrial Co., Ltd. Apparatus for retrieving similar documents and apparatus for extracting relevant keywords
US20020078431A1 (en) * 2000-02-03 2002-06-20 Reps Thomas W. Method for representing information in a highly compressed fashion
US20020129015A1 (en) * 2001-01-18 2002-09-12 Maureen Caudill Method and system of ranking and clustering for document indexing and retrieval
US20030217052A1 (en) * 2000-08-24 2003-11-20 Celebros Ltd. Search engine method and apparatus
US20040103091A1 (en) * 2002-06-13 2004-05-27 Cerisent Corporation XML database mixed structural-textual classification system
US20080256067A1 (en) * 2007-04-10 2008-10-16 Nelson Cliff File Search Engine and Computerized Method of Tagging Files with Vectors
US8090724B1 (en) * 2007-11-28 2012-01-03 Adobe Systems Incorporated Document analysis and multi-word term detector
US20160085742A1 (en) * 2014-09-23 2016-03-24 Kaybus, Inc. Automated collective term and phrase index
CN106777001A (en) * 2016-12-06 2017-05-31 中国人民公安大学 The construction method of public security remote sensing monitoring application standards system database
CN110569976A (en) * 2019-08-27 2019-12-13 上海交通大学 Brain-like artificial intelligence decision system and decision method
WO2020233256A1 (en) * 2019-07-12 2020-11-26 之江实验室 General medical termbase-based multi-center medical terminology standardization system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
US20020078431A1 (en) * 2000-02-03 2002-06-20 Reps Thomas W. Method for representing information in a highly compressed fashion
US20020016787A1 (en) * 2000-06-28 2002-02-07 Matsushita Electric Industrial Co., Ltd. Apparatus for retrieving similar documents and apparatus for extracting relevant keywords
US20030217052A1 (en) * 2000-08-24 2003-11-20 Celebros Ltd. Search engine method and apparatus
US20020129015A1 (en) * 2001-01-18 2002-09-12 Maureen Caudill Method and system of ranking and clustering for document indexing and retrieval
US20040103091A1 (en) * 2002-06-13 2004-05-27 Cerisent Corporation XML database mixed structural-textual classification system
US20080256067A1 (en) * 2007-04-10 2008-10-16 Nelson Cliff File Search Engine and Computerized Method of Tagging Files with Vectors
US8090724B1 (en) * 2007-11-28 2012-01-03 Adobe Systems Incorporated Document analysis and multi-word term detector
US20160085742A1 (en) * 2014-09-23 2016-03-24 Kaybus, Inc. Automated collective term and phrase index
CN106777001A (en) * 2016-12-06 2017-05-31 中国人民公安大学 The construction method of public security remote sensing monitoring application standards system database
WO2020233256A1 (en) * 2019-07-12 2020-11-26 之江实验室 General medical termbase-based multi-center medical terminology standardization system
CN110569976A (en) * 2019-08-27 2019-12-13 上海交通大学 Brain-like artificial intelligence decision system and decision method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
曾志刚等: "在标准编制中容易出现的问题和解决方案", 《中国标准化》 *
温有奎: "基于"知识元"的知识组织与检索", 《计算机工程与应用》 *
苏红刚: "基于SVM的中文文本分类系统实现", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
袁金伟: "基于网络百科的中文实体链接研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794798A (en) * 2022-12-12 2023-03-14 江苏省工商行政管理局信息中心 Market supervision informationized standard management and dynamic maintenance system and method
CN115794798B (en) * 2022-12-12 2023-09-15 江苏省工商行政管理局信息中心 Market supervision informatization standard management and dynamic maintenance system and method

Also Published As

Publication number Publication date
CN112395854B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN111291161A (en) Legal case knowledge graph query method, device, equipment and storage medium
CN114003791B (en) Depth map matching-based automatic classification method and system for medical data elements
CN107145516B (en) Text clustering method and system
Sarkhel et al. Visual segmentation for information extraction from heterogeneous visually rich documents
CN105630751A (en) Method and system for rapidly comparing text content
CN115017299A (en) Unsupervised social media summarization method based on de-noised image self-encoder
CN111899090A (en) Enterprise associated risk early warning method and system
CN111143394B (en) Knowledge data processing method, device, medium and electronic equipment
CN112395854B (en) Standard element consistency inspection method
CN116362243A (en) Text key phrase extraction method, storage medium and device integrating incidence relation among sentences
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN114491081A (en) Electric power data tracing method and system based on data blood relationship graph
CN111737694B (en) Malicious software homology analysis method based on behavior tree
CN113761192A (en) Text processing method, text processing device and text processing equipment
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN116628584A (en) Power sensitive data processing method and device, electronic equipment and storage medium
WO2023093116A1 (en) Method and apparatus for determining industrial chain node of enterprise, and terminal and storage medium
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN110377690A (en) A kind of information acquisition method and system based on long-range Relation extraction
CN115953041A (en) Construction scheme and system of operator policy system
CN115794798A (en) Market supervision informationized standard management and dynamic maintenance system and method
Zhai et al. TRIZ technical contradiction extraction method based on patent semantic space mapping
CN112307235B (en) Naming method and device of front-end page element and electronic equipment
Shao et al. An improved approach to the recovery of traceability links between requirement documents and source codes based on latent semantic indexing
Boyagane et al. vue4logs--Automatic Structuring of Heterogeneous Computer System Logs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant