CN110928978A - Standard literature classification retrieval method - Google Patents

Standard literature classification retrieval method Download PDF

Info

Publication number
CN110928978A
CN110928978A CN201911000480.0A CN201911000480A CN110928978A CN 110928978 A CN110928978 A CN 110928978A CN 201911000480 A CN201911000480 A CN 201911000480A CN 110928978 A CN110928978 A CN 110928978A
Authority
CN
China
Prior art keywords
standard
retrieval
search
information
classification table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911000480.0A
Other languages
Chinese (zh)
Inventor
肖蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Institute Of Quality And Standardization
Original Assignee
Nantong Institute Of Quality And Standardization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Institute Of Quality And Standardization filed Critical Nantong Institute Of Quality And Standardization
Priority to CN201911000480.0A priority Critical patent/CN110928978A/en
Publication of CN110928978A publication Critical patent/CN110928978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a standard document classification retrieval method, which comprises the steps of establishing a classification table and a retrieval word bank according to the commonality of standard documents, identifying and extracting multiple types of key terms from each standard document, generating multiple standard classification tables according to the key terms, and constructing multiple retrieval models for users through each standard classification table and the retrieval word bank, so that technical personnel and research personnel can abandon the form of searching the whole or multiple standard documents, and according to the actual retrieval purpose, the standard content-oriented efficient, accurate, systematic, complete and comprehensive retrieval is realized, and the automatic clustering retrieval can be realized.

Description

Standard literature classification retrieval method
Technical Field
The invention relates to the field of document retrieval, in particular to a standard document classification retrieval method.
Background
Currently, users typically use web search engines to search the internet for information of various types of criteria. Specifically, a user inputs a search word first, then a network search engine matches a search result containing the search word in a webpage or network service according to the search word input by the user, then provides the ordered search result and recommended guide content, and the user searches through the search result or clicks the recommended guide content to obtain the search result desired by the user; the problem that a user acquires national, industrial and foreign standards can be well solved by adopting the standard sorting retrieval mode of the system, the user only needs to provide various known keywords such as a standard number, a standard name, standard information, keywords of industries to which the standards belong, and the like, a search engine is used for finding corresponding resources instead, and the search engine is used for helping the user to inquire data meeting requirements from massive data and providing better data recommendation display for the user.
The traditional network search only carries out retrieval according to the matching of keywords, the retrieval function is not comprehensive, the searched information is large in amount, the query is not accurate, the standard classification is not detailed, the depth is not enough, various advertisement contents are enriched, and a user needs to spend a large amount of time and energy to select required information from a large amount of information, so that inconvenience is brought to the user operation;
the system provides a full-text retrieval function, converts the PDF format of a standard text into a TXT document format, and a user inputs keywords such as a certain technical index and data parameters, so that matched corresponding standard information can be retrieved in a massive standard library, and search results are arranged according to a hit rate weight algorithm. In addition, the system is based on the traditional standard search engine, the industry mainly supported by the nation, provinces and cities is divided into six (high-end textiles, marine engineering, electronic information, intelligent equipment, new materials, new energy and new energy automobiles) involved standards, the system is divided into multiple levels, a two-dimensional model of a standard library is established, namely, the various industries and the subdivision subclasses thereof can enter the involved standard subdivision library, and the traditional standard library is modeled systematically.
Disclosure of Invention
The invention provides a method for classified standard information query and a search engine, aiming at solving the problems of incomplete standard query function, large amount of queried information, inaccurate query, incomplete standard classification and insufficient depth in the prior art.
The technical scheme adopted by the invention is as follows: the standard document classification retrieval method has the innovation points that: a standard literature classification retrieval method comprises the following steps:
s1: acquiring various standard document data, and establishing a standard document database; the method comprises the steps of obtaining standard literature data which are complete in variety and updated timely from Jiangsu province, Shanghai city quality and standardization research institutes;
s2: establishing a classification table and a retrieval word bank according to commonalities among standard documents and a Chinese standard classification method, identifying and extracting multiple types of key terms from the standard documents, generating multiple standard classification tables according to the key terms, and generating the retrieval word bank according to the key terms;
s3: constructing a retrieval model, and constructing a plurality of retrieval models according to the standard classification tables and the retrieval word stock in the step S2;
s4: issuing a search request, and recording search conditions by taking each search model in the step S3 as an entry;
s5: the search information is returned and the data satisfying the search request in step S4 is displayed.
In some embodiments, the type of the keyword entry in step S2 includes industry information, standard number information, standard name information, distribution department information, distribution date information, implementation date information, validity status information, standard level information, standard announcement information; and setting a first index for each type to which the keyword entry belongs.
In some embodiments, the plurality of standard classification tables in step S2 includes an industry standard classification table, a national standard classification table, a local standard classification table, a foreign standard classification table, an existing standard classification table, a revocation standard classification table, a partial revocation standard classification table, an unfulfilled standard classification table, another standard classification table, a standard bulletin classification table; and sets a second index for each standard classification table.
In some embodiments, a third index is set for each keyword in the search thesaurus in step S2, and all keywords are linked with their corresponding standard document data.
In some embodiments, each piece of standard document data includes standard content, cited standard information, adopted standard information, substituted standard information, cited standard information, standard source information.
In some embodiments, the plurality of search models in step S3 includes a custom condition search model, a classification search model, and an emphasis industry search model, and each search model has an independent search function.
In some embodiments, the plurality of search models cooperate with each other to form a multi-level conditional search, and the search conditions are combined in an "and" relationship.
In some embodiments, the search model in step S3 includes an information entry unit for a user to enter search conditions, and a data return unit to return a search result.
In some embodiments, the data return unit includes a first return unit for returning the search result, and an auxiliary return unit for intelligently sorting and displaying the data in the first return unit.
In some embodiments, the information entry unit in the user-defined condition retrieval model includes a retrieval condition entry subunit and a user-defined condition selection subunit, the user-defined condition selection subunit is used for defining a retrieval range, the user-defined condition selection subunit includes a standard number, a standard name, a standard full text and a standard announcement, and after a user selects the retrieval range in the user-defined condition selection subunit, the user performs retrieval according to the first index meeting the selected range.
In some embodiments, the classification retrieval model includes a category selection subunit, where the categories in the subunit are selected from the types of the keyword entries and the categories of the standard classification table in step S2, and are associated with the corresponding first index indexes and the second index indexes.
In some embodiments, the key industry retrieval model includes an industry selection subunit, which classifies the industry supported by the country and provinces and cities and establishes an association relationship with the corresponding first index.
In some embodiments, the first returning unit further comprises a subunit for performing autonomous sorting on the retrieval result and a range limiting subunit, wherein the autonomous sorting subunit comprises sorting according to a comprehensive sorting, a release date sorting, a standard number sorting, an implementation date sorting and a click rate sorting; the range limiting subunit comprises each standard classification table, and one standard classification table can be selected or a plurality of standard classification tables can be selected simultaneously by selecting the required standard classification table to realize the further limitation of the range.
In some embodiments, the retrieval method further comprises a user login unit, after the user logs in, the retrieval process of different users is intelligently recorded, the retrieval habits of the users are analyzed, and the retrieval results are intelligently sorted according to the retrieval habits of the users by the auxiliary return unit.
In some embodiments, the retrieval method further comprises providing the first three pages of reading, full text reading, text downloading, and text hit of the standard document.
Compared with the prior art, the invention has the beneficial effects that: the method can enable a user to abandon the form of the whole or multiple standard documents, realize efficient, accurate, systematic, complete and comprehensive full-text retrieval facing standard contents according to the actual retrieval purpose of the user, convert the PDF format of the standard text into the TXT document format, realize the recombination retrieval aiming at specific keywords of cross-standard documents, and arrange the search results according to a hit rate weight algorithm. In addition, the system is based on the traditional standard search engine, the industry which is mainly supported by the nation, provinces and cities is divided into six (high-end textiles, marine engineering, electronic information, intelligent equipment, new materials, new energy and new energy automobiles) involved standards, the system is divided into multiple levels, a two-dimensional model of a standard library is established, namely, the various industries and the subdivision subclasses thereof enter the involved standard subdivision library, and the traditional standard library is modeled systematically.
The problems of low efficiency, inaccurate retrieval and incomplete retrieval results of the traditional retrieval are solved, the accuracy and the completeness of the retrieval are improved, and the user is helped to realize automatic clustering retrieval.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
The invention discloses a standard document classification retrieval method, which is shown in figure 1: comprises the following steps:
s1: acquiring various standard document data, and establishing a standard document database; the method comprises the steps that standard literature data which are complete in variety and updated in time are obtained from a higher unit 'Jiangsu province quality and standardization research institute';
s2: according to the commonalities among the standard documents and a Chinese standard classification method, recognizing and extracting multiple types of key terms from the standard documents, generating multiple standard classification tables according to the key terms, and generating a search word bank according to the key words;
s3: constructing a retrieval model, and constructing a plurality of retrieval models according to the standard classification tables and the retrieval word stock in the step S2;
s4: issuing a search request, and recording search conditions by taking each search model in the step S3 as an entry;
s5: the search information is returned and the data satisfying the search request in step S4 is displayed.
In some embodiments, the type of the keyword entry in step S2 includes industry information, standard number information, standard name information, distribution department information, distribution date information, implementation date information, validity status information, standard level information, standard announcement information; and setting a first index for each type to which the keyword entry belongs.
In some embodiments, each piece of standard document data includes standard content, cited standard information, adopted standard information, substituted standard information, cited standard information, standard source information.
In some embodiments, the plurality of standard classification tables in step S2 includes industry standard classification tables (six industries are high-end textile, marine, electronic information, smart equipment, new materials, new energy and new energy automobiles), industry standard classification tables, national standard classification tables, local standard classification tables, foreign standard classification tables, current standard classification tables, disuse standard classification tables, partial disuse standard classification tables, non-implemented standard classification tables, other standard classification tables; and sets a second index for each standard classification table.
In some embodiments, a third index is set for each keyword in the search thesaurus in step S2, and all keywords are linked with their corresponding standard document data.
In some embodiments, the plurality of search models in step S3 includes a custom condition search model, a classification search model, and an emphasis industry search model, and each search model has an independent search function.
In some embodiments, the plurality of search models cooperate with each other to form a multi-level conditional search, and the search conditions are combined in an "and" relationship.
In some embodiments, the search model in step S3 includes an information entry unit for a user to enter search conditions, and a data return unit to return a search result.
In some embodiments, the data return unit includes a first return unit that returns the search result, and an auxiliary return unit that intelligently sequences and displays the data in the first return unit, and the specific rule is as follows:
under the condition of non-accurate searching, matching is carried out according to the character string content input by the user;
(1) and inputting English + number, the search engine will match according to the input content, the closer the search result is to the input content, the priority will be given,
(2) pure digital search and pure English search, input contents are searched according to content comparison in search results, English fields are matched with standard number heads in the search results, numbers in the digital matching search results are numbered, the higher the matching degree is, the higher the ranking priority is,
(3) searching other input contents containing punctuation marks special characters, matching data standard numbers in a search engine according to the whole character string, sequencing the higher priority with higher matching degree,
(4) the above search results are also satisfied
National standard > line standard > landmark > foreign international standard; currently > abolish; a descending date sort is implemented with the closer dates are displayed at the top sort priority.
In some embodiments, the information entry unit in the user-defined condition retrieval model includes a retrieval condition entry subunit and a user-defined condition selection subunit, the user-defined condition selection subunit is used for defining a retrieval range, the user-defined condition selection subunit includes a standard number, a standard name, a standard full text and a standard announcement, and after the user-defined condition selection subunit selects the retrieval range, a single user performs retrieval according to the first index meeting the selected range.
In some embodiments, the classification retrieval model includes a category selection subunit, where the categories in the subunit are selected from the types of the keyword entries and the categories of the standard classification table in step S2, and are associated with the corresponding first index indexes and the second index indexes.
In some embodiments, the key industry retrieval model includes an industry selection subunit, which classifies the industry supported by the country and provinces and cities and establishes an association relationship with the corresponding first index.
In some embodiments, the first returning unit further comprises a subunit for performing autonomous sorting on the retrieval result and a range limiting subunit, wherein the autonomous sorting subunit comprises sorting according to a comprehensive sorting, a release date sorting, a standard number sorting, an implementation date sorting and a click rate sorting; the range limiting subunit comprises each standard classification table, and one standard classification table can be selected or a plurality of standard classification tables can be selected simultaneously by selecting the required standard classification table to realize the further limitation of the range.
In some embodiments, the retrieval method further comprises a user login unit, after the user logs in, the retrieval process of different users is intelligently recorded, the retrieval habits of the users are analyzed, and the retrieval results are intelligently sorted according to the retrieval habits of the users by the auxiliary return unit.
In some embodiments, the retrieval method further comprises providing the first three pages of reading, full text reading, text downloading, and text hit of the standard document.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A standard literature classification retrieval method is characterized in that: comprises the following steps:
s1: acquiring various standard document data, and establishing a standard document database;
s2: establishing a classification table and a retrieval word bank according to the commonality among the standard documents, identifying and extracting key terms of a plurality of types from the standard documents, generating a plurality of standard classification tables according to the key terms, and generating the retrieval word bank according to the key words in the key terms;
s3: constructing a retrieval model, and constructing a plurality of retrieval models according to the standard classification tables and the retrieval word stock in the step S2;
s4: issuing a search request, and recording search conditions by taking each search model in the step S3 as an entry;
s5: the search information is returned and the data satisfying the search request in step S4 is displayed.
2. The method for classified retrieval of standard documents according to claim 1, wherein: the type of the keyword in step S2 includes industry information, standard number information, standard name information, release department information, release date information, implementation date information, validity state information, standard level information, and standard announcement information; and setting a first index for each type to which the keyword entry belongs.
3. The method for classified retrieval of standard documents according to claim 1, wherein: in step S2, the plurality of standard classification tables include an industry standard classification table, a national standard classification table, a local standard classification table, a foreign standard classification table, a current standard classification table, a revocation standard classification table, a partial revocation standard classification table, an unfulfilled standard classification table, other standard classification tables, and a standard announcement classification table; and sets a second index for each standard classification table.
4. The method for classified retrieval of standard documents according to claim 1, wherein: and setting a third index for each keyword in the search word library in the step S2, wherein all the keywords are linked with the corresponding standard document data.
5. The method for classified retrieval of standard documents according to claim 1, wherein: in step S3, the plurality of search models include a user-defined search model, a classified search model, and an important industry search model, and each search model has an independent search function.
6. The method for classified retrieval of standard documents according to claim 5, wherein: the plurality of retrieval models cooperate with each other to form a multi-level condition retrieval, and the retrieval conditions are combined in a 'and' relationship.
7. The method for classified retrieval of standard documents according to claim 5, wherein: the search model in step S3 includes an information entry unit for a user to enter search conditions, and a data return unit for returning a search result.
8. The method for classified retrieval of standard documents according to claim 7, wherein: the data return unit comprises a first return unit for returning the retrieval result and an auxiliary return unit for intelligently sequencing and displaying the data in the first return unit.
9. The method for classified retrieval of standard documents according to claims 1 and 8, wherein: the retrieval method also comprises a user login unit, wherein the user login unit monitors the retrieval process of different users, analyzes the retrieval habits of the users, and the auxiliary return unit intelligently sorts the retrieval results according to the retrieval habits of the users.
10. The method for classified retrieval of standard documents according to claim 1, wherein: the retrieval method further includes providing first three pages of reading, full text reading, text downloading and text hit of the standard documents.
CN201911000480.0A 2019-10-21 2019-10-21 Standard literature classification retrieval method Pending CN110928978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911000480.0A CN110928978A (en) 2019-10-21 2019-10-21 Standard literature classification retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911000480.0A CN110928978A (en) 2019-10-21 2019-10-21 Standard literature classification retrieval method

Publications (1)

Publication Number Publication Date
CN110928978A true CN110928978A (en) 2020-03-27

Family

ID=69849353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911000480.0A Pending CN110928978A (en) 2019-10-21 2019-10-21 Standard literature classification retrieval method

Country Status (1)

Country Link
CN (1) CN110928978A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307171A (en) * 2020-10-30 2021-02-02 中国电力科学研究院有限公司 Institutional standard retrieval method and system based on power knowledge base and readable storage medium
CN112735584A (en) * 2020-12-31 2021-04-30 北京万方数据股份有限公司 Malignant tumor diagnosis and treatment auxiliary decision generation method and device
CN113033175A (en) * 2021-04-07 2021-06-25 芜湖市标准化研究院 Standard effectiveness evaluation method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307171A (en) * 2020-10-30 2021-02-02 中国电力科学研究院有限公司 Institutional standard retrieval method and system based on power knowledge base and readable storage medium
CN112735584A (en) * 2020-12-31 2021-04-30 北京万方数据股份有限公司 Malignant tumor diagnosis and treatment auxiliary decision generation method and device
CN112735584B (en) * 2020-12-31 2023-10-24 北京万方数据股份有限公司 Malignant tumor diagnosis and treatment auxiliary decision generation method and device
CN113033175A (en) * 2021-04-07 2021-06-25 芜湖市标准化研究院 Standard effectiveness evaluation method and system

Similar Documents

Publication Publication Date Title
CN109754233B (en) Method and system for intelligently recommending position information
CN109918453B (en) Method and system for searching relational complex management information system data by natural language
CN101223525B (en) Relationship networks
CN101151607B (en) Method and system for providing reviews for a product
CN101542475B (en) System and method for searching and matching data having ideogrammatic content
CN100375090C (en) Retrieving matching documents by queries in any national language
US9141691B2 (en) Method for automatically indexing documents
CN110928978A (en) Standard literature classification retrieval method
CN110866018B (en) Steam-massage industry data entry and retrieval method based on label and identification analysis
CN107590128B (en) Paper homonymy author disambiguation method based on high-confidence characteristic attribute hierarchical clustering method
CN111309944B (en) Digital humane searching method based on graph database
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
WO2007132342A1 (en) Documentary search procedure in a distributed information system
CN102955844A (en) Presenting search results based upon subject-versions
CN110968800A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN110765233A (en) Intelligent information retrieval service system based on deep mining and knowledge management technology
Wong et al. Finding structure and characteristic of Web documents for classification
CN102385597B (en) The fault-tolerant searching method of a kind of POI
JP2001184358A (en) Device and method for retrieving information with category factor and program recording medium therefor
CN110245215B (en) Text retrieval method and device
WO1998049632A1 (en) System and method for entity-based data retrieval
Dorn et al. Meta-search in human resource management
CN100496091C (en) System for making global search in wired TV one-way set-top box
KR20010107810A (en) Web search system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication