CN110928978A - Standard literature classification retrieval method - Google Patents
Standard literature classification retrieval method Download PDFInfo
- Publication number
- CN110928978A CN110928978A CN201911000480.0A CN201911000480A CN110928978A CN 110928978 A CN110928978 A CN 110928978A CN 201911000480 A CN201911000480 A CN 201911000480A CN 110928978 A CN110928978 A CN 110928978A
- Authority
- CN
- China
- Prior art keywords
- standard
- retrieval
- search
- information
- classification table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a standard document classification retrieval method, which comprises the steps of establishing a classification table and a retrieval word bank according to the commonality of standard documents, identifying and extracting multiple types of key terms from each standard document, generating multiple standard classification tables according to the key terms, and constructing multiple retrieval models for users through each standard classification table and the retrieval word bank, so that technical personnel and research personnel can abandon the form of searching the whole or multiple standard documents, and according to the actual retrieval purpose, the standard content-oriented efficient, accurate, systematic, complete and comprehensive retrieval is realized, and the automatic clustering retrieval can be realized.
Description
Technical Field
The invention relates to the field of document retrieval, in particular to a standard document classification retrieval method.
Background
Currently, users typically use web search engines to search the internet for information of various types of criteria. Specifically, a user inputs a search word first, then a network search engine matches a search result containing the search word in a webpage or network service according to the search word input by the user, then provides the ordered search result and recommended guide content, and the user searches through the search result or clicks the recommended guide content to obtain the search result desired by the user; the problem that a user acquires national, industrial and foreign standards can be well solved by adopting the standard sorting retrieval mode of the system, the user only needs to provide various known keywords such as a standard number, a standard name, standard information, keywords of industries to which the standards belong, and the like, a search engine is used for finding corresponding resources instead, and the search engine is used for helping the user to inquire data meeting requirements from massive data and providing better data recommendation display for the user.
The traditional network search only carries out retrieval according to the matching of keywords, the retrieval function is not comprehensive, the searched information is large in amount, the query is not accurate, the standard classification is not detailed, the depth is not enough, various advertisement contents are enriched, and a user needs to spend a large amount of time and energy to select required information from a large amount of information, so that inconvenience is brought to the user operation;
the system provides a full-text retrieval function, converts the PDF format of a standard text into a TXT document format, and a user inputs keywords such as a certain technical index and data parameters, so that matched corresponding standard information can be retrieved in a massive standard library, and search results are arranged according to a hit rate weight algorithm. In addition, the system is based on the traditional standard search engine, the industry mainly supported by the nation, provinces and cities is divided into six (high-end textiles, marine engineering, electronic information, intelligent equipment, new materials, new energy and new energy automobiles) involved standards, the system is divided into multiple levels, a two-dimensional model of a standard library is established, namely, the various industries and the subdivision subclasses thereof can enter the involved standard subdivision library, and the traditional standard library is modeled systematically.
Disclosure of Invention
The invention provides a method for classified standard information query and a search engine, aiming at solving the problems of incomplete standard query function, large amount of queried information, inaccurate query, incomplete standard classification and insufficient depth in the prior art.
The technical scheme adopted by the invention is as follows: the standard document classification retrieval method has the innovation points that: a standard literature classification retrieval method comprises the following steps:
s1: acquiring various standard document data, and establishing a standard document database; the method comprises the steps of obtaining standard literature data which are complete in variety and updated timely from Jiangsu province, Shanghai city quality and standardization research institutes;
s2: establishing a classification table and a retrieval word bank according to commonalities among standard documents and a Chinese standard classification method, identifying and extracting multiple types of key terms from the standard documents, generating multiple standard classification tables according to the key terms, and generating the retrieval word bank according to the key terms;
s3: constructing a retrieval model, and constructing a plurality of retrieval models according to the standard classification tables and the retrieval word stock in the step S2;
s4: issuing a search request, and recording search conditions by taking each search model in the step S3 as an entry;
s5: the search information is returned and the data satisfying the search request in step S4 is displayed.
In some embodiments, the type of the keyword entry in step S2 includes industry information, standard number information, standard name information, distribution department information, distribution date information, implementation date information, validity status information, standard level information, standard announcement information; and setting a first index for each type to which the keyword entry belongs.
In some embodiments, the plurality of standard classification tables in step S2 includes an industry standard classification table, a national standard classification table, a local standard classification table, a foreign standard classification table, an existing standard classification table, a revocation standard classification table, a partial revocation standard classification table, an unfulfilled standard classification table, another standard classification table, a standard bulletin classification table; and sets a second index for each standard classification table.
In some embodiments, a third index is set for each keyword in the search thesaurus in step S2, and all keywords are linked with their corresponding standard document data.
In some embodiments, each piece of standard document data includes standard content, cited standard information, adopted standard information, substituted standard information, cited standard information, standard source information.
In some embodiments, the plurality of search models in step S3 includes a custom condition search model, a classification search model, and an emphasis industry search model, and each search model has an independent search function.
In some embodiments, the plurality of search models cooperate with each other to form a multi-level conditional search, and the search conditions are combined in an "and" relationship.
In some embodiments, the search model in step S3 includes an information entry unit for a user to enter search conditions, and a data return unit to return a search result.
In some embodiments, the data return unit includes a first return unit for returning the search result, and an auxiliary return unit for intelligently sorting and displaying the data in the first return unit.
In some embodiments, the information entry unit in the user-defined condition retrieval model includes a retrieval condition entry subunit and a user-defined condition selection subunit, the user-defined condition selection subunit is used for defining a retrieval range, the user-defined condition selection subunit includes a standard number, a standard name, a standard full text and a standard announcement, and after a user selects the retrieval range in the user-defined condition selection subunit, the user performs retrieval according to the first index meeting the selected range.
In some embodiments, the classification retrieval model includes a category selection subunit, where the categories in the subunit are selected from the types of the keyword entries and the categories of the standard classification table in step S2, and are associated with the corresponding first index indexes and the second index indexes.
In some embodiments, the key industry retrieval model includes an industry selection subunit, which classifies the industry supported by the country and provinces and cities and establishes an association relationship with the corresponding first index.
In some embodiments, the first returning unit further comprises a subunit for performing autonomous sorting on the retrieval result and a range limiting subunit, wherein the autonomous sorting subunit comprises sorting according to a comprehensive sorting, a release date sorting, a standard number sorting, an implementation date sorting and a click rate sorting; the range limiting subunit comprises each standard classification table, and one standard classification table can be selected or a plurality of standard classification tables can be selected simultaneously by selecting the required standard classification table to realize the further limitation of the range.
In some embodiments, the retrieval method further comprises a user login unit, after the user logs in, the retrieval process of different users is intelligently recorded, the retrieval habits of the users are analyzed, and the retrieval results are intelligently sorted according to the retrieval habits of the users by the auxiliary return unit.
In some embodiments, the retrieval method further comprises providing the first three pages of reading, full text reading, text downloading, and text hit of the standard document.
Compared with the prior art, the invention has the beneficial effects that: the method can enable a user to abandon the form of the whole or multiple standard documents, realize efficient, accurate, systematic, complete and comprehensive full-text retrieval facing standard contents according to the actual retrieval purpose of the user, convert the PDF format of the standard text into the TXT document format, realize the recombination retrieval aiming at specific keywords of cross-standard documents, and arrange the search results according to a hit rate weight algorithm. In addition, the system is based on the traditional standard search engine, the industry which is mainly supported by the nation, provinces and cities is divided into six (high-end textiles, marine engineering, electronic information, intelligent equipment, new materials, new energy and new energy automobiles) involved standards, the system is divided into multiple levels, a two-dimensional model of a standard library is established, namely, the various industries and the subdivision subclasses thereof enter the involved standard subdivision library, and the traditional standard library is modeled systematically.
The problems of low efficiency, inaccurate retrieval and incomplete retrieval results of the traditional retrieval are solved, the accuracy and the completeness of the retrieval are improved, and the user is helped to realize automatic clustering retrieval.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
The invention discloses a standard document classification retrieval method, which is shown in figure 1: comprises the following steps:
s1: acquiring various standard document data, and establishing a standard document database; the method comprises the steps that standard literature data which are complete in variety and updated in time are obtained from a higher unit 'Jiangsu province quality and standardization research institute';
s2: according to the commonalities among the standard documents and a Chinese standard classification method, recognizing and extracting multiple types of key terms from the standard documents, generating multiple standard classification tables according to the key terms, and generating a search word bank according to the key words;
s3: constructing a retrieval model, and constructing a plurality of retrieval models according to the standard classification tables and the retrieval word stock in the step S2;
s4: issuing a search request, and recording search conditions by taking each search model in the step S3 as an entry;
s5: the search information is returned and the data satisfying the search request in step S4 is displayed.
In some embodiments, the type of the keyword entry in step S2 includes industry information, standard number information, standard name information, distribution department information, distribution date information, implementation date information, validity status information, standard level information, standard announcement information; and setting a first index for each type to which the keyword entry belongs.
In some embodiments, each piece of standard document data includes standard content, cited standard information, adopted standard information, substituted standard information, cited standard information, standard source information.
In some embodiments, the plurality of standard classification tables in step S2 includes industry standard classification tables (six industries are high-end textile, marine, electronic information, smart equipment, new materials, new energy and new energy automobiles), industry standard classification tables, national standard classification tables, local standard classification tables, foreign standard classification tables, current standard classification tables, disuse standard classification tables, partial disuse standard classification tables, non-implemented standard classification tables, other standard classification tables; and sets a second index for each standard classification table.
In some embodiments, a third index is set for each keyword in the search thesaurus in step S2, and all keywords are linked with their corresponding standard document data.
In some embodiments, the plurality of search models in step S3 includes a custom condition search model, a classification search model, and an emphasis industry search model, and each search model has an independent search function.
In some embodiments, the plurality of search models cooperate with each other to form a multi-level conditional search, and the search conditions are combined in an "and" relationship.
In some embodiments, the search model in step S3 includes an information entry unit for a user to enter search conditions, and a data return unit to return a search result.
In some embodiments, the data return unit includes a first return unit that returns the search result, and an auxiliary return unit that intelligently sequences and displays the data in the first return unit, and the specific rule is as follows:
under the condition of non-accurate searching, matching is carried out according to the character string content input by the user;
(1) and inputting English + number, the search engine will match according to the input content, the closer the search result is to the input content, the priority will be given,
(2) pure digital search and pure English search, input contents are searched according to content comparison in search results, English fields are matched with standard number heads in the search results, numbers in the digital matching search results are numbered, the higher the matching degree is, the higher the ranking priority is,
(3) searching other input contents containing punctuation marks special characters, matching data standard numbers in a search engine according to the whole character string, sequencing the higher priority with higher matching degree,
(4) the above search results are also satisfied
National standard > line standard > landmark > foreign international standard; currently > abolish; a descending date sort is implemented with the closer dates are displayed at the top sort priority.
In some embodiments, the information entry unit in the user-defined condition retrieval model includes a retrieval condition entry subunit and a user-defined condition selection subunit, the user-defined condition selection subunit is used for defining a retrieval range, the user-defined condition selection subunit includes a standard number, a standard name, a standard full text and a standard announcement, and after the user-defined condition selection subunit selects the retrieval range, a single user performs retrieval according to the first index meeting the selected range.
In some embodiments, the classification retrieval model includes a category selection subunit, where the categories in the subunit are selected from the types of the keyword entries and the categories of the standard classification table in step S2, and are associated with the corresponding first index indexes and the second index indexes.
In some embodiments, the key industry retrieval model includes an industry selection subunit, which classifies the industry supported by the country and provinces and cities and establishes an association relationship with the corresponding first index.
In some embodiments, the first returning unit further comprises a subunit for performing autonomous sorting on the retrieval result and a range limiting subunit, wherein the autonomous sorting subunit comprises sorting according to a comprehensive sorting, a release date sorting, a standard number sorting, an implementation date sorting and a click rate sorting; the range limiting subunit comprises each standard classification table, and one standard classification table can be selected or a plurality of standard classification tables can be selected simultaneously by selecting the required standard classification table to realize the further limitation of the range.
In some embodiments, the retrieval method further comprises a user login unit, after the user logs in, the retrieval process of different users is intelligently recorded, the retrieval habits of the users are analyzed, and the retrieval results are intelligently sorted according to the retrieval habits of the users by the auxiliary return unit.
In some embodiments, the retrieval method further comprises providing the first three pages of reading, full text reading, text downloading, and text hit of the standard document.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A standard literature classification retrieval method is characterized in that: comprises the following steps:
s1: acquiring various standard document data, and establishing a standard document database;
s2: establishing a classification table and a retrieval word bank according to the commonality among the standard documents, identifying and extracting key terms of a plurality of types from the standard documents, generating a plurality of standard classification tables according to the key terms, and generating the retrieval word bank according to the key words in the key terms;
s3: constructing a retrieval model, and constructing a plurality of retrieval models according to the standard classification tables and the retrieval word stock in the step S2;
s4: issuing a search request, and recording search conditions by taking each search model in the step S3 as an entry;
s5: the search information is returned and the data satisfying the search request in step S4 is displayed.
2. The method for classified retrieval of standard documents according to claim 1, wherein: the type of the keyword in step S2 includes industry information, standard number information, standard name information, release department information, release date information, implementation date information, validity state information, standard level information, and standard announcement information; and setting a first index for each type to which the keyword entry belongs.
3. The method for classified retrieval of standard documents according to claim 1, wherein: in step S2, the plurality of standard classification tables include an industry standard classification table, a national standard classification table, a local standard classification table, a foreign standard classification table, a current standard classification table, a revocation standard classification table, a partial revocation standard classification table, an unfulfilled standard classification table, other standard classification tables, and a standard announcement classification table; and sets a second index for each standard classification table.
4. The method for classified retrieval of standard documents according to claim 1, wherein: and setting a third index for each keyword in the search word library in the step S2, wherein all the keywords are linked with the corresponding standard document data.
5. The method for classified retrieval of standard documents according to claim 1, wherein: in step S3, the plurality of search models include a user-defined search model, a classified search model, and an important industry search model, and each search model has an independent search function.
6. The method for classified retrieval of standard documents according to claim 5, wherein: the plurality of retrieval models cooperate with each other to form a multi-level condition retrieval, and the retrieval conditions are combined in a 'and' relationship.
7. The method for classified retrieval of standard documents according to claim 5, wherein: the search model in step S3 includes an information entry unit for a user to enter search conditions, and a data return unit for returning a search result.
8. The method for classified retrieval of standard documents according to claim 7, wherein: the data return unit comprises a first return unit for returning the retrieval result and an auxiliary return unit for intelligently sequencing and displaying the data in the first return unit.
9. The method for classified retrieval of standard documents according to claims 1 and 8, wherein: the retrieval method also comprises a user login unit, wherein the user login unit monitors the retrieval process of different users, analyzes the retrieval habits of the users, and the auxiliary return unit intelligently sorts the retrieval results according to the retrieval habits of the users.
10. The method for classified retrieval of standard documents according to claim 1, wherein: the retrieval method further includes providing first three pages of reading, full text reading, text downloading and text hit of the standard documents.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911000480.0A CN110928978A (en) | 2019-10-21 | 2019-10-21 | Standard literature classification retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911000480.0A CN110928978A (en) | 2019-10-21 | 2019-10-21 | Standard literature classification retrieval method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110928978A true CN110928978A (en) | 2020-03-27 |
Family
ID=69849353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911000480.0A Pending CN110928978A (en) | 2019-10-21 | 2019-10-21 | Standard literature classification retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110928978A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112307171A (en) * | 2020-10-30 | 2021-02-02 | 中国电力科学研究院有限公司 | Institutional standard retrieval method and system based on power knowledge base and readable storage medium |
CN112735584A (en) * | 2020-12-31 | 2021-04-30 | 北京万方数据股份有限公司 | Malignant tumor diagnosis and treatment auxiliary decision generation method and device |
CN113033175A (en) * | 2021-04-07 | 2021-06-25 | 芜湖市标准化研究院 | Standard effectiveness evaluation method and system |
-
2019
- 2019-10-21 CN CN201911000480.0A patent/CN110928978A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112307171A (en) * | 2020-10-30 | 2021-02-02 | 中国电力科学研究院有限公司 | Institutional standard retrieval method and system based on power knowledge base and readable storage medium |
CN112735584A (en) * | 2020-12-31 | 2021-04-30 | 北京万方数据股份有限公司 | Malignant tumor diagnosis and treatment auxiliary decision generation method and device |
CN112735584B (en) * | 2020-12-31 | 2023-10-24 | 北京万方数据股份有限公司 | Malignant tumor diagnosis and treatment auxiliary decision generation method and device |
CN113033175A (en) * | 2021-04-07 | 2021-06-25 | 芜湖市标准化研究院 | Standard effectiveness evaluation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754233B (en) | Method and system for intelligently recommending position information | |
CN109918453B (en) | Method and system for searching relational complex management information system data by natural language | |
CN101223525B (en) | Relationship networks | |
CN101151607B (en) | Method and system for providing reviews for a product | |
CN101542475B (en) | System and method for searching and matching data having ideogrammatic content | |
CN100375090C (en) | Retrieving matching documents by queries in any national language | |
US9141691B2 (en) | Method for automatically indexing documents | |
CN110928978A (en) | Standard literature classification retrieval method | |
CN110866018B (en) | Steam-massage industry data entry and retrieval method based on label and identification analysis | |
CN107590128B (en) | Paper homonymy author disambiguation method based on high-confidence characteristic attribute hierarchical clustering method | |
CN111309944B (en) | Digital humane searching method based on graph database | |
CN113190687B (en) | Knowledge graph determining method and device, computer equipment and storage medium | |
WO2007132342A1 (en) | Documentary search procedure in a distributed information system | |
CN102955844A (en) | Presenting search results based upon subject-versions | |
CN110968800A (en) | Information recommendation method and device, electronic equipment and readable storage medium | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
CN110765233A (en) | Intelligent information retrieval service system based on deep mining and knowledge management technology | |
Wong et al. | Finding structure and characteristic of Web documents for classification | |
CN102385597B (en) | The fault-tolerant searching method of a kind of POI | |
JP2001184358A (en) | Device and method for retrieving information with category factor and program recording medium therefor | |
CN110245215B (en) | Text retrieval method and device | |
WO1998049632A1 (en) | System and method for entity-based data retrieval | |
Dorn et al. | Meta-search in human resource management | |
CN100496091C (en) | System for making global search in wired TV one-way set-top box | |
KR20010107810A (en) | Web search system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |