CN115168623A - Full-text retrieval method and system for water conservancy industry standard - Google Patents

Full-text retrieval method and system for water conservancy industry standard Download PDF

Info

Publication number
CN115168623A
CN115168623A CN202210746687.8A CN202210746687A CN115168623A CN 115168623 A CN115168623 A CN 115168623A CN 202210746687 A CN202210746687 A CN 202210746687A CN 115168623 A CN115168623 A CN 115168623A
Authority
CN
China
Prior art keywords
water conservancy
industry standard
full
conservancy industry
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210746687.8A
Other languages
Chinese (zh)
Inventor
韩永利
朱家兵
蔡军凯
房爱印
牛月华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Technology Co Ltd filed Critical Inspur Software Technology Co Ltd
Priority to CN202210746687.8A priority Critical patent/CN115168623A/en
Publication of CN115168623A publication Critical patent/CN115168623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a full-text retrieval method and a full-text retrieval system for water conservancy industry standards, which belong to the technical field of water conservancy business industry retrieval, wherein the method splits and combines a water conservancy industry standard file and content, and automatically extracts characters and words in a text; firstly, collecting standard files of the water conservancy industry, and regularly and continuously updating; then, the collected water conservancy industry standards are cleaned, converted, classified and put in storage; and finally, feeding back specific contents in the water conservancy industry standard to a consulted user through a water conservancy standard full-text retrieval system. The method and the system can bring great convenience to the majority of users in consulting the water conservancy standard files and contents, reduce the time for looking up the data and improve the accuracy of the searched contents.

Description

Full-text retrieval method and system for water conservancy industry standard
Technical Field
The invention relates to the technical field of water conservancy business industry retrieval, in particular to a water conservancy industry standard full-text retrieval method and system.
Background
At present, water conservancy standards issued by water conservancy departments, water conservancy halls, water conservancy bureaus, various related associations and the like are more, corresponding versions are continuously upgraded, and when a large number of users use the system, file contents which are not inquired or searched and are not wanted by the users exist. Such as: looking up hydrologic information forecast specifications, early versions can be inquired, so that old versions can be used in business process design and implementation projects, and latest requirements cannot be met; in addition, the standard is more, accurate information cannot be directly searched, and great waste of data reference time can be brought when the water conservancy work is engaged.
Disclosure of Invention
The technical task of the invention is to provide a water conservancy industry standard full-text retrieval method and a water conservancy industry standard full-text retrieval system aiming at the defects, which can bring great convenience for vast users to look up water conservancy standard files and contents, reduce the time for searching data and improve the accuracy of the retrieved contents.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a full-text retrieval method for water conservancy industry standards is characterized in that a water conservancy industry standard file and contents are split and combined, and characters and words in a text are automatically extracted;
firstly, standard files of the water conservancy industry are collected and are updated regularly and continuously;
then, the collected water conservancy industry standards are cleaned, converted, classified and stored;
and finally, feeding back specific contents in the water conservancy industry standard to a consulted user through a water conservancy standard full-text retrieval system.
The method establishes a mechanism for regularly collecting, cleaning, converting and warehousing the water conservancy industry standard files, and ensures the accuracy and timeliness of the water conservancy standard files; the Chinese word segmentation and storage of the water conservancy industry standard file are completed, the full-text retrieval function is realized, and the convenience for looking up is brought to users in the water conservancy industry. By the method, the water conservancy industry standard rapid full-text retrieval can be realized, and the efficiency and accuracy of user query are improved.
Preferably, the method is implemented based on the Java web architecture.
Preferably, the water conservancy industry standard file collection is carried out through webpage crawler related technologies, and the search website comprises a national standard network, an industry standard network, water conservancy halls of various provinces and a hydrological bureau website.
Preferably, the collected files have a plurality of formats such as pdf, word, webpage, swf and the like, and the collected files are cleaned, converted, classified and put in storage, and the steps are repeatedly executed regularly to ensure that the standard files are the latest files.
Further, the file cleaning comprises judging whether the version of the file needs to be modified or not, and which needs to be deleted;
after cleaning, formatting conversion of files is carried out, files with different formats are converted into word formats in a unified manner, and word segmentation of Chinese words in the later period is facilitated;
and classifying the converted files according to a water conservancy technical standard system framework, and storing the files in directories of various databases.
Preferably, the water conservancy standard full-text retrieval system firstly establishes a set of water conservancy industry standard database; then, performing Chinese word segmentation on each standard file, and storing the Chinese word segmentation into a water conservancy industry standard database; and finally, through searching a query page, the queried data is extracted from the background and fed back to the front-end user.
Preferably, the information of the water conservancy industry standard database comprises the name, version, release time and standard type of the water conservancy industry standard.
The invention also claims to protect a water conservancy industry standard full-text retrieval system, which comprises a water conservancy industry standard file collection module, a water conservancy industry standard file cleaning and warehousing module and a water conservancy industry standard full-text retrieval module,
the system realizes full-text retrieval of the water conservancy industry standard file through the water conservancy industry standard full-text retrieval method.
The system has friendly retrieval interface, and can provide a user retrieval interface which is beautiful, generous and easy to use for users.
The invention also claims to protect a full-text retrieval device of the water conservancy industry standard, which comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
and the at least one processor is used for calling the machine readable program to execute the water conservancy industry standard full-text retrieval method.
The invention is also claimed in a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the above-described water conservancy industry standard full-text search method.
Compared with the prior art, the full-text retrieval method and the full-text retrieval system for the water conservancy industry standard have the following beneficial effects:
according to the method, the water conservancy industry standard files and contents are split and combined, characters and words in the text are automatically extracted, and the efficiency and accuracy of user query are improved; the retrieval interface is friendly, and can provide a user retrieval interface which is beautiful, generous and easy to use for the user.
The method establishes a mechanism for regularly collecting, cleaning, converting and warehousing the water conservancy industry standard files, and ensures the accuracy and timeliness of the water conservancy standard files; the Chinese word segmentation and storage of the water conservancy industry standard file are completed, the full-text retrieval function is realized, and the convenience for looking up is brought to users in the water conservancy industry.
Drawings
Fig. 1 is a diagram of an implementation manner of a water conservancy industry standard full-text retrieval method provided by an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the following specific examples.
The society is more and more refined, various standard specifications are also continuously updated in an iterative way, in order to learn, understand and master water conservancy related knowledge, solve the problem of full-text retrieval of the water conservancy standard, improve the accuracy of retrieval contents, ensure that all production departments are technically coordinated and unified, quickly and accurately inquire water conservancy industry standard files and data wanted by the production departments, the embodiment of the invention provides a full-text retrieval method of the water conservancy industry standard, which splits and combines the water conservancy industry standard files and contents, automatically extracts characters and words in the text,
firstly, standard files of the water conservancy industry need to be collected and updated regularly;
then, the collected water conservancy industry standards are cleaned, converted, classified and put in storage;
and finally, feeding back specific contents in the water conservancy industry standard to a consulted user through a water conservancy standard full-text retrieval system.
The method is realized based on a Java web architecture, a mechanism for regularly collecting, cleaning, converting and warehousing the water conservancy industry standard files is established, and the accuracy and timeliness of the water conservancy standard files are ensured; the Chinese word segmentation and storage of the water conservancy industry standard file are completed, the full-text retrieval function is realized, and the convenience for looking up is brought to users in the water conservancy industry. By the method, the water conservancy industry standard rapid full-text retrieval can be realized, and the efficiency and accuracy of user query are improved.
The standard document collection of the water conservancy industry needs to be carried out through relevant technologies such as web crawlers and the like from national standard networks, industry standard networks, water conservancy halls of various provinces, websites of hydrological bureaus and the like. The collected files have various formats such as pdf, word, webpage, swf and the like, so that the files need to be cleaned, converted, classified and put in storage, and in addition, the steps need to be executed repeatedly at regular intervals, so that the standard files can be ensured to be the latest files.
Processing a water conservancy standard file: as the format of the collected files is not standardized, the cleaning work is the most critical step, which version the file is, and does not need to be modified, and what is needed to be deleted? After the cleaning is finished, the file format is converted, and files with different formats are converted into word formats in a unified mode, so that word segmentation of Chinese at the later stage is facilitated.
And then classifying the converted files according to a water conservancy technical standard system framework, and storing the files into directories of various databases.
The water conservancy standard full-text retrieval system firstly establishes a set of water conservancy industry standard database, and information comprises the name, version, release time and standard type of a water conservancy industry standard;
then, performing Chinese word segmentation on each standard file, and storing the Chinese word segmentation into a water conservancy industry standard database;
and finally, by searching the query page, the queried data is extracted from the background and fed back to the front-end user.
The embodiment of the invention also provides a water conservancy industry standard full-text retrieval system, which comprises a water conservancy industry standard file collection module, a water conservancy industry standard file cleaning and warehousing module and a water conservancy industry standard full-text retrieval module,
the system realizes full-text retrieval of the water conservancy industry standard file through the water conservancy industry standard full-text retrieval method in the embodiment.
The system has friendly retrieval interface, and can provide a user retrieval interface which is beautiful, generous and easy to use for users.
The embodiment of the invention also provides a full-text retrieval device for the water conservancy industry standard, which comprises: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine-readable program to execute the water conservancy industry standard full-text retrieval method in the above embodiment of the present invention.
An embodiment of the present invention further provides a computer-readable medium, where a computer instruction is stored on the computer-readable medium, and when the computer instruction is executed by a processor, the processor is enabled to execute the full-text search method according to the water conservancy industry standard in the foregoing embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the embodiments described above are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a flexible disk, hard disk, magneto-optical disk, optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), magnetic tape, nonvolatile memory card, and ROM. Alternatively, the program code may be downloaded from a server computer by a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
While the invention has been particularly shown and described with reference to the preferred embodiments and drawings, it is not intended to be limited to the specific embodiments disclosed, and it will be understood by those skilled in the art that various other combinations of code approval means and various embodiments described above may be made, and such other embodiments are within the scope of the present invention.

Claims (10)

1. A full-text retrieval method for water conservancy industry standards is characterized in that a water conservancy industry standard file and content are split and combined, and characters and words in a text are automatically extracted;
firstly, collecting standard files of the water conservancy industry, and regularly and continuously updating;
then, the collected water conservancy industry standards are cleaned, converted, classified and put in storage;
and finally, feeding back specific contents in the water conservancy industry standard to a consulted user through a water conservancy standard full-text retrieval system.
2. The water conservancy industry standard full-text retrieval method according to claim 1, characterized in that the method is implemented based on a Java web architecture.
3. The water conservancy industry standard full-text retrieval method according to claim 1 or 2, wherein the water conservancy industry standard file collection is performed through webpage crawler related technologies, and the search websites comprise a national standard network, an industry standard network, water conservancy halls of various provinces and hydrology bureau websites.
4. The full text retrieval method for the water conservancy industry standard according to claim 3, wherein the collected files are cleaned, converted, classified and stored in a warehouse, and the steps are repeated periodically to ensure that the standard files are the latest files.
5. The water conservancy industry standard full-text retrieval method according to claim 4, wherein the file cleaning comprises judging whether the file version needs to be modified and which needs to be deleted;
after cleaning, formatting conversion of files is carried out, files with different formats are converted into word formats in a unified mode, and word segmentation of later Chinese is facilitated;
and classifying the converted files according to a water conservancy technical standard system framework, and storing the files in directories of various databases.
6. The water conservancy industry standard full-text retrieval method according to claim 1, wherein the water conservancy standard full-text retrieval system firstly establishes a set of water conservancy industry standard database; then, performing Chinese word segmentation on each standard file, and storing the Chinese word segmentation into a water conservancy industry standard database; and finally, by searching the query page, the queried data is extracted from the background and fed back to the front-end user.
7. The full-text search method for water conservancy industry standards according to claim 6, wherein the information of the water conservancy industry standard database comprises names, versions, release times and standard types of the water conservancy industry standards.
8. A full-text retrieval system for water conservancy industry standards is characterized by comprising a water conservancy industry standard file collection module, a water conservancy industry standard file cleaning and warehousing module and a water conservancy industry standard full-text retrieval module,
the system realizes full-text retrieval of the water conservancy industry standard files through the water conservancy industry standard full-text retrieval method of any one of claims 1 to 7.
9. A water conservancy industry standard full-text retrieval device is characterized by comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is used for calling the machine readable program to execute the hydraulic industry standard full text retrieval method of any one of claims 1 to 7.
10. A computer readable medium having stored thereon computer instructions, which when executed by a processor, cause the processor to perform the hydraulic industry standard full text search method of any one of claims 1 to 7.
CN202210746687.8A 2022-06-29 2022-06-29 Full-text retrieval method and system for water conservancy industry standard Pending CN115168623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210746687.8A CN115168623A (en) 2022-06-29 2022-06-29 Full-text retrieval method and system for water conservancy industry standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210746687.8A CN115168623A (en) 2022-06-29 2022-06-29 Full-text retrieval method and system for water conservancy industry standard

Publications (1)

Publication Number Publication Date
CN115168623A true CN115168623A (en) 2022-10-11

Family

ID=83488310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210746687.8A Pending CN115168623A (en) 2022-06-29 2022-06-29 Full-text retrieval method and system for water conservancy industry standard

Country Status (1)

Country Link
CN (1) CN115168623A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436421A (en) * 2023-12-21 2024-01-23 湖北省标准化与质量研究院(湖北Wto/Tbt通报咨询中心) Standard file editing system, method and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436421A (en) * 2023-12-21 2024-01-23 湖北省标准化与质量研究院(湖北Wto/Tbt通报咨询中心) Standard file editing system, method and equipment

Similar Documents

Publication Publication Date Title
US9619448B2 (en) Automated document revision markup and change control
CN102207948B (en) Method for generating incident statement sentence material base
US9390097B2 (en) Dynamic generation of target files from template files and tracking of the processing of target files
US7765236B2 (en) Extracting data content items using template matching
US7487174B2 (en) Method for storing text annotations with associated type information in a structured data store
CN102073490B (en) Method and device for translating database language
CN110263317B (en) Method and device for generating document template
CN100498782C (en) Method for quick updating data domain in full text retrieval system
CN101539904B (en) Automatic indexing method of quotations
CN107346325A (en) Information query method and device
CN101763255B (en) Format conversion method and device of special interface tool
KR101083563B1 (en) Method and System for Managing Database
US8812441B2 (en) Migration apparatus which convert database of mainframe system into database of open system and method for thereof
JP4247135B2 (en) Structured document storage method, structured document storage device, structured document search method
CN116204660B (en) Multi-source heterogeneous data driven domain knowledge graph construction method
CN115168623A (en) Full-text retrieval method and system for water conservancy industry standard
CN107748748B (en) Full text retrieval system for water conservancy and hydropower technology standard
CN106095933A (en) A kind of patent information inquiry system and querying method
JP2005190163A (en) Method, apparatus and program for retrieving structured data
US20090077031A1 (en) System and method for creating full-text indexes of patent documents
CN112818070A (en) Data query method and device based on global data dictionary and electronic equipment
CN116090416B (en) Standard writing method, system, equipment and medium based on standard knowledge graph
CN111796833A (en) Code language conversion method, system, equipment and storage medium
JP2005242416A (en) Natural language text search method and device
CN114218347A (en) Method for quickly searching index of multiple file contents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination