CN112463791A - Nuclear power station document data acquisition method and device, computer equipment and storage medium - Google Patents
Nuclear power station document data acquisition method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112463791A CN112463791A CN202011308653.8A CN202011308653A CN112463791A CN 112463791 A CN112463791 A CN 112463791A CN 202011308653 A CN202011308653 A CN 202011308653A CN 112463791 A CN112463791 A CN 112463791A
- Authority
- CN
- China
- Prior art keywords
- document
- data
- document data
- page number
- nuclear power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/177—Editing, e.g. inserting or deleting of tables; using ruled lines
- G06F40/18—Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of information construction of nuclear power stations, in particular to a method and a device for acquiring nuclear power station document data, computer equipment and a storage medium, wherein the method comprises the steps of receiving an equipment data acquisition instruction containing a document path and keywords; acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword; acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not; when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data; and converting the first document data into structured data and storing the structured data in a nuclear power plant document database. The invention improves the document classification storage efficiency and can reduce omission or errors possibly caused by manual extraction and sorting.
Description
Technical Field
The invention relates to the technical field of nuclear power plant informatization construction, in particular to a nuclear power plant document data acquisition method and device, computer equipment and a storage medium.
Background
With the development of nuclear power plant technology, more and more equipment data, such as maintenance data, operation data and the like, need to be recorded in each nuclear power plant equipment.
At present, equipment data of nuclear power plant equipment is stored in a working document form, and the equipment data in the working document is recorded in an unstructured text form, so that when the equipment data is required to be utilized, the working document needs to be read through a file reading program at first, and then the read and identified equipment data is extracted and sorted in a manual mode, a large amount of labor and time are consumed in the process, and the subsequent use of the equipment data recorded in the working document form is very inconvenient.
Disclosure of Invention
The embodiment of the invention provides a nuclear power station document data acquisition method and device, computer equipment and a storage medium, and aims to solve the problem that equipment data is inconvenient to use subsequently due to manual extraction and arrangement of the equipment data.
A nuclear power plant document data acquisition method includes:
receiving an equipment data acquisition instruction containing a file path and a keyword;
acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword;
acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not;
when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data;
and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
A nuclear power plant document data acquisition apparatus comprising:
the data acquisition instruction receiving module is used for receiving an equipment data acquisition instruction containing a file path and a keyword;
the target document acquisition module is used for acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword;
a key page number obtaining module, configured to obtain a key page number including the keyword from the target document, and detect whether the key page number includes a preset table;
the first document data acquisition module is used for acquiring first document data in a preset table when the key page number comprises the preset table; the first document data is unstructured data;
and the data storage module is used for converting the first document data into structured data and storing the structured data into a nuclear power plant document database.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the nuclear power plant document data acquisition method described above when executing the computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described nuclear power plant document data acquisition method.
The nuclear power station document data acquisition method, the nuclear power station document data acquisition device, the computer equipment and the storage medium are characterized in that the method receives an equipment data acquisition instruction containing a document path and a keyword; acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword; acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not; when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data; and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
According to the invention, after the equipment data acquisition instruction is received, the document data associated with the file path and the keywords are automatically detected and acquired, and the original unstructured document data are converted into the structured document data and then stored in the document database of the nuclear power station, so that the original path of the data can be quickly found when the utilized data or the document associated with the data needs to be searched subsequently, the labor is saved, the document classification and storage efficiency is improved, and the omission or errors possibly caused by manual extraction and sorting can be reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a diagram illustrating an application environment of a nuclear power plant document data acquisition method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a nuclear power plant document data acquisition method according to an embodiment of the present invention;
fig. 3 is a flowchart of step S20 in the nuclear power plant document data acquisition method according to the embodiment of the present invention;
fig. 4 is a flowchart of step S50 in the nuclear power plant document data acquisition method according to the embodiment of the present invention;
FIG. 5 is a functional block diagram of a nuclear power plant document data acquisition device according to an embodiment of the present invention;
FIG. 6 is a functional block diagram of a target document acquiring module in a nuclear power plant document data acquiring apparatus according to an embodiment of the present invention;
FIG. 7 is a functional block diagram of a data storage module in the nuclear power plant document data acquisition device according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for acquiring the nuclear power plant document data provided by the embodiment of the invention can be applied to the application environment shown in fig. 1. Specifically, the nuclear power plant document data acquisition method is applied to a nuclear power plant document data acquisition system, the nuclear power plant document data acquisition system comprises a client and a server shown in fig. 1, and the client and the server are communicated through a network and used for solving the problem that subsequent use of device data is inconvenient due to manual extraction and sorting of the device data. The client is also called a user side, and refers to a program corresponding to the server and providing local services for the client. The client may be installed on, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, a nuclear power plant document data obtaining method is provided, which is described by taking the example that the method is applied to the server in fig. 1, and includes the following steps:
s10: and receiving a device data acquisition instruction containing a file path and a keyword.
The device data acquisition instruction can be an instruction sent by a relevant worker through a mobile terminal or a cloud server, or an instruction automatically generated when the relevant worker enters a file path and a keyword on an application program applied by the method. The file path refers to an address where a target document to be acquired is stored. The keywords refer to words existing in the target document to be acquired, and the corresponding target document can be acquired through the specified file path and the specified keywords.
Further, when a plurality of device data acquisition instructions including file paths and keywords are received, the instructions received at a later time can be stored in the instruction cache region according to the time sequence of the received instructions, and further the instructions can be executed in a circulating batch manner.
S20: and acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword.
The target document is a document containing keywords in a file path of a nuclear power plant equipment data cache region; optionally, the target documents in this embodiment are all stored in the nuclear power plant equipment data cache region in the form of word documents, and there may be only one target document that includes the keyword under the file path, or there may be multiple target documents. Further, the nuclear power plant equipment data cache area indicated in this embodiment is merely an example, and the target document may be stored in another personal computer or the like. And the nuclear power plant equipment data cache region is used for storing unstructured documents of the structured data which are not sorted temporarily.
In one embodiment, as shown in fig. 3, step S20 includes:
s201: and acquiring all documents under the file path from the nuclear power station equipment data cache region.
It can be understood that a file path corresponds to the storage area of each document, and after receiving an equipment data acquisition instruction including the file path and a keyword, all documents in the file path are acquired from the equipment data cache region of the nuclear power plant by tracing back to the file path.
S202: and detecting whether all the documents contain the keywords or not.
S203: recording the document containing the keywords as the target document.
Optionally, after all documents in the file path are acquired from the data cache area of the nuclear power plant equipment, a TextRank, LDA, or TPR algorithm model may be adopted to perform keyword detection on all documents, so as to record a document containing a keyword as a target document.
Further, after detecting whether all documents contain the keywords, if all documents do not contain the keywords, sending an equipment data acquisition failure instruction to a preset receiver so as to enable the preset receiver to detect whether the file path and/or the keywords have errors. The preset receiving party may be a related person of the nuclear power plant or an object for sending the equipment data acquisition instruction.
S30: and acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not.
It can be understood that, although the target document is a document containing a keyword, the content in all the pages in the target document does not necessarily contain the keyword, and therefore, it is only necessary to detect whether data to be acquired exists in the keyword pages containing the keyword, which saves system running time, reduces the burden of the computer system, and increases the running speed of the computer system. The preset table may be any form of table in which device data is stored.
Generally, specific data of the nuclear power plant is generally stored in a table in a target document, but the data in the table is all unclassified unstructured data, and after the target document is acquired from a nuclear power plant equipment data cache region according to the file path and the keyword, a key page number containing the keyword is acquired from the target document, and whether a preset table is included in the key page number is detected, so as to acquire equipment data in the preset table.
S40: when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data.
It can be understood that the first document data is stored in the preset table of the key page of the target document in an irregular or incomplete data format, which is inconvenient for representing the data by using the two-dimensional logic table of the database, and therefore, the first document data cannot be directly stored into the database, thereby representing that the first document data is unstructured data.
Specifically, after acquiring a key page number containing the keyword from the target document and detecting whether the key page number contains a preset table, generally, a first line in the preset table is a header line, and the content of the header line has no meaning on data storage, so that when the key page number contains the preset table, the table content with the first line as a line in the preset table is automatically rejected; after the table content of the first row is proposed, all column data from the second row in the preset table are automatically obtained through the loop traversal method, and after the loop traversal to the last row, the process automatically jumps to step S30 to detect whether the preset table is included in the next key page number.
S50: and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
The data stored in the nuclear power plant document database are all structured data, so before the first document data of the non-structured data is stored in the nuclear power plant document database, the first document data needs to be converted into the structured data, and then the first document data is stored in the nuclear power plant document database.
In an embodiment, before step S50, that is, before converting the first document data into the structured data and storing the structured data in the nuclear power plant document database, the method further includes:
(1) and detecting whether the document data contains preset horizontal line characters.
The preset horizontal line characters can be horizontal lines for connecting the document data or horizontal lines of the document data, and the horizontal line characters can not influence the specific meaning of the document data, so that the horizontal line characters can be removed.
(2) And when the document data contains the preset horizontal line characters, eliminating the preset horizontal line characters, and detecting the length of the identification bit of each document data after the preset horizontal line characters are eliminated.
Specifically, after detecting whether the document data contains preset horizontal line characters or not, when the document data contains the preset horizontal line characters, the preset horizontal line characters are removed, and the identification bit length of each document data after the preset horizontal line characters are removed is detected. The identification bit length refers to the number of identification bit digits of each document data after horizontal line characters are removed.
(3) And when the identification bit length of the document data is equal to a first preset identification bit length, recording the document data as the power station document data.
Preferably, the first preset identification bit length is two bits. It can be understood that when the identification bit length of the document data is equal to the first preset identification bit length, the document data is represented as the position information of the nuclear power plant equipment, and then the document data is recorded as the power plant document data.
(4) And when the identification bit length of the document data is greater than or equal to a second preset identification bit length, recording the document data as device document data.
Preferably, the second preset identification bit length is nine bits. It can be understood that when the identification bit length of the document data is greater than or equal to the second preset identification bit length, the document data is represented as specific device information of the nuclear power plant device, and then the document data is recorded as device document data.
(5) And when the identification bit length of the document data is smaller than the first preset identification bit length or is larger than the first preset identification bit length and smaller than the second preset identification bit length, recording the document data as data to be verified, and sending the data to be verified to a preset receiver.
It can be understood that when the identification bit length of the document data is less than a first preset identification bit length, or is greater than the first preset identification bit length and less than a second preset identification bit length, there is no way to directly determine the document data as the power station document data or the device document data, the document data needs to be recorded as the data to be verified, and the data to be verified needs to be sent to a preset receiver to instruct the preset receiver to manually verify the data to be verified, and after a specific classification (such as the power station document data or the device document data) of the data to be verified is determined, the data to be verified and the classification corresponding to the data to be verified can be fed back to the server, so as to perform classified storage on the data. Illustratively, when the target document records data, the document data which belongs to the power station document data is recorded with one less identification bit due to omission during recording, so that the identification bit length of the document data is smaller than the first preset identification bit length.
In one embodiment, as shown in fig. 4, in step S50, associating a target document with a first document data, that is, converting the first document data into structured data, and storing the first document data in a nuclear power plant document database includes:
s501: and determining a power station document label corresponding to the power station document data according to the key page number corresponding to the power station document data and the target document corresponding to the key page number.
S502: and the power station document data and the power station document label are stored in a power station field in the nuclear power station document database in an associated mode, so that the power station document data are converted into structured data.
The power station field refers to a type of data in a nuclear power station database, and only power station document data are stored under the power station field.
It can be understood that, when the identification bit length of the document data is equal to a first preset identification bit length, after the document data is recorded as the power station document data, in order to convert the unstructured data of the power station document data into structured data, that is, data that can be expressed by two-dimensional logic of a database, the power station document tag corresponding to the power station document data is determined according to a key page corresponding to the power station document data and a target document corresponding to the key page; and then, each piece of power station document data has an associated power station document tag, so that after the power station document data is stored in a nuclear power station document database, when the power station document data needs to be inquired, other data information associated with the power station document data can be obtained by analyzing the associated power station document tag, and the power station document data is represented to be converted from unstructured data into structured data.
S503: and determining the device document tag corresponding to the device document data according to the key page number corresponding to the device document data and the target document corresponding to the key page number.
S504: and storing the equipment document data and the equipment document tag into an equipment field in the nuclear power plant document database in an associated manner so as to convert the equipment document data into structured data.
It can be understood that, when the identification bit length of the document data is greater than a first preset identification bit length and less than a second preset identification bit length, after the document data is recorded as device document data, in order to convert the device document data from unstructured data to structured data, that is, to enable the device document data to be represented by a database two-dimensional logic, a device document tag corresponding to the device document data is determined according to a key page corresponding to the device document data and a target document corresponding to the key page; and each piece of equipment document data has an associated equipment document tag, so that when the equipment document data is stored in a nuclear power plant document database and needs to be inquired, other data information associated with the equipment document tag can be acquired by analyzing the associated equipment document tag, and the equipment document data is represented to be converted from unstructured data into structured data.
In this embodiment, after receiving an apparatus data acquisition instruction, document data associated with a file path and a keyword is automatically detected and acquired, and the original unstructured document data is converted into structured document data and then stored in a nuclear power plant document database, so that when subsequently required to search for utilized data or documents associated with the data, the original path of the data can be quickly found, manpower is saved, document classification and storage efficiency is improved, and omission or errors possibly caused by manual extraction and sorting can be reduced.
In an embodiment, after the step S30, that is, after detecting whether the key page number includes the preset table, the method further includes:
s60: and when the key page number does not contain a preset table, detecting whether a next page number adjacent to the key page number contains the preset table.
S70: and when the next page number adjacent to the key page number contains the preset table, acquiring second document data in the preset table contained in the next page number, and storing the second document data in a nuclear power station document database.
It can be understood that, since the display content of one page number in the document is limited, the keyword may be contained in the current key page number but appears only at the end of the current key page, and the corresponding preset table cannot be displayed in the current key page number, so that, when the preset table is not contained in the key page number, it is detected whether the preset table is contained in the next page number adjacent to the key page number.
Further, when the next page number adjacent to the key page number contains the preset table, second document data in the preset table contained in the next page number is acquired, and the second document data is stored in a nuclear power plant document database.
In an embodiment, after the step S60, that is, after detecting whether the next page number adjacent to the key page number includes the preset table, the method further includes:
s80: and when the next page number adjacent to the key page number does not contain the preset table, prompting that the key page number does not contain the preset table, and detecting whether the next key page number contains the preset table or not.
S90: and when the next key page number contains a preset table, acquiring third document data in the preset table, and storing the third document data in a nuclear power station document database.
It can be understood that, when the key page number does not include the preset table, after detecting whether the next page number adjacent to the key page number includes the preset table, if the next page number adjacent to the key page number does not include the preset table, it represents that there is no data to be acquired in the key page number and the adjacent next page number, and then skips over the key page number, detects whether the next key page number includes the preset table, when the next key page number includes the preset table, acquires the third document data in the preset table, stores the third document data in the nuclear power station document database, and continues to detect whether the next key page number includes the preset table, until all the key page numbers are detected, the detection is stopped.
In an embodiment, after step S50, that is, after converting the first document data into structured data and storing the structured data in the nuclear power plant document database, the method further includes:
(1) after receiving a device data acquisition instruction containing a target field, analyzing the target field to obtain a target identification vector corresponding to the target field.
The equipment data acquisition instruction can be an instruction sent by a related worker of the nuclear power station, and can also be an instruction generated by triggering after the related worker enters a target field in a server. The target field is a field for which document data related thereto is to be acquired.
It can be understood that when a person related to the nuclear power plant needs to acquire a document or associated data related to a certain data, the person may send an equipment data acquisition instruction including a target field to the server, and after receiving the equipment data acquisition instruction including the target field, the server analyzes the target field to obtain a target identification vector corresponding to the target field, and then queries the document related to the target field in the nuclear power plant document database according to the target identification vector.
(2) And detecting the identification bit length of the target identification vector.
(3) And when the identification bit length is equal to the first preset identification bit length, acquiring power station document data matched with the target identification vector and a power station document label associated with the power station document data from a power station field of the nuclear power station document database.
Wherein the identification bit length refers to the number of characters of the target identification vector.
It is to be understood that, in the above embodiment, it has been indicated that, when the identification bit length of the document data is equal to the first preset identification bit length, the document data is recorded as the plant document data, and the plant document data and the plant document tag corresponding to the plant document data are stored in the plant field in the nuclear power plant document database in an associated manner, and then, when the identification bit length is equal to the first preset identification bit length, the plant document data matching the target identification vector can be obtained from the plant field in the nuclear power plant document database.
Further, in the above embodiment, it is further indicated that, when the identification bit length of the document data is greater than or equal to a second preset identification bit length, the document data is recorded as device document data, and the device document data and the device document tag are stored in a device field in the nuclear power plant document database in an associated manner, and then, when the identification bit length of the target identification vector is greater than or equal to the second preset identification bit length, the device document data matching the target identification vector is acquired from the device field in the nuclear power plant document database.
And sending the power station document data and the power station document label associated with the power station document data to a preset receiving party.
Specifically, after power station document data matched with the target identification vector and a power station document tag associated with the power station document data are acquired from a power station field of the nuclear power station document database, the power station document data and the power station document tag associated with the power station document data are sent to a preset receiving party, so that after the preset receiving party analyzes the power station document tag, a target document and a key page number associated with the power station document data in the power station document tag are acquired, the key page number in the target document can be inquired from a nuclear power station equipment data cache region, data related to the target field (namely, the power station document data) are acquired, and convenience in data acquisition is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, a nuclear power plant document data acquisition apparatus is provided, which corresponds one-to-one to the nuclear power plant document data acquisition method in the above-described embodiment. As shown in fig. 5, the nuclear power plant document data acquiring apparatus includes a data acquisition instruction receiving module 10, a target document acquiring module 20, a key page number acquiring module 30, a first document data acquiring module 40, and a data storage module 50. The functional modules are explained in detail as follows:
a data acquisition instruction receiving module 10, configured to receive an apparatus data acquisition instruction including a file path and a keyword;
a target document obtaining module 20, configured to obtain a target document from a data cache of the nuclear power plant device according to the file path and the keyword;
a key page number obtaining module 30, configured to obtain a key page number including the keyword from the target document, and detect whether the key page number includes a preset table;
a first document data obtaining module 40, configured to obtain first document data in a preset table when the key page number includes the preset table; the first document data is unstructured data;
and the data storage module 50 is used for converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
Preferably, as shown in fig. 6, the target document acquiring module 20 includes the following units:
a document obtaining unit 201, configured to obtain all documents in the file path from the data cache of the nuclear power plant device;
a keyword detection unit 202, configured to detect whether the all documents include the keyword;
a target document recording unit 203, configured to record the document including the keyword as the target document.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the first preset table detection module is used for detecting whether a next page number adjacent to the key page number contains a preset table or not when the key page number does not contain the preset table;
and the second document data acquisition module is used for acquiring second document data in the preset table contained in the next page number when the preset table is contained in the next page number adjacent to the key page number, and storing the second document data in a nuclear power station document database.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the second preset table detection module is used for prompting that the key page number does not contain the preset table and detecting whether the next key page number contains the preset table or not when the next page number adjacent to the key page number does not contain the preset table;
and the third document data acquisition module is used for acquiring third document data in a preset table when the next key page number comprises the preset table, and storing the third document data in a nuclear power station document database.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the horizontal line character detection module is used for detecting whether the document data contains preset horizontal line characters;
a horizontal line character removing module, configured to remove the preset horizontal line character when the document data includes the preset horizontal line character, and detect an identification bit length of each document data after the preset horizontal line character is removed;
the power station document data recording module is used for recording the document data into power station document data when the identification bit length of the document data is equal to a first preset identification bit length;
the equipment document data recording module is used for recording the document data into equipment document data when the length of the identification bit of the document data is greater than or equal to a second preset identification bit length;
and the data sending module is used for recording the document data as the data to be verified and sending the data to be verified to a preset receiver when the identification bit length of the document data is smaller than the first preset identification bit length or is larger than the first preset identification bit length and smaller than the second preset identification bit length.
Preferably, as shown in fig. 7, the data storage module 50 includes:
a power station document tag recording unit 501, configured to determine a power station document tag corresponding to the power station document data according to a key page number corresponding to the power station document data and a target document corresponding to the key page number;
a power station document data storage unit 502, configured to store the power station document data and the power station document tag in a power station field in the nuclear power station document database in an associated manner, so that the power station document data is converted into structured data;
a device document tag recording unit 503 configured to determine a device document tag corresponding to the device document data, based on a key page number corresponding to the device document data and a target document corresponding to the key page number;
a device document data storage unit 504, configured to store the device document data and the device document tag in association with a device field in the nuclear power plant document database, so that the device document data is converted into structured data.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the target field analyzing module is used for analyzing the target field after receiving an equipment data acquiring instruction containing the target field to obtain a target identification vector corresponding to the target field;
the identification bit length detection module is used for detecting the identification bit length of the target identification vector;
the power station document data acquisition module is used for acquiring power station document data matched with the target identification vector and a power station document label associated with the power station document data from a power station field of the nuclear power station document database when the identification bit length is equal to the first preset identification bit length;
and the data sending module is used for sending the power station document data and the power station document label related to the power station document data to a preset receiving party.
For specific limitations of the nuclear plant document data acquisition device, reference may be made to the above limitations of the nuclear plant document data acquisition method, and details are not described here. All or part of each module in the nuclear power plant document data acquisition device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data used in the nuclear power plant document data acquisition method in the embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a nuclear power plant document data acquisition method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and when the processor executes the computer program, the nuclear power plant document data acquisition method in the above embodiments is implemented.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the nuclear power plant document data acquisition method in the above-described embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.
Claims (12)
1. A nuclear power plant document data acquisition method is characterized by comprising the following steps:
receiving an equipment data acquisition instruction containing a file path and a keyword;
acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword;
acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not;
when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data;
and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
2. The nuclear power plant document data acquisition method according to claim 1, wherein the acquiring a target document from a nuclear power plant device data cache area according to the file path and the keyword includes:
acquiring all documents under the file path from the nuclear power station equipment data cache region;
detecting whether all the documents contain the keywords or not;
recording the document containing the keywords as the target document.
3. The nuclear power plant document data acquisition method according to claim 1, wherein the detecting whether the key page number includes a preset table includes:
when the key page number does not contain a preset table, detecting whether a next page number adjacent to the key page number contains the preset table;
and when the next page number adjacent to the key page number contains the preset table, acquiring second document data in the preset table contained in the next page number, and storing the second document data in a nuclear power station document database.
4. The nuclear power plant document data acquisition method according to claim 3, wherein after detecting whether a preset table is included in a next page number adjacent to the key page number, the method further includes:
when the next page number adjacent to the key page number does not contain a preset table, prompting that the key page number does not contain the preset table, and detecting whether the next key page number contains the preset table or not;
and when the next key page number contains a preset table, acquiring third document data in the preset table, and storing the third document data in a nuclear power station document database.
5. The nuclear plant document data acquisition method according to claim 1, wherein before converting the first document data into structured data and storing the structured data in a nuclear plant document database, the method includes:
detecting whether the document data contains preset horizontal line characters or not;
when the document data contains the preset horizontal line characters, eliminating the preset horizontal line characters, and detecting the length of the identification bit of each document data after the preset horizontal line characters are eliminated;
when the identification bit length of the document data is equal to a first preset identification bit length, recording the document data as power station document data;
when the identification bit length of the document data is larger than or equal to a second preset identification bit length, recording the document data as equipment document data;
and when the identification bit length of the document data is smaller than the first preset identification bit length or is larger than the first preset identification bit length and smaller than the second preset identification bit length, recording the document data as data to be verified, and sending the data to be verified to a preset receiver.
6. The nuclear power plant document data acquisition method according to claim 5, wherein one of the target documents is associated with one document tag; the converting the first document data into structured data and storing the structured data in a nuclear power plant document database comprises:
determining a power station document label corresponding to the power station document data according to a key page number corresponding to the power station document data and a target document corresponding to the key page number;
storing the power station document data and the corresponding power station document tags into power station fields in the nuclear power station document database in an associated manner so as to convert the power station document data into structured data;
determining a device document tag corresponding to the device document data according to a key page number corresponding to the device document data and a target document corresponding to the key page number;
and storing the equipment document data and the equipment document tag into an equipment field in the nuclear power plant document database in an associated manner so as to convert the equipment document data into structured data.
7. The nuclear plant document data acquisition method according to claim 6, wherein after converting the first document data into structured data and storing the structured data in a nuclear plant document database, the method further comprises:
after receiving an equipment data acquisition instruction containing a target field, analyzing the target field to obtain a target identification vector corresponding to the target field;
detecting the length of the identification bit of the target identification vector;
when the identification bit length is equal to the first preset identification bit length, acquiring power station document data matched with the target identification vector and a power station document label associated with the power station document data from a power station field of the nuclear power station document database;
and sending the power station document data and the power station document label associated with the power station document data to a preset receiving party.
8. A nuclear power plant document data acquisition apparatus, characterized by comprising:
the data acquisition instruction receiving module is used for receiving an equipment data acquisition instruction containing a file path and a keyword;
the target document acquisition module is used for acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword;
a key page number obtaining module, configured to obtain a key page number including the keyword from the target document, and detect whether the key page number includes a preset table;
the first document data acquisition module is used for acquiring first document data in a preset table when the key page number comprises the preset table; the first document data is unstructured data;
and the data storage module is used for converting the first document data into structured data and storing the structured data into a nuclear power plant document database.
9. The nuclear power plant document data acquisition apparatus according to claim 8, wherein the target document acquisition module includes:
the file acquisition unit is used for acquiring all files under the file path from the data cache region of the nuclear power station equipment;
a keyword detection unit, configured to detect whether the all documents include the keyword;
and the target document recording unit is used for recording the document containing the key words as the target document.
10. The nuclear power plant document data acquiring apparatus according to claim 8, further comprising:
the first table detection module is used for detecting whether a next page number adjacent to the key page number contains a preset table or not when the key page number does not contain the preset table;
and the second document data acquisition module is used for acquiring second document data in the preset table contained in the next page number when the preset table is contained in the next page number adjacent to the key page number, and storing the second document data in a nuclear power station document database.
11. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the nuclear power plant document data acquisition method according to any one of claims 1 to 7 when executing the computer program.
12. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the nuclear power plant document data acquisition method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011308653.8A CN112463791A (en) | 2020-11-20 | 2020-11-20 | Nuclear power station document data acquisition method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011308653.8A CN112463791A (en) | 2020-11-20 | 2020-11-20 | Nuclear power station document data acquisition method and device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112463791A true CN112463791A (en) | 2021-03-09 |
Family
ID=74837121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011308653.8A Pending CN112463791A (en) | 2020-11-20 | 2020-11-20 | Nuclear power station document data acquisition method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112463791A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113377950A (en) * | 2021-06-02 | 2021-09-10 | 浪潮软件股份有限公司 | Method for realizing flat storage and real-time preview of unstructured document |
-
2020
- 2020-11-20 CN CN202011308653.8A patent/CN112463791A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113377950A (en) * | 2021-06-02 | 2021-09-10 | 浪潮软件股份有限公司 | Method for realizing flat storage and real-time preview of unstructured document |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111506498B (en) | Automatic generation method and device of test case, computer equipment and storage medium | |
CN110795919B (en) | Form extraction method, device, equipment and medium in PDF document | |
CN108932294B (en) | Resume data processing method, device, equipment and storage medium based on index | |
CN110209652B (en) | Data table migration method, device, computer equipment and storage medium | |
CN109508352B (en) | Report data output method, device, equipment and storage medium | |
CN110737818B (en) | Network release data processing method, device, computer equipment and storage medium | |
CN111176996A (en) | Test case generation method and device, computer equipment and storage medium | |
CN111191079B (en) | Document content acquisition method, device, equipment and storage medium | |
CN103455475B (en) | Composition method, equipment and system | |
CN110866491A (en) | Target retrieval method, device, computer readable storage medium and computer equipment | |
CN109325118B (en) | Unbalanced sample data preprocessing method and device and computer equipment | |
CN111400361B (en) | Data real-time storage method, device, computer equipment and storage medium | |
CN112286934A (en) | Database table importing method, device, equipment and medium | |
CN110990390A (en) | Data cooperative processing method and device, computer equipment and storage medium | |
US10664340B2 (en) | Failure analysis program, failure analysis device, and failure analysis method | |
CN108763396B (en) | Access request processing method, device, computer equipment and storage medium | |
CN112559526A (en) | Data table export method and device, computer equipment and storage medium | |
CN110362478B (en) | Application upgrade test method and device, computer equipment and storage medium | |
CN109656474B (en) | Data storage method and device, computer equipment and storage medium | |
CN112463791A (en) | Nuclear power station document data acquisition method and device, computer equipment and storage medium | |
CN108460116B (en) | Search method, search device, computer equipment, storage medium and search system | |
CN111125748A (en) | Judgment method and device for unauthorized query, computer equipment and storage medium | |
CN112528832A (en) | Method and system for processing PDF-format relay protection fixed value list | |
CN111460268A (en) | Method and device for determining database query request and computer equipment | |
CN109918114A (en) | Code comment information acquisition method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |