CN112463791A - Nuclear power station document data acquisition method and device, computer equipment and storage medium - Google Patents

Nuclear power station document data acquisition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112463791A
CN112463791A CN202011308653.8A CN202011308653A CN112463791A CN 112463791 A CN112463791 A CN 112463791A CN 202011308653 A CN202011308653 A CN 202011308653A CN 112463791 A CN112463791 A CN 112463791A
Authority
CN
China
Prior art keywords
document
data
document data
nuclear power
page number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011308653.8A
Other languages
Chinese (zh)
Inventor
刘文可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China General Nuclear Power Corp
CGN Power Co Ltd
Daya Bay Nuclear Power Operations and Management Co Ltd
Lingdong Nuclear Power Co Ltd
Guangdong Nuclear Power Joint Venture Co Ltd
Lingao Nuclear Power Co Ltd
Original Assignee
China General Nuclear Power Corp
CGN Power Co Ltd
Daya Bay Nuclear Power Operations and Management Co Ltd
Lingdong Nuclear Power Co Ltd
Guangdong Nuclear Power Joint Venture Co Ltd
Lingao Nuclear Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China General Nuclear Power Corp, CGN Power Co Ltd, Daya Bay Nuclear Power Operations and Management Co Ltd, Lingdong Nuclear Power Co Ltd, Guangdong Nuclear Power Joint Venture Co Ltd, Lingao Nuclear Power Co Ltd filed Critical China General Nuclear Power Corp
Priority to CN202011308653.8A priority Critical patent/CN112463791A/en
Publication of CN112463791A publication Critical patent/CN112463791A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of information construction of nuclear power stations, in particular to a method and a device for acquiring nuclear power station document data, computer equipment and a storage medium, wherein the method comprises the steps of receiving an equipment data acquisition instruction containing a document path and keywords; acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword; acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not; when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data; and converting the first document data into structured data and storing the structured data in a nuclear power plant document database. The invention improves the document classification storage efficiency and can reduce omission or errors possibly caused by manual extraction and sorting.

Description

Nuclear power station document data acquisition method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of nuclear power plant informatization construction, in particular to a nuclear power plant document data acquisition method and device, computer equipment and a storage medium.
Background
With the development of nuclear power plant technology, more and more equipment data, such as maintenance data, operation data and the like, need to be recorded in each nuclear power plant equipment.
At present, equipment data of nuclear power plant equipment is stored in a working document form, and the equipment data in the working document is recorded in an unstructured text form, so that when the equipment data is required to be utilized, the working document needs to be read through a file reading program at first, and then the read and identified equipment data is extracted and sorted in a manual mode, a large amount of labor and time are consumed in the process, and the subsequent use of the equipment data recorded in the working document form is very inconvenient.
Disclosure of Invention
The embodiment of the invention provides a nuclear power station document data acquisition method and device, computer equipment and a storage medium, and aims to solve the problem that equipment data is inconvenient to use subsequently due to manual extraction and arrangement of the equipment data.
A nuclear power plant document data acquisition method includes:
receiving an equipment data acquisition instruction containing a file path and a keyword;
acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword;
acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not;
when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data;
and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
A nuclear power plant document data acquisition apparatus comprising:
the data acquisition instruction receiving module is used for receiving an equipment data acquisition instruction containing a file path and a keyword;
the target document acquisition module is used for acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword;
a key page number obtaining module, configured to obtain a key page number including the keyword from the target document, and detect whether the key page number includes a preset table;
the first document data acquisition module is used for acquiring first document data in a preset table when the key page number comprises the preset table; the first document data is unstructured data;
and the data storage module is used for converting the first document data into structured data and storing the structured data into a nuclear power plant document database.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the nuclear power plant document data acquisition method described above when executing the computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described nuclear power plant document data acquisition method.
The nuclear power station document data acquisition method, the nuclear power station document data acquisition device, the computer equipment and the storage medium are characterized in that the method receives an equipment data acquisition instruction containing a document path and a keyword; acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword; acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not; when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data; and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
According to the invention, after the equipment data acquisition instruction is received, the document data associated with the file path and the keywords are automatically detected and acquired, and the original unstructured document data are converted into the structured document data and then stored in the document database of the nuclear power station, so that the original path of the data can be quickly found when the utilized data or the document associated with the data needs to be searched subsequently, the labor is saved, the document classification and storage efficiency is improved, and the omission or errors possibly caused by manual extraction and sorting can be reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a diagram illustrating an application environment of a nuclear power plant document data acquisition method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a nuclear power plant document data acquisition method according to an embodiment of the present invention;
fig. 3 is a flowchart of step S20 in the nuclear power plant document data acquisition method according to the embodiment of the present invention;
fig. 4 is a flowchart of step S50 in the nuclear power plant document data acquisition method according to the embodiment of the present invention;
FIG. 5 is a functional block diagram of a nuclear power plant document data acquisition device according to an embodiment of the present invention;
FIG. 6 is a functional block diagram of a target document acquiring module in a nuclear power plant document data acquiring apparatus according to an embodiment of the present invention;
FIG. 7 is a functional block diagram of a data storage module in the nuclear power plant document data acquisition device according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for acquiring the nuclear power plant document data provided by the embodiment of the invention can be applied to the application environment shown in fig. 1. Specifically, the nuclear power plant document data acquisition method is applied to a nuclear power plant document data acquisition system, the nuclear power plant document data acquisition system comprises a client and a server shown in fig. 1, and the client and the server are communicated through a network and used for solving the problem that subsequent use of device data is inconvenient due to manual extraction and sorting of the device data. The client is also called a user side, and refers to a program corresponding to the server and providing local services for the client. The client may be installed on, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, a nuclear power plant document data obtaining method is provided, which is described by taking the example that the method is applied to the server in fig. 1, and includes the following steps:
s10: and receiving a device data acquisition instruction containing a file path and a keyword.
The device data acquisition instruction can be an instruction sent by a relevant worker through a mobile terminal or a cloud server, or an instruction automatically generated when the relevant worker enters a file path and a keyword on an application program applied by the method. The file path refers to an address where a target document to be acquired is stored. The keywords refer to words existing in the target document to be acquired, and the corresponding target document can be acquired through the specified file path and the specified keywords.
Further, when a plurality of device data acquisition instructions including file paths and keywords are received, the instructions received at a later time can be stored in the instruction cache region according to the time sequence of the received instructions, and further the instructions can be executed in a circulating batch manner.
S20: and acquiring a target document from a nuclear power station equipment data cache region according to the file path and the keyword.
The target document is a document containing keywords in a file path of a nuclear power plant equipment data cache region; optionally, the target documents in this embodiment are all stored in the nuclear power plant equipment data cache region in the form of word documents, and there may be only one target document that includes the keyword under the file path, or there may be multiple target documents. Further, the nuclear power plant equipment data cache area indicated in this embodiment is merely an example, and the target document may be stored in another personal computer or the like. And the nuclear power plant equipment data cache region is used for storing unstructured documents of the structured data which are not sorted temporarily.
In one embodiment, as shown in fig. 3, step S20 includes:
s201: and acquiring all documents under the file path from the nuclear power station equipment data cache region.
It can be understood that a file path corresponds to the storage area of each document, and after receiving an equipment data acquisition instruction including the file path and a keyword, all documents in the file path are acquired from the equipment data cache region of the nuclear power plant by tracing back to the file path.
S202: and detecting whether all the documents contain the keywords or not.
S203: recording the document containing the keywords as the target document.
Optionally, after all documents in the file path are acquired from the data cache area of the nuclear power plant equipment, a TextRank, LDA, or TPR algorithm model may be adopted to perform keyword detection on all documents, so as to record a document containing a keyword as a target document.
Further, after detecting whether all documents contain the keywords, if all documents do not contain the keywords, sending an equipment data acquisition failure instruction to a preset receiver so as to enable the preset receiver to detect whether the file path and/or the keywords have errors. The preset receiving party may be a related person of the nuclear power plant or an object for sending the equipment data acquisition instruction.
S30: and acquiring a key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table or not.
It can be understood that, although the target document is a document containing a keyword, the content in all the pages in the target document does not necessarily contain the keyword, and therefore, it is only necessary to detect whether data to be acquired exists in the keyword pages containing the keyword, which saves system running time, reduces the burden of the computer system, and increases the running speed of the computer system. The preset table may be any form of table in which device data is stored.
Generally, specific data of the nuclear power plant is generally stored in a table in a target document, but the data in the table is all unclassified unstructured data, and after the target document is acquired from a nuclear power plant equipment data cache region according to the file path and the keyword, a key page number containing the keyword is acquired from the target document, and whether a preset table is included in the key page number is detected, so as to acquire equipment data in the preset table.
S40: when the key page number contains a preset table, acquiring first document data in the preset table; the first document data is unstructured data.
It can be understood that the first document data is stored in the preset table of the key page of the target document in an irregular or incomplete data format, which is inconvenient for representing the data by using the two-dimensional logic table of the database, and therefore, the first document data cannot be directly stored into the database, thereby representing that the first document data is unstructured data.
Specifically, after acquiring a key page number containing the keyword from the target document and detecting whether the key page number contains a preset table, generally, a first line in the preset table is a header line, and the content of the header line has no meaning on data storage, so that when the key page number contains the preset table, the table content with the first line as a line in the preset table is automatically rejected; after the table content of the first row is proposed, all column data from the second row in the preset table are automatically obtained through the loop traversal method, and after the loop traversal to the last row, the process automatically jumps to step S30 to detect whether the preset table is included in the next key page number.
S50: and converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
The data stored in the nuclear power plant document database are all structured data, so before the first document data of the non-structured data is stored in the nuclear power plant document database, the first document data needs to be converted into the structured data, and then the first document data is stored in the nuclear power plant document database.
In an embodiment, before step S50, that is, before converting the first document data into the structured data and storing the structured data in the nuclear power plant document database, the method further includes:
(1) and detecting whether the document data contains preset horizontal line characters.
The preset horizontal line characters can be horizontal lines for connecting the document data or horizontal lines of the document data, and the horizontal line characters can not influence the specific meaning of the document data, so that the horizontal line characters can be removed.
(2) And when the document data contains the preset horizontal line characters, eliminating the preset horizontal line characters, and detecting the length of the identification bit of each document data after the preset horizontal line characters are eliminated.
Specifically, after detecting whether the document data contains preset horizontal line characters or not, when the document data contains the preset horizontal line characters, the preset horizontal line characters are removed, and the identification bit length of each document data after the preset horizontal line characters are removed is detected. The identification bit length refers to the number of identification bit digits of each document data after horizontal line characters are removed.
(3) And when the identification bit length of the document data is equal to a first preset identification bit length, recording the document data as the power station document data.
Preferably, the first preset identification bit length is two bits. It can be understood that when the identification bit length of the document data is equal to the first preset identification bit length, the document data is represented as the position information of the nuclear power plant equipment, and then the document data is recorded as the power plant document data.
(4) And when the identification bit length of the document data is greater than or equal to a second preset identification bit length, recording the document data as device document data.
Preferably, the second preset identification bit length is nine bits. It can be understood that when the identification bit length of the document data is greater than or equal to the second preset identification bit length, the document data is represented as specific device information of the nuclear power plant device, and then the document data is recorded as device document data.
(5) And when the identification bit length of the document data is smaller than the first preset identification bit length or is larger than the first preset identification bit length and smaller than the second preset identification bit length, recording the document data as data to be verified, and sending the data to be verified to a preset receiver.
It can be understood that when the identification bit length of the document data is less than a first preset identification bit length, or is greater than the first preset identification bit length and less than a second preset identification bit length, there is no way to directly determine the document data as the power station document data or the device document data, the document data needs to be recorded as the data to be verified, and the data to be verified needs to be sent to a preset receiver to instruct the preset receiver to manually verify the data to be verified, and after a specific classification (such as the power station document data or the device document data) of the data to be verified is determined, the data to be verified and the classification corresponding to the data to be verified can be fed back to the server, so as to perform classified storage on the data. Illustratively, when the target document records data, the document data which belongs to the power station document data is recorded with one less identification bit due to omission during recording, so that the identification bit length of the document data is smaller than the first preset identification bit length.
In one embodiment, as shown in fig. 4, in step S50, associating a target document with a first document data, that is, converting the first document data into structured data, and storing the first document data in a nuclear power plant document database includes:
s501: and determining a power station document label corresponding to the power station document data according to the key page number corresponding to the power station document data and the target document corresponding to the key page number.
S502: and the power station document data and the power station document label are stored in a power station field in the nuclear power station document database in an associated mode, so that the power station document data are converted into structured data.
The power station field refers to a type of data in a nuclear power station database, and only power station document data are stored under the power station field.
It can be understood that, when the identification bit length of the document data is equal to a first preset identification bit length, after the document data is recorded as the power station document data, in order to convert the unstructured data of the power station document data into structured data, that is, data that can be expressed by two-dimensional logic of a database, the power station document tag corresponding to the power station document data is determined according to a key page corresponding to the power station document data and a target document corresponding to the key page; and then, each piece of power station document data has an associated power station document tag, so that after the power station document data is stored in a nuclear power station document database, when the power station document data needs to be inquired, other data information associated with the power station document data can be obtained by analyzing the associated power station document tag, and the power station document data is represented to be converted from unstructured data into structured data.
S503: and determining the device document tag corresponding to the device document data according to the key page number corresponding to the device document data and the target document corresponding to the key page number.
S504: and storing the equipment document data and the equipment document tag into an equipment field in the nuclear power plant document database in an associated manner so as to convert the equipment document data into structured data.
It can be understood that, when the identification bit length of the document data is greater than a first preset identification bit length and less than a second preset identification bit length, after the document data is recorded as device document data, in order to convert the device document data from unstructured data to structured data, that is, to enable the device document data to be represented by a database two-dimensional logic, a device document tag corresponding to the device document data is determined according to a key page corresponding to the device document data and a target document corresponding to the key page; and each piece of equipment document data has an associated equipment document tag, so that when the equipment document data is stored in a nuclear power plant document database and needs to be inquired, other data information associated with the equipment document tag can be acquired by analyzing the associated equipment document tag, and the equipment document data is represented to be converted from unstructured data into structured data.
In this embodiment, after receiving an apparatus data acquisition instruction, document data associated with a file path and a keyword is automatically detected and acquired, and the original unstructured document data is converted into structured document data and then stored in a nuclear power plant document database, so that when subsequently required to search for utilized data or documents associated with the data, the original path of the data can be quickly found, manpower is saved, document classification and storage efficiency is improved, and omission or errors possibly caused by manual extraction and sorting can be reduced.
In an embodiment, after the step S30, that is, after detecting whether the key page number includes the preset table, the method further includes:
s60: and when the key page number does not contain a preset table, detecting whether a next page number adjacent to the key page number contains the preset table.
S70: and when the next page number adjacent to the key page number contains the preset table, acquiring second document data in the preset table contained in the next page number, and storing the second document data in a nuclear power station document database.
It can be understood that, since the display content of one page number in the document is limited, the keyword may be contained in the current key page number but appears only at the end of the current key page, and the corresponding preset table cannot be displayed in the current key page number, so that, when the preset table is not contained in the key page number, it is detected whether the preset table is contained in the next page number adjacent to the key page number.
Further, when the next page number adjacent to the key page number contains the preset table, second document data in the preset table contained in the next page number is acquired, and the second document data is stored in a nuclear power plant document database.
In an embodiment, after the step S60, that is, after detecting whether the next page number adjacent to the key page number includes the preset table, the method further includes:
s80: and when the next page number adjacent to the key page number does not contain the preset table, prompting that the key page number does not contain the preset table, and detecting whether the next key page number contains the preset table or not.
S90: and when the next key page number contains a preset table, acquiring third document data in the preset table, and storing the third document data in a nuclear power station document database.
It can be understood that, when the key page number does not include the preset table, after detecting whether the next page number adjacent to the key page number includes the preset table, if the next page number adjacent to the key page number does not include the preset table, it represents that there is no data to be acquired in the key page number and the adjacent next page number, and then skips over the key page number, detects whether the next key page number includes the preset table, when the next key page number includes the preset table, acquires the third document data in the preset table, stores the third document data in the nuclear power station document database, and continues to detect whether the next key page number includes the preset table, until all the key page numbers are detected, the detection is stopped.
In an embodiment, after step S50, that is, after converting the first document data into structured data and storing the structured data in the nuclear power plant document database, the method further includes:
(1) after receiving a device data acquisition instruction containing a target field, analyzing the target field to obtain a target identification vector corresponding to the target field.
The equipment data acquisition instruction can be an instruction sent by a related worker of the nuclear power station, and can also be an instruction generated by triggering after the related worker enters a target field in a server. The target field is a field for which document data related thereto is to be acquired.
It can be understood that when a person related to the nuclear power plant needs to acquire a document or associated data related to a certain data, the person may send an equipment data acquisition instruction including a target field to the server, and after receiving the equipment data acquisition instruction including the target field, the server analyzes the target field to obtain a target identification vector corresponding to the target field, and then queries the document related to the target field in the nuclear power plant document database according to the target identification vector.
(2) And detecting the identification bit length of the target identification vector.
(3) And when the identification bit length is equal to the first preset identification bit length, acquiring power station document data matched with the target identification vector and a power station document label associated with the power station document data from a power station field of the nuclear power station document database.
Wherein the identification bit length refers to the number of characters of the target identification vector.
It is to be understood that, in the above embodiment, it has been indicated that, when the identification bit length of the document data is equal to the first preset identification bit length, the document data is recorded as the plant document data, and the plant document data and the plant document tag corresponding to the plant document data are stored in the plant field in the nuclear power plant document database in an associated manner, and then, when the identification bit length is equal to the first preset identification bit length, the plant document data matching the target identification vector can be obtained from the plant field in the nuclear power plant document database.
Further, in the above embodiment, it is further indicated that, when the identification bit length of the document data is greater than or equal to a second preset identification bit length, the document data is recorded as device document data, and the device document data and the device document tag are stored in a device field in the nuclear power plant document database in an associated manner, and then, when the identification bit length of the target identification vector is greater than or equal to the second preset identification bit length, the device document data matching the target identification vector is acquired from the device field in the nuclear power plant document database.
And sending the power station document data and the power station document label associated with the power station document data to a preset receiving party.
Specifically, after power station document data matched with the target identification vector and a power station document tag associated with the power station document data are acquired from a power station field of the nuclear power station document database, the power station document data and the power station document tag associated with the power station document data are sent to a preset receiving party, so that after the preset receiving party analyzes the power station document tag, a target document and a key page number associated with the power station document data in the power station document tag are acquired, the key page number in the target document can be inquired from a nuclear power station equipment data cache region, data related to the target field (namely, the power station document data) are acquired, and convenience in data acquisition is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, a nuclear power plant document data acquisition apparatus is provided, which corresponds one-to-one to the nuclear power plant document data acquisition method in the above-described embodiment. As shown in fig. 5, the nuclear power plant document data acquiring apparatus includes a data acquisition instruction receiving module 10, a target document acquiring module 20, a key page number acquiring module 30, a first document data acquiring module 40, and a data storage module 50. The functional modules are explained in detail as follows:
a data acquisition instruction receiving module 10, configured to receive an apparatus data acquisition instruction including a file path and a keyword;
a target document obtaining module 20, configured to obtain a target document from a data cache of the nuclear power plant device according to the file path and the keyword;
a key page number obtaining module 30, configured to obtain a key page number including the keyword from the target document, and detect whether the key page number includes a preset table;
a first document data obtaining module 40, configured to obtain first document data in a preset table when the key page number includes the preset table; the first document data is unstructured data;
and the data storage module 50 is used for converting the first document data into structured data and storing the structured data in a nuclear power plant document database.
Preferably, as shown in fig. 6, the target document acquiring module 20 includes the following units:
a document obtaining unit 201, configured to obtain all documents in the file path from the data cache of the nuclear power plant device;
a keyword detection unit 202, configured to detect whether the all documents include the keyword;
a target document recording unit 203, configured to record the document including the keyword as the target document.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the first preset table detection module is used for detecting whether a next page number adjacent to the key page number contains a preset table or not when the key page number does not contain the preset table;
and the second document data acquisition module is used for acquiring second document data in the preset table contained in the next page number when the preset table is contained in the next page number adjacent to the key page number, and storing the second document data in a nuclear power station document database.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the second preset table detection module is used for prompting that the key page number does not contain the preset table and detecting whether the next key page number contains the preset table or not when the next page number adjacent to the key page number does not contain the preset table;
and the third document data acquisition module is used for acquiring third document data in a preset table when the next key page number comprises the preset table, and storing the third document data in a nuclear power station document database.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the horizontal line character detection module is used for detecting whether the document data contains preset horizontal line characters;
a horizontal line character removing module, configured to remove the preset horizontal line character when the document data includes the preset horizontal line character, and detect an identification bit length of each document data after the preset horizontal line character is removed;
the power station document data recording module is used for recording the document data into power station document data when the identification bit length of the document data is equal to a first preset identification bit length;
the equipment document data recording module is used for recording the document data into equipment document data when the length of the identification bit of the document data is greater than or equal to a second preset identification bit length;
and the data sending module is used for recording the document data as the data to be verified and sending the data to be verified to a preset receiver when the identification bit length of the document data is smaller than the first preset identification bit length or is larger than the first preset identification bit length and smaller than the second preset identification bit length.
Preferably, as shown in fig. 7, the data storage module 50 includes:
a power station document tag recording unit 501, configured to determine a power station document tag corresponding to the power station document data according to a key page number corresponding to the power station document data and a target document corresponding to the key page number;
a power station document data storage unit 502, configured to store the power station document data and the power station document tag in a power station field in the nuclear power station document database in an associated manner, so that the power station document data is converted into structured data;
a device document tag recording unit 503 configured to determine a device document tag corresponding to the device document data, based on a key page number corresponding to the device document data and a target document corresponding to the key page number;
a device document data storage unit 504, configured to store the device document data and the device document tag in association with a device field in the nuclear power plant document database, so that the device document data is converted into structured data.
Preferably, the nuclear power plant document data acquiring apparatus further includes:
the target field analyzing module is used for analyzing the target field after receiving an equipment data acquiring instruction containing the target field to obtain a target identification vector corresponding to the target field;
the identification bit length detection module is used for detecting the identification bit length of the target identification vector;
the power station document data acquisition module is used for acquiring power station document data matched with the target identification vector and a power station document label associated with the power station document data from a power station field of the nuclear power station document database when the identification bit length is equal to the first preset identification bit length;
and the data sending module is used for sending the power station document data and the power station document label related to the power station document data to a preset receiving party.
For specific limitations of the nuclear plant document data acquisition device, reference may be made to the above limitations of the nuclear plant document data acquisition method, and details are not described here. All or part of each module in the nuclear power plant document data acquisition device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data used in the nuclear power plant document data acquisition method in the embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a nuclear power plant document data acquisition method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and when the processor executes the computer program, the nuclear power plant document data acquisition method in the above embodiments is implemented.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the nuclear power plant document data acquisition method in the above-described embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (12)

1.一种核电站文档数据获取方法,其特征在于,包括:1. a nuclear power plant document data acquisition method, is characterized in that, comprises: 接收包含文件路径以及关键词的设备数据获取指令;Receive device data acquisition instructions including file paths and keywords; 根据所述文件路径以及所述关键词,自核电站设备数据缓存区中获取目标文档;According to the file path and the keyword, obtain the target document from the nuclear power plant equipment data buffer area; 自所述目标文档中获取包含所述关键词的关键页码,并检测所述关键页码中是否包含预设表格;Obtaining the key page number containing the keyword from the target document, and detecting whether the key page number contains a preset table; 在所述关键页码中包含预设表格时,获取所述预设表格中的第一文档数据;所述第一文档数据为非结构化数据;When a preset table is included in the key page number, obtain first document data in the preset table; the first document data is unstructured data; 将所述第一文档数据转换成结构化数据存储至核电站文档数据库中。The first document data is converted into structured data and stored in the nuclear power plant document database. 2.如权利要求1所述的核电站文档数据获取方法,其特征在于,所述根据所述文件路径以及所述关键词,自核电站设备数据缓存区中获取目标文档,包括:2 . The method for obtaining document data of a nuclear power plant according to claim 1 , wherein, according to the file path and the keyword, obtaining the target document from the data buffer area of the nuclear power plant equipment comprises: 2 . 自所述核电站设备数据缓存区中,获取在所述文件路径下的所有文档;Obtain all documents under the file path from the nuclear power plant equipment data buffer area; 检测所述所有文档中是否包含所述关键词;detecting whether the keyword is contained in all the documents; 将包含所述关键词的所述文档记录为所述目标文档。The document containing the keyword is recorded as the target document. 3.如权利要求1所述的核电站文档数据获取方法,其特征在于,所述检测所述关键页码中是否包含预设表格之后,包括:3 . The method for obtaining document data of a nuclear power plant according to claim 1 , wherein after detecting whether the key page number contains a preset table, the method comprises: 3 . 在所述关键页码中不包含预设表格时,检测与所述关键页码相邻的下一页码中是否包含预设表格;When the key page number does not contain a preset table, detecting whether the next page number adjacent to the key page number contains a preset table; 在与所述关键页码相邻的下一页码中包含所述预设表格时,获取下一页码中包含的所述预设表格中的第二文档数据,并将所述第二文档数据存储至核电站文档数据库中。When the preset table is included in the next page number adjacent to the key page number, the second document data in the preset table included in the next page number is acquired, and the second document data is stored in Nuclear Power Plant Documentation Database. 4.如权利要求3所述的核电站文档数据获取方法,其特征在于,所述检测与所述关键页码相邻的下一页码中是否包含预设表格之后,还包括:4 . The method for obtaining document data of a nuclear power plant according to claim 3 , wherein after detecting whether the next page number adjacent to the key page number contains a preset table, the method further comprises: 5 . 在与所述关键页码相邻的下一页码中不包含预设表格时,则提示该关键页码不包含预设表格,并检测下一个关键页码中是否包含预设表格;When the next page number adjacent to the key page number does not contain a preset table, prompting that the key page number does not contain a preset table, and detecting whether the next key page number contains a preset table; 在所述下一个关键页码中包含预设表格时,获取所述预设表格中的第三文档数据,并将所述第三文档数据存储至核电站文档数据库中。When a preset table is included in the next key page number, the third document data in the preset table is acquired, and the third document data is stored in the nuclear power plant document database. 5.如权利要求1所述的核电站文档数据获取方法,其特征在于,所述将所述第一文档数据转换成结构化数据存储至核电站文档数据库中之前,包括:5 . The method for obtaining document data of a nuclear power plant according to claim 1 , wherein, before converting the first document data into structured data and storing it in the nuclear power plant document database, the method comprises: 5 . 检测所述文档数据中是否包含预设横线字符;Detecting whether the document data contains a preset horizontal line character; 在所述文档数据中包含所述预设横线字符时,剔除所述预设横线字符,并检测剔除所述预设横线字符后各文档数据的标识位长度;When the document data contains the preset horizontal line characters, remove the preset horizontal line characters, and detect the length of the identification bit of each document data after removing the preset horizontal line characters; 在所述文档数据的标识位长度等于第一预设标识位长度时,将该文档数据记录为电站文档数据;When the identification bit length of the document data is equal to the first preset identification bit length, recording the document data as power station document data; 在所述文档数据的标识位长度大于或等于第二预设标识位长度时,将该文档数据记录为设备文档数据;When the length of the identification bit of the document data is greater than or equal to the length of the second preset identification bit, record the document data as device document data; 在所述文档数据的标识位长度小于所述第一预设标识位长度,或者大于所述第一预设标识位长度且小于所述第二预设标识位长度时,将该文档数据记录为待校验数据,并将所述待校验数据发送至预设接收方。When the length of the identification bit of the document data is less than the length of the first preset identification bit, or greater than the length of the first preset identification bit and smaller than the length of the second preset identification bit, the document data is recorded as data to be verified, and the data to be verified is sent to a preset recipient. 6.如权利要求5所述的核电站文档数据获取方法,其特征在于,一个所述目标文档关联一个文档标签;所述将所述第一文档数据转换成结构化数据存储至核电站文档数据库中,包括:6 . The method for obtaining document data of a nuclear power plant according to claim 5 , wherein one of the target documents is associated with one document tag; the described first document data is converted into structured data and stored in the nuclear power plant document database, 6 . include: 根据与所述电站文档数据对应的关键页码,以及与该关键页码对应的目标文档,确定与所述电站文档数据对应的电站文档标签;determining a power station document label corresponding to the power station document data according to the key page number corresponding to the power station document data and the target document corresponding to the key page number; 将各所述电站文档数据以及与其对应的电站文档标签关联存储至所述核电站文档数据库中的电站字段中,以令所述电站文档数据转换成结构化数据;storing each of the power plant document data and its corresponding power plant document tags in a power plant field in the nuclear power plant document database, so as to convert the power plant document data into structured data; 根据与所述设备文档数据对应的关键页码,以及与该关键页码对应的目标文档,确定与所述设备文档数据对应的设备文档标签;Determine the device document label corresponding to the device document data according to the key page number corresponding to the device document data and the target document corresponding to the key page number; 将所述设备文档数据以及所述设备文档标签关联存储至所述核电站文档数据库中的设备字段中,以令所述设备文档数据转换成结构化数据。The equipment document data and the equipment document tag are associated and stored in the equipment field in the nuclear power plant document database, so that the equipment document data is converted into structured data. 7.如权利要求6所述的核电站文档数据获取方法,其特征在于,所述将所述第一文档数据转换成结构化数据存储至核电站文档数据库中之后,还包括:7 . The method for acquiring document data of a nuclear power plant according to claim 6 , wherein after converting the first document data into structured data and storing it in the nuclear power plant document database, the method further comprises: 8 . 接收包含目标字段的设备数据获取指令之后,解析所述目标字段,得到与所述目标字段对应的目标标识向量;After receiving the device data acquisition instruction including the target field, parse the target field to obtain a target identification vector corresponding to the target field; 检测所述目标标识向量的标识位长度;Detecting the length of the identification bit of the target identification vector; 在所述标识位长度等于所述第一预设标识位长度时,自所述核电站文档数据库的电站字段中,获取与所述目标标识向量匹配的电站文档数据以及与该电站文档数据关联的电站文档标签;When the length of the identification bit is equal to the length of the first preset identification bit, obtain the power station document data matching the target identification vector and the power station document data associated with the power station document data from the power station field of the nuclear power station document database document tags; 将所述电站文档数据以及与其关联的电站文档标签发送至预设接收方。The plant documentation data and its associated plant documentation tags are sent to a predetermined recipient. 8.一种核电站文档数据获取装置,其特征在于,包括:8. A device for acquiring document data in a nuclear power plant, comprising: 数据获取指令接收模块,用于接收包含文件路径以及关键词的设备数据获取指令;The data acquisition instruction receiving module is used to receive the device data acquisition instruction including the file path and the keyword; 目标文档获取模块,用于根据所述文件路径以及所述关键词,自核电站设备数据缓存区中获取目标文档;a target document obtaining module, configured to obtain a target document from the nuclear power plant equipment data buffer area according to the file path and the keyword; 关键页码获取模块,用于自所述目标文档中获取包含所述关键词的关键页码,并检测所述关键页码中是否包含预设表格;a key page number obtaining module, configured to obtain a key page number containing the keyword from the target document, and detect whether the key page number contains a preset table; 第一文档数据获取模块,用于在所述关键页码中包含预设表格时,获取所述预设表格中的第一文档数据;所述第一文档数据为非结构化数据;a first document data acquisition module, configured to acquire first document data in the preset form when the key page number includes a preset form; the first document data is unstructured data; 数据存储模块,用于将所述第一文档数据转换成结构化数据存储至核电站文档数据库中。A data storage module, configured to convert the first document data into structured data and store in the nuclear power plant document database. 9.如权利要求8所述的核电站文档数据获取装置,其特征在于,所述目标文档获取模块包括:9 . The device for obtaining document data of a nuclear power plant according to claim 8 , wherein the target document obtaining module comprises: 10 . 文档获取单元,用于自所述核电站设备数据缓存区中,获取在所述文件路径下的所有文档;a document acquisition unit, configured to acquire all documents under the file path from the nuclear power plant equipment data buffer area; 关键词检测单元,用于检测所述所有文档中是否包含所述关键词;a keyword detection unit, configured to detect whether the keyword is contained in all the documents; 目标文档记录单元,用于将包含所述关键词的所述文档记录为所述目标文档。A target document recording unit, configured to record the document containing the keyword as the target document. 10.如权利要求8所述的核电站文档数据获取装置,其特征在于,所述核电站文档数据获取装置,还包括:10. The device for acquiring document data in a nuclear power plant according to claim 8, wherein the device for acquiring document data in a nuclear power plant further comprises: 第一表格检测模块,用于在所述关键页码中不包含预设表格时,检测与所述关键页码相邻的下一页码中是否包含预设表格;a first form detection module, configured to detect whether the next page adjacent to the key page number contains a preset form when the key page number does not contain a preset form; 第二文档数据获取模块,用于在与所述关键页码相邻的下一页码中包含所述预设表格时,获取下一页码中包含的所述预设表格中的第二文档数据,并将所述第二文档数据存储至核电站文档数据库中。A second document data acquisition module, configured to acquire the second document data in the preset table included in the next page number when the preset table is included in the next page number adjacent to the key page number, and The second document data is stored in the nuclear power plant document database. 11.一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述核电站文档数据获取方法。11. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the computer program as claimed in the claims The method for acquiring document data of a nuclear power plant according to any one of 1 to 7. 12.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述核电站文档数据获取方法。12. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the nuclear power plant document data according to any one of claims 1 to 7 is realized get method.
CN202011308653.8A 2020-11-20 2020-11-20 Nuclear power station document data acquisition method and device, computer equipment and storage medium Pending CN112463791A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011308653.8A CN112463791A (en) 2020-11-20 2020-11-20 Nuclear power station document data acquisition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011308653.8A CN112463791A (en) 2020-11-20 2020-11-20 Nuclear power station document data acquisition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112463791A true CN112463791A (en) 2021-03-09

Family

ID=74837121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011308653.8A Pending CN112463791A (en) 2020-11-20 2020-11-20 Nuclear power station document data acquisition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112463791A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377950A (en) * 2021-06-02 2021-09-10 浪潮软件股份有限公司 Method for realizing flat storage and real-time preview of unstructured document

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282760A1 (en) * 2005-06-14 2006-12-14 Canon Kabushiki Kaisha Apparatus, method and system for document conversion, apparatuses for document processing and information processing, and storage media that store programs for realizing the apparatuses
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
CN109446345A (en) * 2018-09-26 2019-03-08 深圳中广核工程设计有限公司 Nuclear power file verification processing method and system
CN110377558A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Document searching method, device, computer equipment and storage medium
CN110688349A (en) * 2019-08-29 2020-01-14 重庆小雨点小额贷款有限公司 Document sorting method, device, terminal and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060282760A1 (en) * 2005-06-14 2006-12-14 Canon Kabushiki Kaisha Apparatus, method and system for document conversion, apparatuses for document processing and information processing, and storage media that store programs for realizing the apparatuses
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
CN109446345A (en) * 2018-09-26 2019-03-08 深圳中广核工程设计有限公司 Nuclear power file verification processing method and system
CN110377558A (en) * 2019-06-14 2019-10-25 平安科技(深圳)有限公司 Document searching method, device, computer equipment and storage medium
CN110688349A (en) * 2019-08-29 2020-01-14 重庆小雨点小额贷款有限公司 Document sorting method, device, terminal and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377950A (en) * 2021-06-02 2021-09-10 浪潮软件股份有限公司 Method for realizing flat storage and real-time preview of unstructured document

Similar Documents

Publication Publication Date Title
CN108932294B (en) Resume data processing method, device, equipment and storage medium based on index
CN111506498B (en) Automatic generation method and device of test case, computer equipment and storage medium
CN110209652B (en) Data table migration method, device, computer equipment and storage medium
WO2020186786A1 (en) File processing method and apparatus, computer device and storage medium
CN111666401B (en) Document recommendation method, device, computer equipment and medium based on graph structure
CN113220782B (en) Method, device, equipment and medium for generating multi-element test data source
CN112181489B (en) Code migration method, device, computer equipment and storage medium
CN109766534B (en) Report generation method and device, computer equipment and readable storage medium
CN109508352B (en) Report data output method, device, equipment and storage medium
CN111176996A (en) Test case generation method and device, computer equipment and storage medium
CN108536745B (en) Shell-based data table extraction method, terminal, equipment and storage medium
CN110866491A (en) Target retrieval method, device, computer readable storage medium and computer equipment
CN103455475B (en) Composition method, equipment and system
CN112507729B (en) Method, device, computer equipment and storage medium for translating text in a page
CN111191079B (en) Document content acquisition method, device, equipment and storage medium
CN109766072B (en) Information verification input method and device, computer equipment and storage medium
CN110908778B (en) Task deployment method, system and storage medium
CN110955608B (en) Test data processing method, device, computer equipment and storage medium
CN109657675B (en) Image annotation method and device, computer equipment and readable storage medium
US10664340B2 (en) Failure analysis program, failure analysis device, and failure analysis method
CN112528832A (en) Method and system for processing PDF-format relay protection fixed value list
CN112286934A (en) Database table import method, device, equipment and medium
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN113761858A (en) Form data processing method and device, computer equipment and storage medium
CN112463791A (en) Nuclear power station document data acquisition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination