CN113158655A - Document information processing method and device, computer equipment and readable storage medium - Google Patents

Document information processing method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113158655A
CN113158655A CN202010076288.6A CN202010076288A CN113158655A CN 113158655 A CN113158655 A CN 113158655A CN 202010076288 A CN202010076288 A CN 202010076288A CN 113158655 A CN113158655 A CN 113158655A
Authority
CN
China
Prior art keywords
information
identification
picture
document
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010076288.6A
Other languages
Chinese (zh)
Inventor
尤勇敏
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiuling Shanghai Intelligent Technology Co ltd
Original Assignee
Jiuling Shanghai Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiuling Shanghai Intelligent Technology Co ltd filed Critical Jiuling Shanghai Intelligent Technology Co ltd
Priority to CN202010076288.6A priority Critical patent/CN113158655A/en
Publication of CN113158655A publication Critical patent/CN113158655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application provides a document information processing method, a document information processing device, a computer device and a readable storage medium, wherein the method comprises the following steps: according to the keyword information, carrying out identification processing on the initial building document to obtain identification information, and arranging the identification information according to a preset arranging mode to obtain a target building document; the method can directly convert the uneditable files in the building field into the editable target building documents through the intelligent identification technology, so that the accuracy of the obtained editable files is higher, meanwhile, the manual error checking and correction are avoided, the file conversion efficiency is improved, and the human resources are saved.

Description

Document information processing method and device, computer equipment and readable storage medium
Technical Field
The present application relates to the field of building file identification, and in particular, to a method and an apparatus for processing document information, a computer device, and a readable storage medium.
Background
PDF is an electronic file format. The PDF file has good format stability and content readability, and is widely applied to the fields of file transmission, webpage publishing and the like. However, the characters and pictures in the PDF file cannot be directly pasted and copied.
In the conventional technology, a PDF file is converted into an editable file by third-party software. However, the error rate of the converted file is high in the traditional mode, so that the accuracy of the converted file is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a document information processing method, an apparatus, a computer device, and a readable storage medium capable of improving the accuracy of a converted file.
The embodiment of the application provides a document information processing method, which comprises the following steps:
according to the keyword information, carrying out identification processing on the initial building document to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
arranging the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
In one embodiment, the initial building document is identified according to the keyword information to obtain identification information; wherein the initial building document is a non-editable file of building domain content, comprising:
according to the first keyword information, carrying out identification processing on the initial building document to obtain first identification information; wherein the initial building document is a non-editable file of building domain content;
according to second keyword information, carrying out identification processing on the initial building document to obtain second identification information; the second keyword information and the first keyword information are different keyword information in the initial building document.
In one embodiment, the identifying, according to the first keyword information, the initial building document to obtain first identification information includes:
according to the first keyword information, carrying out identification processing on the title in the initial building document to obtain initial identification information corresponding to the title containing the first keyword information;
and if the initial identification information is first character information, taking the first character information as the first identification result.
In one embodiment, the method further comprises:
if the initial identification information is a first picture, picture identification processing is carried out on the first picture to obtain first picture information, and the picture information is used as the first identification result.
In one embodiment, the performing picture identification processing on the first picture to obtain the first picture information, and using the picture information as the first identification result includes:
and according to the atlas information in the first picture, carrying out picture identification processing on the first picture to obtain the first picture information, and taking the picture information as the first identification result.
In one embodiment, the identifying the initial building document according to the second keyword information to obtain second identification information includes:
according to the second keyword information, performing identification processing on the title in the initial building document to obtain intermediate identification information corresponding to the title containing the second keyword information;
according to third keyword information, carrying out identification processing on the intermediate identification information to obtain target identification information containing the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different;
and if the target identification information is second character information, taking the second character information as the second identification result.
In one embodiment, the method further comprises:
if the target identification information is a second picture, carrying out picture identification processing on the second picture to obtain second picture information, and taking the second picture information as a second identification result; wherein the second picture is different from the first picture.
In one embodiment, the arranging the first identification information and the second identification information according to a preset arranging manner to obtain the target building document includes:
and classifying, arranging and displaying the information in the first identification information and the second identification information according to the preset arrangement mode to obtain the target building document.
The embodiment of the application provides a document information processing method, which comprises the following steps:
according to the first keyword information, carrying out identification processing on the title in the initial building document to obtain initial identification information corresponding to the title containing the first keyword information; wherein the initial building document is a non-editable file of building domain content;
if the initial identification information is first character information, taking the first character information as the first identification result;
if the initial identification information is a first picture, picture identification processing is carried out on the first picture according to the atlas information in the first picture to obtain first picture information, and the picture information is used as the first identification result;
according to the second keyword information, performing identification processing on the title in the initial building document to obtain intermediate identification information corresponding to the title containing the second keyword information; the second keyword information and the first keyword information are different keyword information in the initial building document;
according to third keyword information, carrying out identification processing on the intermediate identification information to obtain target identification information containing the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different;
if the target identification information is second character information, taking the second character information as a second identification result;
if the target identification information is a second picture, carrying out picture identification processing on the second picture to obtain second picture information, and taking the second picture information as a second identification result; wherein the second picture is different from the first picture;
classifying, arranging and displaying the information in the first identification information and the second identification information according to the preset arrangement mode to obtain the target building document; and the target building file is an editable file acquired from the initial building document.
An embodiment of the present application provides a document information processing apparatus, including:
the identification module is used for identifying the initial building document according to the keyword information to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
the arrangement module is used for arranging the identification information according to a preset arrangement mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, and the processor executes the computer program to realize the following steps:
according to the keyword information, carrying out identification processing on the initial building document to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
arranging the first identification information and the second identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
An embodiment of the application provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the following steps:
according to the keyword information, carrying out identification processing on the initial building document to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
arranging the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
According to the document information processing method, the document information processing device, the computer device and the readable storage medium, the computer device can identify the initial building document according to the keyword information to obtain the identification information; the initial building document is an uneditable file of the building field content, and arrangement processing is carried out on the identification information according to a preset arrangement mode to obtain a target building document; the method can directly convert the uneditable files in the building field into the editable target building documents through the intelligent identification technology, so that the accuracy of the obtained editable files is higher, meanwhile, the manual error checking and correction are avoided, the file conversion efficiency is improved, and the human resources are saved.
Drawings
FIG. 1 is a flowchart illustrating a document information processing method according to an embodiment;
FIG. 2 is a flowchart illustrating a document information processing method according to an embodiment;
FIG. 3 is a schematic structural diagram of a document information processing apparatus according to an embodiment;
FIG. 4 is an internal block diagram of a computer device, provided in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The document information processing method provided by the embodiment can be applied to computer equipment. The computer device may be an electronic device with a drawing application installed, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, or a personal digital assistant, and the specific form of the computer device is not limited in this embodiment.
It should be noted that, in the document information processing method provided in the embodiment of the present application, the execution main body may be a document information processing apparatus, and the apparatus may be implemented as part or all of a computer device by software, hardware, or a combination of software and hardware. The following embodiments of the method are described with reference to a computer device as an example, so that the editable file can be directly exported by performing recognition processing on the uneditable file. Wherein the computer device may implement the processes of the method embodiments described below by way of a document conversion application.
FIG. 1 is a flowchart illustrating a document information processing method according to an embodiment. The embodiment relates to a process for identifying and processing a non-editable file to directly obtain the editable file. As shown in fig. 1, the method includes:
and step S101, according to the keyword information, carrying out identification processing on the initial building document to obtain identification information.
Specifically, the keyword information may include the keyword information under the first-level title in the non-editable initial building document, and the keyword information under the second-level title in the non-editable initial building document, which may be the same or different. The computer equipment can receive one or more non-editable initial building documents, and carries out identification processing on the received initial building documents according to the keyword information to obtain identification information. In this embodiment, the non-editable file may be a PDF file; the keyword may be information with a keyword such as "application range" or "scope".
Optionally, the process of performing identification processing on the initial building document according to the keyword information in step S101 to obtain identification information may include the following steps:
step S1011, according to the first keyword information, carrying out identification processing on the initial building document to obtain first identification information; wherein the initial building document is a non-editable file of the building domain content.
Specifically, the computer device may receive one or more non-editable initial building documents, and perform identification processing on the received initial building documents according to the first keyword information to obtain the first identification information. In this embodiment, the non-editable file may be a PDF file; the first keyword may be information with a keyword such as "application range" or "range"; but also key word information contained in a primary title in an initial building document which is not editable.
Step S1012, according to the second keyword information, carrying out identification processing on the initial building document to obtain second identification information; the second keyword information and the first keyword information are different keyword information in the initial building document.
Specifically, the computer device may receive one or more non-editable initial building documents, and perform identification processing on the received initial building documents according to the second keyword information to obtain second identification information. In this embodiment, the second keyword information may be the keyword information with "rule of thumb", or the keyword information included in the secondary title in the non-editable initial building document, and may be the same as the first keyword information.
In this embodiment, the execution order of step S1011 and step S1012 may be exchanged, and this embodiment is not limited at all.
Step S102, arranging the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
Specifically, the preset arrangement mode may be an arrangement mode of the target building document preset according to the actual requirement before the execution of the embodiment, and the specific arrangement mode may be set arbitrarily and has no fixity. In this embodiment, the target building document may be a word file that is editable by extracting key content required by the user from the initial building document.
In the document information processing method provided by this embodiment, the computer device may perform identification processing on the initial building document according to the keyword information to obtain identification information, and perform arrangement processing on the identification information according to a preset arrangement manner to obtain a target building document; the method can directly convert the uneditable files in the building field into the editable target building documents through the intelligent identification technology, so that the accuracy of the obtained editable files is higher, meanwhile, the manual error checking and correction are avoided, the file conversion efficiency is improved, and the human resources are saved.
As an embodiment, the process of performing identification processing on the initial building document according to the first keyword information in step S1011 to obtain the first identification information may be implemented by the following steps:
step S1111, according to the first keyword information, performing identification processing on the title in the initial building document to obtain initial identification information corresponding to the title containing the first keyword information.
Specifically, the computer device may identify titles with keyword information such as "applicability" or "scope" from all titles in all initial building documents based on the keyword information such as "applicability" or "scope", and then extract all contents under the titles containing the keyword information, i.e., the initial identification information. Optionally, the initial identification information may be text information and pictures in a non-editable file. In this embodiment, the table in the non-editable file may be recognized as a picture first.
Step S1211, if the initial identification information is first character information, taking the first character information as the first identification result.
Specifically, if the computer device determines that the initial identification information is the first character information, the first character information may be directly used as the first identification result. Optionally, the first text information may be character information.
Optionally, after the step S1211, the method may further include the steps of:
step S1311, if the initial identification information is a first picture, performing picture identification processing on the first picture to obtain first picture information, and taking the picture information as the first identification result.
It should be noted that, if the initial identification information identified by the computer device is the first picture, picture identification processing may be performed on the first picture to obtain the first picture information. Optionally, the first picture information may be character information in a picture. In addition, if the initial identification information includes text information and a picture, the computer device may perform picture identification processing on the picture to obtain picture information, and use the text information and the picture information as a first identification result.
Optionally, the step of performing picture identification processing on the first picture to obtain the first picture information, and taking the picture information as the first identification result may specifically include: and according to the atlas information in the first picture, carrying out picture identification processing on the first picture to obtain the first picture information, and taking the picture information as the first identification result.
Specifically, the computer device may perform image recognition processing on the first image by using an OCR character recognition algorithm according to an album number and an album name of a lower right corner (or other positions) in the first image, to obtain character information in the first image, that is, first image information. Alternatively, the first picture information may be all character information recognized from the first picture.
According to the document information processing method provided by the embodiment, the computer device can identify the initial building document according to the keyword information to obtain the identification information, and then arrange the identification information according to a preset arrangement mode to obtain the target building document; the method can directly convert the uneditable files in the building field into the editable target building documents through the intelligent identification technology, so that the accuracy of the obtained editable files is higher, meanwhile, the manual error checking and correction are avoided, the file conversion efficiency is improved, and the human resources are saved.
As an embodiment, the process of performing identification processing on the initial building document according to the second keyword information in step S1012 to obtain the second identification information may be implemented by the following steps:
step S1112 is to perform identification processing on the title in the initial building document according to the second keyword information to obtain intermediate identification information corresponding to the title including the second keyword information.
Specifically, the computer device may identify the title with the "rule of thumb" key information from all the titles in all the initial building documents based on the key information with the "rule of thumb", and then extract all the paragraph contents under the title containing the key information, i.e., the middle key identification information.
Step 1122, according to third keyword information, performing identification processing on the intermediate identification information to obtain target identification information including the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different.
Specifically, the computer device may identify all paragraphs with the "applicable" keyword information from the acquired intermediate identification information according to the information with the "applicable" keyword, and extract the contents of the paragraphs as the target identification information. In this embodiment, the first keyword information, the second keyword information, and the third keyword information may be different.
In addition, the first keyword information may not be limited to information with a "scope of application" or a "scope" keyword; the second keyword information may not be limited to information with a "rule of thumb" keyword; the above-mentioned third keyword information may not be limited to information with an "applicable" keyword; the three keyword information can be determined according to the actual requirements of the user.
Step S1132, if the target identification information is second text information, taking the second text information as the second identification result.
Specifically, if the computer device determines that the target identification information is the second character information, the second character information may be directly used as the second identification result.
Optionally, after step S1132, the method may further include: if the target identification information is a second picture, carrying out picture identification processing on the second picture to obtain second picture information, and taking the second picture information as a second identification result; wherein the second picture is different from the first picture.
It should be noted that, if the computer device determines that the target identification information is the second picture, the OCR character recognition algorithm may be adopted to perform picture recognition processing on the second picture to obtain the second picture information. Optionally, the second picture information may be all character information in the second picture.
Further, the process of performing layout processing on the identification information according to a preset layout manner in the step S102 to obtain the target building document may specifically be implemented by the following steps: and classifying, arranging and displaying the information in the first identification information and the second identification information according to the preset arrangement mode to obtain the target building document.
In this embodiment, the computer device may classify, arrange and display information in the first identification information and the second identification information according to a preset arrangement manner, so as to obtain an editable target building document.
According to the document information processing method provided by the embodiment, the uneditable document in the building field can be directly converted into the editable target building document through the intelligent identification technology, so that the accuracy of the obtained editable document is higher, meanwhile, the manual error checking and correction are avoided, the file conversion efficiency is improved, and the human resources are saved.
Fig. 2 is a schematic flowchart of a document information processing method according to another embodiment. As shown in fig. 2, the document information processing method may include:
step S201, according to first keyword information, carrying out identification processing on a title in an initial building document to obtain initial identification information corresponding to the title containing the first keyword information; wherein the initial building document is a non-editable file of building domain content;
step S202, if the initial identification information is first character information, taking the first character information as the first identification result;
step S203, if the initial identification information is a first picture, picture identification processing is carried out on the first picture according to the picture set information in the first picture to obtain first picture information, and the picture information is used as the first identification result;
step S204, according to the second keyword information, carrying out identification processing on the title in the initial building document to obtain intermediate identification information corresponding to the title containing the second keyword information; the second keyword information and the first keyword information are different keyword information in the initial building document;
step S205, according to third keyword information, performing identification processing on the intermediate identification information to obtain target identification information containing the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different;
step S206, if the target identification information is second character information, taking the second character information as the second identification result;
step S207, if the target identification information is a second picture, picture identification processing is carried out on the second picture to obtain second picture information, and the second picture information is used as a second identification result; wherein the second picture is different from the first picture;
step S208, classifying, arranging and displaying the information in the first identification information and the second identification information according to the preset arrangement mode to obtain the target building document; and the target building file is an editable file acquired from the initial building document.
According to the document information processing method provided by the embodiment, the uneditable document in the building field can be directly converted into the editable target building document through the intelligent identification technology, so that the accuracy of the obtained editable document is higher, meanwhile, the manual error checking and correction are avoided, the file conversion efficiency is improved, and the human resources are saved.
It should be understood that although the steps in the flowcharts of fig. 1-2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.
For specific limitations of the document information processing apparatus, reference may be made to the above limitations of the document information processing method, which are not described herein again. The respective modules in the document information processing apparatus of the computer device described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
FIG. 3 is a schematic structural diagram of a document information processing apparatus according to an embodiment. As shown in fig. 3, the apparatus may include: an identification module 11 and an orchestration module 12.
Specifically, the identification module 11 is configured to perform identification processing on an initial building document according to the keyword information to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
the arranging module 12 is configured to arrange the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the identification module 11 includes: a first recognition unit and a second recognition unit.
The first identification unit is used for identifying the initial building document according to the first keyword information to obtain first identification information; wherein the initial building document is a non-editable file of building domain content;
the second identification unit is used for identifying the initial building document according to second keyword information to obtain second identification information; the second keyword information and the first keyword information are different keyword information in the initial building document.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the first identification unit includes: a first identifying subunit and a first determining subunit.
The first identification subunit is configured to perform labeling processing on attribute information of a building component in the three-dimensional building information model to obtain the three-dimensional labeling model;
and the first determining subunit is configured to, when the initial identification information is first text information, take the first text information as the first identification result.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the first identification unit further includes: the first picture identifies a subunit.
Specifically, the first picture identifying subunit is configured to, if the initial identification information is a first picture, perform picture identification processing on the first picture to obtain the first picture information, and use the picture information as the first identification result.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the first picture identifying subunit is specifically configured to perform picture identification processing on the first picture according to the album information in the first picture to obtain the first picture information, and use the picture information as the first identification result.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the second identification unit includes: a second identifying subunit, a third identifying subunit, and a second determining subunit.
The second identifying subunit is configured to identify, according to the second keyword information, the title in the initial building document to obtain intermediate identification information corresponding to the title that includes the second keyword information;
the third identifying subunit is configured to perform identification processing on the intermediate identifying information according to third keyword information to obtain target identifying information including the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different;
and the second determining subunit is configured to, when the target identification information is second text information, take the second text information as the second identification result.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the second identification unit further includes: the second picture identifies a subunit.
The second picture identifying subunit is configured to, when the target identification information is a second picture, perform picture identification processing on the second picture to obtain second picture information, and use the second picture information as the second identification result; wherein the second picture is different from the first picture.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the arranging module 12 is specifically configured to classify, arrange and display the information in the identification information according to the preset arranging manner, so as to obtain the target building document.
The document information processing apparatus provided in this embodiment may execute the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external computer device through a network connection. The computer program is executed by a processor to implement a document information processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
according to the keyword information, carrying out identification processing on the initial building document to obtain identification information;
arranging the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
In one embodiment, a readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
according to the keyword information, carrying out identification processing on the initial building document to obtain identification information;
arranging the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A document information processing method, characterized by comprising:
according to the keyword information, carrying out identification processing on the initial building document to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
arranging the identification information according to a preset arranging mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
2. The method according to claim 1, wherein the initial building document is identified according to the keyword information to obtain identification information; wherein the initial building document is a non-editable file of building domain content, comprising:
according to the first keyword information, carrying out identification processing on the initial building document to obtain first identification information; wherein the initial building document is a non-editable file of building domain content;
according to second keyword information, carrying out identification processing on the initial building document to obtain second identification information; the second keyword information and the first keyword information are different keyword information in the initial building document.
3. The method according to claim 2, wherein the identifying the initial building document according to the first keyword information to obtain the first identification information comprises:
according to the first keyword information, carrying out identification processing on the title in the initial building document to obtain initial identification information corresponding to the title containing the first keyword information;
and if the initial identification information is first character information, taking the first character information as the first identification result.
4. The method of claim 3, further comprising:
if the initial identification information is a first picture, picture identification processing is carried out on the first picture to obtain first picture information, and the picture information is used as the first identification result.
5. The method according to claim 4, wherein the performing picture identification processing on the first picture to obtain the first picture information, and taking the picture information as the first identification result includes:
and according to the atlas information in the first picture, carrying out picture identification processing on the first picture to obtain the first picture information, and taking the picture information as the first identification result.
6. The method according to claim 2, wherein the identifying the initial building document according to the second keyword information to obtain second identification information comprises:
according to the second keyword information, performing identification processing on the title in the initial building document to obtain intermediate identification information corresponding to the title containing the second keyword information;
according to third keyword information, carrying out identification processing on the intermediate identification information to obtain target identification information containing the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different;
and if the target identification information is second character information, taking the second character information as the second identification result.
7. The method of claim 6, further comprising:
if the target identification information is a second picture, carrying out picture identification processing on the second picture to obtain second picture information, and taking the second picture information as a second identification result; wherein the second picture is different from the first picture.
8. The method according to claim 1, wherein the arranging the first identification information and the second identification information according to a preset arranging manner to obtain the target building document comprises:
and classifying, arranging and displaying the information in the first identification information and the second identification information according to the preset arrangement mode to obtain the target building document.
9. A document information processing method, characterized by comprising:
according to the first keyword information, carrying out identification processing on the title in the initial building document to obtain initial identification information corresponding to the title containing the first keyword information; wherein the initial building document is a non-editable file of building domain content;
if the initial identification information is first character information, taking the first character information as the first identification result;
if the initial identification information is a first picture, picture identification processing is carried out on the first picture according to the atlas information in the first picture to obtain first picture information, and the picture information is used as the first identification result;
according to the second keyword information, performing identification processing on the title in the initial building document to obtain intermediate identification information corresponding to the title containing the second keyword information; the second keyword information and the first keyword information are different keyword information in the initial building document;
according to third keyword information, carrying out identification processing on the intermediate identification information to obtain target identification information containing the third keyword information; wherein the third keyword information, the second keyword information and the first keyword information are different;
if the target identification information is second character information, taking the second character information as a second identification result;
if the target identification information is a second picture, carrying out picture identification processing on the second picture to obtain second picture information, and taking the second picture information as a second identification result; wherein the second picture is different from the first picture;
classifying, arranging and displaying the information in the first identification information and the second identification information according to the preset arrangement mode to obtain the target building document; and the target building file is an editable file acquired from the initial building document.
10. A document information processing apparatus, characterized in that the apparatus comprises:
the identification module is used for identifying the initial building document according to the keyword information to obtain identification information; wherein the initial building document is a non-editable file of building domain content;
the arrangement module is used for arranging the identification information according to a preset arrangement mode to obtain a target building document; and the target building file is an editable file acquired from the initial building document.
11. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 9 when executing the computer program.
12. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.
CN202010076288.6A 2020-01-23 2020-01-23 Document information processing method and device, computer equipment and readable storage medium Pending CN113158655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010076288.6A CN113158655A (en) 2020-01-23 2020-01-23 Document information processing method and device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010076288.6A CN113158655A (en) 2020-01-23 2020-01-23 Document information processing method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113158655A true CN113158655A (en) 2021-07-23

Family

ID=76881942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010076288.6A Pending CN113158655A (en) 2020-01-23 2020-01-23 Document information processing method and device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113158655A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853246A (en) * 2010-06-14 2010-10-06 深圳市万兴软件有限公司 Method and device for converting document format
CN109543690A (en) * 2018-11-27 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for extracting information
CN109783787A (en) * 2018-12-29 2019-05-21 远光软件股份有限公司 A kind of generation method of structured document, device and storage medium
CN110321470A (en) * 2019-05-23 2019-10-11 平安科技(深圳)有限公司 Document processing method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853246A (en) * 2010-06-14 2010-10-06 深圳市万兴软件有限公司 Method and device for converting document format
CN109543690A (en) * 2018-11-27 2019-03-29 北京百度网讯科技有限公司 Method and apparatus for extracting information
CN109783787A (en) * 2018-12-29 2019-05-21 远光软件股份有限公司 A kind of generation method of structured document, device and storage medium
CN110321470A (en) * 2019-05-23 2019-10-11 平安科技(深圳)有限公司 Document processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109712218B (en) Electronic book note processing method, handwriting reading equipment and storage medium
CN109815333B (en) Information acquisition method and device, computer equipment and storage medium
US8213719B2 (en) Editing 2D structures using natural input
US8838657B1 (en) Document fingerprints using block encoding of text
CN104111922A (en) Processing method and device of streaming document
US20140212040A1 (en) Document Alteration Based on Native Text Analysis and OCR
WO2016018683A1 (en) Image based search to identify objects in documents
CN111460131A (en) Method, device and equipment for extracting official document abstract and computer readable storage medium
CN113515928B (en) Electronic text generation method, device, equipment and medium
US20170139875A1 (en) Converting electronic documents having visible objects
CN113076731A (en) Report file generation method and device, computer equipment and storage medium
CN110705207A (en) Document display method and device, computer equipment and storage medium
CN111552903A (en) Page generation method and device based on HTML (Hypertext markup language) template and computer equipment
US20150139547A1 (en) Feature calculation device and method and computer program product
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN115147013B (en) Insurance product readability calculating method, apparatus, computer device and storage medium
CN113158655A (en) Document information processing method and device, computer equipment and readable storage medium
US20150347376A1 (en) Server-based platform for text proofreading
JP2014203406A (en) Control device, control method, and control program
CN111046241B (en) Graph storage method and device for flow graph processing
JP7430219B2 (en) Document information structuring device, document information structuring method and program
CN112347738B (en) Bidirectional encoder characterization quantity model optimization method and device based on referee document
CN113268968A (en) Report file generation method and device, computer equipment and storage medium
CN115758995A (en) Document data labeling method and device, computer equipment and storage medium
CN112749294A (en) Page hidden file identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210723