CN113468861A - Method and device for automatically generating abstract document - Google Patents

Method and device for automatically generating abstract document Download PDF

Info

Publication number
CN113468861A
CN113468861A CN202110652875.XA CN202110652875A CN113468861A CN 113468861 A CN113468861 A CN 113468861A CN 202110652875 A CN202110652875 A CN 202110652875A CN 113468861 A CN113468861 A CN 113468861A
Authority
CN
China
Prior art keywords
file
source files
template
template file
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110652875.XA
Other languages
Chinese (zh)
Other versions
CN113468861B (en
Inventor
陈霖
石杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bayer AG
Bayer Healthcare LLC
Original Assignee
Bayer AG
Bayer Healthcare LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bayer AG, Bayer Healthcare LLC filed Critical Bayer AG
Publication of CN113468861A publication Critical patent/CN113468861A/en
Application granted granted Critical
Publication of CN113468861B publication Critical patent/CN113468861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/16Automatic learning of transformation rules, e.g. from examples
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A method and apparatus for automatically generating a summary document are disclosed. The method comprises the following steps: determining an SD template file and a plurality of separate source files for generating an SD file of a target product, the SD file for providing an overview of detailed information described in the plurality of separate source files; and automatically generating an SD file by using an SD generation engine, wherein the SD generation engine uses a machine learning model, and the machine learning model inputs the SD template file and the plurality of individual source files and outputs the SD file.

Description

Method and device for automatically generating abstract document
Technical Field
The present disclosure relates to a method and apparatus for automatically generating a Summary Document (SD)
Background
During the new drug development process, companies prepare a set of files called CTDs (Common Technical Document) for regulatory filing purposes. These documents contain technical information on the drug substance and drug product to be reviewed and approved for clinical trials by authorities. Summary Documents (SD) are often required in order to provide an overview of much of the detailed information described in the individual source CTDs (raw material). The current approach is to manually prepare the SD by transmitting TLFs (tables, lists and graphs) from the original material, convert the data to different formats, and edit the language accordingly. This method is time consuming and lacks standardization. There are user-to-user variations and this presents challenges for subsequent review and approval activities.
It is therefore an object of the present disclosure to provide a TAG (CTD auto-generator) tool to automate this process by automatically performing the required actions, e.g. by web-based software (machine).
Disclosure of Invention
The present disclosure provides techniques for automatically generating a Summary Document (SD).
In one example aspect, a method of automatically generating a digest document (SD) is disclosed. The method includes determining an SD template file and a plurality of separate source files for generating an SD file of a target product, the SD file for providing an overview of detailed information described in the plurality of separate source files; and automatically generating the SD file by using the SD generation engine, wherein the SD generation engine uses a machine learning model, and the machine learning model inputs the SD template file and the plurality of individual source files and outputs the SD file.
According to an embodiment, in response to a request to generate an SD file of a target product, an SD template file and a plurality of separate source files for generating the SD file of the target product are determined, wherein the plurality of separate source files comprise a collection of Common Technology Documents (CTDs).
According to an embodiment, in response to uploading a set of Common Technology Documents (CTDs) to a predetermined location, an SD template file and a plurality of separate source files for generating an SD file of a target product are determined, wherein the Common Technology Documents (CTDs) serve as the plurality of separate source files.
According to an embodiment, the machine learning model is based on a user requirement table (URS), wherein the URS comprises a plurality of items, each item defining an operation between one of a plurality of separate source files and the SD template file, or defining an operation on the SD template file.
According to an embodiment, each entry further includes one or more information indicating a target chapter in the SD template file, a destination location in the SD template file, an identification of the source file, content to be operated on in the source file, and a variable.
According to an embodiment, based on information in a project, an SD generation engine automatically performs operations defined in a plurality of projects in a project order.
According to an embodiment, the SD generation engine performs a first type of operation defined in a first portion of the plurality of items to automatically transfer content in the source file to a specified location of the SD template file in an original format of the content based on information in the items.
According to an embodiment, the content includes at least one of tables, lists, and graphs, flow charts, text, and headers in the source file.
According to an embodiment, the SD generation engine performs a second type of operation defined in a second portion of the plurality of items to automatically convert the content of the source file into the SD template file by adapting at least one of a destination location, a format, and a style of the SD template file based on information in the items.
According to an embodiment, the content includes at least one of tables, lists, and graphs, flow charts, text, and headers in the source file.
According to an embodiment, the converting includes converting content from at least one of tables, lists and graphs, flow charts, text, and headers in the source file to statements in the SD template file.
According to an embodiment, the SD generation engine performs a third type of operation defined in a third portion of the plurality of items to automatically edit content in the SD template file based on information in the items.
According to an embodiment, editing includes deleting, replacing, and changing the color or format of the content.
According to an embodiment, editing comprises editing the content in the SD template file based on a determination within its logic.
According to an embodiment, the method further comprises outputting the SD template file as an SD file after performing all operations defined in the plurality of items.
According to an embodiment, a machine learning model is generated from training data that includes rules defined by a pharmaceutical professional.
According to an embodiment, the method further comprises providing a user interface to receive the request and output the SD file.
According to an embodiment, an SD template file and a plurality of separate source files are determined by input from a user interface.
According to an embodiment, the operation is performed visually through a user interface.
According to an embodiment, the target product is a drug product, and the CTD comprises technical information of the drug substance and the drug product to be reviewed by an authority and approved for clinical trials.
According to an embodiment, the user interface includes a checkbox for receiving additional information from a user, and the SD generation engine uses the additional information to edit content in the SD template file.
According to an embodiment, the target product comprises two or more components, and the plurality of source files related to the two or more components have the same file name.
According to an embodiment, the SD generation engine automatically identifies to which component each source file belongs by identifying a plurality of source files having the same file name.
According to an embodiment, the plurality of source files having the same file name are uploaded to the same folder without additional information.
According to an embodiment, in response to uploading multiple source files of two or more starting materials of a target product to a target location, the SD generation engine inserts additional statements and titles into the SD template file.
In another example aspect, an apparatus for automatically generating a Summary Document (SD) is disclosed. The apparatus includes one or more processors; and one or more storage devices storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations of the respective methods as described herein.
In another example aspect, a computer storage medium is disclosed that stores instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective method as described herein.
With the method and system of the present disclosure, not only is the efficiency of SD writing improved, but also the standardization of document quality is improved, as it will get rid of the personal writing habits that are typically observed in human authors.
The details of one or more implementations are set forth in the accompanying drawings, and the description below. Other features will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 illustrates the concept of CTD auto generator (TAG) of the present disclosure.
FIG. 2 illustrates an example method of automatically generating a digest document (SD).
FIG. 3 shows an example of a ready-to-write window for TAGs.
Fig. 4 shows an example of a content transfer operation.
Fig. 5 shows an example of a content conversion operation.
Fig. 6 shows an example of a content editing operation.
Fig. 7 shows an example of a writing window of the TAG.
FIG. 8 shows an example of a multiple box for TAG.
Fig. 9 shows an example of two-bottle system template editing of TAG.
FIG. 10 shows a block diagram of an apparatus for implementing the method for automatically generating a digest document (SD) described in the present disclosure.
Detailed Description
The present disclosure now will be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the disclosure are shown. This disclosure may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein. Accordingly, while the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intention to limit the disclosure to the specific forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the claims. Like numbers refer to like elements throughout the description of the figures.
Fig. 1 illustrates the concept of a CTD auto generator (TAG) in the present disclosure. As shown in the upper part of fig. 1, it can be seen that in the current approach, when a company prepares a collection of CTDs (Technical Registration Documents) for regulatory submission purposes, in the first stage, an author needs to find multiple source files and extract multiple tables, lists and graphs (TLFs) from some source files (e.g., p.1.02, p.3.1.01, p.5.1.01, etc.), and manually provide TLFs to CTD drafts. Furthermore, the author also needs to perform content conversion and editing on the TLFs from other source files and manually provide the edited TLFs to the CTD draft. Then, in the second phase, the author needs to provide additional manual input and adjustments (e.g., stability data) in order to generate a Summary Document (SD) for regulatory submission purposes.
The first stage has a 70-80% work load of the overall process and the second stage has a 20-30% work load of the overall process. Obviously, such an approach is time consuming and lacks standardization. Furthermore, different authors have writing habits that present challenges to subsequent review and approval activities.
On the other hand, as shown in the lower part of fig. 1, the TAG in the present application may extract TLFs from a plurality of source files and provide the TLFs to the CTD template according to a predetermined rule in order to automatically generate a CTD draft (e.g., t.50.20 draft). Then, in a second phase, the author provides additional manual input and adjustments (e.g., stability data) in order to generate a Summary Document (SD) for regulatory submission purposes. With TAG, the workload of the first stage can be avoided. Furthermore, with CTD templates and TAGs, the standardization of document quality can be improved and get rid of personal writing habits that are typically observed in human authors.
In one example, the TAG may be opened on a remote server through a web page link. When the user enters a particular web page connection, the TAG will be activated.
In another example, the TAG may be operated by running software at the local client computer.
FIG. 2 illustrates an example method of automatically generating a digest document (SD) from a TAG. The method 200 includes, in step 202, determining an SD template file and a plurality of individual source files for generating an SD file of a target product, the SD file for providing an overview of detailed information described in the plurality of individual source files; and in step 204, automatically generating an SD file by using an SD generation engine, wherein the SD generation engine uses a machine learning model, and the machine learning model inputs an SD template file and a plurality of individual source files, and outputs the SD file.
In the present disclosure, the target product is a drug product, and the CTD contains technical information of the drug substance and drug product to be reviewed by authorities and approved for clinical trials.
Specifically, in step 202, first, TAG determines an SD template file (e.g., t.50.20 draft) and a plurality of individual source files (e.g., p.1.02, p.3.1.01, p.5.1.01, etc.) for generating an SD file of a target product.
The TAG may be implemented as web-based software (machine) and run on a server. The TAG has a web-based interface that can manage the user's administrative rights, source CTD and writing SD.
In one embodiment, in response to a request to generate a SD file for a target product, the TAG may determine a SD template file and a plurality of separate source files for generating the SD file for the target product, wherein the plurality of separate source files comprise a set of technical registration documents (CTDs). SD files are used to provide an overview of detailed information described in multiple separate source files.
For example, as shown in FIG. 3, which shows an example of a TAG ready-to-write window, a user may select a template file through the interface. Further, the interface may provide multiple source files to select. The source files for a product are stored in a folder. Each "source file name" folder corresponds to a product, and a set of technical registration documents (CTDs) related to the product is stored in advance. The user may select a folder for the target product.
For example, when the user selects one template file and the folder "3527964" through the interface, the TAG determines the selected template file as an SD template file and determines a plurality of individual source files in the folder "1002670" as a plurality of individual source files for generating an SD file of a target product. The plurality of individual source files comprises a collection of technology registration documents (CTDs).
In another embodiment, the TAG may automatically determine a SD template file and a plurality of separate source files for generating a SD file of the target product in response to uploading a set of technology registration documents (CTDs) to a predetermined location, wherein the technology registration documents (CTDs) are used as the plurality of separate source files.
For example, when a user uploads a set of technical registration documents (CTDs) to a folder (e.g., folder "3527964"), the TAG is triggered to determine the SD template file, and the uploaded set of technical registration documents (CTDs) is determined as a plurality of separate source files for generating the SD file of the target product.
After the SD template file and the plurality of individual source files are determined, the TAG is automatically generated by the SD generation engine into an SD file in step 204. That is, the TAG may be configured to include an SD generation engine, which may use a machine model. The machine model inputs the SD template file and a plurality of separate source files, and outputs an SD file of the target product.
Returning to fig. 1, when multiple source CTDs and t.50.20 templates are input into the machine learning model, the machine learning model may output the t.50.20 draft as an SD file.
In one embodiment, the machine learning model is based on URS (user requirements Table). The URS defines the detailed operations to be taught to TAGs. The URS includes a plurality of items, and each item defines an operation between one of a plurality of separate source files and the SD template file or an operation on the SD template file. In one embodiment, different products may share the same URS. In another embodiment, different products may have different URS.
In one example, a machine learning model may be generated from training data that includes rules defined by a pharmaceutical professional.
An example of a URS is shown in table 1 below.
Figure BDA0003112421990000061
Figure BDA0003112421990000071
Figure BDA0003112421990000081
Figure BDA0003112421990000091
Table 1: URS of TAG
As shown in table 1, the URS includes 59 items and each item includes one or more parameters selected from a parameter set including a reference template section, a source CTD document, a step, a TLF to be copied, a destination in the template, variables, and actions.
The reference template section indicates the relevant section in the SD template.
The source CTD file indicates a source CTD to be operated.
The steps indicate the order of the separate actions of the SD writing process to be performed by the TAG. For example, the process may be divided into 85 steps. The TAG performs actions in the order indicated by the parameter "step".
The TLF to be copied indicates the location and content of the TLF in the source file to be copied to the SD template.
The destination in the template indicates the destination location in the SD template to which the TLF in the source file is to be copied.
The variables indicate variable parameters such as date, product information, etc.
The action indicates a particular operation to be performed by the TAG.
The operations defined in URS mainly include three types of operations. The first type of operation is content transfer. The TAG automatically copies and transfers content (e.g., tables, lists and graphs, flow charts, text, and headers in the source file) from the source CTD to the specified location in the SD template without editing. During this process, the format of the copied content will not change as needed.
Fig. 4 shows an example of a content transfer operation. As shown in fig. 4, when content transfer is performed, "COM 654321" in the header of the source CTD p.1.01 is copied to the header of the SD template without editing. The format of the content "COM 654321" is unchanged.
The second type of operation is content conversion. The TAG may also convert content from the source CTD to an SD template between different formats, e.g., convert content from a table to a statement. The TAG may perform content conversion to automatically convert content in the source file into an SD template file by adapting at least one of a destination location, format, and style of the SD template file based on parameters in the project.
The content includes at least one of tables, lists, and graphs, flowcharts, text, and headers in the source file. The conversion operation includes converting contents of tables, lists, diagrams, flowcharts, text, headers, and sentences in the source file into contents having different formats.
In data conversion, the transmitted data will adapt and change to different formats, e.g. data to table, table to sentence, sentence to table, etc.
Fig. 5 shows an example of a content conversion operation. As shown in fig. 5, the information in the table of the source CTD is scattered and needs to be converted into statements. When content conversion is performed, the information in the table of the source CTD is converted and populated into the corresponding destination location of the statement in the SD template. Thus, a new statement with the information of the table is generated.
The third type of operation is editing. The TAG can not only edit the SD template by simple actions such as deleting, replacing, and changing the color or format of the content, but also make a decision within its logic to edit the SD template file for each specific drug product.
Fig. 6 shows an example of a content editing operation. For example, as shown in fig. 6, TAG will search for three specific excipients from the source CTD, namely magnesium stearate, sodium lauryl sulfate and lactose monohydrate, indicating the pharmaceutical formulation and editing the SD content accordingly.
Thus, when the TAG is triggered to generate an SD file, the TAG will load the URS and automatically perform the operations defined in the plurality of items of the URS in the sequence of items based on the information in the items.
For example, when the TAG is triggered to generate an SD file, the TAG will load the URS and automatically perform the operations defined by the first item of the URS. The first item of URS is as follows:
Figure BDA0003112421990000111
the reference template section in the first entry indicates that the relevant section in the SD template is the entire file.
The source CTD document in the first project indicates that the source CTD is p.1.01.xxxxxxxxx _ 0. docx.
A step in the first item indicates that the step is the first step to be performed.
The TLF to be copied indicates that the position of the TPF to be copied in the source file is a left header, and the content of the TLF in the source file to be copied is "COM xxxx coated tablet … mg".
The destination in the template indicates that the destination location in the SD template is "COM 123456 coated tablet 25 mg". That is, "COM xxxx coated tablet … mg" in p.1.01.xxxxxxx — 0. docx was copied to "COM 123456 coated tablet 25 mg" in the SD template.
The variable in the first project was "COM 123456 coated tablet 25 mg".
The action in the first item is to replace with the copied content: "COM xxxx coated tablet … mg". That is, TAG will replace "COM 123456 coated tablet 25 mg" in SD template with "COM xxxx coated tablet … mg" in p.1.01.xxxxxxxxx — 0. docx.
Thus, TAG opened the source file p.1.01.xxxxxxxxx _0x.docx, "COM xxxx coated tablet … mg" in the left header of duplicate source file p.1.01. xxxxxxxxxxx _0x.docx, and replaced the "COM 123456 coated tablet 25 mg" in the SD template with "COM xxxx coated tablet … mg" in p.1.01.xxxxxxxxx _0 x.docx.
After the TAG performs the operations defined in the first entry, the TAG will automatically perform the operations defined in the second entry in the sequence.
Similarly, the TAGs will automatically perform the operations defined in the subsequent items in the sequence. For example, as defined in step 33, TAG will delete the text "no animal derived material used in pharmaceutical manufacturing process" in section 4.5. If the check box is yes, then the TAG will retain the statement. Otherwise, the TAG will delete the statement as defined in step 55.
In one example, the operations are performed visually through a user interface.
As illustrated in fig. 7, the user interface may display the source CTD that has been processed, as well as the written information of the chapter in the source CTD and SD templates that are being processed.
After the TAG performs all the operations defined in the URS, the TAG outputs the SD template file as an SD file. The SD file may be output through a user interface, for example.
FIG. 8 shows an example of a multiple box for TAG. As shown in fig. 8, the user interface may also provide one or more check boxes for receiving additional information from the user, such as whether the target product is micronized, whether the administration is oral, injection or inhalation, maximum daily dose of the drug, etc. In addition, the SD generation engine may use additional information passed through the check box handset to edit the content in the SD template file and/or insert such additional information into the SD template file.
Fig. 9 shows an example of two-bottle system template editing of TAG.
In some cases, the target product may include two or more components. For example, a pharmaceutical product may include two components, one component being a solvent and the other component being a solute. The two components are packaged in separate bottles, one containing solvent and one containing solute.
Because both of these two or more components are related to the same target product, the multiple source files related to the two or more components have the same file name (e.g., file name P.1.01 related to the first product, file name P.1.02 related to the second product, etc. in FIG. 9). A plurality of source files related to different components cannot be distinguished from file names alone.
The SD generation engine can automatically identify to which component each source file belongs by identifying a plurality of source files (e.g., p.1.01) having the same file name, e.g., p.1.01#006264372_02.doc is a source file belonging to a bottle containing a solvent, and p.1.01#017173489_01.docx is a source file belonging to a bottle containing a solvent.
Further, the user can upload source files for two bottles (e.g., source files p.1.01#006264372_02.doc and p.1.01#017173489_01.docx with the same file name) to the same folder (e.g., p.1.01) without additional information.
TAGs can also be edited for templates of a variety of starting materials. For example, certain target products may employ two or more starting materials. If a user uploads multiple source files of two or more starting materials of a target product to a target location, the SD generation engine may insert additional statements and titles into the SD template file. When a user uploads multiple source files of two starting materials, the original SD template file may be modified to insert additional statements and headers to describe the associated starting materials.
FIG. 10 shows a block diagram of an apparatus for implementing the method of automatically generating a digest document (SD) described in the present disclosure.
The apparatus 800 may be embodied as a smartphone, tablet, computer, server, or the like. The apparatus 80 may include one or more processors 802, one or more memories 804. The processor(s) 802 may be configured to implement one or more methods described in this document. Memory(s) 804 may be used to store data and instructions for performing the methods and techniques described herein.
During the preliminary verification run, TAGs enable automatic writing of summary documents with only 1% error rate. For a SD writing, the time saving achieved is 20 hours of operation on average.
From the foregoing, it will be appreciated that, although specific embodiments of the technology of the disclosure have been described herein for purposes of illustration, various modifications may be made without deviating from the scope of the disclosure. Accordingly, the techniques of this disclosure are not limited, except as by the appended claims.
Implementations of the subject matter and the functional operations described in this patent document may be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a calculator, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the referenced computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software publication, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file with other programs or data (e.g., one or more scripts stored in a markup language file), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processors and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processor and the logic flows can also be implemented as, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions or data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have these devices. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices). The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
The specification and drawings are to be regarded in an illustrative manner, with an exemplary meaning being one example. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Further, the use of "or" is also intended to include "and/or" unless the context clearly indicates otherwise.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although certain features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the division of various system components in the embodiments described in this patent document should not be construed as requiring such division in all embodiments.
Only a few implementations and examples are described, and other implementations, enhancements, and variations can be made based on what is described and illustrated in this patent document.

Claims (27)

1. A method of automatically generating a digest document (SD), comprising:
determining an SD template file and a plurality of separate source files for generating an SD file of a target product, the SD file for providing an overview of detailed information described in the plurality of separate source files; and
automatically generating the SD file by using an SD generation engine, wherein the SD generation engine uses a machine learning model, and the machine learning model inputs the SD template file and the plurality of individual source files and outputs the SD file.
2. The method of claim 1, wherein the SD template file and the plurality of separate source files used to generate the SD file of the target product are determined in response to a request to generate the SD file of the target product, wherein the plurality of separate source files comprises a collection of Common Technology Documents (CTDs).
3. The method of claim 1, wherein in response to uploading a set of Common Technology Documents (CTDs) to a predetermined location, determining a SD template file and a plurality of separate source files for generating a SD file of the target product, wherein the Common Technology Documents (CTDs) are used as the plurality of separate source files.
4. The method of any of claims 1-3, wherein the machine learning model is based on a user requirement table (URS), wherein the URS comprises a plurality of items, each item defining an operation between one of the plurality of separate source files and the SD template file, or defining an operation on the SD template file.
5. The method of claim 4, wherein each entry further comprises one or more of information indicating a target section in the SD template file, an identification of the source file, a sequence of steps, a destination location in the SD template file, content to be operated on in the source file, variables, and operations.
6. The method of claim 5, wherein the SD generation engine automatically performs operations defined in a plurality of projects in project order based on information in the projects.
7. The method of claim 6, wherein the SD generation engine performs a first type of operation defined in a first portion of the plurality of items to automatically transfer content in the source file to a specified location of the SD template file in an original format of the content based on information in the items.
8. The method of claim 7, wherein the content comprises at least one of tables, lists and graphs, flowcharts, text, and headers in the source file.
9. The method of claim 6, wherein the SD generation engine performs a second type of operation defined in a second portion of the plurality of items to automatically convert the content of the source file into the SD template file by adapting at least one of a destination location, format, and style of the SD template file based on information in the items.
10. The method of claim 9, wherein the content comprises at least one of tables, lists and graphs, flowcharts, text, and headers in the source file.
11. The method of claim 10, wherein the converting comprises converting contents of tables, lists, graphs, flowcharts, text, headers, and sentences in the source file to contents having different formats.
12. The method of claim 6, wherein the SD generation engine performs a third type of operation defined in a third portion of the plurality of items to automatically edit content in the SD template file based on information in the items.
13. The method of claim 12, wherein the editing comprises deleting, replacing, and changing a color or format of the content.
14. The method of claim 12, wherein the editing comprises editing content in the SD template file based on a determination within its logic.
15. The method of claim 6, further comprising:
outputting the SD template file as the SD file after performing all operations defined in the plurality of items.
16. The method of any of claims 1-3, wherein the machine learning model is generated from training data, the training data comprising rules defined by a pharmaceutical professional.
17. The method of claim 4, further comprising:
providing a user interface to receive the request and output the SD file.
18. The method of claim 17, wherein the SD template file and the plurality of separate source files are determined by input received from the user interface.
19. The method of claim 17, wherein the operation is performed visually through the user interface.
20. The method of claim 17, wherein the user interface includes a checkbox for receiving additional information from a user, and the SD generation engine uses the additional information to edit content in the SD template file.
21. A method according to any one of claims 1 to 3, wherein the target product comprises two or more components, and the plurality of source files associated with the two or more components have the same file name.
22. The method of claim 21, wherein the SD generation engine automatically identifies to which component each source file belongs by identifying multiple source files having the same file name.
23. The method of claim 22, wherein the plurality of source files having the same file name are uploaded to the same folder without additional information.
24. The method of claim 22, wherein the SD generation engine inserts additional statements and titles into the SD template file in response to uploading multiple source files of two or more starting materials of a target product to a target location.
25. The method of claim 2 or 3, wherein the target product is a drug product and the CTD comprises technical information of a drug substance and the drug product to be reviewed by an authority and approved for clinical trials.
26. An apparatus for automatically generating a digest document (SD), comprising:
one or more processors; and
one or more storage devices storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the operations of the respective methods of any of claims 1-25.
27. A computer storage medium storing instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective method of any of claims 1-25.
CN202110652875.XA 2020-06-12 2021-06-11 Method and device for automatically generating abstract document Active CN113468861B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2020/095762 2020-06-12
PCT/CN2020/095762 WO2021248435A1 (en) 2020-06-12 2020-06-12 Method and apparatus for automatically generating summary document

Publications (2)

Publication Number Publication Date
CN113468861A true CN113468861A (en) 2021-10-01
CN113468861B CN113468861B (en) 2022-12-20

Family

ID=77869721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110652875.XA Active CN113468861B (en) 2020-06-12 2021-06-11 Method and device for automatically generating abstract document

Country Status (2)

Country Link
CN (1) CN113468861B (en)
WO (1) WO2021248435A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714132A (en) * 2009-11-13 2010-05-26 山东浪潮齐鲁软件产业股份有限公司 Method for automatically generating batch release abstracts in vaccine production
JP2012018674A (en) * 2010-07-06 2012-01-26 Ricoh Co Ltd Method and apparatus for acquiring one or more key elements from document
CN106815184A (en) * 2017-01-18 2017-06-09 上海爱韦讯信息技术有限公司 The system and method for document is automatically generated based on FOG data
US20170228457A1 (en) * 2016-02-09 2017-08-10 Yahoo! Inc. Scalable and effective document summarization framework
CN110222317A (en) * 2019-03-29 2019-09-10 中国地质大学(武汉) A kind of method and system that powerpoint presentation is converted to Word document
CN110334334A (en) * 2019-06-19 2019-10-15 腾讯科技(深圳)有限公司 A kind of abstraction generating method, device and computer equipment
CN110956041A (en) * 2019-11-27 2020-04-03 重庆邮电大学 Depth learning-based co-purchase recombination bulletin summarization method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10691998B2 (en) * 2016-12-20 2020-06-23 Google Llc Generating templated documents using machine learning techniques
US20190244108A1 (en) * 2018-02-08 2019-08-08 Cognizant Technology Solutions U.S. Corporation System and Method For Pseudo-Task Augmentation in Deep Multitask Learning
CN110795923B (en) * 2019-11-01 2024-03-22 达观数据有限公司 Automatic generation system and generation method for technical document based on natural language processing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714132A (en) * 2009-11-13 2010-05-26 山东浪潮齐鲁软件产业股份有限公司 Method for automatically generating batch release abstracts in vaccine production
JP2012018674A (en) * 2010-07-06 2012-01-26 Ricoh Co Ltd Method and apparatus for acquiring one or more key elements from document
US20170228457A1 (en) * 2016-02-09 2017-08-10 Yahoo! Inc. Scalable and effective document summarization framework
CN106815184A (en) * 2017-01-18 2017-06-09 上海爱韦讯信息技术有限公司 The system and method for document is automatically generated based on FOG data
CN110222317A (en) * 2019-03-29 2019-09-10 中国地质大学(武汉) A kind of method and system that powerpoint presentation is converted to Word document
CN110334334A (en) * 2019-06-19 2019-10-15 腾讯科技(深圳)有限公司 A kind of abstraction generating method, device and computer equipment
CN110956041A (en) * 2019-11-27 2020-04-03 重庆邮电大学 Depth learning-based co-purchase recombination bulletin summarization method

Also Published As

Publication number Publication date
WO2021248435A1 (en) 2021-12-16
CN113468861B (en) 2022-12-20

Similar Documents

Publication Publication Date Title
AU2019204404B2 (en) Generating digital document content from a digital image
CN110347953B (en) Page generation method, page generation device, computer equipment and storage medium
US12001470B2 (en) Document elimination for compact and secure storage and management thereof
CN111930966A (en) Intelligent policy matching method and system for digital government affairs
US20070239802A1 (en) System and method for maintaining the genealogy of documents
US20150012805A1 (en) Collaborative Matter Management and Analysis
CN102654874A (en) Bill data management method and system
Thakur et al. Automatic generation of sequence diagram from use case specification
US20150149371A1 (en) System And Method For Generating And Formatting Formally Correct Case Documents From Rendered Semantic Content
CN102467496B (en) Method and device for converting stream mode typeset content into block mode typeset document
CN113468861B (en) Method and device for automatically generating abstract document
Seljan Quality Assurance (QA) of Terminology in a Translation Quality Management System (QMS) in the business environment
CN112765948A (en) Document generation editing method
Verbert et al. The alocom framework: Towards scalable content reuse
CN104657340A (en) Expandable script-based Word report generating system and method
Flatt et al. Model-Driven Development of Akoma Ntoso Application Profiles: A Conceptual Framework for Model-Based Generation of XML Subschemas
Pianka et al. Increasing efficiency and cost-effectiveness by automating the authoring of the development safety update report
Pfalzgraf et al. Cross enterprise change and release processes based on 3D PDF
Ramezani et al. Rapid tagging and reporting for functional language extraction in scientific articles
Lenz et al. Standardisation of XML-based DTDs for corporate environmental reporting: Towards an EML
Phillips et al. Implementing a collaborative workflow for metadata analysis, quality improvement, and mapping
CN112258607B (en) Slide rendering method and device and electronic equipment
US20140136181A1 (en) Translation Decomposition and Execution
McKenzie et al. Introductions
WO2024055862A1 (en) Document review method and apparatus for implementing ia by combining rpa and ai, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant