CN111753535A - Method and device for generating patent application text - Google Patents

Method and device for generating patent application text Download PDF

Info

Publication number
CN111753535A
CN111753535A CN202010420143.3A CN202010420143A CN111753535A CN 111753535 A CN111753535 A CN 111753535A CN 202010420143 A CN202010420143 A CN 202010420143A CN 111753535 A CN111753535 A CN 111753535A
Authority
CN
China
Prior art keywords
text
title
word
level
technical background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010420143.3A
Other languages
Chinese (zh)
Inventor
刘恺
张灏
何大望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinju Intellectual Property Co ltd
Original Assignee
Beijing Xinju Intellectual Property Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinju Intellectual Property Co ltd filed Critical Beijing Xinju Intellectual Property Co ltd
Publication of CN111753535A publication Critical patent/CN111753535A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Technology Law (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for generating a patent application text. The step of generating a claim in the patent application text, comprising: extracting the title and a next-level title of the title from the technical background text, and combining the title and the next-level title to generate a first claim text corresponding to the title; determining a reference relation between first claim texts corresponding to titles according to the hierarchical relation of the titles in the technical background texts; the claims are generated from the first claim text and the reference relationships. The method and the device can automatically generate the claims of the patent application text according to the technical background text, save manpower and improve the writing efficiency of the patent application text.

Description

Method and device for generating patent application text
Technical Field
The invention relates to the technical field of information intelligent processing, in particular to a method and a device for generating a patent application text.
Background
With the rapid development of technical innovation, the patent application amount is gradually increased, but at present, the patent application text mainly depends on the manual writing of the applicant or the patent attorneys, the difference between the number of the patent attorneys and the market demand is large, so that the workload of the patent attorneys is large, and the writing of the patent application text is time-consuming and labor-consuming, and is the writing of the claims.
Meanwhile, when the patent applicant writes a patent application text by himself, the writing thought of the claims cannot be well mastered, and the writing of the claims cannot be well completed. Therefore, the claims of patent application texts can be generated intelligently based on the technical background content, so as to improve the writing efficiency and quality of the patent application texts.
Disclosure of Invention
In view of the above, the present invention has been made to provide a method and apparatus for generating a patent application text that overcomes or at least partially solves the above problems.
In a first aspect, an embodiment of the present invention provides a method for generating a patent application text, where the generating step of a claim in the patent application text includes:
extracting the title and a next-level title of the title from the technical background text, and combining the title and the next-level title to generate a first claim text corresponding to the title;
determining a reference relation between first claim texts corresponding to titles according to the hierarchical relation of the titles in the technical background texts;
generating the claim from the first claim text and the reference relationship.
In some optional embodiments, further comprising:
determining at least one section of description text with preset characteristics of a title from the technical background text, and inputting the description text into a pre-trained text generation model to obtain a second claim text;
determining the second claim text, referring to the first claim text corresponding to the upper-level title of the title to which the description text belongs, and adding the second claim text to the corresponding position in the claims.
In some optional embodiments, further comprising:
performing word segmentation on the first claim text by using a selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the first claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the first claim text with a word at the previous level matched with the word in the knowledge base; and/or the presence of a gas in the gas,
and performing word segmentation on the second claim text by using the selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the second claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the second claim text with a word at the previous level matched with the word in the knowledge base.
In some optional embodiments, determining at least one section of description text with preset features of the title from the technical background text specifically includes:
retrieving each section of description text of the title in the technical background text in a preset database, and determining the similarity between the description text and the database;
and determining at least one section of description text with preset characteristics according to the similarity.
In some optional embodiments, the combining the title and the next-level title to generate a first claim text corresponding to the title specifically includes:
determining a matched claim generation template according to the type of the technical cross-bottom text;
and combining the title and the next-level title according to the template to generate a first claim text corresponding to the title.
In some optional embodiments, the text generation model is obtained by training a pointer generation network model and/or a sequence-to-sequence Seq2Seq model using a plurality of acquired data pairs, where the data pairs include a description text and a claim text corresponding to the description text.
In some optional embodiments, the extracting the title and the next-level title of the title from the technical background text specifically includes:
determining a next-level label of the label according to the label of the title, extracting the title from a technical background text, and extracting the title to which the next-level label belongs as the next-level title of the title; or the like, or, alternatively,
and extracting the title and the next-level title of the title from the technical background text of the tree structure.
In a second aspect, an embodiment of the present invention provides an apparatus for generating a patent application document, where the apparatus is used to generate a claim in the patent application document, and includes:
the first generation module is used for extracting the title and a next-level title of the title from the technical background text and combining the title and the next-level title to generate a first claim text corresponding to the title;
the determining module is used for determining the reference relation between the first claim texts generated by the first generating module corresponding to the titles according to the hierarchical relation of the titles in the technical background texts;
and the second generation module is used for generating the claim according to the first claim text generated by the first generation module and the reference relation determined by the determination module.
In a third aspect, an embodiment of the present invention provides a server, including: the device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the generation method of the patent application text.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the method for generating the patent application text is implemented.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
the method for generating the patent application text provided by the embodiment of the invention comprises the following steps of: when determining that the title has a next-level title, extracting the title and the next-level title of the title from the technical background text, and combining the title and the next-level title to generate a first claim text corresponding to the title; determining a reference relation between the first claim texts corresponding to the titles according to the hierarchical relation of the titles in the technical background texts; the claims are generated from the first claim text and the reference relationships. The method and the device can automatically generate the claim of the patent application text according to the acquired technical background text, save manpower, improve writing efficiency of the patent application text, and simultaneously avoid the form problem in the claim.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for generating a claim in a patent application text according to an embodiment of the present invention;
fig. 2 is a flowchart of a specific implementation of a method for generating a claim in the second embodiment of the present invention;
fig. 3 is a flowchart of a specific implementation of a text generation method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a device for generating a patent application document according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to solve the problem that manually writing the claims of the patent application text is time-consuming and labor-consuming in the prior art, embodiments of the present invention provide a method and an apparatus for generating the patent application text, which can automatically generate the claims of the patent application text according to the technical background text, thereby saving labor and improving writing efficiency of the patent application text.
Example one
An embodiment of the present invention provides a method for generating a patent application text, as shown in fig. 1, the method for generating a claim in the patent application text includes the following steps:
step S11: and extracting the title and the next-level title of the title from the technical background text for the title in the technical background text.
Specifically, whether a title of a next level exists is determined for each title in the technical background text, and if yes, the title and the title of the next level are extracted from the technical background text.
The obtained technical background text comprises a multi-level title, and the title of the highest level is the name of the invention or the name of the utility model; optionally, if the title of the highest level of the obtained technical background text is not the name of the invention or the name of the utility model, the technical background text may be preprocessed: extracting all titles of the highest level, determining names capable of summarizing the extracted titles, and adding the summarized names as the highest level to the technical background text.
Specifically, the name of the title capable of being extracted in summary is determined, and the name may be a keyword in the extracted title, and the extracted keyword is combined into a name according to a preset template.
The format of the technical background text can be unstructured, each title is provided with a label for indicating the hierarchical relationship among the titles, so that the next-level label of the label can be determined according to the label of the title, the title is extracted from the technical background text, and the title to which the next-level label belongs is extracted as the next-level title of the title. Optionally, the format of the technical background text may also be a tree structure, and the title and the next-level title of the title are extracted from the technical background text of the tree structure.
Step S12: combining the title with the next level title generates an item of first claim text corresponding to the title.
In one embodiment, may include determining a matching claim generation template based on a type of technical contribution text; and combining the title and the next-level title according to the template to generate a first claim text corresponding to the title.
The types of the technical background texts can be inventions or utility models, the inventions can be divided into methods and products, and the generation templates of the claim texts corresponding to different types are different. Taking the method class in the invention as an example, the claim generation template can be' 1, an … … method, and is characterized by comprising the following steps: … … "for generating the first claim text corresponding to the title of the highest hierarchy level according to the title of the highest hierarchy level and the title of the next hierarchy level (finally generating the independent claim text in the claims). The method of claim x, wherein said … … specifically includes: … … "(… …" in the template is the content to be supplemented when the claim text is generated, "x" is the content to be replaced when the claim text is generated) for generating the first claim text corresponding to the title of the non-highest level (finally, the dependent claim text in the claims is generated), taking the title of the non-highest level as the title a as an example, the first ellipsis in the template can be supplemented with the text corresponding to the title a, and the second ellipsis can be supplemented with the text corresponding to the next-level title of the title a; replacing the second x in the template with the serial number of the claim text corresponding to the upper-level title of the title A; the first x is the serial number of the generated claim text, and x is replaced when the claim is generated in step S14.
Step S13: and determining the reference relation between the texts of the first claims corresponding to the titles according to the hierarchical relation of the titles in the technical background texts.
For example, the titles at the next level of title a are title B, title C, and title D, and the titles at the next level of title B are title E and title F; the first claim text generated according to the title A, the title B, the title C and the title D is the first claim text A corresponding to the title A; the first claim text generated according to the combination of the title B, the title E and the title F is the first claim text B corresponding to the title B; the reference relationship between the first claim text a and the first claim text B is determined from the hierarchical relationship of the title a and the title B, and it is determined that the first claim text a is referred to by the first claim text B because the title a is a title of a previous hierarchical level of the title B.
Step S14: the claims are generated from the first claim text and the reference relationships.
The serial number of the first claim text is determined according to the reference relation, the referred first claim text is arranged in front of the first claim which refers to the first claim text, the determined serial number of the first claim text is replaced with the first x in the corresponding first claim text, and the first claim text is arranged to generate the claims.
The generation method of the patent application text provided by the embodiment of the invention, the generation step of the claims in the patent application text, comprises the following steps: when determining that the title has a next-level title, extracting the title and the next-level title of the title from the technical background text, and combining the title and the next-level title to generate a first claim text corresponding to the title; determining a reference relation between the first claim texts corresponding to the titles according to the hierarchical relation of the titles in the technical background texts; the claims are generated from the first claim text and the reference relationships. The method and the device can automatically generate the claim of the patent application text according to the acquired technical background text, save manpower, improve writing efficiency of the patent application text, and simultaneously avoid the form problem in the claim.
In one embodiment, at least one section of description text with preset characteristics of a title is determined from a technical background text, and the description text is input into a pre-trained text generation model to obtain a second claim text; the second claim text is determined, the first claim text corresponding to the upper-level title of the title to which the description text belongs is referred to, and the second claim text is added to the corresponding position in the claims.
Besides the technical background text comprising multiple levels of titles, at least one title can also be provided with at least one section of description text for describing the title in detail. Therefore, taking the title a as an example, after a first patent application text a corresponding to the title a is generated, if it is determined that at least one section of description text of the title a has a preset feature, for each section of description text having the preset feature, taking the description text B as an example, a description text and a pre-trained text generation model are used to generate a second claim text B. Alternatively, the second claim text may be generated after all the first claim text is generated. The method for generating the text of the second claim is described in detail in the second embodiment.
The description text with the preset characteristics determined above may be a description text marked in advance in the technical background text. Or retrieving each section of description text of the title in the technical background text in a preset database, and determining the similarity between the description text and the database; and determining at least one section of description text with preset characteristics according to the similarity, for example, determining the description text with the similarity lower than a set threshold as the description text with the preset characteristics.
Specifically, the preset feature may be creative, and the applicant may select a descriptive text considered to be creative when inputting the relevant content of the technical submissions, and mark the descriptive text according to the selection of the applicant in the process of generating the technical submissions text according to the technical submissions input by the applicant. Whether the description text has the preset characteristics or not can be judged directly according to whether the description text has the corresponding label or not.
Optionally, the obtained technical background text may also have no label, and whether the description text has the preset feature is determined, which may include retrieving the description text in a preset database, and determining the similarity between the description text and the database; and if the determined similarity is smaller than a preset threshold value, determining that the description text has preset characteristics.
Optionally, it may also be determined whether the description text has the preset feature according to whether there is a label, and for the description text without the label, it is determined whether there is the preset feature according to the similarity search.
The method for determining whether the description text has the preset feature is not specifically limited in this embodiment.
In one embodiment, the method may further include performing word segmentation on the first claim text by using the selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the first claim text with a word at a previous level in the knowledge base, which matches the word, or labeling the word in the first claim text with a word at a previous level in the knowledge base, which matches the word. This step can be performed after each first claim text has been generated, after all first claim texts have been generated, or after all first claim texts and second claim texts have been generated. And/or the presence of a gas in the gas,
and performing word segmentation on the second claim text by using the selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the second claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the second claim text with a word at the previous level matched with the word in the knowledge base. This step may be performed after each second claim text is generated, or may be performed after all second claim texts are generated.
The replacement enables the protection scope of the generated first or second claim text to be enlarged; if not, only the preset position of the corresponding word is marked, so that the writer of the patent application text can conveniently determine which word is more reasonable to select according to the marked word, the trouble of self query of the writer is avoided, or the situation that the writer forgets to write (replaces the corresponding word with a higher hierarchy) so as to strive for a larger protection range is avoided.
Example two
The second embodiment of the present invention provides a specific implementation of a method for generating a claim of a patent application, the flow of which is shown in fig. 2, and the method includes the following steps:
step S201: and extracting the title and the next-level title of the title from the technical background text for the title in the technical background text.
Step S202: combining the title with the next level title generates an item of first claim text corresponding to the title.
Step S203: and determining the reference relation between the texts of the first claims corresponding to the titles according to the hierarchical relation of the titles in the technical background texts.
For each section of description text of each title in the technical background text, steps S204-S210 are performed.
Step S204: and judging whether the description text is marked with preset characteristics.
Specifically, the technical cross-under text may be obtained by labeling the description text in advance according to a selection of a user, for example, the user determines which description text is creative and then selects at which creative part corresponding to the description text is creative, so that in the process of generating the technical cross-under text according to the information input by the user, whether creative labeling is performed on the description text according to the selection of the user may be performed. Whether the description text has the preset characteristics or not can be judged directly according to whether the description text has the corresponding label or not.
If the descriptive text is judged to have the preset feature, executing step S205; otherwise, step S206 is executed.
Step S205: and retrieving the description text in a preset database to obtain a similar document of which the similarity with the description text in the database meets a preset condition.
Step S209 is performed after step S205.
Step S206: and retrieving the description text in a preset database, and determining the similarity between the description text and the database.
Specifically, the highest value of the similarity between the description text and the document in the database may be determined as the similarity between the description text and the database.
Step S207: and judging whether the determined similarity is smaller than a preset threshold value.
If yes, determining that the description text has the preset characteristics, and executing step S208; if not, determining that the description text does not have the preset characteristics.
Step S208: and obtaining a similar document of which the similarity with the description text in the database meets the preset condition.
Step S209: and inputting the description text into a pre-trained text generation model to obtain a second claim text.
In an embodiment, the text generation model may be obtained by training a pointer generation network model and/or a sequence-to-sequence Seq2Seq model using a plurality of obtained data pairs, where the data pairs include a description text and a claim text corresponding to the description text.
Step S210: and determining a second claim text, and referring to the first claim text corresponding to the upper-level title of the title to which the description text belongs to obtain the reference relation between the second claim text and the first claim text.
Step S211: the claims are generated from the first claim text, the second claim text, similar documents and the reference relationships.
Specifically, the reference relationship includes the reference relationship between the first claim texts determined in step S203, and the reference relationship between the second claim texts and the first claim texts determined in step S210.
For convenience in description, the second claim text is referred to as the claim text together with the first claim text.
Similar documents corresponding to claim texts may be added to the claims in the form of labels according to the position of the claim texts in the claims.
Step S212: and performing word segmentation on the claim text by using the selected word segmentation model, matching each obtained word with a pre-established knowledge base, and replacing the word in the claim text with a word at the previous level matched with the word in the knowledge base if the matching is successful.
Alternatively, instead of replacing a word in the claim text, the word in the claim text may be labeled with a word in the previous level in the knowledge base that matches the word. The marked words are used for the user to refer to whether the words need to be modified. Alternatively, not only the word of the previous level matching the selected word may be labeled, but also the words of a plurality of levels of the higher level matching the word in the knowledge base may be labeled at the same time.
The steps in the above flow do not have a strict sequence relationship, and the above sequence of the steps is only an illustration, for example, after a first claim text corresponding to a title is generated, a second claim text corresponding to the title is generated; it is also possible to replace words in the claim text after knowledge base matching each time a first or second claim text is generated.
EXAMPLE III
The third embodiment of the present invention provides a specific implementation of a text generation method in the specification embodiment of the patent application text, and the flow is shown in fig. 3, and includes the following steps:
step S31: and decomposing the technical data into at least one description unit according to a preset rule.
Specifically, the technical data may be a technical background text generated according to the acquired technical background information, and the generated technical background text includes a multi-level title and at least one section of description text of at least one title. Decomposing the technical background text into at least one description unit according to a preset rule, which may include: determining each title decomposed from the technical background text as a description unit; and decomposing each section of description text of the title in the technical background text into at least one description unit according to a preset separator.
Specifically, the technical delivery text is generated according to technical delivery information input by the user, for example, it may be predetermined that the user presses the "enter key" to perform segmented writing after inputting a complete meaning expression when inputting the description text, so that the generated technical delivery text may include the enter key identifier, and the enter key identifier is used as a separator to decompose a segment of the description text into at least one description unit.
Alternatively, instead of directly parsing the description unit from the technical cross-reference text, the claims may be automatically generated according to the technical cross-reference text, and the description unit may be parsed from each claim text in the claims. The method can decompose the characteristic part or the additional technical characteristic part of the claim text into at least one descriptive text by using a semicolon as a separator; the characteristic part or the additional technical characteristic part of each claim text can be decomposed into a descriptive text in its entirety.
Automatically generating claims from technical cross-text, in one embodiment, may include: extracting the title and the next-level title of the title from the technical background text, and combining the title and the next-level title to generate a first claim text corresponding to the title; determining a reference relation between the first claim texts corresponding to the titles according to the hierarchical relation of the titles in the technical background texts; generating claims according to the first claim text and the reference relation; determining at least one section of description text with preset characteristics of the title from the technical background text, and generating a second claim text by using the description text and a pre-trained text generation model; the second claim text is determined, the first claim text corresponding to the upper-level title of the title to which the description text belongs is referred to, and the second claim text is added to the corresponding position in the claims.
The detailed description in the following embodiment of the specific implementation flow of the claims is automatically generated according to the technical cross-reference text.
The description unit may be a description text, at least one picture, or a combination of the description text and the picture.
Step S32: and inputting the pre-trained text generation model into each description unit to obtain an embodiment text segment corresponding to the description unit.
In an embodiment, the text generation model may be obtained by training a pointer generation network model and/or a sequence-to-sequence Seq2Seq model using a plurality of acquired data pairs, where a data pair includes a description unit and an embodiment text segment corresponding to the description unit.
In one embodiment, the description unit in the data pair is description text or at least one description picture, or a combination of the description text and the description picture. Wherein, the description text can be the technical description text in the technical introduction (the obtained description text written by the applicant from the technical perspective); or converting the technical description text into a claim text which meets the requirements of patent examination guidelines; or the claim text obtained by inputting the technical description text into the second text generation model. The second text generation model is obtained by training a pointer generation network model and/or a sequence-to-sequence Seq2Seq model by using a plurality of acquired second data pairs, and the second data pairs comprise a description text and a claim text corresponding to the description text.
Inputting a description unit into a pre-trained text generation model to obtain an embodiment text segment corresponding to the description unit.
Step S33: the embodiment text segments are combined to generate the embodiment text.
In one embodiment, a matching embodiment text generation template is determined according to the type of technical data; determining the hierarchical structure of the embodiment text segment corresponding to the description unit according to the hierarchical structure of the description unit in the technical data; and combining the embodiment text sections according to the hierarchical structure of the generation template and the embodiment text sections to generate the embodiment text.
Optionally, combining the embodiment text paragraphs to generate an embodiment text, and adding the description units corresponding to the embodiment text paragraphs into the embodiment text together with the embodiment text paragraphs in a preset form, so that a writer of the patent application text can clearly know which description unit each embodiment text paragraph is generated according to when reviewing or modifying the automatically generated embodiment text.
In the method for automatically generating a patent application text provided by the third embodiment of the present invention, the step of automatically generating the embodiment text in the specification in the patent application text includes: decomposing technical data into at least one description unit according to a preset rule; inputting a pre-trained text generation model into each description unit to obtain an embodiment text segment corresponding to the description unit; the embodiment text segments are combined to generate the embodiment text. The specification embodiment text of the patent application text can be automatically generated according to the acquired technical data, so that the labor is saved, the writing efficiency of the patent application text is improved, and the form problem in the specification embodiment text is avoided.
In an embodiment, the method may further include retrieving the description unit in a preset database to obtain a similar document in which the similarity between the description unit and the database meets a preset condition; and adding similar documents into the embodiment text in a labeling mode according to the position of the embodiment text segment corresponding to the description unit in the embodiment text.
The method can enable a writer of the patent application text to refer to a similar document corresponding to each embodiment text segment when looking up or modifying the automatically generated embodiment text segment, further complement and perfect the embodiment text segment, or modify the embodiment text segment and the corresponding claim text, so that the claim text applied for protection is different from the prior art and is creative.
Based on the inventive concept of the present invention, an embodiment of the present invention further provides a device for generating a patent application document, where the device is used to generate a claim in the patent application document, and its structure is shown in fig. 4, and includes:
a first generating module 41, configured to extract a title and a next-level title of the title from a technical background text, and combine the title and the next-level title to generate a first claim text corresponding to the title;
a determining module 42, configured to determine, according to a hierarchical relationship of titles in the technical background text, a reference relationship between the first claim texts generated by the first generating module 41 corresponding to the titles;
a second generating module 43, configured to generate the claims according to the first claim text generated by the first generating module 41 and the reference relationship determined by the determining module 42.
In one embodiment, the apparatus further comprises a third generating module 44 for:
determining at least one section of description text with preset characteristics of a title from the technical background text, and inputting the description text into a pre-trained text generation model to obtain a second claim text; correspondingly, the determining module 42 is further configured to:
determining a second claim text generated by the third generation module 44, referring to the first claim text corresponding to the upper-level title of the title to which the description text belongs; correspondingly, the second generating module 33 is further configured to:
the second claim text generated by the third generation module 44 is added to the corresponding location in the claims.
In one embodiment, the apparatus further comprises a replacement module 45 configured to:
performing word segmentation on the first claim text by using a selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the first claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the first claim text with a word at the previous level matched with the word in the knowledge base; and/or performing word segmentation on the second claim text by using the selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the second claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the second claim text with a word at the previous level matched with the word in the knowledge base.
In one embodiment, the third generating module 44 is specifically configured to:
retrieving each section of description text of the title in the technical background text in a preset database, and determining the similarity between the description text and the database; and determining at least one section of description text with preset characteristics according to the similarity.
In one embodiment, the first generating module 41 is specifically configured to:
determining a matched claim generation template according to the type of the technical cross-bottom text; and combining the title and the next-level title according to the template to generate a first claim text corresponding to the title.
In one embodiment, the first generating module 41 is specifically configured to:
determining a next-level label of the label according to the label of the title, extracting the title from a technical background text, and extracting the title to which the next-level label belongs as the next-level title of the title; or extracting the title and the next-level title of the title from the technical background text of the tree structure.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Based on the inventive concept of the present invention, an embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the method for generating the patent application text described above is implemented.
Based on the inventive concept of the present invention, an embodiment of the present invention further provides a server, including: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the inventive method of the present invention as described above.
Unless specifically stated otherwise, terms such as processing, computing, calculating, determining, displaying, or the like, may refer to an action and/or process of one or more processing or computing systems or similar devices that manipulates and transforms data represented as physical (e.g., electronic) quantities within the processing system's registers and memories into other data similarly represented as physical quantities within the processing system's memories, registers or other such information storage, transmission or display devices. Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or". The terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Claims (10)

1. A method for generating a patent application text, wherein the step of generating a claim in the patent application text comprises:
extracting the title and a next-level title of the title from the technical background text, and combining the title and the next-level title to generate a first claim text corresponding to the title;
determining a reference relation between first claim texts corresponding to titles according to the hierarchical relation of the titles in the technical background texts;
generating the claim from the first claim text and the reference relationship.
2. The method of claim 1, further comprising:
determining at least one section of description text with preset characteristics of a title from the technical background text, and inputting the description text into a pre-trained text generation model to obtain a second claim text;
determining the second claim text, referring to the first claim text corresponding to the upper-level title of the title to which the description text belongs, and adding the second claim text to the corresponding position in the claims.
3. The method of claim 2, further comprising:
performing word segmentation on the first claim text by using a selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the first claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the first claim text with a word at the previous level matched with the word in the knowledge base; and/or the presence of a gas in the gas,
and performing word segmentation on the second claim text by using the selected word segmentation model, matching each obtained word with a pre-established knowledge base, and if the matching is successful, replacing the word in the second claim text with a word at the previous level matched with the word in the knowledge base, or labeling the word in the second claim text with a word at the previous level matched with the word in the knowledge base.
4. The method of claim 2, wherein determining at least one section of descriptive text of the title having predetermined characteristics from the technical background text comprises:
retrieving each section of description text of the title in the technical background text in a preset database, and determining the similarity between the description text and the database;
and determining at least one section of description text with preset characteristics according to the similarity.
5. The method of claim 1, wherein combining the title and the next-level title to generate a first claim text corresponding to the title comprises:
determining a matched claim generation template according to the type of the technical cross-bottom text;
and combining the title and the next-level title according to the template to generate a first claim text corresponding to the title.
6. The method according to claim 1, wherein the text generation model is obtained by training a pointer generation network model and/or a sequence-to-sequence Seq2Seq model using a plurality of obtained data pairs, and the data pairs include a description text and a claim text corresponding to the description text.
7. The method of any one of claims 1-6, wherein extracting the title and a next-level title of the title from technical cross-over text comprises:
determining a next-level label of the label according to the label of the title, extracting the title from a technical background text, and extracting the title to which the next-level label belongs as the next-level title of the title; or the like, or, alternatively,
and extracting the title and the next-level title of the title from the technical background text of the tree structure.
8. An apparatus for generating patent application text, the apparatus being used for generating claims in patent application text, comprising:
the first generation module is used for extracting the title and a next-level title of the title from the technical background text and combining the title and the next-level title to generate a first claim text corresponding to the title;
the determining module is used for determining the reference relation between the first claim texts generated by the first generating module corresponding to the titles according to the hierarchical relation of the titles in the technical background texts;
and the second generation module is used for generating the claim according to the first claim text generated by the first generation module and the reference relation determined by the determination module.
9. A server, comprising: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of generating the patent application text of claims 1-7.
10. A computer-readable storage medium having stored thereon computer instructions, characterized in that the instructions, when executed by a processor, implement the method of generating a patent application text according to claims 1-7.
CN202010420143.3A 2020-03-19 2020-05-18 Method and device for generating patent application text Pending CN111753535A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010196520X 2020-03-19
CN202010196520 2020-03-19

Publications (1)

Publication Number Publication Date
CN111753535A true CN111753535A (en) 2020-10-09

Family

ID=72673235

Family Applications (6)

Application Number Title Priority Date Filing Date
CN202010420142.9A Pending CN111753066A (en) 2020-03-19 2020-05-18 Method, device and equipment for expanding technical background text
CN202010420151.8A Active CN111756689B (en) 2020-03-19 2020-05-18 System and method for generating patent application file
CN202010421279.6A Pending CN111753067A (en) 2020-03-19 2020-05-18 Innovative assessment method, device and equipment for technical background text
CN202010421278.1A Pending CN111753536A (en) 2020-03-19 2020-05-18 Automatic patent application text writing method and device
CN202010421277.7A Active CN111753514B (en) 2020-03-19 2020-05-18 Automatic generation method and device of patent application text
CN202010420143.3A Pending CN111753535A (en) 2020-03-19 2020-05-18 Method and device for generating patent application text

Family Applications Before (5)

Application Number Title Priority Date Filing Date
CN202010420142.9A Pending CN111753066A (en) 2020-03-19 2020-05-18 Method, device and equipment for expanding technical background text
CN202010420151.8A Active CN111756689B (en) 2020-03-19 2020-05-18 System and method for generating patent application file
CN202010421279.6A Pending CN111753067A (en) 2020-03-19 2020-05-18 Innovative assessment method, device and equipment for technical background text
CN202010421278.1A Pending CN111753536A (en) 2020-03-19 2020-05-18 Automatic patent application text writing method and device
CN202010421277.7A Active CN111753514B (en) 2020-03-19 2020-05-18 Automatic generation method and device of patent application text

Country Status (1)

Country Link
CN (6) CN111753066A (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686639B (en) * 2021-01-05 2022-11-08 河北冀联人力资源服务集团有限公司 Labor contract determination method and system based on deep learning
CN116010603A (en) * 2023-01-31 2023-04-25 浙江中电远为科技有限公司 Feature clustering dimension reduction method for commercial text classification
CN117763106B (en) * 2023-12-11 2024-06-18 中国科学院文献情报中心 Document duplicate checking method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106155989A (en) * 2015-04-03 2016-11-23 北京中知智慧科技有限公司 Patent document generates method and apparatus
CN108491384A (en) * 2018-03-15 2018-09-04 周慧祥 A kind of auxiliary writing system of patent application document
CN108932220A (en) * 2018-06-29 2018-12-04 北京百度网讯科技有限公司 article generation method and device
CN109101538A (en) * 2018-06-29 2018-12-28 中译语通科技股份有限公司 A kind of entity abstracting method and system towards Chinese patent text
CN109376350A (en) * 2018-12-15 2019-02-22 长沙贤正益祥机械科技有限公司 A kind of semi-automatic methodology of composition of structure class product patent, server and system
CN110427884A (en) * 2019-08-01 2019-11-08 达而观信息科技(上海)有限公司 The recognition methods of the document structure of an article, device, equipment and storage medium
CN111160870A (en) * 2019-12-31 2020-05-15 洪泰智造(青岛)信息技术有限公司 Patent file generation method, device and system and storage medium

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8041739B2 (en) * 2001-08-31 2011-10-18 Jinan Glasgow Automated system and method for patent drafting and technology assessment
US7707039B2 (en) * 2004-02-15 2010-04-27 Exbiblio B.V. Automatic modification of web pages
US20170098290A1 (en) * 2005-12-14 2017-04-06 Harold W. Milton, Jr. System for preparing a patent application
TWI464601B (en) * 2006-12-22 2014-12-11 Hon Hai Prec Ind Co Ltd System and method for creating patent application files
CN101488164A (en) * 2008-10-10 2009-07-22 亿维讯软件(北京)有限公司 Method for generating patent application files related to invention creation
CN104809106A (en) * 2015-05-15 2015-07-29 合肥汇众知识产权管理有限公司 System and method for excavating patent schemes
CN104881401B (en) * 2015-05-27 2017-10-17 大连理工大学 A kind of patent document clustering method
CN106021207A (en) * 2016-05-06 2016-10-12 长沙市麓智信息科技有限公司 A patent writing system and method
CN105956119A (en) * 2016-05-06 2016-09-21 长沙市麓智信息科技有限公司 Patent write auxiliary system and method
CN105956955A (en) * 2016-05-06 2016-09-21 长沙市麓智信息科技有限公司 Case tracking interaction system and method
CN105930316A (en) * 2016-05-06 2016-09-07 长沙市麓智信息科技有限公司 Patent writing assistance system and assistance method therefor
CN106528836A (en) * 2016-11-22 2017-03-22 北京恒冠网络数据处理有限公司 Method and device for compiling patent background technology based on big data
CN106777193B (en) * 2016-12-23 2020-04-10 李鹏 Method for automatically writing specific manuscript
CN106776519A (en) * 2016-12-26 2017-05-31 北京文先科技有限公司 A kind of self-service methodology of composition of patent and system
CN106940726B (en) * 2017-03-22 2020-09-01 山东大学 Creative automatic generation method and terminal based on knowledge network
CN107133210A (en) * 2017-04-20 2017-09-05 中国科学院上海高等研究院 Scheme document creation method and system
CN107220295B (en) * 2017-04-27 2020-02-07 银江股份有限公司 Searching and mediating strategy recommendation method for human-human contradiction mediating case
CN108416008A (en) * 2018-02-28 2018-08-17 华南理工大学 A kind of BIM product database semantic retrieving methods based on natural language processing
CN109062877A (en) * 2018-04-24 2018-12-21 筑权网(武汉)科技有限公司 A kind of self-service methodology of composition of patent and system
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal
CN109062937B (en) * 2018-06-15 2019-11-26 北京百度网讯科技有限公司 The method of training description text generation model, the method and device for generating description text
CN108845991A (en) * 2018-06-28 2018-11-20 河北国瑞企业管理咨询有限公司 A kind of intra-company's patent duplicate checking method
CN109522537A (en) * 2018-11-16 2019-03-26 合肥汇创知识产权代理有限公司 Patent writing and application software for XRF analysis
CN109635284A (en) * 2018-11-26 2019-04-16 北京邮电大学 Text snippet method and system based on deep learning associate cumulation attention mechanism
CN109766537A (en) * 2019-01-16 2019-05-17 北京未名复众科技有限公司 Study abroad document methodology of composition, device and electronic equipment
CN109766429A (en) * 2019-02-19 2019-05-17 北京奇艺世纪科技有限公司 A kind of sentence retrieval method and device
CN110413986B (en) * 2019-04-12 2023-08-29 上海晏鼠计算机技术股份有限公司 Text clustering multi-document automatic summarization method and system for improving word vector model
CN110502632A (en) * 2019-07-19 2019-11-26 平安科技(深圳)有限公司 Contract terms reviewing method, device, computer equipment and storage medium based on clustering algorithm
CN110457690A (en) * 2019-07-26 2019-11-15 南京邮电大学 A kind of judgment method of patent creativeness
CN110532352B (en) * 2019-08-20 2023-10-27 腾讯科技(深圳)有限公司 Text duplication checking method and device, computer readable storage medium and electronic equipment
KR102085217B1 (en) * 2019-10-14 2020-03-04 (주)디앤아이파비스 Method, apparatus and system for determining similarity of patent documents

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106155989A (en) * 2015-04-03 2016-11-23 北京中知智慧科技有限公司 Patent document generates method and apparatus
CN108491384A (en) * 2018-03-15 2018-09-04 周慧祥 A kind of auxiliary writing system of patent application document
CN108932220A (en) * 2018-06-29 2018-12-04 北京百度网讯科技有限公司 article generation method and device
CN109101538A (en) * 2018-06-29 2018-12-28 中译语通科技股份有限公司 A kind of entity abstracting method and system towards Chinese patent text
CN109376350A (en) * 2018-12-15 2019-02-22 长沙贤正益祥机械科技有限公司 A kind of semi-automatic methodology of composition of structure class product patent, server and system
CN110427884A (en) * 2019-08-01 2019-11-08 达而观信息科技(上海)有限公司 The recognition methods of the document structure of an article, device, equipment and storage medium
CN111160870A (en) * 2019-12-31 2020-05-15 洪泰智造(青岛)信息技术有限公司 Patent file generation method, device and system and storage medium

Also Published As

Publication number Publication date
CN111753514A (en) 2020-10-09
CN111756689B (en) 2022-11-22
CN111753514B (en) 2024-07-02
CN111753067A (en) 2020-10-09
CN111753066A (en) 2020-10-09
CN111756689A (en) 2020-10-09
CN111753536A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
CN108460014B (en) Enterprise entity identification method and device, computer equipment and storage medium
US10460162B2 (en) Method, device, and system, for identifying data elements in data structures
CN107392143B (en) Resume accurate analysis method based on SVM text classification
JP3425408B2 (en) Document reading device
US7469251B2 (en) Extraction of information from documents
CN109145260B (en) Automatic text information extraction method
CN111753535A (en) Method and device for generating patent application text
CN110580308B (en) Information auditing method and device, electronic equipment and storage medium
CN110175334B (en) Text knowledge extraction system and method based on custom knowledge slot structure
US10963717B1 (en) Auto-correction of pattern defined strings
CN112163424A (en) Data labeling method, device, equipment and medium
US12008830B2 (en) System for template invariant information extraction
CN109165373B (en) Data processing method and device
CN111553556A (en) Business data analysis method and device, computer equipment and storage medium
CN111191429A (en) System and method for automatic filling of data table
CN107844531B (en) Answer output method and device and computer equipment
CN114970502B (en) Text error correction method applied to digital government
CN115422372A (en) Knowledge graph construction method and system based on software test
CN112699671B (en) Language labeling method, device, computer equipment and storage medium
CN112966501B (en) New word discovery method, system, terminal and medium
CN113254583B (en) Document marking method, device and medium based on semantic vector
CN112989011B (en) Data query method, data query device and electronic equipment
CN112732743B (en) Data analysis method and device based on Chinese natural language
CN114154480A (en) Information extraction method, device, equipment and storage medium
CN113255369A (en) Text similarity analysis method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination