CN112001821A - Patent document auditing method, processing device and storage medium - Google Patents

Patent document auditing method, processing device and storage medium Download PDF

Info

Publication number
CN112001821A
CN112001821A CN202010872321.6A CN202010872321A CN112001821A CN 112001821 A CN112001821 A CN 112001821A CN 202010872321 A CN202010872321 A CN 202010872321A CN 112001821 A CN112001821 A CN 112001821A
Authority
CN
China
Prior art keywords
patent document
extracting
preset
labels
arabic numerals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010872321.6A
Other languages
Chinese (zh)
Inventor
谢德意
陶帅军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wade Innvoation Information Co ltd
Original Assignee
Shenzhen Wade Innvoation Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wade Innvoation Information Co ltd filed Critical Shenzhen Wade Innvoation Information Co ltd
Priority to CN202010872321.6A priority Critical patent/CN112001821A/en
Publication of CN112001821A publication Critical patent/CN112001821A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a patent document auditing method, a processing device and a storage medium, wherein the patent document auditing method comprises the following steps: extracting element information of character parts in the patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number; searching in element labels of the figure part based on the element information so as to perform first image-text consistency check; and searching the character part based on the element number of the figure part so as to carry out second image-text consistency check. By means of the method, the patent documents can be automatically audited, and auditing efficiency of the patent documents is improved.

Description

Patent document auditing method, processing device and storage medium
Technical Field
The present application relates to the field of document auditing technologies, and in particular, to an auditing method, a processing apparatus, and a storage medium for patent documents.
Background
A patent is a document issued by a government agency or regional organization representing several countries as a matter of application which describes the contents of the invention and creates, for a certain period of time, a legal state in which the invention of the patented patent is generally implemented by others only with the permission of a patentee. Patents are generally classified into three types, i.e., inventions, utility models, and design.
For the invention and the utility model patent, the protection scope of the patent is generally limited by the description of the characters and the drawings, and the patent technology is described in detail, and the patent also needs a certain guarantee of accuracy as a legal document. Patents are generally written manually, and various errors are inevitable, so that automatic auditing of patent documents becomes a problem to be solved urgently.
Disclosure of Invention
In order to solve the above problems, the present application provides an auditing method, a processing apparatus and a storage medium for patent documents, which can perform automatic auditing on the patent documents and improve auditing efficiency of the patent documents.
The technical scheme adopted by the application is as follows: extracting element information of character parts in the patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number; searching in element labels of the figure part based on the element information so as to perform first image-text consistency check; and searching the character part based on the element number of the figure part so as to carry out second image-text consistency check.
The method for extracting element information of the character part in the patent document comprises the following steps: extracting element labels in the text part; performing word segmentation processing on characters before the element labels to obtain element names; the component names and the component numbers are combined to form component information.
Wherein, extracting the element number in the patent document comprises: extracting Arabic numerals in the patent documents; judging whether the Arabic numerals meet a first preset requirement or not; if yes, the Arabic numerals are determined as element numbers.
Wherein, judge whether Arabic numeral satisfies the requirement of predetermineeing, include: judging whether the digit number of the Arabic numerals is smaller than a preset digit threshold value or not; if yes, the Arabic numerals are determined to meet the first preset requirement.
After extracting the arabic numbers in the patent documents, the method further comprises the following steps: extracting English letters after Arabic numerals; judging whether the English letters meet a second preset requirement or not; and when the Arabic numerals meet the first preset requirement and the English letters meet the second preset requirement, combining the Arabic numerals and the English letters as element labels.
Wherein, carry out word segmentation to the characters before the component label, obtain the component name, include: judging whether preset characters/words exist in the set number of characters before the element label; if so, the character between the last preset character/word and the element label is taken as the element name.
The preset characters/words are characters/words in a preset word cutting library, and the preset word cutting library is established by a user in a self-defined mode.
Wherein, extracting the element numbers of the figure parts in the patent documents comprises: according to the current typesetting format of the figure part, carrying out first image recognition processing on the figure part to obtain a first type element label; rotating the figure part by 90 degrees clockwise, and carrying out second-time image recognition processing on the figure part to obtain a second-class element label; the first type of element designation and the second type of element designation are combined to yield a plurality of element designations.
Another technical scheme adopted by the application is as follows: there is provided a processing apparatus for patent documents, the processing apparatus comprising a processor and a memory, the memory being arranged to store program data, the processor being arranged to execute the program data to implement a method as described above.
Another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium having stored therein program data for implementing the method as described above when executed by a processor.
The application provides a patent document auditing method, which comprises the following steps: extracting element information of character parts in the patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number; searching in element labels of the figure part based on the element information so as to perform first image-text consistency check; and searching the character part based on the element number of the figure part so as to carry out second image-text consistency check. By the mode, automatic verification of image-text consistency can be performed on the patent documents, labor cost is saved, patent verification efficiency is improved, and reference can be provided for manual verification.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
fig. 1 is a schematic flow chart of an auditing method for patent documents provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a component information extraction method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a text consistency check according to an embodiment of the present application;
FIG. 4 is a flow chart illustrating a method for building a claim tree of a patent document according to an embodiment of the present application;
FIG. 5 is a flow chart of a method for assisted writing of a patent document according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a patent document processing device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic flow chart of an auditing method for patent documents according to an embodiment of the present application, where the method includes:
step 11: and displaying an operation interface.
The operation interface is used for displaying the patent documents, displaying the auditing results and receiving the operation instructions input by the user. Optionally, the operation interface may include a document display area, an audit result display area, and a plurality of operation buttons, where the document display area is used to display the imported patent document, the audit result display area is used to display the audit result, and the plurality of operation buttons are used to receive an operation instruction input by the user.
Specifically, the method of this embodiment may be implemented by an Application (APP), and the login interface may be accessed by clicking a shortcut of the APP, and the operation interface may be accessed by inputting user information such as an account number and a password on the login interface.
Step 12: in response to an import instruction input on the operation interface, importing a patent document and displaying the patent document in a document display area of the operation interface; wherein the patent document in the document display area is in an editable state.
Optionally, an import button is arranged on the operation interface, the user pops up the address index bar after clicking the import button, and a patent document of the specified address index is determined according to the selection of the user for importing.
It is to be understood that the patent document in the present embodiment may be a document with a suffix of file name "doc" or "docx", such as a word document in office software of office or a word document in WPS office software. Of course, in other embodiments, other text editing documents are possible.
Further, for a patent document, the interface in the original word is still displayed in the document display area, and the patent document specifically comprises a function bar and a text part. Wherein the function bar may include the self-contained functions in the original word such as "start", "insert", "page layout", "refer", "review", "view", etc. For example, the font, font size, bolding, tilting, underlining, color, etc. may be set by the "start" function, and the layout setting of left justification, center justification, right justification, etc. may be performed.
Further, as for patent documents, they can be directly edited. For example, text input, modification, deletion, annotation, highlighting, and the like may be performed for text portions, and deletion and pasting may be performed for drawing portions. Alternatively, for the drawings drawn by the visio software, the drawings can be directly edited, for example, lines, shapes, characters, labels and the like in the drawings are modified.
Step 13: and auditing the patent documents.
The patent document auditing method mainly comprises three parts: and verifying patent writing specifications, verifying element information consistency and verifying image-text consistency.
Patent writing standard auditing
The patent writing specification auditing is mainly to audit patent documents according to patent writing specifications, wherein the patent writing specifications can include patent law, patent law implementation rules, patent examination guidelines and other self-defined specifications. The following is illustrated by several examples:
1. form problem in claims
For example: the problem of multiple citations, requiring that one of the multiple citations of a claim be recited in multiple claims without the multiple citation of a claim. Specifically, the information such as "a-B" a to B "(A, B represents a claim number) can be captured by acquiring the arabic numeral information in the claims. For example, if claim 5 is cited in any of claims 1 to 4, claim 6 is cited in any of claims 1 to 5, that is, claim 5, which claim 6 is also cited in multiple citations, and thus does not comply with patent writing specifications.
For example, the "lack of a basis for reference" issue, a statement in a claim that is referred to by "said" is intended to be present in the preceding disclosure or in the referenced claim. Specifically, words following "said" may be captured by acquiring "said" in the claims, and looking up whether they have appeared in the preceding and the cited claims. Specifically, for example, if "said C" appears in claim 5, it is searched for whether "C" appears from the content before "said C" in claim 5, and it is searched for whether "C" appears from the claims cited in claim 5, and if none of them appears, it is determined that "said C" lacks the reference basis.
For example, the punctuation problem, in accordance with the specification in each claim ". "ends with a period, and each claim can contain only one". ". By way of specific example, each claim can be found. "and determines whether or not at the end of a claim. "end of run".
2. Problems of form in the description
For example, the patent names in the specification generally require no more than 25 words, and the number of words in the patent names in the specification can be detected for auditing.
For example, the problem of inconsistency between the drawing description and the drawings, the drawing description generally describes each drawing in the drawings, each drawing is generally shown in the form of "drawing one" and "drawing two.
For example, the typesetting problem is to detect the font, line spacing, segment spacing and the like, and to judge that the preset requirements are met.
For example, a word and sentence repetition problem, typically two identical words and sentences occurring in succession, is determined to be a repetition problem, e.g., "said".
3. Custom specification
The custom specification can be set according to daily auditing habits or user customization. For example, some users do not like to have words in the patent document that are too much to limit the patent protection scope, such as "only", etc., the word may be added to the blacklist, and during review, if the words in the blacklist appear in the document, a comment reminding may be performed.
Element information consistency audit
The element information consistency audit mainly audits the character part of a patent document, and mainly comprises two aspects: the same element names correspond to different element numbers, and the same element numbers correspond to different element names. Referring to fig. 2, fig. 2 is a schematic flow chart of a component information extraction method provided in an embodiment of the present application, where the component information extraction method includes:
step 21: the element numbers in the patent document are extracted.
It is to be understood that the component reference numerals in the patent documents are extracted mainly from the specification parts in the patent documents.
Alternatively, in one instance, the elements are numbered as Arabic numerals. Thus, the arabic numbers in the patent documents can be extracted; judging whether the Arabic numerals meet a first preset requirement or not; if yes, the Arabic numerals are determined as element numbers.
Specifically, whether the digit number of the arabic numeral is smaller than a preset digit threshold value or not can be judged; if yes, the Arabic numerals are determined to meet the first preset requirement. Since the general labels are two-digit, three-digit, four-digit or five-digit numbers, there are generally no more labels, and the preset position threshold value can be determined according to actual conditions. For example, a series of arabic numerals may represent data, e.g., "10111000" may represent a binary number, e.g., "CNXXXXXXXX" (X represents any arabic numeral) may represent a patent application number, etc.
Alternatively, in another embodiment, the element numbers are a combination of Arabic numbers and English letters. Therefore, english letters following arabic numerals can be further extracted; judging whether the English letters meet a second preset requirement or not; and when the Arabic numerals meet the first preset requirement and the English letters meet the second preset requirement, combining the Arabic numerals and the English letters as element labels.
Specifically, when the element labels are formed by combining the "arabic numerals + english letters", the number of the english letters is 1, and therefore, the second preset requirement can determine whether the number of the english letters is 1. For example, "101 a" is a satisfactory element number, and "101 applet" is an unsatisfactory element number.
Step 22: and performing word segmentation processing on characters before the element labels to obtain the element names.
Alternatively, a semantic recognition technology may be used to obtain a word before the component label as the component name, or a large data word library may be used to obtain a word library, and a sentence before the component label is compared with the word library to obtain the component name, and the number of words of the component name is usually short, so the requirement of the number of words may also be added. For example, acquiring a target character between a component label and a first punctuation mark before the component label; and matching the target characters with the component names in the preset name library to obtain the component names.
Specifically, in a specific embodiment, it can be determined whether there is a preset character/word in a set number of characters before the element label; if so, the character between the last preset character/word and the element label is taken as the element name. Specifically, the word segmentation process is performed by a preset "word segmentation library", the word in the "word segmentation library" is generally a connection word indicating orientation, relationship and action, for example, "and" or "in" "to" "connection" "pair" "about" "match" "include" "set to" and the like, "in addition to" "due to" "and the like," when performing word segmentation, it is searched for whether there is a word in the "word segmentation library" from the element number, if there is, it is determined that the word between the element number and the "word segmentation library" is an element name.
For example, the following steps are carried out: one sentence in the patent document is as follows: "input display signal to the display screen 100". The Arabic numeral "100" is firstly extracted, then whether a word in the word cutting library exists is searched before the word "100", and the word "display screen" between the word "100" and the word "display screen" is used as the element name when the word "to" is searched.
In addition, different rules may be set according to some specific component information. For example, the component information "LED display 100" containing english letters, the word cutting process may have both "LED display 100" and "display 100" as the word cutting result so that the user can make modifications. For example, the element information "display screen 100 a" including letters in the reference numeral, the word cutting process may have both "display screen 100" and "display screen 100 a" as the word cutting result so that the user can make a modification. In addition, in addition to the above-described "letter + number" combination, "letter + number" may also include "letter + number" (e.g., LED 200) "letter + number" (e.g., red LED 300) "letter + Kanji + number" (e.g., MOS transistor 400) "number + Kanji + number" (e.g., 1-out-of-4 selector 500).
In other embodiments, it can also be determined whether there is a predetermined word or phrase in the text between the element label and the first punctuation mark before the element label. This is not exemplified.
Step 23: the component names and the component numbers are combined to form component information.
After the word segmentation processing is completed, a plurality of element information of the character part can be obtained, and then consistency check can be performed, wherein the element information consistency check mainly comprises two types:
the same element name corresponds to different element numbers, such as "display 100" and "display 200". Specifically, the extracted component information may be compared one by one, specifically, the component information with the same component name is compared, and whether the component labels are consistent or not is determined. It will be understood that non-uniformity may also be identified if no element designation follows an element name.
The same reference numerals are assigned to different names of components, such as "display screen 100" and "camera 100". Specifically, the extracted component information may be compared one by one, specifically, the component information with the same component label is compared, and whether the component names of the component information are consistent or not is determined.
And thirdly, checking consistency of the images and texts
As shown in fig. 3, fig. 3 is a schematic flowchart of a process of checking consistency of graphics and text provided in an embodiment of the present application, including the following steps:
step 31: extracting element information of character parts in a patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number.
The above embodiments have been described, and details are not repeated herein, because the word segmentation method can be used to extract component information of the Chinese character part of the patent document.
The image recognition processing may be performed on the drawing part to obtain a plurality of element labels.
The element labels are generally arabic numerals, english letters, or a combination of arabic numerals and english letters, and in this embodiment, the image recognition processing is performed on the drawings to extract the arabic numerals and the english letters in the drawings, so as to obtain the element labels in each drawing.
Specifically, several reference numerals are illustrated, the Arabic numerals may be generally a reference numeral such as "100" 101 ", the English letters may be generally a reference numeral such as" A "b", and the combination of the Arabic numerals and the English letters may be generally a reference numeral such as "200 a" 101b ".
Alternatively, the above recognition may be performed by means of deep learning, and specifically, may be performed by means of a supervised deep neural network. For example, a large number of figures are used as training data, labels in the figures are obtained in advance to mark each figure, then the labels are input into a neural network for learning, and parameters in the neural network are continuously corrected by calculating a loss value between an output value and a true value to obtain the neural network meeting requirements, so that the labels in the figures are identified.
It can be understood that when the drawings in the patent document are oversized, in order to clearly show the drawings, the directions of the drawings are generally adjusted, and in one embodiment, the first image recognition processing is performed on the parts of the drawings according to the current layout format of the parts of the drawings to obtain the first type element labels; rotating the figure part by 90 degrees clockwise, and carrying out second-time image recognition processing on the figure part to obtain a second-class element label; the first type of element designation and the second type of element designation are combined to yield a plurality of element designations. In this way, the rotated drawing can be recognized by acquiring the element numbers from two directions by image recognition.
Step 32: and searching in element numbers of the figure part based on the element information so as to perform first image-text consistency check.
And obtaining the reference numbers in the drawings in the steps, and performing consistency check on the reference numbers in the element information obtained by word segmentation and the reference numbers in the drawings in the step. Specifically, the following cases may be included: the element numbers in the text portions are found to be the same as the element numbers in the figure portions, or the element numbers in the text portions are not found in the figure portions.
Step 33: and searching in element numbers of the figure part based on the element information so as to perform first image-text consistency check.
And in the step, the reference numbers in the drawings are obtained, and whether the matched reference numbers exist in the text part or not is searched for each reference number in the drawings so as to carry out the text-text consistency check. Specifically, the following cases may be included: the same element numbers are found in the text portions for the element numbers in the figure portions, or the element numbers in the figure portions are not found in the text portions.
It is understood that step 32 is to search in the drawing by using the reference number of the text portion, step 33 is to search in the text portion by using the reference number of the drawing portion, and both steps may be performed by selecting only one of the steps, or both steps may be performed, so as to perform bidirectional search and review.
In addition, in addition to the auditing process described above, a claim tree may be built according to claims. As shown in fig. 4, fig. 4 is a flowchart illustrating a method for establishing a claim tree of a patent document according to an embodiment of the present application, where the method includes:
step 41: claim numbers of patent documents are obtained.
Alternatively, multiple claims and the numbering of each claim may be determined from the leading digits of the claims section of the patent document. It is to be understood that the claim numbers are arranged in increments beginning with the arabic numeral "1" and are at the beginning of each claim, i.e., the head of the line, so that the claim numbers can be obtained by determining the number of the head of the line. Further, since there is only one "per claim. When the first claim number "1" is obtained, the next one can be automatically found ". "number" 2 "after, and so on.
Step 42: claim citation relationships of patent documents are obtained.
The reference relationship of the claims can be analyzed according to the first sentence of each claim. A sentence according to the general claims is "according to claim x. -, so that the reference relationship of each claim can be determined by literal identification of the number following the" according to claim ". In addition, the reference relationship of each claim can be determined according to the first numerical combination of each claim except the claim number. Where the combination of numbers is mainly made in view of the question of multiple citations, such as "according to claims 1-3", it is established that this claim cites claim 1, claim 2 and claim 3.
Step 43: claim numbers and claim reference relationships are displayed using a mind map.
The thinking guidance software is a general standard for creating, managing and communicating ideas, and the visual drawing software has an intuitive and friendly user interface and rich functions, which helps you orderly organize your thinking, resources and project processes. Mind map software as a way to organize resources and manage projects derives various associated ideas and information from the core branches of the brain map.
The APP used in the method can be embedded with a thought-chart plug-in to realize the function of the thought-chart or interact with other thought-chart software to generate the thought-chart.
Optionally, determining the independent claim and the dependent claim according to the reference relation of each claim; each independent claim is taken as a free subject matter and each dependent claim is taken as a sub-subject matter, and a mind map is built according to the claim citation relationship.
In particular, a claim can be determined to be an independent claim if it does not refer to any claim, or a dependent claim if it refers to another claim. Of course, in some cases, some independent claims may refer to other claims, and in such cases, it may be determined by the preface of each claim, for example, a claim ending with "a" may be determined as an independent claim, and a claim ending with "a claim" may be determined as a dependent claim.
It is to be understood that, taking the X-mind map as an example, the X-mind includes the free subject matter and the sub-subject matter, the free subject matter corresponds to the independent claims, and the sub-subject matter must be generated on the basis of one free subject matter, corresponding to the dependent claims. A thought-graph based claim tree can thus be built from claims and their references.
Further, the contents of each claim can be added to the mind map. Specifically, the method comprises the following steps: acquiring the text content of each claim; extracting the core content of the literal content of each claim; the core contents of each claim are displayed at the corresponding claim in the mind map.
In which the text of each claim can be semantically identified to allow the text of each claim to be abbreviated to obtain the core content of each claim. Or extracting keywords from each claim, and using the extracted keywords as the core content of each claim.
Step 14: and displaying the auditing result, and annotating the patent document based on the auditing result.
Displaying a consistency audit result and a patent writing standard audit result in a display area of the audit result in a paging mode, receiving an input switching instruction, and switching a display page of the display area of the audit result; and the consistency checking result comprises an element information consistency checking result and an image-text consistency checking result.
Further, the generated claim tree may be displayed in an examination result display area.
Optionally, the review result display area is divided into three pages, including a consistency review, a patent writing specification review, and a claim tree.
The audit for consistency can be shown in the form of the following table:
name of element Element number Location of the attached drawing
Display device 100 FIG. 1 shows a schematic view of a
Camera head 200 FIG. 1 shows a schematic view of a
Display device 101
300 FIG. 1 shows a schematic view of a
Display it 100 FIG. 1 shows a schematic view of a
LED display 100 FIG. 1 shows a schematic view of a
Loudspeaker 400
The analysis of the above table shows that:
1. for the display 100:
the text portion has "display 100" and the corresponding "100" label in fig. 1, but the text portion also has two cases of "display 100" and "display 101", and the judgment is considered to be a miswriting of the pen, and can be directly modified in a table or a patent document.
In addition, the text portion may have "LED display 100" that appears inaccurate due to word segmentation, and the user may determine whether the label 100 corresponds to "display" or "LED display" based on the actual situation.
2. For the camera 200:
the text has a "camera head 200" and the corresponding "200" reference number is also shown in FIG. 1.
3. For reference numeral 300:
reference numeral "300" is shown in fig. 1, but there is no corresponding reference numeral in the text.
4. For the speaker 400:
the text has "speaker 400" but is not found in the figures.
Alternatively, for the different question types, the component names and labels may be displayed in different colors, and the question type represented by each color may be prompted on the display interface for the user to understand.
In addition, when receiving a click designation for the displayed component information, component name, or component label, the current display interface of the patent document is adjusted to a corresponding position, and the corresponding component information, component name, or component label is marked. For example, the corresponding component information may be highlighted.
For the patent writing specification auditing result, the corresponding reference specification, namely the relevant requirements in the patent law implementation rules patent examination guidelines can be displayed specifically, and the comments are made to the corresponding documents.
For example, if "said a" in a certain claim lacks a citation basis, which results in the claims from being unclear, then an annotation "lacking a citation basis may be made at a position in the patent document corresponding to" said a "which results in the claims from being unclear, which does not comply with article 26, clause 4 of the patent law, and" annotation N: the lack of a basis for this reference, which would render the scope of protection of the claims unclear, is not in accordance with article 26, clause 4 of the patent Law.
For the claim tree, it can be displayed in the form of direct access to the X-mind, and the thinking diagram of the X-mind can be edited, for example, the words therein can be modified. Further, the display may be in the form of a picture.
After the auditing process is completed, the modified (annotated) patent document can be saved (or saved as another way), and specifically, the annotated patent document is saved to the selected address in response to a saving instruction input on the operation interface.
Referring to fig. 5, fig. 5 is a schematic flow chart of a method for assisting in writing a patent document according to an embodiment of the present application, where the method includes:
step 51: patent writing information is acquired.
Step 52: and acquiring a corresponding writing template from the template database according to the patent writing information.
The template database is a pre-established database containing patent writing templates, and specifically, the writing templates input by a user can be obtained, or imported patent documents are obtained to form the template database.
In one embodiment, the patent writing information may be a patent technology submission document or a partially written patent document.
Specifically, the patent technology cross-reference documents are generally documents provided by the inventor for describing the patent technology, and generally include contents of background art, technical solutions, protection points, technical effects, and the like. A partially written patent document is one that has been written but has not yet been written. Therefore, keyword extraction can be carried out on the patent writing information to obtain a plurality of keywords; and acquiring a corresponding writing template from the template database according to the keywords. Optionally, keyword extraction may be performed on the background technology part of the patent technology filing document or the partially written patent document to obtain a plurality of keywords.
Wherein, in another embodiment, the patent writing information includes at least one of technical field information, product information, application country information, application type information, and applicant information.
Specifically, when the template database is established, a piece of information may be associated with each template, for example, when the template is put in storage, at least one of technical field information, product information, application country information, application type information, and applicant information needs to be determined.
Step 53: a writing template is displayed to provide additional reference to the writing of the patent document.
The obtained writing template can be displayed so that the user can directly refer to the template when writing the patent document.
In particular, the composition template is in an editable state so that the user may copy the content therein.
Optionally, if the writing template is a part of a patent document, a provenance of the template may be displayed after the template, for example, a network link, so that a user may find a complete version of the template according to the provenance. In addition, functions such as template export and local download can be supported.
In addition, a word library can be established, and when a user writes a patent document, a target word input in the patent document is obtained; searching at least one associated word associated with the target word from a word library; and displaying the associated words.
In one embodiment, the related term may be a generic term of the target term, for example, if the user inputs "smart watch", the term such as "wearable device", "smart device", etc. may be searched from the term library.
In another embodiment, the associated word may also be a synonym of the target word, for example, if the user inputs "smart watch", then the words such as "sports watch", "children watch", "navigation watch", etc. may be searched from the word bank.
In addition, after the patent document is written, the patent document can be checked for duplication, and specifically, the duplication can be checked through an internal patent library or an external patent library. The internal patent library can be a patent library which is built by a user, and the external patent library can be a patent library of a certain country or a world patent library.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a processing device of a patent document according to an embodiment of the present application, where the processing device 60 of the patent document includes a processor 61 and a memory 62, where the memory 62 is used for storing program data, and a user of the processor 61 executes the program data to implement the following method:
displaying an operation interface; in response to an import instruction input on the operation interface, importing a patent document and displaying the patent document in a document display area of the operation interface; wherein, the patent document in the document display area is in an editable state; auditing the patent documents; and displaying the auditing result, and annotating the patent document based on the auditing result.
Optionally, in another embodiment, the processor 61 user executes the program data to implement the following method: extracting element labels in the patent documents; performing word segmentation processing on characters before the element labels to obtain element names; the component names and the component numbers are combined to form component information.
Optionally, in another embodiment, the processor 61 user executes the program data to implement the following method: extracting element information of character parts in the patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number; searching in element labels of the figure part based on the element information so as to perform first image-text consistency check; and searching the character part based on the element number of the figure part so as to carry out second image-text consistency check.
Optionally, in another embodiment, the processor 61 user executes the program data to implement the following method: acquiring a claim number of a patent document; acquiring a claim citation relation of a patent document; claim numbers and claim reference relationships are displayed using a mind map.
Optionally, in another embodiment, the processor 61 user executes the program data to implement the following method: acquiring patent writing information; acquiring a corresponding writing template from a template database according to the patent writing information; a writing template is displayed to provide additional reference to the writing of the patent document.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer-readable storage medium 70 provided in an embodiment of the present application, where the computer-readable storage medium 70 stores program data 71, and the program data 71, when executed by a processor, is configured to implement the following methods:
displaying an operation interface; in response to an import instruction input on the operation interface, importing a patent document and displaying the patent document in a document display area of the operation interface; wherein, the patent document in the document display area is in an editable state; auditing the patent documents; and displaying the auditing result, and annotating the patent document based on the auditing result.
Optionally, in another embodiment, the program data 71, when executed by a processor, is for implementing a method of: extracting element labels in the patent documents; performing word segmentation processing on characters before the element labels to obtain element names; the component names and the component numbers are combined to form component information.
Optionally, in another embodiment, the program data 71, when executed by a processor, is for implementing a method of: extracting element information of character parts in the patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number; searching in element labels of the figure part based on the element information so as to perform first image-text consistency check; and searching the character part based on the element number of the figure part so as to carry out second image-text consistency check.
Optionally, in another embodiment, the program data 71, when executed by a processor, is for implementing a method of: acquiring a claim number of a patent document; acquiring a claim citation relation of a patent document; claim numbers and claim reference relationships are displayed using a mind map.
Optionally, in another embodiment, the program data 71, when executed by a processor, is for implementing a method of: acquiring patent writing information; acquiring a corresponding writing template from a template database according to the patent writing information; a writing template is displayed to provide additional reference to the writing of the patent document.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made according to the content of the present specification and the accompanying drawings, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. An auditing method for patent documents, the method comprising:
extracting element information of character parts in a patent document and extracting element labels of figure parts in the patent document; wherein the component information includes a component name and a component number;
searching in the element labels of the figure part based on the element information so as to perform first image-text consistency check; and
and searching the text part based on the element numbers of the figure part so as to perform second image-text consistency check.
2. The method of claim 1,
the method for extracting element information of the character part in the patent document comprises the following steps:
extracting element labels in the text part;
performing word segmentation processing on characters before the element labels to obtain element names;
and combining the element names and the element labels to form element information.
3. The method of claim 2,
the extracting of the element number in the patent document comprises:
extracting Arabic numerals in the patent documents;
judging whether the Arabic numerals meet a first preset requirement or not;
if yes, determining the Arabic numerals as element labels.
4. The method of claim 3,
the judging whether the Arabic numerals meet the preset requirements includes:
judging whether the digit number of the Arabic numerals is smaller than a preset digit threshold value or not;
if yes, determining that the Arabic numerals meet a first preset requirement.
5. The method of claim 3,
after extracting the arabic numbers in the patent documents, the method further comprises:
extracting English letters after the Arabic numerals;
judging whether the English letters meet a second preset requirement or not;
and when the Arabic numerals meet the first preset requirement and the English letters meet the second preset requirement, combining the Arabic numerals and the English letters as the element labels.
6. The method of claim 2,
the word segmentation processing is performed on the characters before the element labels to obtain the element names, and the word segmentation processing comprises the following steps:
judging whether preset characters/words exist in the characters with the set number before the element labels;
if so, the character between the last preset character/word and the element label is taken as the element name.
7. The method of claim 6,
the preset characters/words are characters/words in a preset word cutting library, and the preset word cutting library is established by a user in a self-defined mode.
8. The method of claim 1,
the extracting of the element numbers of the figure parts in the patent document comprises:
according to the current typesetting format of the figure part, carrying out first image recognition processing on the figure part to obtain a first type element label;
rotating the part of the figure clockwise by 90 degrees, and performing second-time image recognition processing on the part of the figure to obtain a second-class element label;
and combining the first type element number and the second type element number to obtain a plurality of element numbers.
9. An auditing apparatus for a patent document, the auditing apparatus comprising a processor and a memory, the memory being for storing program data, the processor being for executing the program data to implement a method according to any one of claims 1 to 8.
10. A computer-readable storage medium, in which program data are stored which, when being executed by a processor, are adapted to carry out the method according to any one of claims 1-8.
CN202010872321.6A 2020-08-26 2020-08-26 Patent document auditing method, processing device and storage medium Withdrawn CN112001821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010872321.6A CN112001821A (en) 2020-08-26 2020-08-26 Patent document auditing method, processing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010872321.6A CN112001821A (en) 2020-08-26 2020-08-26 Patent document auditing method, processing device and storage medium

Publications (1)

Publication Number Publication Date
CN112001821A true CN112001821A (en) 2020-11-27

Family

ID=73470961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010872321.6A Withdrawn CN112001821A (en) 2020-08-26 2020-08-26 Patent document auditing method, processing device and storage medium

Country Status (1)

Country Link
CN (1) CN112001821A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949254A (en) * 2021-02-25 2021-06-11 郎丽华 System and method for processing reference numbers of patent application files

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949254A (en) * 2021-02-25 2021-06-11 郎丽华 System and method for processing reference numbers of patent application files

Similar Documents

Publication Publication Date Title
CN106650943B (en) Auxiliary writing method and device based on artificial intelligence
US6721451B1 (en) Apparatus and method for reading a document image
JP2862626B2 (en) Electronic dictionary and information retrieval method
US5200893A (en) Computer aided text generation method and system
US20140304579A1 (en) Understanding Interconnected Documents
CN111274239A (en) Test paper structuralization processing method, device and equipment
CN112017079A (en) Component information extraction method, processing device and storage medium of patent document
CN112017078A (en) Auxiliary writing method, processing device and storage medium of patent document
CN107463537A (en) A kind of method that structuring processing is carried out to text message
CN109086274A (en) English social media short text time expression recognition method based on restricted model
CN111046627B (en) Chinese character display method and system
CN112001821A (en) Patent document auditing method, processing device and storage medium
CN114004221A (en) Method and device for correcting table content
CN112016282A (en) Patent document auditing method, processing device and storage medium
Bhatti et al. Phonetic-based sindhi spellchecker system using a hybrid model
CN111159408A (en) Text data labeling method and device, computer device and computer readable storage medium
CN111597302A (en) Text event acquisition method and device, electronic equipment and storage medium
CN111435405A (en) Method and device for automatically labeling key sentences of article
CN112001820A (en) Method for establishing claim tree of patent document, processing device and storage medium
CN112364632B (en) Book checking method and device
CN112347765B (en) Entity labeling method, module and device based on dictionary matching
CN110532391B (en) Text part-of-speech tagging method and device
Darģis et al. The use of text alignment in semi-automatic error analysis: use case in the development of the corpus of the Latvian language learners
CN115481602A (en) Patent document auxiliary writing method and device and computer readable storage medium
CN112860958B (en) Information display method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201127

WW01 Invention patent application withdrawn after publication