WO2022188821A1 - 对文件进行自定义字段标引的处理方法、装置、服务器及系统 - Google Patents

对文件进行自定义字段标引的处理方法、装置、服务器及系统 Download PDF

Info

Publication number
WO2022188821A1
WO2022188821A1 PCT/CN2022/080011 CN2022080011W WO2022188821A1 WO 2022188821 A1 WO2022188821 A1 WO 2022188821A1 CN 2022080011 W CN2022080011 W CN 2022080011W WO 2022188821 A1 WO2022188821 A1 WO 2022188821A1
Authority
WO
WIPO (PCT)
Prior art keywords
indexing
field
file
rule
custom
Prior art date
Application number
PCT/CN2022/080011
Other languages
English (en)
French (fr)
Inventor
杨林林
刘旭阳
张鑫
项晓露
周志翔
Original Assignee
智慧芽信息科技(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 智慧芽信息科技(苏州)有限公司 filed Critical 智慧芽信息科技(苏州)有限公司
Publication of WO2022188821A1 publication Critical patent/WO2022188821A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes

Definitions

  • the present disclosure relates to the technical field of patent data processing, and in particular, to a processing method, device, server and system for indexing a file with a custom field.
  • Patents are indexed with the "language” tag. And it requires the user to repeatedly receive the work similar to the "language” indexing operation for the submenus of the other 8 subjects. It can be seen that such an indexing process is relatively complex, has low indexing efficiency, consumes large human resources, and has poor user experience in patent work.
  • the present disclosure provides a processing method, device, server and system for indexing a custom field of a document, so as to at least solve the technical problem of low indexing efficiency of a document's custom field in the related art.
  • the technical solutions of the present disclosure are as follows:
  • a processing method for custom field indexing of files including:
  • the indexing rule is used to set the classification field of all or part of the sub-items under the custom field, the custom field
  • the field is used to index and classify the document, and the types of the document range include: indexing all documents in the set of documents to be indexed;
  • the indexing process of the custom field is performed on the files within the file scope by using the indexing rule.
  • the indexing rule includes a word frequency filtering rule
  • the field type of the custom field includes a text field
  • the determining the indexing rule for the custom field includes:
  • a word frequency filtering rule is set based on the number of occurrences of the filtering word in the text field.
  • all or part of the indexing rules are associated based on logical operations of AND, OR, and NOT.
  • the word frequency filtering rule is further used for:
  • the document is a patent document
  • the word frequency filtering rule can also be used to perform word frequency filtering on at least one of the following text content: invention title, abstract, claim, description, identification
  • word frequency filtering rule can also be used to perform word frequency filtering on at least one of the following text content: invention title, abstract, claim, description, identification
  • the category of the file scope further includes:
  • the unindexed files in the to-be-indexed file set are indexed, and the newly added files in the to-be-indexed file set are indexed.
  • the method further includes:
  • the field type of the custom field further includes:
  • the modifying the indexing rule of the custom field to obtain the updated indexing rule includes:
  • the parameter values and/or operation logic in the indexing rules are adjusted according to the results of the indexing processing to obtain the updated indexing rules.
  • the present disclosure also provides a processing method for document classification, including:
  • Identify the operating authority of the operating account where the operating authority includes pre-set business contents that can be handled by different account types;
  • the operation account has the conversion operation authority, obtain a corresponding file data set including the second field according to the business field allocated to the operation account, wherein the second file data set in the file data set including the second field is obtained. fields, including data information obtained by processing file data by operating accounts that match different or the same operating authority;
  • the second field in the file data set is converted into a classification field of the target field type
  • the method further includes:
  • the second field in the file data set is modified to obtain modified field data.
  • the present disclosure also provides a processing device for custom field indexing of files, including:
  • a rule definition module is used to determine an indexing rule of a custom field and a file scope using the indexing rule, wherein the indexing rule is used to perform classification on all or part of the sub-items of the custom field.
  • the custom field is used to index and classify the file, and the types of the file range include: indexing all the files in the set of files to be indexed;
  • the indexing processing module is configured to, in response to a trigger instruction of automatic indexing, use the indexing rule to perform indexing processing of custom fields on the files within the file scope.
  • the indexing rule includes a word frequency filtering rule
  • the field type of the custom field includes a text field
  • the determining the indexing rule for the custom field includes:
  • a word frequency filtering rule set based on the number of occurrences of the filtering word in the text field.
  • all or part of the indexing rules are associated based on logical operations of AND, OR, and NOT.
  • the word frequency filtering rule is further used for:
  • the document is a patent document
  • the word frequency filtering rule can also be used to perform word frequency filtering on at least one of the following text content: invention title, abstract, claim, description, identification
  • word frequency filtering rule can also be used to perform word frequency filtering on at least one of the following text content: invention title, abstract, claim, description, identification
  • the category of the file scope further includes:
  • the unindexed files in the to-be-indexed file set are indexed, and the newly added files in the to-be-indexed file set are indexed.
  • the field type of the custom field further includes:
  • the apparatus further comprises:
  • the rule modification module is used for modifying the indexing rule of the self-defined field in response to the rule modification instruction to obtain the updated indexing rule.
  • the apparatus further comprises:
  • the re-indexing module is configured to perform indexing processing of custom fields on the files within the file scope by using the indexing rules updated by the rule-modifying module.
  • the modifying the indexing rule of the custom field to obtain the updated indexing rule includes:
  • the parameter values and/or operation logic in the indexing rules are adjusted according to the results of the indexing processing to obtain the updated indexing rules.
  • the present disclosure also provides a file classification processing device, including:
  • an authority identification module used for identifying the operation authority of the operation account, the operation authority including pre-set business contents that can be handled by different account types;
  • a data acquisition module configured to acquire a corresponding file data set containing the second field according to the business field allocated for the operating account when the operating account has the conversion operation authority, wherein the file containing the second field
  • the second field in the data set includes data information obtained by processing file data by operating accounts that match different or the same operating authority;
  • a first conversion module configured to respond to the first conversion operation of the second field, and convert the second field in the file data set into a classification field of the target field type
  • the display module is configured to display the classification result after the conversion operation of the second field.
  • the apparatus further comprises:
  • the modification module is configured to modify the second field in the file data set in response to the modification operation of the second field to obtain the modified field data.
  • Another aspect of the embodiments of the present disclosure further provides a server, including:
  • a memory for storing the processor-executable instructions
  • the processor is configured to execute the instructions to implement the method described in any embodiment of the present disclosure.
  • Another aspect of the embodiments of the present disclosure further provides a computer-readable storage medium, when the instructions in the computer-readable storage medium are executed by a processor of a server, the server can execute any one of the implementations of the present disclosure. method described in the example.
  • Another aspect of the embodiments of the present disclosure further provides a computer program product, including a computer program/instruction, characterized in that, when the computer program is executed by a processor, the method described in any one of the embodiments of the present disclosure is implemented.
  • Another aspect of the embodiments of the present disclosure further provides a patent management system, where the patent management system includes the device described in any embodiment of the present disclosure, or when the processor of the patent management system executes the executable instructions stored in the memory, Implement the processing method for indexing a file with a custom field described in any embodiment of the present disclosure; or, the patent management system includes the computer program product described in the present disclosure.
  • the multiple steps of user-defined field indexing of files can be transformed into automatic indexing based on triggering instructions after setting rules for the user-defined fields.
  • Page operation can complete the indexing work, effectively simplifying the operation steps.
  • the indexing rule setting can be provided to all or some of the sub-items under the custom field, it can be avoided to repeatedly perform different sub-items of the same type of custom field one by one in a single file collection (such as a single favorite). Indexing operations. Once the indexing rules are set, they can be used in all file collections (such as under all favorites), which realizes one-time rule setting and is globally common, which reduces the complexity of indexing operations for custom fields and reduces human resources. consumption, improve the efficiency of indexing processing, and improve the service experience of users' patent operations.
  • FIG. 1 is a schematic diagram of an application environment of a processing method for indexing a file with a custom field according to an exemplary embodiment.
  • Fig. 2 is a flow chart of a processing method for indexing a custom field to a file according to an exemplary embodiment.
  • FIG. 3 is a schematic diagram of an interface for setting automatic indexing rules provided by the present disclosure.
  • FIG. 4 is a schematic diagram of a scenario in which word frequency filtering rules are applied to batch reply information in an embodiment provided by the present disclosure.
  • FIG. 5 is a schematic diagram of a scenario in which a word frequency filtering rule is applied to reply information including reply information in an embodiment provided by the present disclosure.
  • Fig. 6 is a flow chart of a processing method for indexing a custom field to a file according to an exemplary embodiment.
  • FIG. 7 is a schematic structural diagram of a processing apparatus for indexing a file with a custom field according to an exemplary embodiment.
  • FIG. 8 is a schematic structural diagram of a processing apparatus for indexing a file with a custom field according to an exemplary embodiment.
  • FIG. 9 is a schematic structural diagram of a processing apparatus for indexing a file with a custom field according to an exemplary embodiment.
  • FIG. 10 is a schematic structural diagram of a device S00 for converting patent description information according to an exemplary embodiment.
  • Fig. 11 is a flowchart showing a processing method for file classification according to an exemplary embodiment.
  • Fig. 12 is a flow chart of a processing method for file classification according to another exemplary embodiment.
  • FIG. 1 is a schematic diagram of an application environment of a processing method for indexing a file with a custom field according to an exemplary embodiment.
  • a processing method for custom field indexing of a file provided by the present disclosure can be applied to the application environment as shown in FIG. 1 .
  • the patent management system 110 provided to the user, the patent management system 110 can build a patent database, provide a patent management interface, and realize indexing, storage, novelty search, analysis, and update of patent data.
  • Some current patent management systems are still based on the classification of indexing data or descriptive information entered by indexers. Even if the user has pre-set custom fields for classifying patents, such as language, mathematics, sports, etc.
  • the classification of patent documents is still carried out on a batch of patents of a single category one by one as described in the background art. Indexing and classification, repeating a lot of work. Especially in application scenarios with a large number of patents, the current indexing processing methods and processes are inefficient and cannot meet the patent management needs of some users.
  • the technical solution provided by the present disclosure can be applied to the patent management system 110, can provide classification processing including but not limited to patent documents, and optimize the indexing processing process and flow of the custom fields of patent documents, greatly Improved the job efficiency of custom field indexing.
  • the field types of the custom fields involved in the embodiments of the present disclosure may include, but are not limited to, text fields, option fields, and hierarchical fields.
  • the text field may include information content entered by the user to describe the content of the file, key technologies, etc., or a user-defined classification field. Usually text fields can be freely entered by the user.
  • the option field usually includes a user-defined classification field, and the classification field in the option field may be a parallel classification relationship or a hierarchical relationship.
  • the option field is a user-defined category field, which can be displayed in the form of a menu.
  • a file can belong to classification fields in multiple option fields. For example, a patent belongs to A and B in the four option fields of A, B, C, and D.
  • the patent management system 110 described in the embodiments of the present disclosure may include, but is not limited to, local or remote servers, server clusters, distributed subsystems, cloud processing platforms, servers including blockchain nodes, and devices of combinations thereof. Including various personal computers, laptops, smartphones, tablets, wearable devices, in-vehicle devices, medical devices, etc.
  • the solutions of the embodiments of the present disclosure are not limited to the indexing processing of custom fields of patent documents.
  • the solutions of the embodiments of the present disclosure can also be used for the indexing of custom fields of other file types.
  • citation processing such as papers, newspapers, book documents, etc. Therefore, in some embodiments of the present disclosure, the documents may include patent documents.
  • the patent words described in the following embodiments can also be adjusted adaptively, such as the thesis management system.
  • the indexing may include marking the target object with one or more classification marks, so as to guide people or machines to find the required information conveniently and quickly.
  • the index can be used as the basis for retrieving the identification or the classification of the target object, or the index itself can identify the classification to which the object belongs.
  • Fig. 2 is a flowchart of a processing method for indexing a file with a custom field according to an exemplary embodiment. As shown in Fig. 2, the method may be used in the aforementioned patent management system 110, and may include the following step.
  • S202 Determine an indexing rule of a custom field and a file scope using the indexing rule, wherein the indexing rule is used to set the classification fields of all or part of the sub-items under the custom field, and the The custom field is used to index and classify the file, and the types of the file range include: indexing all the files in the set of files to be indexed.
  • One of the innovations of the embodiments of the present disclosure is to change the traditional custom indexing processing flow, to predefine and set indexing rules, and convert the multiple steps of user indexing of custom fields into rules for custom fields. Afterwards, the indexing is automatically completed based on the trigger instruction, and the user can complete the indexing work only by operating on one page, which effectively simplifies the operation steps.
  • a rule setting or editing interface can be provided, and the user can edit the custom field in the interface to give the specific indexing rules of the custom field.
  • the indexing rule setting can be provided to all or part of the sub-items under the custom field, it can be avoided to repeatedly refer to the same file in a single file set (such as a single favorite). Different sub-items of the type custom field are indexed one by one. Once indexing rules are set, they can be used in all file collections (such as under all favorites), which implements one-time rule setting and is globally common.
  • the documents may include one or more pre-selected or filtered patents.
  • the enterprise-level search application server can be called to find the patents that need to be indexed this time, or it can be displayed in the job list in the form of a list of patent publication (announcement) numbers. in the space interface.
  • the custom field may be a text field, an option field and/or a level field.
  • the option field obtained by the conversion processing of the description information/index data of the patent document as before.
  • the target field type may include a category field set by a user-defined category.
  • Another innovation of the present disclosure is that the classification fields in some field types of the custom fields can be set by the user, which greatly facilitates the user to customize the type of the classification field to be converted. For example, the name of the root node, middle node, leaf node and other node classification fields in the hierarchical field can be customized by the user according to the classification requirements, and the system's own classification rules can be omitted or partially used, which is convenient for users to flexibly define the classification type. .
  • indexing rule setting can be provided to some sub-items under the custom field, or can be implemented by specifying different classification levels.
  • indexing rules can be specified to classify all sub-levels under the first-level category "subject” into "language”, “mathematics", “sports”, etc., in this way, once the indexing rules are set, the automatic indexing By triggering the operation, the files can be automatically classified into all the corresponding sub-categories under the first-level category "subject", without the need to perform indexing operations for the sub-categories separately.
  • the types of the document scope may include various manners, which may indicate which patent documents the indexing rules set this time are used in.
  • the type of the document scope includes at least indexing all documents, so that the scope of use of the indexing rule can be controlled globally and efficiently, that is, the indexing rule can be used to All documents are indexed, including previously indexed documents.
  • the types of the file scope may also include:
  • the unindexed files in the to-be-indexed file set are indexed, and the newly added files in the to-be-indexed file set are indexed.
  • the indexed files can no longer be re-indexed using the indexing rules set this time, and only the unindexed files can be indexed to avoid overwriting the previously indexed targets .
  • similarly, only newly added files may be indexed using the indexing rules set this time.
  • the indexing rules set more reasonable, the indexing can be completely overwritten. It can be seen that, in some embodiments of the present disclosure, not only can indexing rules be set, but also the scope of files to use indexing rules can be selected at the same time, and the usage scenarios of site indexing rules can be selected more flexibly to meet different user needs and indexing requirements. Introduce processing requirements and prompt user experience.
  • the word frequency setting can be supported, and the user can set the number of times the filter words appear in the patent text, such as 1, 2, 3, 4, and 5 times. Citation rules. For example, if the number of occurrences of the filter word "automobile" in a patent document reaches a certain threshold, such as 10 times, the patent can be indexed as "automobile".
  • the word frequency setting can also be combined with other indexing rules to jointly decide the classification described in the document.
  • the indexing rule includes:
  • the filter rules for word frequency set based on the number of occurrences of the filter words in the text field.
  • An indexing rule can include multiple word frequency filtering rules. Multiple word frequency filtering rules can be associated by AND-OR logical operation. A filtering rule that is logically associated with OR can also be regarded as a word frequency filtering rule or a logical operation rule. For example, if the number of occurrences of the filter word "WORD1" in the patent document reaches 20, and the number of words in which the filter word "WORD2" appears reaches 10, the patent is indexed as "automobile".
  • FIG. 3 is a schematic diagram of an interface for setting automatic indexing rules provided by the present disclosure.
  • the indexing rules may include multiple rules, and as described above, different rules may be associated through logical operations. Therefore, all or part of the indexing rules are associated based on logical operations of AND, OR, NOT.
  • the word frequency filtering rule is further used for:
  • FIG. 4 is a schematic diagram of a scenario in which word frequency filtering rules are applied to batch reply information in an embodiment provided by the present disclosure.
  • the word frequency filtering rule can not only perform logical operations on the fields of the patent text content in the patent document, but also support the logical operations on the comments, comments, memos and other reply information made in the file.
  • These reply information can be Information added by one or more different users, as shown in Figure 4.
  • the word frequency filtering rule can also include reply information to these comments, comments, memos and other approval information.
  • FIG. 5 is a schematic diagram of a scenario in which a word frequency filtering rule is applied to reply information including reply information in an embodiment provided by the present disclosure. As shown in Figure 5, User1's reply content to User2's comment. In the embodiments of the present disclosure, these approval information are further taken into consideration in the custom field indexing of the file. If the reply information corresponding to the approval information may further include the reply information, the accuracy of the file custom field indexing can be further improved. This makes the classification results of documents more reliable.
  • the present disclosure also provides a method for processing word frequency filtering rules for files.
  • the document is a patent document
  • the word frequency filtering rule can also be used to perform word frequency filtering on one or more specified content parts of the document, and at least one of the following Word frequency filtering is performed on the text content of the invention: the name of the invention, the abstract, the claims, the description, and the text information contained in the identified drawings of the description.
  • the text information contained in the drawings of the description can be recognized by ORC or other image recognition algorithms.
  • the solution of the embodiment of the present disclosure performs word frequency filtering on the patent text, and can perform word frequency filtering on the text information contained in the invention name, abstract, claims, description, and the identified drawings of the description, as well as the combinations thereof, thereby further improving the customization of the patent text.
  • Classification accuracy of field indexing
  • Fig. 6 is a flow chart of a processing method for indexing a custom field to a file according to an exemplary embodiment. As shown in FIG. 6 , in other embodiments, the method may further include:
  • S604 Use the updated indexing rule to perform custom field indexing processing on the files within the file scope.
  • the indexing rules can also be automatically updated and optimized to continuously complete the indexing rules, so that the indexing results are more accurate.
  • the modifying the indexing rule of the custom field to obtain the updated indexing rule includes:
  • the parameter values and/or operation logic in the indexing rules are adjusted according to the results of the indexing processing to obtain the updated indexing rules.
  • the results of the indexing process can be used to automatically adjust the parameter values or logical operations in the indexing rules, and specifically, corresponding algorithms can be set or neural networks, deep learning, and iterative algorithms can be used.
  • the indexing rule field is used to mark to obtain the index. Citing results. Among them, if some indexing results that do not meet the expectations are adjusted, the records of the adjustment of the indexing results can be obtained. What is it, etc., automatically learn and optimize the indexing rules based on these recorded information. In this way, when subsequent automatic indexing processing is performed on the target file of the type, the processing can be performed according to the previously learned and optimized indexing rules, and a more accurate indexing processing result can be obtained.
  • the processing method for custom field indexing of a file provided by the embodiment of the present disclosure can convert multiple steps of a user to perform custom field indexing on a file into rules for custom fields and then automatically complete based on trigger instructions. Indexing, the user can complete the indexing work only by operating on one page, which effectively simplifies the operation steps. Moreover, since the indexing rule setting can be provided to all or some of the sub-items under the custom field, it can avoid duplication of similar work for different sub-items of the same type of custom field in a single file collection (such as a single favorite). Indexing operations.
  • indexing rules Once the indexing rules are set, they can be used in all file collections (such as under all favorites), realizing one-time rule setting and universal use, reducing the complexity of custom field indexing operations and reducing human resource consumption , improve the efficiency of indexing processing, and improve the service experience of users' patent operations.
  • steps in the flowcharts involved in the drawings are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, there is no strict order in the execution of these steps, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 2 may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution sequence of these steps or stages It also does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of steps or phases of other steps.
  • the present disclosure further provides a processing apparatus for indexing a custom field of a file.
  • the apparatuses may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc., which use the methods described in the embodiments of this specification, in combination with necessary implementation hardware apparatuses.
  • the apparatuses in one or more embodiments provided by the embodiments of the present disclosure are described in the following embodiments. Since the implementation solution of the device to solve the problem is similar to the method, the implementation of the specific device in the embodiment of the present specification can refer to the implementation of the foregoing method, and repeated details will not be repeated.
  • unit or “module” may be a combination of software and/or hardware that implements a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
  • FIG. 7 is a schematic structural diagram of a processing apparatus for indexing a file with a custom field according to an exemplary embodiment.
  • the device may be the aforementioned patent management system 110, or may be an individual server or a server cluster or the like.
  • the apparatus 100 may include:
  • a rule definition module 702 which can be used to determine an indexing rule of a custom field and a file scope using the indexing rule, wherein the indexing rule is used to classify all or part of the sub-items under the custom field Fields are set, and the types of the file range include; indexing all files;
  • the indexing processing module 704 may be configured to, in response to a trigger instruction of automatic indexing, use the indexing rule to perform indexing processing of custom fields on the files within the file scope.
  • the indexing rule includes:
  • the filter rules for word frequency set based on the number of occurrences of the filter words in the text field.
  • all or part of the rules in the indexing rules are associated based on logical operations of AND, OR, and NOT.
  • the word frequency filtering rule is further used for:
  • the document is a patent document
  • the word frequency filtering rule can also be used to perform word frequency filtering on text content of at least one of the following: the title of the invention, the abstract, and the claims. , the description, and the textual information contained in the identified drawings of the description.
  • the category of the file scope further includes:
  • the type of the custom field index includes:
  • FIG. 8 is a schematic structural diagram of a processing apparatus for indexing a file with a custom field according to an exemplary embodiment.
  • the apparatus may further include:
  • the rule modification module 802 may be configured to respond to the rule modification instruction, modify the indexing rules of the self-defined field, and obtain the updated indexing rules.
  • the modifying the indexing rule of the custom field to obtain the updated indexing rule includes:
  • the parameter values and/or operation logic in the indexing rules are adjusted according to the results of the indexing processing to obtain the updated indexing rules.
  • FIG. 9 is a schematic structural diagram of a processing apparatus for indexing a file with a custom field according to an exemplary embodiment.
  • the apparatus may further include:
  • the re-indexing module 902 may be configured to perform indexing processing of custom fields on the files within the file scope by using the indexing rules updated by the rule modification module 802 .
  • a computer program product including a computer program, which, when executed by a processor, implements the processing method for custom field indexing of a file described in any one of this specification.
  • FIG. 10 is a schematic structural diagram of a patent description information conversion processing device S00 according to an exemplary embodiment.
  • the device S00 may include the aforementioned server, patent management system, server cluster, distributed processing server, blockchain Servers, cloud computing platforms, etc., and combinations thereof.
  • device S00 may be a combination of one or more servers.
  • apparatus S00 includes a processing component S20, which further includes one or more processors, and a memory resource, represented by memory S22, for storing instructions, such as application programs, executable by the processing component S20.
  • the application program stored in the memory S22 may include one or more modules, each corresponding to a set of instructions.
  • the processing component S20 is configured to execute the instruction to execute the above-mentioned method that can be implemented on the side of the proxy server.
  • Device S00 may also include a power supply assembly S24 configured to perform power management of device S00, a wired or wireless network interface S26 configured to connect device S00 to a network, and an input output (I/O) interface S28.
  • Device S00 can operate based on an operating system stored in memory S22, such as Window12 12erver, Mac O12 X, Unix, Linux, FreeB12D or the like.
  • the above-mentioned device S00 may be an exemplary description of a data processing device, such as a patent management platform. In some data processing devices, it may not be necessary to include all the above components or all functional units under a certain component.
  • a computer-readable storage medium including instructions such as a memory S22 including instructions, is also provided, and the instructions are executable by the processing component S20 of the device S00 to complete the above method.
  • the storage medium may be a computer-readable storage medium such as ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk and optical data storage device, graphene, and the like.
  • the present disclosure further provides a patent management system, and the patent management system may include the apparatus described in any one of the embodiments of the present disclosure;
  • the processor of the patent management system executes the executable instructions stored in the memory, the processing method for custom field indexing of a file described in any one of the embodiments of the present disclosure is implemented;
  • the patent management system includes the computer program product described above.
  • a processing method and device for indexing a file with a custom field can be applied to a file classification management system.
  • a patent management system provided to users, the patent management system can build a patent database, provide a patent management interface, and realize the classification, storage, novelty search, analysis and update of patent data.
  • the classification management system of files usually needs to classify the files.
  • Fig. 11 is a flowchart showing a processing method for file classification according to an exemplary embodiment.
  • the present disclosure provides a document classification processing method, which can be used in the aforementioned patent management system, as shown in FIG. 11 , and can include the following steps.
  • different account types may be preset, and different account types have different operation rights in the patent management system or work space.
  • the first type of account can only index files and cannot convert the indexing data generated by the indexing into option fields or hierarchical fields.
  • the second type of account has the conversion operation authority, which can convert the indexing data into the target field type, and can also review the indexing data of the third type of account, such as modifying it.
  • the third type of account can have the highest authority, for example, it can have all the authority in the work space or the patent management system, and it can have the authority to assign some operation authority to different second type accounts according to certain rules, or to the second type of account. The conversion result of the account is reviewed, modified, etc.
  • the third type of account can also have the conversion operation authority and the authority to index files.
  • the corresponding account type can be set according to the requirements for file classification management.
  • the present disclosure also provides an implementation method for setting accounts with different permissions.
  • the operation authority of the operation account includes the account and corresponding authority set in the following manner:
  • the first account type which has the permission to index files and has no permission to convert
  • the second account type which modifies the data information generated by the processing of the file by the indexing account in the first account type, and has the authority to convert the field to be converted into a business field of a specified type;
  • the third account type has conversion operation authority, and assigns a conversion account number in the second account type according to a preset matching rule to allow the conversion account to convert the fields to be converted into business fields.
  • the first account type may be the same as the aforementioned first type of account, and may be provided to a specific person who performs initial classification processing of files, such as a person who performs indexing of files.
  • the account used by the indexing personnel may be called an indexing account, which belongs to the first account type.
  • the account in the second account type (may be referred to as a conversion account), with reference to the second type of account, the data information generated by the processing of the file by the index account in the first account type can be modified, and the Permission to convert fields to business fields of the specified type.
  • the third account type refer to the aforementioned third type of account, which can be set for different conversion accounts to convert fields to be converted into business fields.
  • a distribution account belonging to the third type of account type can set or assign a conversion account user33 to handle electronic patent document classification, which can indicate that the classification field after user33 converts the index data should be the electronic classification field.
  • the indexing personnel are usually one or more R&D or patent classification personnel, and may also include review and management personnel for patent management. Therefore, it may be data information obtained by processing file data by operating accounts that match different or the same operating authority.
  • the file data set may include data information of one or more files, usually including data information obtained by processing file data, such as index data.
  • the indexing data may include the overall classification description of the patent by the indexer, and may include the text language description of the patent classification input by the R&D or patent administrator. For example, "This patent is a cleaning device applied to a sweeping robot, which can automatically identify obstacles and automatically charge.”
  • indexing data can be used as a pending field for initial classification of patents using the patent management system.
  • the fields to be processed may also be option fields and/or hierarchical fields.
  • a patent publication (publication) number may be used to uniquely identify a patent, and all or part of the description information, text field, option field, and level field are represented by letters, for example, the text field may be A. , B, C, etc.
  • the operation account has the conversion operation permission and can perform field classification conversion between different field types.
  • the classification results after the conversion operation can be displayed for users to view.
  • FIG. 12 is a flowchart of a processing method for file classification according to another exemplary embodiment. As shown in Figure 12, the processing method may also include:
  • S206 In response to the modification operation of the second field, modify the second field in the file data set to obtain modified field data.
  • the file data set may include the modified field data of the second field, and during conversion, the modified field data is also converted.
  • Managers can review the indexing data made by each indexer in the patent management system. Generally, administrators need to obtain certain operation authority to facilitate the review and conversion of patent indexing data in a safe and centralized manner.
  • the administrator can modify it in the work space of the patent management system. For example, the text-type index data "cleaning device" is changed to "disinfecting device".
  • the patent management system can modify the index data to be adjusted in response to the modification operation of the index data (a kind of data information in the second field) to obtain the modified index data.
  • the reviewer can uniformly convert it through the patent management system to obtain the classification field of the target field type.
  • An apparatus for classifying and processing a file may include:
  • an authority identification module which can be used to identify the operation authority of the operating account, and the operation authority includes pre-set business contents that can be handled by different account types;
  • the data acquisition module can be used to acquire the corresponding file data set containing the second field according to the business field allocated for the operating account when the operating account has the conversion operation authority, wherein the data containing the second field is
  • the second field in the file data set includes data information obtained by processing the file data by operating accounts that match different or the same operating authority;
  • a first conversion module which can be used to convert the second field in the file data set into a classification field of the target field type in response to the first conversion operation of the second field;
  • the display module can be used to display the classification result after the conversion operation of the second field.
  • the apparatus may further include:
  • the modification module may be configured to modify the second field in the file data set in response to the modification operation of the second field to obtain modified field data.
  • each module can be implemented in the same one or more software and/or hardware, and the modules that implement the same function can also be implemented by a combination of multiple sub-modules or sub-units, etc. .
  • the apparatus embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the coupling, communication connection, etc. between the shown or described devices or units may be realized by direct and/or indirect coupling/connection, and may be achieved through some standard or custom interfaces, protocols, etc. Sexually, mechanically or otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开关于一种对文件进行自定义字段标引的处理方法、装置、服务器及系统、计算机程序产品。一个方法实施例中,可以将用户对文件进行自定义字段标引的多个步骤,转化为对自定义字段进行规则设定后基于触发指令自动完成标引,用户可以只要在一个页面操作即可完成标引工作,有效地简化操作步骤。并且,由于标引规则设定可以是提供给自定义字段下的所有子项,可以避免了在单个文件集合中重复地对同一类型自定义字段的不同子项逐一进行标引操作,可以实现一次规则设定,全局通用,降低了自定义字段的标引作业复杂性,减少人力资源消耗,提高标引处理效率,提高用户专利作业的服务使用体验。

Description

对文件进行自定义字段标引的处理方法、装置、服务器及系统 技术领域
本公开涉及专利数据处理的技术领域,尤其涉及一种对文件进行自定义字段标引的处理方法、装置、服务器及系统。
发明背景
目前,随着科技的不断创新与进步,专利申请的数量也越来越多。而专利资料的维护对业内专利申请方向、专利发展趋势、专利布局等具有重要的参考价值。
专利代理公司或专利申请、管理、运营等专利关联方(用户)维护专利数据时,通常需要对专利进行标引分类。目前一些相关技术中,用户可以自定义专利分类的字段(为便于描述,后续称为自定义字段),利用自定义字段逐个或对某一个类别的专利进行标引。例如,某个用户建立了“科目”的下拉菜单的自定义字段,并给它创建了9个具体子菜单:语文,数学,英语,政治,历史,地理,物理,化学,生物。而用户的收藏夹里面有1万条需要打标的专利。用户需要在1万条专利中找到与“语文”相关的专利,如有900条,并且把这些专利打上“语文”标引。相关技术中,用户需要进入专利收藏夹,然后进入过滤页面,之后设置专利属性的检索式检索出需要标引的900条专利,再选择“语文”标签,点击标引按钮后开始对着900条专利进行“语文”标签的标引。并且需要用户重复收到为其它8个科目的子菜单做类似“语文”标引操作的工作。由此可见,这样的标引过程相对较为复杂,标引效率较低,消耗较大的人力资源,用户专利作业的体验较差。
发明内容
本公开提供一种对文件进行自定义字段标引的处理方法、装置、服务器及系统,以至少解决相关技术中文件自定义字段的标引效率低的技术问题。本公开的技术方案如下:
一种对文件进行自定义字段标引的处理方法,包括:
确定自定义字段的标引规则以及使用所述标引规则的文件范围,其中所述标引规则用于对所述自定义字段下的全部或部分子项的分类字段进行设置,所述自定义字段用于对所述文件进行标引分类,所述文件范围的种类包括;对待标引文件集合中的全部文件进行标引;
响应自动标引的触发指令,利用所述标引规则对所述文件范围内的文件进行自定义字段的标引处理。
所述方法的另一个实施例中,所述标引规则包括词频过滤规则,所述自定义字段的字段类型包括文本字段,所述确定自定义字段的标引规则,包括:
对于所述字段类型为文本字段的自定义字段,基于文本字段中过滤词汇出现的次数设置词频过滤规则。
所述方法的另一个实施例中,所述标引规则中的全部或部分规则基于与、或、非的逻辑运算进行关联。
所述方法的另一个实施例中,所述词频过滤规则还用于:
对所述文件中文本内容出现的过滤词汇进行过滤,
和/或,
对所述文件中所包括的批复信息中出现的过滤词汇进行过滤,其中,所述批复信息包括下述中的至少一种:
文件中内容的注释信息;
文件中内容的批注信息;
文件中内容的备忘信息;
以及与所述注释信息、批注信息、备忘信息各自对应的回复信息。
所述方法的另一个实施例中,所述文件为专利文件,所述词频过滤规则还可以用于对至少下述之一的文本内容进行词频过滤:发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息。
所述方法的另一个实施例中,所述文件范围的种类还包括:
对所述待标引文件集合中未标引的文件进行标引、对所述待标引文件集合中新增的文件进行标引。
所述方法的另一个实施例中,所述方法还包括:
响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则;
或者,响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则之后,利用更新后的标引规则对所述文件范围内的文件进行自定义字段的标引处理。
所述方法的另一个实施例中,所述自定义字段的字段类型还包括:
选项字段和/或层级字段。
所述方法的另一个实施例中,所述对所述自定义字段的标引规则进行修改,得到更新后的标引规则包括:
根据标引处理的结果调整所述标引规则中的参数值和/或运算逻辑,得到更新后的标引规则。
本公开还提供一种文件分类的处理方法,包括:
识别操作账号的操作权限,所述操作权限包括预先设置的不同账号类型可处理的业务内容;
若所述操作账号具有转换操作权限,则根据为所述操作账号分配的业务字段获取对应的包含第二字段的文件数据集合,其中,所述的包含第二字段的文件数据集合中的第二字段,包括由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息;
响应第二字段的第一转换操作,将所述文件数据集合中的第二字段转换为目标字段类型的分类字段;
展示所述第二字段转换操作后的分类结果。
所述方法的另一个实施例中,所述方法还包括:
响应第二字段的修改操作,对所述文件数据集合中的第二字段进行修改,得到修改后的字段数据。
本公开还提供一种对文件进行自定义字段标引的处理装置,包括:
规则定义模块,用于确定自定义字段的标引规则以及使用所述标引规则的文件范围,其中所述标引规则用于对所述自定义字段下的全部或部分子项的分类字段进行设置,所述自定义字段用于对所述文件进行标引分类,所述文件范围的种类包括;对待标引文件集合中的全部文件进行标引;
标引处理模块,用于响应自动标引的触发指令,利用所述标引规则对所述文件范围内的文件进行自定义字段的标引处理。
所述装置的另一个实施例中,所述标引规则包括词频过滤规则,所述自定义字段的字段类型包括文本字段,所述确定自定义字段的标引规则,包括:
对于所述字段类型为文本字段的自定义字段,基于所述文本字段中过滤词汇出现的次数设置的词频过滤规则。
所述装置的另一个实施例中,所述标引规则中的全部或部分规则基于与、或、非的逻辑运算进行关联。
所述装置的另一个实施例中,所述词频过滤规则还用于:
对所述文件中文本内容出现的过滤词汇进行过滤,
和/或,
对所述文件中所包括的批复信息中出现的过滤词汇进行过滤,其中,所述批复信息包括下述中的至少一种:
文件中内容的注释信息;
文件中内容的批注信息;
文件中内容的备忘信息;
以及与所述注释信息、批注信息、备忘信息各自对应的回复信息。
所述装置的另一个实施例中,所述文件为专利文件,所述词频过滤规则还可以用于对至少下述之一的文本内容进行词频过滤:发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息。
所述装置的另一个实施例中,其中,所述文件范围的种类还包括:
对所述待标引文件集合中未标引的文件进行标引、对所述待标引文件集合中新增的文件进行标引。
所述装置的另一个实施例中,所述自定义字段的字段类型还包括:
选项字段和/或层级字段。
所述装置的另一个实施例中,所述装置还包括:
规则修改模块,用于响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则。
所述装置的另一个实施例中,所述装置还包括:
重标引模块,用于利用规则修改模块更新后的标引规则对所述文件范围内的文件进行自定义字段的标引处理。
所述装置的另一个实施例中,所述对所述自定义字段的标引规则进行修改,得到更新后的标引规则包括:
根据标引处理的结果调整所述标引规则中的参数值和/或运算逻辑,得到更新后的标引规则。
本公开还提供一种文件的分类处理装置,包括:
权限识别模块,用于识别操作账号的操作权限,所述操作权限包括预先设置的不同账号类型可处理的业务内容;
数据获取模块,用于在所述操作账号具有转换操作权限时,根据为所述操作账号分配的业务字段获取对应的包含第二字段的文件数据集合,其中,所述的包含第二字段的文件数据集合中的第二字段,包括由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息;
第一转换模块,用于响应第二字段的第一转换操作,将所述文件数据集合中的第二字段转换为目标字段类型的分类字段;
展示模块,用于展示所述第二字段转换操作后的分类结果。
所述装置的另一个实施例中,所述装置还包括:
修改模块,用于响应第二字段的修改操作,对所述文件数据集合中的第二字段进行修改,得到修改后的字段数据。
本公开实施例的另一方面,还提供一种服务器,包括:
至少一个处理器;
用于存储所述处理器可执行指令的存储器;
其中,所述处理器被配置为执行所述指令,以实现本公开任一项实施例所述的方法。
本公开实施例的另一方面,还提供一种计算机可读存储介质,当所述计算机可读存储介质中的指令被服务器的处理器执行时,使得所述服务器能够执行本公开任一项实施例所述的方法。
本公开实施例的另一方面,还提供一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序被处理器执行时实现本公开任一项实施例所述的方法。
本公开实施例的另一方面,还提供一种专利管理系统,所述专利管理系统包括本公开任一实施例所述的装置,或者专利管理系统的处理器执行存储器存储的可执行指令时,实现本公开任一项实施例所述的对文件进行自定义字段标引的处理方法;或者,所述专利管理系统包括本公开所述的计算机程序产品。
本公开的实施例提供的技术方案至少带来以下有益效果:
本公开实施例提供的技术方案中,可以将用户对文件进行自定义字段标引的多个步骤,转 化为对自定义字段进行规则设定后基于触发指令自动完成标引,用户可以只要在一个页面操作即可完成标引工作,有效地简化操作步骤。并且,由于标引规则设定可以是提供给自定义字段下的所有或部分子项,可以避免在单个文件集合中(如单个收藏夹)重复地对同一类型自定义字段的不同子项逐个进行标引操作。标引规则一旦设定后,可以在所有的文件集合中(如所有收藏夹下)使用,实现了一次规则设定,全局通用,降低了自定义字段的标引作业复杂性,减少了人力资源消耗,提高了标引处理效率,提高了用户专利作业的服务使用体验。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图简要说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。
图1是根据一示例性实施例示出的一种对文件进行自定义字段标引的处理方法的应用环境示意图。
图2是根据一示例性实施例示出的一种对文件进行自定义字段标引的处理方法的流程图。
图3是本公开提供的一种设置自动标引规则的界面示意图。
图4是本公开提供的一个实施例中词频过滤规则应用于批复信息的场景示意图。
图5是本公开提供的一个实施例中词频过滤规则应用于包含回复信息的批复信息的场景示意图。
图6是根据一示例性实施例示出的一种对文件进行自定义字段标引的处理方法的流程图。
图7是根据一示例性实施例示出的一个对文件进行自定义字段标引的处理装置结构示意图。
图8是根据一示例性实施例示出的一个对文件进行自定义字段标引的处理装置结构示意图。
图9是根据一示例性实施例示出的一个对文件进行自定义字段标引的处理装置结构示意图。
图10是根据一示例性实施例示出的一个专利描述信息的转换处理设备S00的结构示意图。
图11是根据一示例性实施例示出的一种文件分类的处理方法的流程图。
图12是根据另一示例性实施例示出的一种文件分类的处理方法的流程图。
实施本发明的方式
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。
示例性地,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。术语“包括”、“包含”或者其任何其它变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其它要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。例如若使用到第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
图1是根据一示例性实施例示出的一种对文件进行自定义字段标引的处理方法的应用环境示意图。本公开所提供的一种对文件进行自定义字段标引的处理方法,可以应用于如图1所示的应用环境中。例如提供给用户的专利管理系统110,专利管理系统110可以构建专利数据库,提供专利管理界面,实现对专利数据的标引、存储、查新、分析、更新。目前的一些专利管理系统,还停留在基于标引人员对专利自定义输入的标引数据或描述信息进行分类。即使用户已 经预先设置好对专利分类的自定义字段,如前述所述的语文、数学、体育等,对专利文件的分类仍然是如背景技术中所描述的逐个对单个类别的一批次专利进行标引分类,重复大量工作。尤其是在专利数量较大的应用场景中,目前的标引处理方式和流程效率低下,不能满足一些用户的专利管理需求。而本公开提供的技术方案可以应用于所述专利管理系统110中,可以提供包括但不限于专利文件的分类处理,在专利文件自定义字段的标引处理过程和流程上进行了优化,大大地提高了自定义字段标引的作业效率。
本公开实施例涉及的自定义字段的字段类型可以包括但不限于文本字段、选项字段、层级字段。所述文本字段可以包括用户输入的对文件的内容、关键技术等进行描述的信息内容,或者自定义的分类字段。通常文本字段可以由用户自由输入内容。所述的选项字段通常包括用户自定义的分类字段,选项字段中的分类字段可以是并列的分类关系,也可以是层级关系。通常选项字段为用户预先定义好的分类字段,可以以菜单的方式展示。一个文件可以归属于多个选项字段中的分类字段,如某专利属于A、B、C、D四个选项字段中的A和B,也可以属性选择字段A和B1,其中A、B、C、D均为一级分类,B1为一级分类B的子分类(二级分类)。所述的层级字段可以包括不同分类字段的从属或分级关系。专利文件标引后可以更新专利管理系统的专利数据库。本公开实施例中所述的专利管理系统110可以包括但不限于本地或远程的服务器、服务器集群、分布式分系统、云处理平台、包含区块链节点的服务器以及其组合的设备,也可以包括各种个人计算机、笔记本电脑、智能手机、平板电脑、可穿戴设备、车载设备、医疗设备等。
下面以专利文件为文件对专利管理系统的某个收藏夹下的专利进行标引的实施场景对本公开实施例方案进行说明。示例性地,本公开实施例方案并不限于对专利文件的自定义字段的标引处理,基于本公开的创新思想,本公开的实施例方案还可以用于其他文件类型的自定义字段的标引处理,如对论文、报刊、图书文件资料等,因此本公开的一些实施例中,所述文件可以包括专利文件。相应的,根据专利文件的实施例描述,下述实施例中所描述的专利词语也可以适应性的调整,如论文管理系统等。所述的标引可以包括通过对目标对象标记一个或多个分类标记,以指引人员或机器方便、快捷地找到所需要的信息。标引可以作为检索标识或者目标对象分类的依据,或者其标引本身可以标识对象所属的分类。图2是根据一示例性实施例示出的一种对文件进行自定义字段标引的处理方法的流程图,如图2所示,所述方法可以用于前述专利管理系统110中,可以包括以下步骤。
S202:确定自定义字段的标引规则以及使用所述标引规则的文件范围,其中所述标引规则用于对所述自定义字段下的全部或部分子项的分类字段进行设置,所述自定义字段用于对所述文件进行标引分类,所述文件范围的种类包括;对待标引文件集合中的全部文件进行标引。
S204:响应自动标引的触发指令,利用所述标引规则对所述文件范围内的文件进行自定义字段的标引处理。
本公开实施例的创新之一在于改变传统的自定义标引处理流程,可以预先定义设置标引规则,将用户对自定义字段标引的多个步骤,转化为对自定义字段进行规则设定后基于触发指令自动完成标引,用户可以只要在一个页面操作即可完成标引工作,有效地简化操作步骤。具体地可以提供规则设置或编辑界面,用户可以在界面中编辑自定义字段,给出自定义字段的具体的标引规则。更为重要的是,本公开实施例中,由于标引规则设定可以是提供给自定义字段下的所有或部分子项,可以避免在单个文件集合中(如单个收藏夹)重复地对同一类型自定义字段的不同子项逐一进行标引操作。标引规则一旦设定后,可以在所有的文件集合中(如所有收藏夹下)使用,实现了一次规则设定,全局通用。
除利用选择收藏夹的方式来确定文件集合外,所述文件可以包括预先选择或筛选出的一条或多条专利。如可以根据查询条件、过滤条件、空间配置、排序规则等,调用企业级搜索应用服务器(solr)查找出此次需要标引的专利,也可以以专利公开(公告)号列表的方式展示在作业空间界面中。
所述的自定义字段可以是文本字段,也可以是选项字段和/或层级字段。如之前对专利文件的描述信息/标引数据的转换处理得到的选项字段。所述目标字段类型可以包括用户自定义分类 设置的分类字段。本公开的另一个创新之处在于,自定义字段的一些字段类型中的分类字段可以是由用户自定义设置的,极大地方便了用户自定义所需转换的分类字段的类型。如,层级字段中根节点、中节点、叶子节点等各个节点分类字段的名称可以由用户根据分类需求进行自定义设置,可以不使用或者部分使用系统自带的分类规则,便于用户灵活的定义分类类型。
上述标引规则设定可以是提供给自定义字段下的部分子项,也可以通过指定不同的分类层级实现。如可以指定标引规则用于对一级分类“科目”下的所有子级分类为“语文”、“数学”、“体育”等,这样,一次设置好标引规则,通过一次自动标引的触发操作即可实现将文件自动分类到一级分类“科目”下的所有对应子级分类中,而无需分多次分别进行子级分类的标引操作。
所述文件范围的种类可以包括多种方式,可以表征本次设置的标引规则使用在哪些专利文件中。本公开实施例的一个实施例,文件范围的种类至少包括对全部文件进行标引,这样可以更加全面地、高效地从全局控制标引规则的使用范围,即,该标引规则可以用于对所有文件进行标引,包括之前已经标引过的文件。当然,其他的实施例中,所述文件范围的种类还可以包括:
对所述待标引文件集合中未标引的文件进行标引、对所述待标引文件集合中新增的文件进行标引。
另一些实施例中,对已经标引的文件可以不再使用本次设置的标引规则进行重新标引,可以仅对未标引的文件进行标引,避免对之前已经标引好的目标覆盖。另一些实施例中,类似地,也可以设置仅对新增的文件使用本次设置的标引规则进行标引。当然,用户若觉得本次标引规则设置的更加合理,则可以全部覆盖标引。由此可见,本公开的一些实施例,不仅可以设置标引规则,还可以同时选择使用标引规则的文件范围,可以更加灵活的选址标引规则的使用场景,满足不同的用户需求和标引处理需求,提示用户体验。
本公开的另一些实施例中,针对文本类型的专利字段,可以支持词频设置,用户可以设置专利文本中出现过滤词汇的次数,如1、2、3、4、5次,参考过滤次数设置标引规则。例如,若某个专利文件中出现过滤词汇“汽车”的次数达到某个阈值,如10次,则可以将该专利标引为“汽车”。当然,词频设置还可以结合其他的标引规则共同决策文件所述的分类。具体地,本公开提供的所述方法的另一个实施例中,所述标引规则包括:
对于文本类型的自定义字段,基于文本字段中过滤词汇出现的次数设置的词频过滤规则。
一个标引规则中可以包括多个词频过滤规则。多个词频过滤规则之间可以通过与或非逻辑运算进行关联。与或非逻辑关联的过滤规则也可以视为一种词频过滤规则或者逻辑运算规则。例如,专利文件中出现过滤词汇“WORD1”的次数达到20,并且出现过滤词汇“WORD2”的词汇达到10,则将该专利标引为“汽车”。
如图3所述,图3是本公开提供的一种设置自动标引规则的界面示意图。标引规则可以包括多个规则,如上述所述,不同的规则之间可以通过逻辑运算进行关联。因此,所述标引规则中的全部或部分规则基于与、或、非的逻辑运算进行关联。
本公开提供的所述方法的另一个实施例中,所述词频过滤规则还用于:
对所述文件中文本内容出现的过滤词汇进行过滤,
和/或,
对所述文件中所包括的批复信息中出现的过滤词汇进行过滤,其中,所述批复信息包括下述中的至少一种:
文件中内容的注释信息;
文件中内容的批注信息;
文件中内容的备忘信息;
以及与所述注释信息、批注信息、备忘信息各自对应的回复信息。
图4是本公开提供的一个实施例中词频过滤规则应用于批复信息的场景示意图。在本实施例中,词频过滤规则不仅可以针对专利文件中专利文本内容的字段进行逻辑运算,还支持针对文件中所做的注释、批注、备忘等批复信息进行逻辑运算,这些批复信息可以为一个或多个不 同用户添加的信息,如图4所示。
不仅如此,词频过滤规则还可以包括对这些注释、批注、备忘等批复信息所做的答复信息。图5是本公开提供的一个实施例中词频过滤规则应用于包含回复信息的批复信息的场景示意图。如图5所示,User1对User2批注的回复内容。本公开实施例在对文件进行自定义字段标引中,进一步考虑到这些批复信息,如有批复信息对应的回复信息还可以进一步包括这些回复信息,可以进一步地提高文件自定义字段标引的准确性,使得文件的分类结果更加可靠。
本公开还提供一种针对文件的词频过滤规则的处理方式。具体地,所述方法的另一个实施例中,所述文件为专利文件,所述词频过滤规则还可以用于对文件的指定的一个或多个内容部分进行词频过滤,对至少下述之一的文本内容进行词频过滤:发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息。其中,可以通过ORC或其他图像识别算法识别出说明书附图中所包含的文字信息。本公开实施例方案针对专利文本进行词频过滤,可以对发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息以及其中组合进行词频过滤,进一步提高专利文本的自定义字段标引的分类准确性。
图6是根据一示例性实施例示出的一种对文件进行自定义字段标引的处理方法的流程图。如图6所示,另一些实施例中,所述方法还可以包括:
S602:响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则;
S604:利用更新后的标引规则对所述文件范围内的文件进行自定义字段标引处理。
用户可以对标引规则进行重新修改,可以重新边界标引规则,提高了标引规则设置的灵活性。可以利用更新后的标引规则对后续处理的文件进行自定义字段的标引处理。另一方面,也可以利用更新后的标引规则重新对本次文件范围内的文件进行自定义字段的标引处理,覆盖文件之前的标引。
另一个实施例方式中,所述的标引规则也可以自动进行更新、优化,不断地完整标引规则,使得标引的结果更加准确。具体地,所述方法的另一个实施例中,所述对所述自定义字段的标引规则进行修改,得到更新后的标引规则包括:
根据标引处理的结果调整所述标引规则中的参数值和/或运算逻辑,得到更新后的标引规则。
本实施例中,可以利用标引处理的结果自动调整标引规则中的参数值或逻辑运算,具体地可以设置相应的算法或者使用神经网络、深度学习、迭代算法等。一个示例性的实施方式中,如对某种类型的文件进行标引或对某些文件的特定位置的内容(如专利的标题)进行标引规则设置后,利用标引规则字段打标得到标引结果。其中,对不符合预期的部分标引结果进行了调整,则可以获取标引结果调整的记录,处理器根据调整的记录,如对哪些文件进行的调整,调整的内容是什么,调整后的结果是什么等,根据这些记录信息自动学习、优化标引规则。这样,后续再对类型的目标文件进行自动标引处理时,可以根据之前学习、优化的标引规则进行处理,得到更加准确的标引处理结果。
本公开实施例提供的对文件进行自定义字段标引的处理方法,可以将用户对文件进行自定义字段标引的多个步骤,转化为对自定义字段进行规则设定后基于触发指令自动完成标引,用户可以只要在一个页面操作即可完成标引工作,有效地简化操作步骤。并且,由于标引规则设定可以是提供给自定义字段下的所有或部分子项,可以避免在单个文件集合中(如单个收藏夹)对同一类型自定义字段的不同子项相似工作的重复标引操作。标引规则一旦设定后,可以在所有的文件集合中(如所有收藏夹下)使用,实现了一次规则设定,全局通用,降低了自定义字段标引作业的复杂性,减少人力资源消耗,提高标引处理效率,提高用户专利作业的服务使用体验。
示例性地,说明书中上述方法的各个实施例均采用递进的方式描述,各个实施例之间相同/相似的部分互相参见即可,每个实施例重点说明的都是与其它实施例的不同之处。相关之处参见其他方法实施例的描述说明即可。
示例性地,虽然附图中涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没 有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,附图2中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的步骤或者阶段的至少一部分轮流或者交替地执行。
基于上述所述的对文件进行自定义字段标引的处理方法实施例的描述,本公开还提供一种对文件进行自定义字段标引的处理装置。所述装置可以包括使用了本说明书实施例所述方法的系统(包括分布式系统)、软件(应用)、模块、组件、服务器、客户端等并结合必要的实施硬件的装置。基于同一创新构思,本公开实施例提供的一个或多个实施例中的装置如下面的实施例所述。由于装置解决问题的实现方案与方法相似,因此本说明书实施例具体的装置的实施可以参见前述方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图7是根据一示例性实施例示出的一个对文件进行自定义字段标引的处理装置结构示意图。所述装置可以为前述所述专利管理系统110,也可以为单独的服务器或服务器集群等。具体的可以参照图7,该装置100可以包括:
规则定义模块702,可以用于确定自定义字段的标引规则以及使用所述标引规则的文件范围,其中所述标引规则用于对所述自定义字段下的全部或部分子项的分类字段进行设置,所述文件范围的种类包括;对全部文件进行标引;
标引处理模块704,可以用于响应自动标引的触发指令,利用所述标引规则对所述文件范围内的文件进行自定义字段的标引处理。
本公开提供的所述装置的另一个实施例中,所述标引规则包括:
对于文本类型的自定义字段,基于文本字段中过滤词汇出现的次数设置的词频过滤规则。
本公开提供的所述装置的另一个实施例中,所述标引规则中的全部或部分规则基于与、或、非的逻辑运算进行关联。
本公开提供的所述装置的另一个实施例中,所述词频过滤规则还用于:
对所述文件中文本内容出现的过滤词汇进行过滤,
和/或,
对所述文件中所包括的批复信息中出现的过滤词汇进行过滤,其中,所述批复信息包括下述中的至少一种:
文件中内容的注释信息;
文件中内容的批注信息;
文件中内容的备忘信息;
以及与所述注释信息、批注信息、备忘信息各自对应的回复信息。
本公开提供的所述装置的另一个实施例中,所述文件为专利文件,所述词频过滤规则还可以用于对至少下述之一的文本内容进行词频过滤:发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息。
本公开提供的所述装置的另一个实施例中,所述文件范围的种类还包括:
对未标引的文件进行标引、对新增的文件进行标引。
本公开提供的所述装置的另一个实施例中,所述自定义字段标引的类型包括:
选项字段和/或层级字段。
一示例性实施例如图8所示,图8是根据一示例性实施例示出的一个对文件进行自定义字段标引的处理装置结构示意图。参照图8,本公开提供的所述装置的另一个实施例中,所述装置还可以包括:
规则修改模块802,可以用于响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则。
所述装置的另一个实施例中,所述对所述自定义字段的标引规则进行修改,得到更新后的标引规则包括:
根据标引处理的结果调整所述标引规则中的参数值和/或运算逻辑,得到更新后的标引规则。
一示例性实施例如图9所示,图9是根据一示例性实施例示出的一个对文件进行自定义字段标引的处理装置结构示意图。参照图9,本公开提供的所述装置的另一个实施例中,所述装置还可以包括:
重标引模块902,可以用于利用规则修改模块802更新后的标引规则对所述文件范围内的文件进行自定义字段的标引处理。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
在示例性实施例中,还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现本说明书中任一项所述的对文件进行自定义字段标引的处理方法。
图10是根据一示例性实施例示出的一个专利描述信息的转换处理设备S00的结构示意图,设备S00可以包括如前述所述的服务器、专利管理系统、服务器集群、分布式处理服务器、区块链服务器、云计算平台等以及其组合。例如,设备S00可以为一个或多个服务器的组合。参照图10,设备S00包括处理组件S20,其进一步包括一个或多个处理器,以及由存储器S22所代表的存储器资源,用于存储可由处理组件S20执行的指令,例如应用程序。存储器S22中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件S20被配置为执行指令,以执行上述可以实施于代理服务端一侧的方法。
设备S00还可以包括一个电源组件S24,电源组件S24被配置为执行设备S00的电源管理,一个有线或无线网络接口S26被配置为将设备S00连接到网络,和一个输入输出(I/O)接口S28。设备S00可以操作基于存储在存储器S22的操作系统,例如Window12 12erver,Mac O12 X,Unix,Linux,FreeB12D或类似。
示例性地,上述设备S00可以是数据处理设备的示例性描述,如专利管理平台。在一些数据处理设备中,可以不必包含上述全部组件或某个组件下的全部功能单元。
在示例性实施例中,还提供了一种包括指令的计算机可读存储介质,例如包括指令的存储器S22,上述指令可由设备S00的处理组件S20执行以完成上述方法。存储介质可以是计算机可读存储介质,例如,所述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备、石墨烯等。
基于前述方法、装置、计算机程序产品的实施例描述,本公开还一种专利管理系统,所述专利管理系统可以包括本公开任意一个实施例所述的装置;
或者,专利管理系统的处理器执行存储器存储的可执行指令时,实现本公开任意一个实施例所述的对文件进行自定义字段标引的处理方法;
或者,所述专利管理系统包括上述所述的计算机程序产品。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其它实施例的不同之处。尤其,对于硬件+程序类实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的,上述所述的装置、设备、服务器等根据方法实施例的描述还可以包括其它的实施方式,具体的实现方式可以参照相关方法实施例的描述。同时各个方法以及装置、设备、服务器实施例之间特征的相互组合组成的新的实施例仍然属于本公开所涵盖的实施范围之内,在此不作一一赘述。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
上述所述的对文件进行自定义字段标引的处理方法、装置,可以应用于文件的分类管理系统中。例如提供给用户的专利管理系统,专利管理系统可以构建专利数据库,提供专利管理界面,实现对专利数据的分类、存储、查新、分析、更新。文件的分类管理系统通常需要对文件进行分类处理。图11是根据一示例性实施例示出的一种文件分类的处理方法的流程图。本公开提供一种文件分类的处理方法,所述方法可以用于前述专利管理系统中,如图11所示,可 以包括以下步骤。
S0:识别操作账号的操作权限,所述操作权限包括预先设置的不同账号类型可处理的业务内容。
本公开实施例中可以预先设置不同账号类型,不同账号类型在专利管理系统或作业空间中有不同的操作权限。例如第一类账号只能对文件进行标引而不能将标引产生的标引数据转换为选项字段或层级字段。第二类账号具有转换操作权限,可以将标引数据转换为目标字段类型,也可以对第三类账号的标引数据进行审核,如进行修改。第三类账号可以具有最高的权限,如可以具有在作业空间或专利管理系统中的所有权限,可以具有按照一定的规则为不同的第二类账号分配一些操作权限的权限,或者对第二类账号的转换结果进行审核、修改等,当然,第三类账号也可以具有转换操作权限及对文件进行标引的权限等待。
具体地可以根据对文件分类管理的需求设置相应的账号类型。本公开还提供一种设置不同权限的账号的实施方法。具体地,所述方法的另一个实施例中,所述操作账号的操作权限包括采用下述方式设置的账号以及对应的权限:
第一账号类型,具有对文件进行标引的权限、无转换操作的权限;
第二账号类型,对所述第一账号类型中的标引账号对文件进行处理产生的数据信息进行修改,以及具有将待转换字段转换为指定类型的业务字段的权限;
第三账号类型,具有转换操作权限,以及根据预设的匹配规则为所述第二账号类型中的转换账号分配允许所述转换账号将待转换字段转换为业务字段。
所述的第一账号类型可以如同前述的第一类账号,可以提供给具体的对文件进行初始分类处理的人员,如对文件进行标引的作用人员。标引人员使用的账号可以称为标引账号,属于第一账号类型。所述的第二账号类型中账号(可以称为转换账号),参考前述第二类账号,可以修改所述第一账号类型中的标引账号对文件进行处理产生的数据信息,以及具有将待转换字段转换为指定类型的业务字段的权限。第三账号类型参考前述第三类账号,可以为不同的转换账号设置其可以将待转换字段转换为业务字段。例如属于第三类账号类型的某个分配账号可以设置或分配某个转换账号user33处理电子类的专利文件分类,可以表示user33将标引数据转换后的分类应该是电子类的分类字段。
当然,本公开不限于还有其他的设置不同账号的不同权限的实施方案。
S2:若所述操作账号具有转换操作权限,则根据为所述操作账号分配的业务字段获取对应的包含第二字段的文件数据集合,其中,所述的包含第二字段的文件数据集合中的第二字段,包括由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息。
标引的人员通常为一个或多个研发或专利分类处理的人员,也可以包括对专利管理的审核、管理人员等。因此,可以是由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息。
所述文件数据集合中可以包括一个或多个文件的数据信息,通常包括对文件数据进行处理获得的数据信息,如标引数据。所述的标引数据可以包括标引人员对专利整体的分类描述,可以包括研发或专利管理人员输入的对专利分类的文本语言描述。如“该专利是应用在扫地机器人上的清洁装置,可以自动识别障碍物并自动充电”。一些应用场景中,标引数据可以作为使用专利管理系统对专利初次分类的待处理字段。另一些实施场景中,待处理字段也可以为选项字段和/或层级字段。为便于描述,本公开的一些实施例中可以使用专利公开(公告)号来唯一标识专利,描述信息、文本字段、选项字段、层级字段中的全部或部分使用字母表示,如文本字段可以为A、B、C等。
S4:响应第二字段的第一转换操作,将所述文件数据集合中的第二字段转换为目标字段类型的分类字段。
操作账号具有转换操作权限,可以执行不同字段类型之间的字段分类转换。
S6:展示所述第二字段转换操作后的分类结果。
不同类型的字段转换后,可以展示转换操作后的分类结果,以供用户查看。
进一步的,图12是根据另一示例性实施例示出的一种文件分类的处理方法的流程图。如图 12所示,该处理方法还可以包括:
S206:响应第二字段的修改操作,对所述文件数据集合中的第二字段进行修改,得到修改后的字段数据。相应的,若对第二字段进行了修改,则所述文件数据集合可以包括所述第二字段修改后的字段数据,那么在转换时,转换的也是修改后的字段数据。
管理人员可以在专利管理系统中对各个标引人员做出的标引数据进行审核。一般的,管理人员需要获取一定的操作权限,以便于安全、集中的对专利标引数据的审核和转换处理。本实施例应用场景中,若发现需要对标引数据进行修改,则管理人员可以在专利管理系统的作业空间中进行修改。如将文本类型的标引数据“清洁装置”修改为“消毒装置”。专利管理系统可以响应标引数据(第二字段的一种数据信息)的修改操作,对需要调整的标引数据进行修改,得到修改后标引数据。
审核人员对标引数据审核或校准之后,可以通过专利管理系统统一进行转换,得到目标字段类型的分类字段。
一示例性实施例示出的一个文件的分类处理装置,可以包括:
权限识别模块,可以用于识别操作账号的操作权限,所述操作权限包括预先设置的不同账号类型可处理的业务内容;
数据获取模块,可以用于在所述操作账号具有转换操作权限时,根据为所述操作账号分配的业务字段获取对应的包含第二字段的文件数据集合,其中,所述的包含第二字段的文件数据集合中的第二字段,包括由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息;
第一转换模块,可以用于响应第二字段的第一转换操作,将所述文件数据集合中的第二字段转换为目标字段类型的分类字段;
展示模块,可以用于展示所述第二字段转换操作后的分类结果。
在一示例性实施例中,所述装置还可以包括:
修改模块,可以用于响应第二字段的修改操作,对所述文件数据集合中的第二字段进行修改,得到修改后的字段数据。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书一个或多个时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或描述的装置或单元相互之间的耦合、通信连接等可以是直接和/或间接耦合/连接的方式实现,可以是通过一些标准或自定义的接口、协议等,是电性,机械或其它的形式实现。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。

Claims (27)

  1. 一种对文件进行自定义字段标引的处理方法,包括:
    确定自定义字段的标引规则以及使用所述标引规则的文件范围,其中所述标引规则用于对所述自定义字段下的全部或部分子项的分类字段进行设置,所述自定义字段用于对所述文件进行标引分类,所述文件范围的种类包括:对待标引文件集合中的全部文件进行标引;
    响应自动标引的触发指令,利用所述标引规则对所述文件范围内的文件进行自定义字段的标引处理。
  2. 根据权利要求1所述的方法,其中,所述标引规则包括词频过滤规则,所述自定义字段的字段类型包括文本字段,所述确定自定义字段的标引规则,包括:
    对于所述字段类型为文本字段的自定义字段,基于所述文本字段中过滤词汇出现的次数设置词频过滤规则。
  3. 根据权利要求1或2所述的方法,其中,所述标引规则中的全部或部分规则基于与、或、非的逻辑运算进行关联。
  4. 根据权利要求2或3所述的方法,其中,所述词频过滤规则还用于:
    对所述文件中文本内容出现的过滤词汇进行过滤,
    和/或,
    对所述文件中所包括的批复信息中出现的过滤词汇进行过滤,其中,所述批复信息包括下述中的至少一种:
    文件中内容的注释信息;
    文件中内容的批注信息;
    文件中内容的备忘信息;
    以及与所述注释信息、批注信息、备忘信息各自对应的回复信息。
  5. 根据权利要求4所述的方法,其中,所述文件为专利文件,所述词频过滤规则还可以用于对至少下述之一的文本内容进行词频过滤:发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息。
  6. 根据权利要求1至5任一项所述的方法,其中,所述文件范围的种类还包括:
    对所述待标引文件集合中未标引的文件进行标引、对所述待标引文件集合中新增的文件进行标引。
  7. 根据权利要求1至6任一项所述的方法,其中,所述方法还包括:
    响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则;
    或者,响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则之后,利用所述更新后的标引规则对所述文件范围内的文件进行自定义字段的标引处理。
  8. 根据权利要求2至7所述的方法,其中,所述自定义字段的字段类型还包括:
    选项字段和/或层级字段。
  9. 根据权利要求7或8所述的方法,其中,所述对所述自定义字段的标引规则进行修改,得到更新后的标引规则包括:
    根据所述标引处理的结果调整所述标引规则中的参数值和/或运算逻辑,得到更新后的标引规则。
  10. 一种文件分类的处理方法,包括:
    识别操作账号的操作权限,所述操作权限包括预先设置的不同账号类型可处理的业务内容;
    若所述操作账号具有转换操作权限,则根据为所述操作账号分配的业务字段获取对应的包含第二字段的文件数据集合,其中,所述的包含第二字段的文件数据集合中的第二字段,包括由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息;
    响应第二字段的第一转换操作,将所述文件数据集合中的第二字段转换为目标字段类型的分类字段;
    展示所述第二字段转换操作后的分类结果。
  11. 根据权利要求10所述的方法,所述方法还包括:
    响应第二字段的修改操作,对所述文件数据集合中的第二字段进行修改,得到修改后的字段数据。
  12. 一种对文件进行自定义字段标引的处理装置,包括:
    规则定义模块,用于确定自定义字段的标引规则以及使用所述标引规则的文件范围,其中所述标引规则用于对所述自定义字段下的全部或部分子项的分类字段进行设置,所述自定义字段用于对所述文件进行标引分类,所述文件范围的种类包括;对待标引文件集合中的全部文件进行标引;
    标引处理模块,用于响应自动标引的触发指令,利用所述标引规则对所述文件范围内的文件进行自定义字段的标引处理。
  13. 根据权利要求12所述的装置,其中,所述标引规则包括词频过滤规则,所述自定义字段的字段类型包括文本字段,所述确定自定义字段的标引规则,包括:
    对于所述字段类型为文本字段的自定义字段,基于所述文本字段中过滤词汇出现的次数设置的词频过滤规则。
  14. 根据权利要求12或13所述的装置,其中,所述标引规则中的全部或部分规则基于与、或、非的逻辑运算进行关联。
  15. 根据权利要求13或14所述的装置,其中,所述词频过滤规则还用于:
    对所述文件中文本内容出现的过滤词汇进行过滤,
    和/或,
    对所述文件中所包括的批复信息中出现的过滤词汇进行过滤,其中,所述批复信息包括下述中的至少一种:
    文件中内容的注释信息;
    文件中内容的批注信息;
    文件中内容的备忘信息;
    以及与所述注释信息、批注信息、备忘信息各自对应的回复信息。
  16. 根据权利要求15所述的装置,其中,所述文件为专利文件,所述词频过滤规则还可以用于对至少下述之一的文本内容进行词频过滤:发明名称、摘要、权利要求、说明书、识别出的说明书附图中所包含的文字信息。
  17. 根据权利要求13至16任一项所述的装置,其中,所述文件范围的种类还包括:
    对所述待标引文件集合中未标引的文件进行标引、对所述待标引文件集合中新增的文件进行标引。
  18. 根据权利要求13至17任一项所述的装置,其中,所述自定义字段的字段类型还包括:
    选项字段和/或层级字段。
  19. 根据权利要求12至18中任一项所述的装置,其中,所述装置还包括:
    规则修改模块,用于响应规则修改指令,对所述自定义字段的标引规则进行修改,得到更新后的标引规则。
  20. 根据权利要求12至19中任一项所述的装置,其中,所述装置还包括:
    重标引模块,用于利用规则修改模块更新后的标引规则对所述文件范围内的文件进行自定义字段的标引处理。
  21. 根据权利要求19或20所述的装置,其中,所述对所述自定义字段的标引规则进行修改,得到更新后的标引规则包括:
    根据所述标引处理的结果调整所述标引规则中的参数值和/或运算逻辑,得到更新后的标引规则。
  22. 一种文件的分类处理装置,包括:
    权限识别模块,用于识别操作账号的操作权限,所述操作权限包括预先设置的不同账号类型可处理的业务内容;
    数据获取模块,用于在所述操作账号具有转换操作权限时,根据为所述操作账号分配的业务字段获取对应的包含第二字段的文件数据集合,其中,所述的包含第二字段的文件数据集合中的第二字段,包括由匹配不同或相同操作权限的操作账号对文件数据进行处理获得的数据信息;
    第一转换模块,用于响应第二字段的第一转换操作,将所述文件数据集合中的第二字段转换为目标字段类型的分类字段;
    展示模块,用于展示所述第二字段转换操作后的分类结果。
  23. 根据权利要求22所述的装置,所述装置还包括:
    修改模块,用于响应第二字段的修改操作,对所述文件数据集合中的第二字段进行修改,得到修改后的字段数据。
  24. 一种服务器,包括:
    至少一个处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现如权利要求1至11中任一项所述的方法。
  25. 一种计算机可读存储介质,当所述计算机可读存储介质中的指令被服务器的处理器执行时,使得所述服务器能够执行如权利要求1至11中任一项所述的方法。
  26. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至11任一项所述的方法。
  27. 一种专利管理系统,包括权利要求12至23中任意一项所述的装置;
    或者,专利管理系统的处理器执行存储器存储的可执行指令时,实现如权利要求1至11中任一项所述的方法;
    或者,所述专利管理系统包括权利要求26所述的计算机程序产品。
PCT/CN2022/080011 2021-03-09 2022-03-09 对文件进行自定义字段标引的处理方法、装置、服务器及系统 WO2022188821A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110254317.8 2021-03-09
CN202110254317.8A CN113095039A (zh) 2021-03-09 2021-03-09 对文件自定义字段标引的处理方法、装置、服务器

Publications (1)

Publication Number Publication Date
WO2022188821A1 true WO2022188821A1 (zh) 2022-09-15

Family

ID=76666588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/080011 WO2022188821A1 (zh) 2021-03-09 2022-03-09 对文件进行自定义字段标引的处理方法、装置、服务器及系统

Country Status (2)

Country Link
CN (1) CN113095039A (zh)
WO (1) WO2022188821A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095039A (zh) * 2021-03-09 2021-07-09 智慧芽信息科技(苏州)有限公司 对文件自定义字段标引的处理方法、装置、服务器

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655866A (zh) * 2009-08-14 2010-02-24 北京中献电子技术开发中心 科技术语的自动化抽取方法
CN101692240A (zh) * 2009-08-14 2010-04-07 北京中献电子技术开发中心 一种基于规则的专利摘要自动抽取和关键词标引方法
US20110289091A1 (en) * 2010-05-18 2011-11-24 Salesforce.Com, Inc. Methods and Systems for Providing Multiple Column Custom Indexes In A Multi-Tenant Database Environment
CN106354861A (zh) * 2016-09-06 2017-01-25 中国传媒大学 电影标签自动标引方法及自动标引系统
CN108197118A (zh) * 2018-02-05 2018-06-22 齐鲁工业大学 一种利用计算机系统进行自动标引及检索的方法
CN108268253A (zh) * 2017-05-05 2018-07-10 平安科技(深圳)有限公司 接口代码生成方法及终端设备
CN112307718A (zh) * 2020-11-25 2021-02-02 北京邮电大学 一种基于文本特征和语法规则的pdf全自动标引系统及方法
CN113095039A (zh) * 2021-03-09 2021-07-09 智慧芽信息科技(苏州)有限公司 对文件自定义字段标引的处理方法、装置、服务器

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10210553B4 (de) * 2002-03-09 2004-08-26 Xtramind Technologies Gmbh Verfahren zum automatischen Klassifizieren eines Textes durch ein Computersystem
JP5527845B2 (ja) * 2010-08-20 2014-06-25 Kddi株式会社 文書情報の文章的特徴及び外形的特徴に基づく文書分類プログラム、サーバ及び方法
CN105573968A (zh) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 基于规则的文本标引方法
CN109241276B (zh) * 2018-07-11 2022-03-08 河海大学 文本中词语分类方法、言语创造性评价方法和系统
CN111782601A (zh) * 2020-06-08 2020-10-16 北京海泰方圆科技股份有限公司 电子文件的处理方法、装置、电子设备及机器可读介质
CN111814431A (zh) * 2020-06-15 2020-10-23 开易(北京)科技有限公司 一种复杂数据标注方法及装置
CN112380838A (zh) * 2020-10-29 2021-02-19 武汉蝉略科技有限公司 一种基于大数据的专利文件智能标引方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655866A (zh) * 2009-08-14 2010-02-24 北京中献电子技术开发中心 科技术语的自动化抽取方法
CN101692240A (zh) * 2009-08-14 2010-04-07 北京中献电子技术开发中心 一种基于规则的专利摘要自动抽取和关键词标引方法
US20110289091A1 (en) * 2010-05-18 2011-11-24 Salesforce.Com, Inc. Methods and Systems for Providing Multiple Column Custom Indexes In A Multi-Tenant Database Environment
CN106354861A (zh) * 2016-09-06 2017-01-25 中国传媒大学 电影标签自动标引方法及自动标引系统
CN108268253A (zh) * 2017-05-05 2018-07-10 平安科技(深圳)有限公司 接口代码生成方法及终端设备
CN108197118A (zh) * 2018-02-05 2018-06-22 齐鲁工业大学 一种利用计算机系统进行自动标引及检索的方法
CN112307718A (zh) * 2020-11-25 2021-02-02 北京邮电大学 一种基于文本特征和语法规则的pdf全自动标引系统及方法
CN113095039A (zh) * 2021-03-09 2021-07-09 智慧芽信息科技(苏州)有限公司 对文件自定义字段标引的处理方法、装置、服务器

Also Published As

Publication number Publication date
CN113095039A (zh) 2021-07-09

Similar Documents

Publication Publication Date Title
US11651032B2 (en) Determining semantic content of textual clusters
US11347783B2 (en) Implementing a software action based on machine interpretation of a language input
US8209407B2 (en) System and method for web service discovery and access
AU2019204976B2 (en) Intelligent data ingestion system and method for governance and security
US11645345B2 (en) Systems and methods for issue tracking systems
DE112018005462T5 (de) Anomalie-erkennung unter verwendung von cognitive-computing
US20070244921A1 (en) Method, apparatus and computer-readable medium to provide customized classification of documents in a file management system
US8812544B2 (en) Enterprise content management federation and integration system
US11176184B2 (en) Information retrieval
US20150127688A1 (en) Facilitating discovery and re-use of information constructs
US20150295939A1 (en) System and method for evaluating a reverse query
WO2021119175A1 (en) Determining semantic content of textual clusters
EP3594822A1 (en) Intelligent data ingestion system and method for governance and security
CN106294520A (zh) 使用从文档提取的信息来标识关系
US11048767B2 (en) Combination content search
WO2022188821A1 (zh) 对文件进行自定义字段标引的处理方法、装置、服务器及系统
CN112286916B (zh) 一种数据处理方法、装置、设备及存储介质
US11409520B2 (en) Custom term unification for analytical usage
Möller et al. Towards an architecture to support data access in research data spaces
CN108205564B (zh) 知识体系构建方法及系统
US20180336242A1 (en) Apparatus and method for generating a multiple-event pattern query
CN113111179B (zh) 文件的分类处理方法、装置、服务器及系统
WO2022188820A1 (zh) 文件的分类处理方法、装置、服务器、系统及计算机程序产品
Alm et al. Towards integration and management of contextualized information in the manufacturing environment by digital annotations
CN118312573B (zh) 一种数据建模方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22766343

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22766343

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 22766343

Country of ref document: EP

Kind code of ref document: A1