CN113127603B

CN113127603B - Intellectual property case source identification method, device, equipment and storage medium

Info

Publication number: CN113127603B
Application number: CN202110485741.3A
Authority: CN
Inventors: 林少康
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-04-18
Anticipated expiration: 2041-04-30
Also published as: CN113127603A

Abstract

The application relates to the technical field of artificial intelligence, and discloses a method, a device, equipment and a storage medium for identifying a case source of intellectual property, wherein the method comprises the following steps: inputting intellectual property document and case source data to be identified into a case source mining model to extract case types and acquit index sets to obtain case types to be analyzed and acquit index sets to be analyzed; performing early warning grade calculation according to the early warning grade calculation rule base, the to-be-analyzed case and the to-be-analyzed conviction index set to obtain an early warning grade to be analyzed; judging whether the early warning level to be analyzed is within the early warning level threshold range or not; and when the early warning grade judgment result is within the early warning grade threshold value range, taking the intellectual property document source data to be identified as target intellectual property document source data. The method and the system realize automatic identification of whether the intellectual property document and case source data are set up or not, improve case source mining efficiency, improve mining accuracy and improve case rate.

Description

Intellectual property case source identification method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technology, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a case source of intellectual property rights.

Background

The administrative punishment documents related to the intellectual property protection field mostly use text information as carriers, the information quantity and the propagation speed bring great challenges to the acquisition and mining of intellectual property clues for the intellectual property protection field inspection under the background of 'Internet +', the inspection business related to the intellectual property protection field at present mostly uses the traditional manual carding and the identification of the administrative punishment documents to acquire case source information which violates the intellectual property protection field, the mode needs to consume huge labor cost and time cost, the work is not high in efficiency, the accuracy is low, and the case rate is low.

Disclosure of Invention

The application mainly aims to provide a method, a device, equipment and a storage medium for identifying a case source of intellectual property rights, and aims to solve the technical problem that in the prior art, inspection business relating to the field of intellectual property right protection adopts manual carding and identification of administrative penalty documents to obtain case source information violating the field of intellectual property right protection, and the case rate is low due to low accuracy.

In order to achieve the above object, the present application provides a method for identifying a case source of intellectual property, the method comprising:

acquiring intellectual property document and case source data to be identified;

inputting the intellectual property document source data to be identified into a document source mining model for carrying out pattern by type and acquit index set extraction to obtain a pattern by type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data;

acquiring an early warning grade calculation rule base, and performing early warning grade calculation according to the early warning grade calculation rule base, the pattern type to be analyzed and the acquit index set to be analyzed to obtain an early warning grade to be analyzed corresponding to the intellectual property file source data;

acquiring a pre-warning grade threshold range, and judging whether the pre-warning grade to be analyzed is within the pre-warning grade threshold range;

and when the early warning grade judgment result is within the early warning grade threshold value range, taking the intellectual property document file source data to be identified as target intellectual property file source data.

Further, before the step of inputting the intellectual property document source data to be identified into a document source mining model for extracting a document type and an acquit index set to obtain a document type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data, the method further comprises:

obtaining a first training sample set, training an initial model by using the first training sample set, and taking the initial model after training as a first model, wherein the first model comprises: extracting rules from case classification rules and crime indexes;

obtaining a set of validation samples, each validation sample in the set of validation samples comprising: the first intellectual property document sample marking data and the first case are calibrated by type;

respectively inputting the labeling data of the first intellectual property document sample corresponding to each verification sample into the first model to extract case type and guilt index sets, and obtaining case type predicted values and guilt index set predicted values corresponding to each verification sample in the verification sample sets;

carrying out classification correctness judgment and identification validity judgment according to the case routing type predicted value and the acquit index set predicted value to obtain a document sample data set with failure prediction;

obtaining a second training sample set corresponding to the text sample data set with the prediction failure, wherein each second training sample in the second training sample set comprises: labeling data of a second intellectual property document sample;

updating the feature word stock of the first model according to the second training sample set to obtain a second model;

generating a feature vector according to the second model and the second training sample set to obtain a feature vector set to be processed corresponding to the second training sample set;

and adding the feature vector set to be processed into a feature vector library of the second model to obtain the case source mining model.

Further, the step of performing classification correctness judgment and identification validity judgment according to the case base type predicted value and the acquit index set predicted value to obtain a document sample data set with prediction failure includes:

classifying and judging according to the first case routing type calibration value and the case routing type prediction value of each verification sample respectively to obtain a classification judgment result;

when the classification judgment result is wrong, taking the first intellectual property document sample labeling data of all the verification samples with the wrong classification judgment result as a document sample data set with the wrong classification;

when the classification judgment result is correct, acquiring the early warning grade calculation rule base, and performing early warning grade calculation by adopting the early warning grade calculation rule base according to the case type predicted value and the acquaintance index set predicted value corresponding to each verification sample with the correct classification judgment result to obtain an early warning grade calculation result;

when the early warning level calculation result is failure, taking all the verification samples with the early warning level calculation result as a document sample data set with failure in identification;

and combining the document sample data set with the wrong classification with the document sample data set with the failed identification to obtain the document sample data set with the failed prediction.

Further, the step of obtaining the early warning level calculation rule base when the classification judgment result is correct, and performing early warning level calculation by using the early warning level calculation rule base according to the case type prediction value and the acquit index set prediction value corresponding to each of the verification samples for which the classification judgment result is correct to obtain an early warning level calculation result includes:

when the classification judgment result is correct, taking the first intellectual property document sample labeling data of all the verification samples with the correct classification judgment result as a document sample data set with correct classification;

acquiring a first intellectual property document sample marking data from the document sample data set with correct classification as intellectual property document sample marking data to be calculated;

adopting the early warning grade calculation rule base to carry out early warning grade calculation according to the case trend type predicted value and the acquit index set predicted value corresponding to the intellectual property document sample marking data to be calculated;

when the early warning level calculation is successful, determining that the early warning level calculation result corresponding to the intellectual property document sample marking data to be calculated is successful, otherwise, determining that the early warning level calculation result corresponding to the intellectual property document sample marking data to be calculated is failure;

and repeating the step of acquiring the first intellectual property document sample marking data from the correctly classified document sample data set as the intellectual property document sample marking data to be calculated until the acquisition of the first intellectual property document sample marking data in the correctly classified document sample data set is completed.

Further, the step of updating the feature lexicon of the first model according to the second training sample set to obtain a second model includes:

acquiring a case source feature word set corresponding to the second training sample set;

and updating the case source feature word set to the feature word library of the first model to obtain the second model.

Further, the step of generating a feature vector according to the second model and the second training sample set to obtain a feature vector set to be processed corresponding to the second training sample set includes:

dividing the second intellectual property document sample annotation data of each second training sample by adopting the second model to obtain intellectual property document sample division data corresponding to each second intellectual property document sample annotation data;

and generating a feature vector according to all the word segmentation data of the intellectual property document sample by adopting a vector space model to obtain the feature vector set to be processed.

Further, the step of generating feature vectors according to all the word segmentation data of the intellectual property document sample by using a vector space model to obtain the feature vector set to be processed includes:

taking all the word segmentation data of the intellectual property document sample as a set to obtain a feature word set to be analyzed;

and converting the feature words in the feature word set to be analyzed from a high latitude high sparse space to a low dimension dense space by adopting a vector space model and a TF-IDF formula to obtain the feature vector set to be processed, wherein the TF-IDF formula is used for carrying out weight calculation on the feature words in the feature word set to be analyzed.

The invention also provides a case source identification device of intellectual property, which comprises:

the data acquisition module is used for acquiring intellectual property document and case source data to be identified;

a case type and acquit index extraction module, configured to input the intellectual property document source data to be identified into a case source mining model to perform case type and acquit index set extraction, so as to obtain a case type to be analyzed and an acquit index set to be analyzed, which correspond to the intellectual property document source data;

the early warning grade determining module is used for acquiring an early warning grade calculation rule base, and performing early warning grade calculation according to the early warning grade calculation rule base, the case type to be analyzed and the acquit index set to be analyzed to obtain an early warning grade to be analyzed corresponding to the intellectual property file source data;

the judging module is used for acquiring a pre-warning grade threshold range and judging whether the pre-warning grade to be analyzed is within the pre-warning grade threshold range;

and the target intellectual property case source data determining module is used for taking the intellectual property case source data to be identified as the target intellectual property case source data when the early warning grade judgment result is within the early warning grade threshold range.

The invention also proposes a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.

The invention also proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of the above-mentioned.

The intellectual property case source identification method, the device, the equipment and the storage medium have the advantages that intellectual property case source data to be identified are input into a case source mining model to be extracted according to case by type and acquit index set, the case by type to be analyzed and the acquit index set to be analyzed corresponding to the intellectual property case source data are obtained, an early warning grade calculation rule base is obtained, early warning grade calculation is carried out according to the early warning grade calculation rule base and the case by type to be analyzed and the acquit index set to be analyzed, the early warning grade to be analyzed is obtained, an early warning grade threshold range is obtained, and whether the early warning grade to be analyzed is within the early warning grade threshold range is judged; when the early warning level judgment result is within the early warning level threshold range, the intellectual property document source data to be identified is used as the target intellectual property document source data, so that whether the intellectual property document source data is set or not is automatically identified, the case source mining efficiency is improved, the mining accuracy is improved, and the case rate is improved.

Drawings

Fig. 1 is a schematic flow chart illustrating a method for identifying a case source of intellectual property according to an embodiment of the present application;

FIG. 2 is a block diagram of a device for intellectual property case source identification according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the object of the present application will be further explained with reference to the embodiments, and with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to solve the technical problems that in the prior art, an inspection service relating to the field of intellectual property protection adopts manual carding and administrative punishment document identification to obtain case source information violating the field of intellectual property protection, and the case rate is low due to low accuracy, the application provides a case source identification method of intellectual property, the method is applied to the technical field of artificial intelligence, and the method is further applied to the technical field of artificial intelligence natural language processing. The intellectual property case source identification method comprises the steps of firstly adopting a case source mining model to extract case type and acquit index sets of intellectual property case source data, then adopting an early warning grade calculation rule base to carry out early warning grade calculation according to the case type and acquit index extraction results, and finally determining whether the intellectual property case source data needs to be checked and managed according to an early warning grade threshold range and an early warning grade calculation result, so that automatic identification of whether the intellectual property case source data is set or not is realized, the case source mining efficiency is improved, the mining accuracy is improved, and the case rate is improved.

Referring to fig. 1, in an embodiment of the present application, a method for identifying a case source of intellectual property is provided, the method including:

s1: acquiring intellectual property document and case source data to be identified;

s2: inputting the intellectual property document source data to be identified into a document source mining model for carrying out pattern by type and acquit index set extraction to obtain a pattern by type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data;

s3: acquiring an early warning grade calculation rule base, and performing early warning grade calculation according to the early warning grade calculation rule base, the pattern type to be analyzed and the acquit index set to be analyzed to obtain an early warning grade to be analyzed corresponding to the intellectual property file source data;

s4: acquiring a pre-warning grade threshold range, and judging whether the pre-warning grade to be analyzed is within the pre-warning grade threshold range;

s5: and when the early warning grade judgment result is within the early warning grade threshold value range, taking the intellectual property document file source data to be identified as target intellectual property file source data.

The method comprises the steps of inputting intellectual property document source data to be identified into a document source mining model to extract a scheme by type and an acquit index set, obtaining a scheme by type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data, obtaining an early warning grade calculation rule base, carrying out early warning grade calculation according to the early warning grade calculation rule base and the scheme by type to be analyzed and the acquit index set to be analyzed, obtaining an early warning grade threshold range, and judging whether the early warning grade to be analyzed is within the early warning grade threshold range; when the early warning level judgment result is within the early warning level threshold range, the intellectual property document source data to be identified is used as the target intellectual property document source data, so that the automatic identification of whether the intellectual property document source data is set is realized, the case source mining efficiency is improved, the mining accuracy is improved, and the case rate is improved.

For S1, intellectual property document source data to be identified input by a user can be obtained, the intellectual property document source data to be identified can also be obtained from a database, and the intellectual property document source data to be identified can also be obtained from a third-party application system. For example, the administrative penalty document data in the intellectual property field may be acquired from an administrative unit as the source data of the intellectual property document to be identified, or the administrative penalty document data in the intellectual property field may be acquired from the internet through an internet crawler as the source data of the intellectual property document to be identified.

The source data of the intellectual property document to be identified is the source data of the intellectual property document to be identified, namely whether the intellectual property document needs to be checked and managed by the officer or not.

The intellectual property document source data may be an administrative penalty document of the intellectual property, a civil action document of the intellectual property, or a criminal action document of the intellectual property, and is not specifically limited herein. The administrative punishment document of intellectual property rights is a written legal document which is made by an administrative authority aiming at the illegal behaviors of the intellectual property rights of parties and records the illegal facts of the parties, punishment reasons, bases, decisions and other matters with legal force on the basis of investigating and obtaining evidence to master the illegal evidence, and the written legal document is the administrative punishment document.

Civil litigation documents of intellectual property rights and criminal litigation documents of intellectual property rights are documents which are legally made in the process of consulting civil court, parties and other litigation participants in the field of intellectual property rights and have legal effectiveness or legal significance.

The intellectual property rights include: trade mark, patent, copyright.

And S2, inputting the intellectual property document source data to be identified into a document source mining model for carrying out document order type and acquit index set extraction, taking the result of the document order type extraction output by the document source mining model as the document order type to be analyzed corresponding to the intellectual property document source data, and taking the result of the acquit index set extraction output by the document source mining model as the acquit index set to be analyzed corresponding to the intellectual property document source data.

Types of protocols include, but are not limited to: criminals of counterfeit registered trademarks, goods which sell counterfeit registered trademarks, criminal identification criminals of illegally manufactured/sold illegally manufactured registered trademarks, infringement of copyright, and sale of infringement copies.

The indicators of conviction include, but are not limited to: trademark comparison result, trademark type, illegal operation amount, illegal result amount, existence of unsold condition, sale amount, unsold value amount, registered trademark identification number, copy number and value amount.

It can be understood that each case type to be analyzed corresponds to one guilt index set to be analyzed. The set of the acquit indexes to be analyzed includes values of one or more acquit indexes.

And S3, acquiring an early warning level calculation rule base input by a user, acquiring the early warning level calculation rule base from a database, and acquiring the early warning level calculation rule base from a third-party application system.

The early warning level calculation rule base comprises: and each case type corresponds to one early warning level calculation rule set. The early warning level calculation rule set comprises one or more calculation rules.

The early warning grade calculation rule set is a set of calculation rules of early warning grades of case sources determined by experts according to case types and related laws and regulations of intellectual property rights. For example, the case is a crime with a type of counterfeit registered trademark, and the crime standard elements are set as follows: the illegal operation amount and the illegal result amount, and the early warning level calculation rule is as follows: five stars: reach the amount of the crime entry standard, four stars: the amount is within 10% of the standard deviation degree of the conviction, three stars: the amount is within 20% of the conviction standard deviation degree, two stars: the amount is within 35% of the conviction standard deviation degree, one star: the amount is within 60% of the crime standard deviation, and the example is not limited to this.

Acquiring an early warning grade calculation rule set from the early warning grade calculation rule base according to the case type to be analyzed to obtain an early warning grade calculation rule set to be analyzed; and performing early warning grade calculation according to the early warning grade calculation rule set to be analyzed and the acquit index set to be analyzed, and taking the early warning grade obtained by calculation as the early warning grade to be analyzed.

For S4, the early warning level threshold range input by the user can be obtained, the early warning level threshold range can also be obtained from a database, and the early warning level threshold range can also be obtained from a third-party application system.

The early warning level threshold range comprises: early warning grade starting value and early warning grade ending value.

Judging whether the early warning grade to be analyzed is within the early warning grade threshold range, wherein when the early warning grade to be analyzed is within the early warning grade threshold range, the intellectual property document source data to be identified is a set case source and needs to be pushed to a detection officer for handling; and when the early warning grade to be analyzed is not within the early warning grade threshold range, the data of the intellectual property document source to be identified is not a set case source and does not need to be pushed to a checker for handling.

For example, the early warning level setting includes, in order from high to low: the early warning level threshold range is more than three stars, the early warning level judgment result is determined to be a case source when the early warning level to be analyzed is one or two stars, and the early warning level judgment result is determined to be a non-case source when the early warning level to be analyzed is any one of three, four and five stars, which is not specifically limited in this example.

For S5, when the early warning grade to be analyzed is within the early warning grade threshold range, the intellectual property document file source data to be identified needs to be handled by a detection officer, and at the moment, the intellectual property document file source data to be identified is used as target intellectual property file source data and pushed to the detection officer. Therefore, automatic identification of whether to set up a case for the intellectual property document and case source data is realized.

In an embodiment, before the step of inputting the intellectual property document source data to be identified into a document source mining model for extracting a document type and an acquit index set to obtain a document type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data, the method further includes:

s21: obtaining a first training sample set, training an initial model by using the first training sample set, and taking the initial model after training as a first model, wherein the first model comprises: extracting rules from case classification rules and crime indexes;

s22: obtaining a set of validation samples, each validation sample in the set of validation samples comprising: marking data and a first case by type calibration value for a first intellectual property document sample;

s23: respectively inputting the labeling data of the first intellectual property document sample corresponding to each verification sample into the first model to extract case type and guilt index sets, and obtaining case type predicted values and guilt index set predicted values corresponding to each verification sample in the verification sample sets;

s24: carrying out classification correctness judgment and identification effectiveness judgment according to the case pattern predicted value and the incrimination index set predicted value to obtain a document sample data set with failure prediction;

s25: obtaining a second training sample set corresponding to the text sample data set with the prediction failure, wherein each second training sample in the second training sample set comprises: labeling data of a second intellectual property document sample;

s26: updating the feature word stock of the first model according to the second training sample set to obtain a second model;

s27: generating a feature vector according to the second model and the second training sample set to obtain a feature vector set to be processed corresponding to the second training sample set;

s28: and adding the feature vector set to be processed into a feature vector library of the second model to obtain the case source mining model.

According to the embodiment, a training sample is adopted to train an initial model to obtain a first model, then a verification sample is adopted to carry out classification correctness judgment and identification validity judgment on the first model, a feature word bank of the first model is updated according to the verification sample failed in judgment to obtain a second model, feature vectors are generated according to the verification sample failed in judgment and the second model, the generated feature vectors are updated to a feature vector bank of the second model, and therefore a case source mining model is obtained.

For S21, a first training sample set input by the user may be obtained, the first training sample set may also be obtained from a database, and the first training sample set may also be obtained from a third-party application system.

Each first training sample in the first set of training samples comprises: the method comprises the steps of marking data of an intellectual property document sample to be analyzed, case type calibration values to be analyzed and acquit index set calibration values to be analyzed. Each first training sample comprises: the system comprises intellectual property document sample marking data to be analyzed, a case type calibration value to be analyzed and an acquit index set calibration value to be analyzed.

After the keywords are manually labeled by the identifiers in the intellectual property document file source data, the labeled identifiers and the intellectual property document file source data are used as intellectual property document file sample labeling data to be analyzed.

The first model includes: rules are extracted from the type classification rules and the conviction indicators, so that the first model is a rule model.

The specific method steps of training the initial model by using the first training sample set and using the initial model after training as the first model are not repeated here.

In the same first training sample, the case type calibration value to be analyzed is the result of the case type calibration of the intellectual property document sample marking data to be analyzed, and the entrusting index set calibration value to be analyzed is the result of the calibration of the entrusting index set of the intellectual property document sample marking data to be analyzed.

For S22, a verification sample set input by the user may be obtained, or the verification sample set may be obtained from a database, or the verification sample set may be obtained from a third-party application system.

After the keywords are manually marked by the identifiers in the intellectual property document file source data, the marked identifiers and the intellectual property document file source data are used as first intellectual property document sample marking data.

Each proof sample includes a first intellectual property document sample label data and a first case designation.

In the same verification sample, the first case designation value is a result of designation of the case designation type for designating data of the first intellectual property document sample.

And for S23, inputting the first intellectual property document sample marking data corresponding to each verification sample into the first model to extract case type and guilt index set, taking each extracted case type as a case type predicted value, and taking each extracted guilt index set as a guilt index set predicted value.

For S24, performing classification correctness judgment according to the first case routing calibration value and the case routing type prediction value of the same verification sample, taking the verification sample as the document sample data with failed prediction when the classification correctness judgment is wrong, performing identification validity judgment according to the case routing type prediction value and the incrustation index set prediction value of the verification sample when the classification correctness judgment is correct, and taking the verification sample as the document sample data with failed recognition when the identification validity judgment is failed; and determining a document sample data set with failed prediction according to the document sample data with failed prediction and the document sample data with failed identification.

And S25, acquiring a second training sample set determined by the user according to the text sample data set with the prediction failure.

Manually adjusting and labeling the keywords by using identifiers in the annotation data of the intellectual property document sample to be adjusted according to the failure reason of the annotation data of the intellectual property document sample to be adjusted, and taking the identifiers after the adjustment and the annotation data of the intellectual property document sample to be adjusted as the annotation data of the second intellectual property document sample, wherein the annotation data of the intellectual property document sample to be adjusted is any one of the annotation data of the first intellectual property document sample in the document sample set with the failure prediction.

The failure reasons of the intellectual property document sample marking data to be adjusted comprise: classification failure and early warning level calculation failure.

When the classification correctness of the intellectual property document sample annotation data to be adjusted is judged to be wrong, determining that the failure reason of the intellectual property document sample annotation data to be adjusted is classification failure; and when the intellectual property document sample annotation data to be adjusted comes from the state that the classification correctness is judged to be correct and the identification validity is judged to be failed, determining that the failure reason of the intellectual property document sample annotation data to be adjusted is the failure of the early warning level calculation.

For step S26, case source feature words related to intellectual property unlawful acts, criminal names, and the like are extracted from the second intellectual property document sample labeling data of the second training sample set, the extracted case source feature words are updated to the feature word library of the first model, and the updated first model is used as the second model.

It is understood that the feature words in the feature word library of the first model may be optimized, deleted, or new words may be added.

And for S27, performing word segmentation on the second intellectual property document sample labeling data of each second training sample by adopting the second model, performing feature vector generation according to word segmentation results by adopting a vector space model, and taking all generated feature vectors as the feature vector set to be processed. That is, each word in the word segmentation result corresponds to one feature vector in the feature vector set to be processed.

For step S28, the feature vector set to be processed is added to the feature vector library of the second model, so as to update the feature vector library of the second model, and the second model to which the feature vector set to be processed is added is used as the case source mining model. Therefore, the case source mining model becomes an element analysis model, and the case source mining model calls data in the feature vector library when extracting case type and guilt index set. The feature vector set to be processed is added into the feature vector library of the second model, so that the communication capacity of the second model and the machine is improved, and the communication capacity of the case source mining model and the machine is improved.

In one embodiment, the step of obtaining a document sample data set with a failed prediction by performing classification correctness judgment and identification validity judgment according to the case type prediction value and the crime entry index set prediction value comprises:

s241: classifying and judging according to the first case routing type calibration value and the case routing type prediction value of each verification sample respectively to obtain a classification judgment result;

s242: when the classification judgment result is wrong, taking the first intellectual property document sample labeling data of all the verification samples with the wrong classification judgment result as a document sample data set with the wrong classification;

s243: when the classification judgment result is correct, acquiring the early warning grade calculation rule base, and performing early warning grade calculation by adopting the early warning grade calculation rule base according to the case type prediction value and the acquit index set prediction value corresponding to each verification sample with the correct classification judgment result to obtain an early warning grade calculation result;

s244: when the early warning level calculation result is failure, taking all the verification samples with the early warning level calculation result as a document sample data set with failure in identification;

s245: and combining the wrongly classified document sample data set and the unsuccessfully identified document sample data set to obtain the unsuccessfully predicted document sample data set.

According to the embodiment, classification correctness judgment is performed first, and then identification validity judgment is performed, so that a document sample data set with prediction failure is identified, and a basis is provided for subsequent accurate model optimization.

For S241, when the first pattern calibration value and the pattern type prediction value of the target verification sample are the same, determining that the classification judgment result corresponding to the target verification sample is correct; when the first case type calibration value and the case type prediction value of a target verification sample are different, determining that the classification judgment result corresponding to the target verification sample is wrong; the target verification sample is any one of said verification samples.

In S242, when the classification determination result is an error, it means that the extraction of the pattern type performed by the first model is an error, and at this time, the correct extraction of the pattern type is not performed on the first intellectual property document sample label data of the verification sample of which the classification determination result is an error, and therefore, the first intellectual property document sample label data of each verification sample of which the classification determination result is an error is used as a document sample data with an error classification, and all document sample data with an error classification are used as a document sample data set with an error classification.

For S243, when the classification determination result is correct, it means that the case type of the first model is correct, at this time, correct case type extraction is performed on the first intellectual property document sample annotation data of the verification sample, which is incorrect in the classification determination result, and at this time, the identification validity determination may be performed again.

The early warning level calculation rule base input by the user can be obtained, the early warning level calculation rule base can also be obtained from a database, and the early warning level calculation rule base can also be obtained from a third-party application system.

And performing early warning grade calculation according to the case type predicted value and the acquit index set predicted value corresponding to the verification sample to be identified by adopting the early warning grade calculation rule base, determining that the early warning grade calculation result is successful when the early warning grade is successfully calculated, and determining that the early warning grade calculation result is failed when the early warning grade is not successfully calculated, wherein the verification sample to be identified is any one verification sample with the correct classification judgment result.

For S244, when the early warning level calculation result is a failure, it means that the early warning level cannot be successfully calculated, and at this time, the correct acquit index set cannot be extracted, so that the first intellectual property document sample label data of each verification sample whose early warning level calculation result is a failure may be used as a document sample data with a failure in identification, and all document sample data with a failure in identification may be used as a document sample data set with a failure in identification.

And S245, combining the document sample data set with the classification error and the document sample data set with the identification failure into a set, and taking the set obtained by combination as the document sample data set with the prediction failure.

In an embodiment, the step of obtaining the early warning level calculation rule base when the classification judgment result is correct, and performing early warning level calculation by using the early warning level calculation rule base according to the case type prediction value and the acquit index set prediction value corresponding to each of the verification samples for which the classification judgment result is correct to obtain an early warning level calculation result includes:

s2431: when the classification judgment result is correct, taking the first intellectual property document sample labeling data of all the verification samples with the correct classification judgment result as a document sample data set with correct classification;

s2432: acquiring a first intellectual property document sample marking data from the document sample data set with correct classification as intellectual property document sample marking data to be calculated;

s2433: adopting the early warning grade calculation rule base to carry out early warning grade calculation according to the case trend type predicted value and the acquit index set predicted value corresponding to the intellectual property document sample marking data to be calculated;

s2434: when the calculation of the early warning grade is successful, determining that the calculation result of the early warning grade corresponding to the annotation data of the intellectual property document sample to be calculated is successful, or else, determining that the calculation result of the early warning grade corresponding to the annotation data of the intellectual property document sample to be calculated is failed;

s2435: and repeatedly executing the step of acquiring the first intellectual property document sample marking data from the correctly classified document sample data set as the intellectual property document sample marking data to be calculated until the acquisition of the first intellectual property document sample marking data in the correctly classified document sample data set is completed.

In this embodiment, the early warning level calculation rule base is adopted to perform early warning level calculation according to the case type predicted value and the acquit index set predicted value corresponding to each correct verification sample according to the classification judgment result, so that whether correct acquit index set is extracted is accurately judged, and a basis is provided for subsequent accurate model optimization.

For S2431, when the classification determination result is correct, it means that the first model execution pattern type is correct, the first intellectual property document sample label data of each verification sample with the correct classification determination result is used as a document sample data with correct classification, and a document sample data set with correct classification is obtained according to all document sample data with correct classification.

And S2432, sequentially acquiring a piece of first intellectual property document sample labeling data from the correctly classified document sample data set as intellectual property document sample labeling data to be calculated.

And for S2433, the early warning grade calculation rule base is adopted, each early warning grade is calculated according to the case type predicted value and the acquit index set predicted value corresponding to the intellectual property document sample marking data to be calculated, when one early warning grade is calculated successfully, the early warning grade is calculated successfully, and when none of the early warning grades is calculated successfully, the early warning grade is calculated unsuccessfully.

For S2434, when the calculation of the early warning level is successful, it means that the early warning level is successfully calculated, and at this time, it may be determined that the calculation result of the early warning level corresponding to the annotation data of the intellectual property document sample to be calculated is successful; when the calculation of the early warning level fails, it means that the early warning level is not successfully calculated, and at this time, it can be determined that the calculation result of the early warning level corresponding to the annotation data of the intellectual property document sample to be calculated is a failure.

And S2435, repeating steps S2432 to S2435 until the acquisition of the first intellectual property document sample marking data in the correctly classified document sample data set is completed.

In an embodiment, the step of updating the feature lexicon of the first model according to the second training sample set to obtain a second model includes:

s261: acquiring a case source feature word set corresponding to the second training sample set;

s262: and updating the case source feature word set to the feature word library of the first model to obtain the second model.

According to the embodiment, the case source feature words are extracted from the second training sample set to update the feature word library of the first model, and a basis is provided for subsequently improving the word segmentation capability of the second model.

And S261, acquiring a case source feature word set determined by the user according to the second training sample set.

The case source feature word set is a representative case source feature word set related to intellectual property illegal behaviors, criminal names and the like.

For step S262, the case source feature word set is added to the feature word library of the first model, and the first model to which the case source feature word set is added is used as the second model, so that the word segmentation capability of the second model on the second training sample set is improved.

In an embodiment, the step of performing feature vector generation according to the second model and the second training sample set to obtain a feature vector set to be processed corresponding to the second training sample set includes:

s271: dividing the second intellectual property document sample annotation data of each second training sample by adopting the second model to obtain intellectual property document sample division data corresponding to each second intellectual property document sample annotation data;

s272: and generating a feature vector according to all the word segmentation data of the intellectual property document sample by adopting a vector space model to obtain the feature vector set to be processed.

In this embodiment, a vector space model is adopted, and feature vectors are generated according to the second model and the second training sample set, so that a basis is provided for subsequently updating a feature vector library of the second model.

For S271, the second intellectual property document sample labeling data of the second training sample to be participled is input into the second model for participling, and data obtained by the participling is used as the intellectual property document sample participling data corresponding to the second intellectual property document sample labeling data corresponding to the second training sample to be participled, where the second training sample to be participled is any one of the second training samples in the second training sample set.

And S272, generating a feature vector of each feature word in all the intellectual property document sample word segmentation data according to all the intellectual property document sample word segmentation data by adopting a vector space model, and taking all the generated feature vectors as the feature vector set to be processed. That is to say, each feature word in all the intellectual property document sample word segmentation data corresponds to one feature vector in the feature vector set to be processed.

In an embodiment, the step of generating a feature vector according to all the word segmentation data of the sample of the intellectual property document by using a vector space model to obtain the feature vector set to be processed includes:

s2721: taking all the word segmentation data of the intellectual property document sample as a set to obtain a feature word set to be analyzed;

s2722: and converting the characteristic words in the characteristic word set to be analyzed from a high latitude high sparse space to a low dimension dense space by adopting a vector space model and a TF-IDF formula to obtain the characteristic vector set to be processed, wherein the TF-IDF formula is used for carrying out weight calculation on the characteristic words in the characteristic word set to be analyzed.

According to the embodiment, a vector space model and a TF-IDF formula are adopted, the feature words in the feature word set to be analyzed are converted from a high latitude high sparse space to a low dimension dense space to generate feature vectors, and a foundation is provided for subsequently improving the communication capacity between a second model and a machine.

For S2721, taking all the word segmentation data of the intellectual property document sample as a set, and taking the set as a feature word set to be analyzed.

For S2722, a vector space model is adopted, the feature words in the feature word set to be analyzed are converted from a high latitude high sparse space to a low dimension dense space, a TF-IDF formula is adopted to perform weight calculation on the feature words in the feature word set to be analyzed when the high latitude high sparse space is converted to the low dimension dense space, finally, digital variables which can be directly understood and processed by a computer are generated, each generated digital variable is used as a feature vector, and all feature vectors are used as the feature vector set to be processed.

The specific steps of performing the conversion from the high latitude high sparse space to the low dimension dense space on the feature words in the feature word set to be analyzed by using the vector space model are not repeated herein.

It is understood that the set of feature vectors to be processed is a set of digital variables that can be understood and processed by a computer.

The TF-IDF formula, namely, terrm Frequency And inverse Document Frequency, TF is the word Frequency, and IDF is the inverse word Frequency.

Referring to fig. 2, the present application also proposes an intellectual property case source identification apparatus, the apparatus comprising:

a data obtaining module 100, configured to obtain source data of an intellectual property document to be identified;

a case order type and acquit index extraction module 200, configured to input the intellectual property document source data to be identified into a case order source mining model to perform case order type and acquit index set extraction, so as to obtain a case order type to be analyzed and an acquit index set to be analyzed, where the case order type and the acquit index set correspond to the intellectual property document source data;

the early warning grade determining module 300 is configured to obtain an early warning grade calculation rule base, perform early warning grade calculation according to the early warning grade calculation rule base, the pattern type to be analyzed, and the acquaintance index set to be analyzed, and obtain an early warning grade to be analyzed corresponding to the intellectual property file source data;

the judging module 400 is configured to obtain an early warning level threshold range, and judge whether the early warning level to be analyzed is within the early warning level threshold range;

and the target intellectual property case source data determining module 500 is configured to, when the early warning level judgment result is within the early warning level threshold range, use the intellectual property case source data to be identified as the target intellectual property case source data.

The method comprises the steps of inputting intellectual property document source data to be identified into a document source mining model to extract a scheme by type and an acquit index set, obtaining a scheme by type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data, obtaining an early warning grade calculation rule base, carrying out early warning grade calculation according to the early warning grade calculation rule base and the scheme by type to be analyzed and the acquit index set to be analyzed, obtaining an early warning grade threshold range, and judging whether the early warning grade to be analyzed is within the early warning grade threshold range; when the early warning level judgment result is within the early warning level threshold range, the intellectual property document source data to be identified is used as the target intellectual property document source data, so that whether the intellectual property document source data is set or not is automatically identified, the case source mining efficiency is improved, the mining accuracy is improved, and the case rate is improved.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and computer programs in the storage medium. The database of the computer equipment is used for storing data such as case source identification methods of intellectual property rights. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for intellectual property case source identification. The intellectual property case source identification method comprises the following steps: acquiring intellectual property document and case source data to be identified; inputting the intellectual property document source data to be identified into a document source mining model for carrying out pattern by type and acquit index set extraction to obtain a pattern by type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data; acquiring an early warning grade calculation rule base, and performing early warning grade calculation according to the early warning grade calculation rule base, the pattern type to be analyzed and the acquit index set to be analyzed to obtain an early warning grade to be analyzed corresponding to the intellectual property file source data; acquiring a pre-warning grade threshold range, and judging whether the pre-warning grade to be analyzed is within the pre-warning grade threshold range; and when the early warning grade judgment result is within the early warning grade threshold value range, taking the intellectual property document file source data to be identified as target intellectual property file source data.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a case source identification method for intellectual property, including the steps of: acquiring intellectual property document and case source data to be identified; inputting the intellectual property document source data to be identified into a document source mining model for carrying out pattern by type and acquit index set extraction to obtain a pattern by type to be analyzed and an acquit index set to be analyzed corresponding to the intellectual property document source data; acquiring an early warning grade calculation rule base, and performing early warning grade calculation according to the early warning grade calculation rule base, the pattern type to be analyzed and the acquit index set to be analyzed to obtain an early warning grade to be analyzed corresponding to the intellectual property file source data; acquiring a pre-warning grade threshold range, and judging whether the pre-warning grade to be analyzed is within the pre-warning grade threshold range; and when the early warning grade judgment result is within the early warning grade threshold value range, taking the intellectual property document file source data to be identified as target intellectual property file source data.

The executed intellectual property case source identification method comprises the steps of inputting intellectual property case source data to be identified into a case source mining model to extract case by type and acquit index sets, obtaining case by type to be analyzed and acquit index sets to be analyzed corresponding to the intellectual property case source data, obtaining an early warning grade calculation rule base, carrying out early warning grade calculation according to the early warning grade calculation rule base and the case by type to be analyzed and the acquit index sets to be analyzed, obtaining early warning grades to be analyzed, obtaining an early warning grade threshold range, and judging whether the early warning grade to be analyzed is in the early warning grade threshold range; when the early warning level judgment result is within the early warning level threshold range, the intellectual property document source data to be identified is used as the target intellectual property document source data, so that the automatic identification of whether the intellectual property document source data is set is realized, the case source mining efficiency is improved, the mining accuracy is improved, and the case rate is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (SSRDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, apparatus, article, or method comprising the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all the equivalent structures or equivalent processes that can be directly or indirectly applied to other related technical fields by using the contents of the specification and the drawings of the present application are also included in the scope of the present application.

Claims

1. A case source identification method for intellectual property, the method comprising:

acquiring intellectual property document and case source data to be identified;

when the early warning level judgment result is within the early warning level threshold value range, the intellectual property document case source data to be identified is used as target intellectual property case source data;

the acquiring of the early warning level calculation rule base comprises the following steps:

acquiring an early warning level calculation rule base input by a user, or acquiring the early warning level calculation rule base from a database, or acquiring the early warning level calculation rule base from a third-party application system; wherein, the early warning level calculation rule base comprises: case routing types and early warning level calculation rule sets, wherein each case routing type corresponds to one early warning level calculation rule set; the early warning level calculation rule set comprises one or more calculation rules;

the obtaining of the early warning level threshold range includes:

acquiring an early warning level threshold range input by a user, or acquiring the early warning level threshold range from a database, or acquiring the early warning level threshold range from a third-party application system; wherein the early warning level threshold range comprises: an early warning grade starting value and an early warning grade ending value;

before the step of inputting the intellectual property document source data to be identified into a document source mining model for carrying out document format and acquit index set extraction, and obtaining the document format to be analyzed and acquit index set corresponding to the intellectual property document source data, the method further comprises the following steps:

obtaining a first training sample set, training an initial model by using the first training sample set, and taking the initial model after training as a first model, wherein the first model comprises: extracting rules from the pattern classification rules and the guilt indexes;

obtaining a set of validation samples, each validation sample in the set of validation samples comprising: marking data and a first case by type calibration value for a first intellectual property document sample;

inputting the first intellectual property document sample marking data corresponding to each verification sample into the first model to extract case type and guilt index sets, and obtaining case type predicted values and guilt index set predicted values corresponding to the verification samples in the verification sample sets;

carrying out classification correctness judgment and identification effectiveness judgment according to the case pattern predicted value and the incrimination index set predicted value to obtain a document sample data set with failure prediction;

adding the feature vector set to be processed into a feature vector library of the second model to obtain the case source mining model;

the step of carrying out classification correctness judgment and identification validity judgment according to the case pattern predicted value and the acquit index set predicted value to obtain a document sample data set with failed prediction comprises the following steps of:

when the classification judgment result is wrong, taking the first intellectual property document sample labeling data of all the verification samples with the classification judgment result as a document sample data set with the classification error;

when the classification judgment result is correct, acquiring the early warning grade calculation rule base, and performing early warning grade calculation by adopting the early warning grade calculation rule base according to the case type prediction value and the acquit index set prediction value corresponding to each verification sample with the correct classification judgment result to obtain an early warning grade calculation result;

combining the wrongly classified document sample data set with the unsuccessfully identified document sample data set to obtain the unsuccessfully predicted document sample data set;

the step of obtaining the early warning level calculation rule base when the classification judgment result is correct, and performing early warning level calculation by adopting the early warning level calculation rule base according to the case type prediction value and the guilt index set prediction value corresponding to each verification sample with the correct classification judgment result to obtain an early warning level calculation result comprises the following steps of:

when the calculation of the early warning grade is successful, determining that the calculation result of the early warning grade corresponding to the annotation data of the intellectual property document sample to be calculated is successful, or else, determining that the calculation result of the early warning grade corresponding to the annotation data of the intellectual property document sample to be calculated is failed;

and repeatedly executing the step of acquiring the first intellectual property document sample marking data from the correctly classified document sample data set as the intellectual property document sample marking data to be calculated until the acquisition of the first intellectual property document sample marking data in the correctly classified document sample data set is completed.

2. The method of claim 1, wherein the step of updating the feature lexicon of the first model according to the second training sample set to obtain a second model comprises:

3. The method for identifying a case source of intellectual property according to claim 1, wherein the step of generating a feature vector according to the second model and the second training sample set to obtain a set of feature vectors to be processed corresponding to the second training sample set comprises:

4. The method of claim 3, wherein the step of generating feature vectors according to all the sample word segmentation data of the intellectual property document by using a vector space model to obtain the feature vector set to be processed comprises:

5. An intellectual property case source identification device for performing the intellectual property case source identification method according to any one of claims 1-4, the device comprising:

the judging module is used for acquiring a pre-warning level threshold range and judging whether the pre-warning level to be analyzed is within the pre-warning level threshold range;

and the target intellectual property document source data determining module is used for taking the intellectual property document source data to be identified as the target intellectual property document source data when the early warning level judgment result is within the early warning level threshold range.

6. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 4 when executing the computer program.

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.