CN110659347B - Associated document determining method, device, computer equipment and storage medium - Google Patents

Associated document determining method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110659347B
CN110659347B CN201910825858.4A CN201910825858A CN110659347B CN 110659347 B CN110659347 B CN 110659347B CN 201910825858 A CN201910825858 A CN 201910825858A CN 110659347 B CN110659347 B CN 110659347B
Authority
CN
China
Prior art keywords
information
document
adjustment
feedback
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910825858.4A
Other languages
Chinese (zh)
Other versions
CN110659347A (en
Inventor
罗霄
胡文成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910825858.4A priority Critical patent/CN110659347B/en
Publication of CN110659347A publication Critical patent/CN110659347A/en
Application granted granted Critical
Publication of CN110659347B publication Critical patent/CN110659347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a method, a device, computer equipment and a storage medium for determining associated documents, which are used for firstly acquiring documents to be associated, and extracting information from the documents to be associated to obtain associated information of the documents to be associated; the method comprises the steps of sending associated information to a client and obtaining feedback information returned by the client; judging the information type of the feedback information, and if the information type is a substance type, re-carrying out information extraction processing on the document to be associated according to the feedback information to obtain adjustment information of the document to be associated; and finally, according to the adjustment information, determining the associated document of the document to be associated from the candidate documents, so that the associated document can be automatically determined for the document to be associated, and the efficiency of determining the associated document is improved. And feedback adjustment is performed through feedback information, so that the accuracy of determining the associated document is ensured, and a more accurate classification effect is achieved.

Description

Associated document determining method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of intelligent arbitration, and in particular, to a method and apparatus for determining associated documents, a computer device, and a storage medium.
Background
Along with the acceleration of informatization speed in the judicial field of China, a large number of judicial document files are generated, are all in a discrete form and exist in a webpage and a file system of a computer, belong to a discrete non-relation storage mode, and when facing massive judicial document data, one judicial document often only represents one link in the whole examination flow of one document, and when looking up the document, judicial personnel often need to know the examination conditions of other links in the subordinate documents of the judicial document, and because of the discrete distribution of the judicial document, the judicial document files need to be uniformly classified and managed by manpower, so that a large amount of manpower and material resources are consumed.
Disclosure of Invention
The embodiment of the application provides a method, a device, computer equipment and a storage medium for determining associated documents, which are used for solving the problem of low efficiency of determining associated documents of judicial documents.
A method of associated document determination, comprising:
acquiring a document to be associated, and carrying out information extraction processing on the document to be associated to obtain association information of the document to be associated;
the associated information is sent to a client, and feedback information returned by the client is obtained, wherein the feedback information comprises information types;
judging the information type of the feedback information, and if the information type is a substance type, carrying out information extraction processing on the document to be associated again according to the feedback information to obtain adjustment information of the document to be associated;
and determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
An associated document determining apparatus comprising:
the associated information acquisition module is used for acquiring the document to be associated, and carrying out information extraction processing on the document to be associated to obtain associated information of the document to be associated;
the feedback information acquisition module is used for sending the associated information to the client and acquiring feedback information returned by the client, wherein the feedback information comprises an information type;
the adjustment information acquisition module is used for judging the information type of the feedback information, and if the information type is a substance type, carrying out information extraction processing on the document to be associated again according to the feedback information to obtain adjustment information of the document to be associated;
and the associated document determining module is used for determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the associated document determination method described above when executing the computer program.
A computer readable storage medium storing a computer program which when executed by a processor implements the associated document determination method described above.
In the method, the device, the computer equipment and the storage medium for determining the associated document, the document to be associated is firstly obtained, and the information extraction processing is carried out on the document to be associated to obtain the associated information of the document to be associated; the method comprises the steps of sending associated information to a client and obtaining feedback information returned by the client; judging the information type of the feedback information, and if the information type is a substance type, re-carrying out information extraction processing on the document to be associated according to the feedback information to obtain adjustment information of the document to be associated; and finally, according to the adjustment information, determining the associated document of the document to be associated from the candidate documents, so that the associated document can be automatically determined for the document to be associated, and the efficiency of determining the associated document is improved. And feedback adjustment is performed through feedback information, so that the accuracy of determining the associated document is ensured, and a more accurate classification effect is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an application environment of a method for determining associated documents according to an embodiment of the present application;
FIG. 2 is a diagram illustrating an exemplary method for determining associated documents in accordance with one embodiment of the present application;
FIG. 3 is another exemplary diagram of a method for determining associated documents in an embodiment of the present application;
FIG. 4 is another exemplary diagram of a method for determining associated documents in an embodiment of the present application;
FIG. 5 is another exemplary diagram of a method for determining associated documents in an embodiment of the present application;
FIG. 6 is another exemplary diagram of a method for determining associated documents in an embodiment of the present application;
FIG. 7 is another exemplary diagram of a method for determining associated documents in an embodiment of the present application;
FIG. 8 is a schematic block diagram of an associated document determination apparatus in accordance with an embodiment of the present application;
FIG. 9 is a functional block diagram of an associated document determining apparatus in an embodiment of the present application;
FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The method for determining the associated document provided by the embodiment of the application can be applied to an application environment as shown in fig. 1, wherein a client (computer equipment) communicates with a server through a network. The method comprises the steps that a server side obtains a document to be associated sent by a client side, and information extraction processing is carried out on the document to be associated to obtain association information of the document to be associated; the associated information is sent to a client, and feedback information returned by the client is obtained; if the feedback information is effective information, carrying out information extraction processing on the document to be associated again according to the feedback information to obtain adjustment information of the document to be associated; and determining the associated document of the document to be associated from the candidate documents according to the adjustment information. Among other things, clients (computer devices) may be, but are not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.
In an embodiment, as shown in fig. 2, a method for determining associated documents is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
s10: and acquiring the document to be associated, and carrying out information extraction processing on the document to be associated to obtain the associated information of the document to be associated.
The document to be associated may be a judicial document. The document to be associated is a single judicial document, alternatively, the document to be associated may be more than two judicial documents associated, where the embodiment is to obtain other judicial documents associated with the document to be associated. After the document to be associated is obtained, information extraction processing is carried out on the document to be associated so as to obtain the associated information of the document to be associated. The association information can be embodied in the form of labels, namely, the specific words in the document to be associated are classified by the corresponding classification labels, so that the key information of the document to be associated is better embodied. For example, the association information of the document to be associated may be: zhang san (name), 445566XXXXXXXXXXXX (identification card number), south mountain area (location information), patent Law (law), etc. Alternatively, the association information may include association vocabulary, association tags, tag descriptions, and the like.
Specifically, the information extraction processing of the document to be associated can be realized by the following modes: and performing data cleaning on the documents to be associated to generate a corresponding first text, performing word segmentation on the first text, performing corresponding scoring on the words according to a preset rule or algorithm, and sequencing the corresponding words according to the corresponding scoring to obtain related words. And then, associating the related segmentation words with the corresponding labels, and attaching corresponding labels to the documents to be associated according to the preset number of the labels.
Preferably, the information extraction processing of the documents to be associated can also be realized through a preset classification model. And obtaining the association information from the documents to be associated through a pre-established classification model. Alternatively, a large number of sample documents may be obtained, the sample documents being existing judicial documents. And performing word segmentation on each sample document to obtain a word segmentation result of each sample document, and taking the word segmentation result as a label of each sample document. Then, counting the labels of all the sample documents to obtain a statistical result, wherein the statistical result at least comprises all the labels obtained after word segmentation, the frequency of each label appearing in all the sample documents, the probability of each document label appearing in the sample documents and the like; after the statistical results are saved, the classification model is generated. The number of times each label in the documents to be associated appears in all sample documents and the probability of each label appearing in the sample documents can be obtained through the classification model. And selecting the labels with the minimum times in all the sample documents or the minimum probability of occurrence in the sample documents as the associated labels of the documents to be associated.
S20: and sending the associated information to the client, and acquiring feedback information returned by the client, wherein the feedback information comprises information types.
And after obtaining the association information of the document to be associated, sending the association information to the client. After receiving the associated information, the client can review and audit the associated information and return feedback information. The feedback information may be acknowledgement information or modification information. The confirmation information is information confirming no errors, and the modification information is feedback information modifying the content in the associated information. Wherein the feedback information includes information type, and the feedback information may include modifying at least one of an associated vocabulary, an associated tag, or a tag description in the associated information. If the feedback information comprises modifying the associated vocabulary or the associated tag, the information type of the feedback information is a substance type; if the feedback information is only to modify the label description, the information type of the feedback information is a form type. The specific information type can be represented by words, symbols, letters or numbers. Illustratively, if the associated vocabulary, associated tag or tag description in the associated information is denoted by 00, 01 and 11, respectively, and the substance type is X and the form type is Y. Judging whether the feedback information contains 00 or 01, and if yes, judging the information type of the feedback information to be X. If not, the information type of the feedback information is Y.
Preferably, the associated vocabulary and the associated tag further comprise corresponding weights, i.e. there is a corresponding weight for each associated vocabulary and each associated tag, respectively. The modification information may be modification to the associated vocabulary and/or the associated tag itself, and may also include modification to the weight corresponding to the associated vocabulary and/or the associated tag, for example: the weight is increased or decreased.
S30: judging the information type of the feedback information, and if the information type is a substance type, re-carrying out information extraction processing on the document to be associated according to the feedback information to obtain adjustment information of the document to be associated.
In the step, if the feedback information is of a substantial type, the information extraction processing is carried out on the document to be associated again according to the feedback information, and the adjustment information of the document to be associated is obtained. Specifically, the corresponding link of the information extraction process is adjusted according to the feedback information. If the feedback information is that the associated vocabulary is adjusted, in the process of extracting information from the document to be associated again, the next processing is performed according to the adjusted associated vocabulary in the feedback information. If the feedback information is that the associated label is adjusted, in the process of re-extracting information of the document to be associated, the next processing is carried out according to the adjusted associated label in the feedback information. If the feedback information adjusts both (the associated vocabulary and the associated tag), the two links are adjusted in the process of re-extracting information of the document to be associated, so as to obtain adjustment information of the document to be associated. Alternatively, the adjustment information may include an adjustment vocabulary and an adjustment tab. The adjustment vocabulary corresponds to the content of the adjustment of the associated vocabulary, and the adjustment tab corresponds to the content of the adjustment of the associated tab.
S40: and determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
Wherein, the candidate document is the existing judicial document. Alternatively, the candidate documents may be uniformly stored in the document library for uniform management. After the adjustment information is obtained, matching is carried out according to the adjustment information and the label of each candidate document, and the associated document of the document to be associated is determined from the matching result. If the matching degree is high, the candidate document can be considered as the associated document of the document to be associated.
Optionally, the matching can be performed by using the adjustment tag in the adjustment information of the document to be associated and the feature tag of each candidate document, so as to determine the association degree between the document to be associated and each candidate document. And determining the associated document of the document to be associated from the candidate documents according to the association degree. And determining the candidate documents with the association degree larger than the preset association degree as associated documents of the documents to be associated. It will be appreciated that each candidate document is also previously subjected to information extraction processing to obtain a corresponding feature tag. Specifically, a preset weight coefficient of each adjustment label is firstly obtained; for example, the weight coefficient of the corresponding tag may be set according to the weight value corresponding to each adjustment vocabulary in the adjustment information. The higher the weight, the higher the corresponding weight coefficient. Alternatively, the weight coefficient may be the same as the weight value. A fixed weight coefficient may also be set in advance for each tag. And determining the association degree G of the documents to be associated and each candidate document by using the following calculation method:
wherein G represents the association degree of one candidate document and the document to be associated, N represents the total number of adjustment tags of the document to be associated, a i Matching parameters k representing characteristic labels i of candidate documents and adjustment labels i of the documents to be associated i Indicating the weight coefficient of the adjustment label. Specifically, if the vocabulary corresponding to the feature tag i of the candidate document is the same as the vocabulary corresponding to the adjustment tag i of the document to be associated, the corresponding matching parameter 1 is set, otherwise, the matching parameter is set to 0.
In the embodiment, a document to be associated is acquired first, and information extraction processing is performed on the document to be associated to obtain association information of the document to be associated; the method comprises the steps of sending associated information to a client and obtaining feedback information returned by the client; judging the information type of the feedback information, and if the information type is a substance type, re-carrying out information extraction processing on the document to be associated according to the feedback information to obtain adjustment information of the document to be associated; and finally, according to the adjustment information, determining the associated document of the document to be associated from the candidate documents, so that the associated document can be automatically determined for the document to be associated, and the efficiency of determining the associated document is improved. And feedback adjustment is performed through feedback information, so that the accuracy of determining the associated document is ensured, and a more accurate classification effect is achieved.
In an embodiment, as shown in fig. 3, the information extracting process is performed on the document to be associated to obtain association information of the document to be associated, including:
s11: and performing word segmentation processing on the documents to be associated to obtain initial keywords.
Specifically, word segmentation is carried out on the documents to be associated by adopting a word segmentation algorithm. The word segmentation algorithm can be realized by adopting a word segmentation method based on character string matching, an understanding-based word segmentation method or a statistical-based word segmentation method. Optionally, part of the content in the document to be associated may be selected to perform word segmentation, that is, a part that can better show the characteristics of the document to be associated is selected, for example: the scheme is described. And performing word segmentation on the documents to be associated to obtain initial keywords.
S12: and cleaning the data of the initial keywords to obtain target keywords.
After the initial keywords are obtained, data cleaning is performed on the initial keywords, specifically, words which do not represent specific meanings in the initial keywords are cleaned through the data, for example: a co-word, an adverb, a pronoun, etc. The words which do not represent specific meanings can be screened from the initial keywords in a keyword matching mode, and removal processing is carried out to obtain target keywords.
S13: and calculating the weight value of each target keyword by adopting a word frequency-inverse text frequency algorithm.
The term frequency-inverse text frequency (TF-IDF) algorithm is a weighting algorithm used for information retrieval and data mining. TF means Term Frequency (Term Frequency), and IDF means reverse document Frequency (Inverse Document Frequency). In a given document, term Frequency (TF) refers to the number of times a given word appears in the document. This number will typically be normalized to prevent it from biasing towards long files. The same word may have a higher word frequency in long files than short files, regardless of the importance of the word. Reverse document frequency (inverse document frequency, IDF) is a measure of the general importance of a word. The IDF of a particular word may be obtained by dividing the total number of documents by the number of documents containing the word, and taking the logarithm of the quotient obtained.
Specifically, word frequency (TF) of each target keyword in a document to be associated is calculated, inverse text frequency (IDF) of each target keyword in a large number of candidate documents is calculated, and then the product of the two is calculated, so as to obtain a weight value of the target keyword.
S14: and determining the associated vocabulary from the target keywords according to the weight value.
After the weight value of each target keyword is obtained, the target keywords with larger weight values are screened out and used as associated words. Alternatively, a weight threshold may be set, and the target keyword whose weight value exceeds the weight threshold is determined as the associated vocabulary. Or a preset number is set, and then the target keywords with the preset number are screened out according to the sequence from the large weight value to the small weight value to be used as the associated vocabulary.
Further, a weight threshold and a predetermined number may also be set. If the number of the target keywords with the weight value exceeding the weight threshold value is larger than the preset number, the preset number of the target keywords are screened out according to the order of the weight value from large to small and used as the associated vocabulary. And if the number of the target keywords with the weight value exceeding the weight threshold value is smaller than or equal to the preset number, determining the target keywords with the weight value exceeding the weight threshold value as the associated vocabulary.
Preferably, after determining the associated vocabulary, the weight value may be determined again for the associated vocabulary, so as to obtain the target weight value. Specifically, the following formula is adopted to determine a target weight value for the associated vocabulary:
wherein W is i For target weight value, (TF-IDF) i And for the weight value of each associated vocabulary, n is the number of the associated vocabularies, and i is a positive integer. The accuracy of the weight value of each associated vocabulary is better ensured through the redetermined target weight value.
S15: and matching the associated vocabulary by adopting a preset associated tag library to obtain associated tags of the documents to be associated.
And matching the associated vocabulary through a preset associated tag library to obtain the associated tag of each associated vocabulary. Alternatively, the associated tag may include a name, region, law, identification number, etc. Further, the name may be further refined into principals, originals, notices, and the like. Specifically, the matching may be performed by the keywords in the associated tag library and the associated vocabulary, for example, if the matching occurs that a "principal" appears in the associated vocabulary, the next vocabulary of the principal in the associated vocabulary may be determined to be the name or principal.
The associated labels of the associated vocabulary may also preferably be determined by various recognition algorithms, such as a chinese name recognition algorithm, an address recognition algorithm, a certificate number recognition algorithm, and the like. The associated vocabulary is processed through various corresponding algorithms, and then each associated vocabulary is assigned an associated tag.
S16: the associated vocabulary and the associated tag form associated information.
After the associated vocabulary and the associated tag are obtained, the associated vocabulary and the associated tag are combined into associated information.
In this embodiment, word segmentation is performed on the document to be associated to obtain an initial keyword; performing data cleaning on the initial keywords to obtain target keywords; calculating a weight value of each target keyword by adopting a TF-IDF algorithm; determining associated vocabulary from the target keywords according to the weight value; matching the associated vocabulary by adopting a preset associated tag library to obtain associated tags of the documents to be associated; and combining the associated vocabulary and the associated tag into the associated information. The correlation information is obtained through the processing mode, and the accuracy and the efficiency of the determination of the correlation information are ensured.
In one embodiment, the feedback information includes adjustment vocabulary information and adjustment tag information.
The adjustment vocabulary information refers to relevant information for adjusting the associated vocabulary, and comprises adjustment of the associated vocabulary and/or adjustment of a weight value of the associated vocabulary. The adjustment tag information refers to related information for adjusting the associated tag, including adjustment of the associated tag itself and/or adjustment of the tag description.
In this embodiment, as shown in fig. 4, the information extracting process is performed again on the document to be associated according to the feedback information, to obtain adjustment information of the document to be associated, including:
s31: and adjusting the associated vocabulary in the associated information by adopting the adjustment vocabulary information to obtain the target vocabulary.
In this step, the related vocabulary is adjusted by adjusting the vocabulary information to obtain the target vocabulary. The target vocabulary includes vocabulary names and weight values corresponding to each vocabulary name.
S32: and adjusting the preset associated tag library by adopting the tag adjustment information to obtain an adjusted associated tag library.
After the adjustment label is obtained, the association label library is adjusted according to the adjustment label so as to correct unreasonable label correspondence in the association label library. Illustratively, for the association information: zhang san (name), 445566XXXXXXXXXXXX (identification card number), south mountain area (location information), patent Law (law). The label of "patent law (law)" is modified to "patent law (intellectual property law)". The associated tag library modifies the corresponding tag of the vocabulary of the patent law from legal to intellectual property law to better tag each associated vocabulary.
S33: and matching the target vocabulary by adopting the adjusted association tag library to obtain the adjustment tag of the document to be associated.
After the adjusted associated tag library is obtained, the target vocabulary is matched through the adjusted associated tag library, and specifically, the matching process may be the same as that in step S15, which is not described herein again. And matching the adjusted associated vocabulary (target vocabulary) through the adjusted associated tag library, so that the accuracy of the acquisition of the follow-up tags is better ensured.
S34: and forming the target vocabulary and the adjustment label into the adjustment information.
In this embodiment, the related vocabulary is first adjusted by using the adjustment vocabulary information to obtain a target vocabulary; the adjustment label is adopted to adjust the association label library, and an adjusted association label library is obtained; matching the target vocabulary by adopting the adjusted association tag library to obtain an adjustment tag of the document to be associated; and forming the target vocabulary and the adjustment label into the adjustment information. And the relevant labels are adjusted through the feedback information, and the adjusted relevant label library is adopted again to match the target vocabulary, so that the accuracy of label acquisition of the document to be associated is ensured.
In an embodiment, as shown in fig. 5, after the feedback information returned by the client is obtained, the associated document determining method further includes:
s21: and if the information type of the feedback information is the form type, acquiring a feedback label in the feedback information.
If the information type of the feedback information is a formal type, the description feedback information is only to modify the label description, and does not substantially affect the determination of the final label of the document to be associated. At this time, the feedback tag in the feedback information is further acquired, that is, the specific content of modifying the tag description in the feedback information is acquired.
S22: and calculating the difference proportion of the feedback label and the corresponding associated label.
The difference ratio is used to reflect the degree of difference between the feedback tag and the corresponding associated tag. The difference ratio of the feedback tag and the corresponding associated tag may be calculated by means of text comparison, for example, the difference ratio of the feedback tag and the corresponding associated tag may be calculated by a string matching algorithm or a string similarity calculation algorithm. The differential ratio may be expressed in terms of a percentage. It will be appreciated that the feedback tag may be one or plural. If the feedback labels are plural, the difference proportion of each feedback label and the corresponding associated label is calculated respectively.
S23: and if the difference proportion does not exceed the preset proportion, adjusting the corresponding label in the associated information according to the feedback label to obtain adjustment information.
The preset proportion is a preset numerical value, and if the difference proportion does not exceed the preset proportion, the adjustment degree of the label description is not large. Therefore, the corresponding label in the current associated information can be adjusted, the system load is reduced, and the processing efficiency is improved. Specifically, the corresponding tag in the associated information can be found by feeding back the tag name in the tag, and corresponding modification is performed to obtain the adjustment information. The adjustment information in the step is the associated information after modifying the label description of the corresponding label in the associated information. And if the difference proportion does not exceed the preset proportion, the feedback labels are correspondingly adjusted one by one to obtain adjustment information.
S24: and determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
And after the adjustment information is obtained, determining the associated document of the document to be associated from the candidate documents according to the adjustment information. This step is the same as step S40 and will not be described here again.
In a specific embodiment, if the information type of the feedback information is a form type, the feedback tag may further include a tag type. In particular, it may be indicated by the tag type whether the modification of the tag description of the tag is applicable only to the current document to be associated or the corresponding modification of the table odd description of all the same tag. Alternatively, the tag types may include a local type and a global type. The local type indicates that only the tag description of the current document to be associated needs to be correspondingly modified, while the global type indicates that all the tag descriptions of the same tag need to be correspondingly modified. The specific label type may be embodied by text, symbols, letters or numbers.
In this embodiment, if the information type of the feedback information is a form type, a feedback tag in the feedback information is obtained; calculating the difference proportion of the feedback information and the association information; if the difference proportion does not exceed the preset proportion, corresponding labels in the associated information are adjusted according to the feedback labels, and adjustment information is obtained; and finally, determining the associated document of the document to be associated from the candidate documents according to the adjustment information. Under the condition of judging that the difference proportion is smaller, only the associated information is adjusted, so that the burden of a system is avoided, and the processing efficiency is improved.
In an embodiment, as shown in fig. 6, after the calculating the difference ratio between the feedback information and the association information, the association document determining method further includes:
s23': and if the difference proportion exceeds the preset proportion, determining the feedback label in the feedback information of which the difference proportion exceeds the preset proportion as an adjustment label.
If the difference ratio exceeds the preset ratio, the adjustment degree of the label description is larger. Thus, corresponding tags already in the document database may also be modified for more accurate subsequent processing. At this time, a feedback label with the difference ratio exceeding the preset ratio is obtained, and the feedback label is determined to be an adjustment label.
S24': and adjusting the corresponding label in the document database according to the adjustment label.
And adjusting the corresponding label in the document database according to the adjustment label. I.e. adaptively modifying the label specification of the labels with the same label number as the feedback label in the document database to keep the same.
In this embodiment, if the difference ratio exceeds a preset ratio, determining a feedback tag with the difference ratio exceeding the preset ratio as an adjustment tag; and adjusting the corresponding label in the document database according to the adjustment label, so that the accuracy of label data is ensured.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In an embodiment, there is provided an associated document determining apparatus that corresponds one-to-one to the associated document determining method in the above embodiment. As shown in fig. 7, the related document determining apparatus includes a related information acquiring module 10, a feedback information acquiring module 20, an adjustment information acquiring module 30, and a first related document determining module 40. The functional modules are described in detail as follows:
the associated information acquisition module 10 is used for acquiring a document to be associated, and extracting information from the document to be associated to obtain associated information of the document to be associated;
the feedback information obtaining module 20 is configured to send the association information to a client, and obtain feedback information returned by the client, where the feedback information includes an information type;
the adjustment information obtaining module 30 is configured to determine an information type of the feedback information, and if the information type is a substantial type, re-perform information extraction processing on the document to be associated according to the feedback information, so as to obtain adjustment information of the document to be associated;
the first associated document determining module 40 is configured to determine, according to the adjustment information, an associated document of the documents to be associated from candidate documents.
Preferably, as shown in fig. 8, the related information acquisition module 10 includes an initial keyword acquisition unit 11, a data cleansing unit 12, a weight value calculation unit 13, a related vocabulary determination unit 14, a related tag matching unit 15, and a related information composition unit 16.
An initial keyword obtaining unit 11, configured to perform word segmentation on the document to be associated to obtain an initial keyword;
a data cleaning unit 12, configured to perform data cleaning on the initial keyword to obtain a target keyword;
a weight value calculation unit 13, configured to calculate a weight value of each target keyword using a word frequency-inverse text frequency algorithm;
an associated vocabulary determining unit 14, configured to determine an associated vocabulary from the target keywords according to the weight value;
the association tag matching unit 15 is configured to match the association vocabulary by using a preset association tag library, so as to obtain an association tag of the document to be associated;
and a related information composing unit 16, configured to compose the related vocabulary and the related tag into the related information.
Preferably, as shown in fig. 9, the feedback information includes adjustment vocabulary information and adjustment tag information; the adjustment information acquisition module 30 includes a target vocabulary acquisition unit 31, an associated tag library adjustment unit 32, an adjustment tag acquisition unit 33, and an adjustment information composition unit 34.
A target vocabulary acquiring unit 31, configured to adjust the associated vocabulary in the associated information by using the adjustment vocabulary information, so as to obtain a target vocabulary;
an associated tag library adjustment unit 32, configured to adjust the preset associated tag library by using the adjustment tag information, so as to obtain an adjusted associated tag library;
an adjustment tag obtaining unit 33, configured to match the target vocabulary with an adjusted association tag library, so as to obtain an adjustment tag of the document to be associated;
an adjustment information composing unit 34, configured to compose the target vocabulary and the adjustment tag into the adjustment information.
Preferably, the related document determining device further comprises a feedback tag acquiring module, a difference proportion calculating module, an adjustment information acquiring module and a second related document determining module.
The feedback label acquisition module is used for acquiring the feedback label in the feedback information when the information type of the feedback information is a form type.
And the difference proportion calculation module is used for calculating the difference proportion of the feedback label and the corresponding associated label.
And the adjustment information acquisition module is used for adjusting the corresponding tag in the associated information according to the feedback tag when the difference proportion does not exceed the preset proportion, so as to obtain adjustment information.
And the second associated document determining module is used for determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
Preferably, the related document determining device is further configured to determine, as the adjustment tag, a feedback tag in feedback information in which the difference ratio exceeds a preset ratio when the difference ratio exceeds the preset ratio; and adjusting the corresponding label in the document database according to the adjustment label.
For specific limitations of the associated document determining apparatus, reference may be made to the above limitations of the associated document determining method, and no further description is given here. The respective modules in the above-described associated document determination apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store data used in the associated document determination method in the above embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of associated document determination.
In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the associated document determination method of the above embodiments when the computer program is executed by the processor.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the associated document determination method in the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (9)

1. A method for determining associated documents, comprising:
acquiring a document to be associated, and carrying out information extraction processing on the document to be associated to obtain association information of the document to be associated;
the associated information is sent to a client, and feedback information returned by the client is obtained, wherein the feedback information comprises information types;
judging the information type of the feedback information, and if the information type is a substance type, carrying out information extraction processing on the document to be associated again according to the feedback information to obtain adjustment information of the document to be associated;
according to the adjustment information, determining the associated document of the document to be associated from the candidate documents;
after the feedback information returned by the client is obtained, the associated document determining method further comprises the following steps:
if the information type of the feedback information is a form type, acquiring a feedback label in the feedback information;
calculating the difference proportion of the feedback label and the corresponding associated label;
if the difference proportion does not exceed the preset proportion, corresponding labels in the associated information are adjusted according to the feedback labels, and adjustment information is obtained;
and determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
2. The method for determining associated documents according to claim 1, wherein the information extracting process is performed on the documents to be associated to obtain the associated information of the documents to be associated, and the method comprises the following steps:
performing word segmentation on the documents to be associated to obtain initial keywords;
data cleaning is carried out on the initial keywords to obtain target keywords;
calculating the weight value of each target keyword by adopting a word frequency-inverse text frequency algorithm;
determining associated vocabulary from the target keywords according to the weight value;
matching the associated vocabulary by adopting a preset associated tag library to obtain associated tags of the documents to be associated;
and combining the associated vocabulary and the associated tag into the associated information.
3. The associated document determining method according to claim 2, wherein the feedback information includes adjustment vocabulary information and adjustment tag information;
and re-carrying out information extraction processing on the document to be associated according to the feedback information to obtain adjustment information of the document to be associated, wherein the adjustment information comprises the following steps:
adjusting the associated vocabulary in the associated information by adopting the adjustment vocabulary information to obtain a target vocabulary;
adjusting the preset associated tag library by adopting the tag adjustment information to obtain an adjusted associated tag library;
matching the target vocabulary by adopting the adjusted association tag library to obtain an adjustment tag of the document to be associated;
and forming the target vocabulary and the adjustment label into the adjustment information.
4. The associated document determining method according to claim 1, wherein after the calculating of the difference ratio of the feedback information and the associated information, the associated document determining method further comprises:
if the difference proportion exceeds the preset proportion, determining a feedback label in feedback information of which the difference proportion exceeds the preset proportion as an adjustment label;
and adjusting the corresponding label in the document database according to the adjustment label.
5. The related document determining method according to claim 2, wherein after the related vocabulary is determined from the target keyword according to the weight value, the related document determining method further comprises:
determining a target weight value for the associated vocabulary by adopting the following formula:
wherein W is i For target weight value, (TF-IDF) i And for the weight value of each associated vocabulary, n is the number of the associated vocabularies, and i is a positive integer.
6. The associated document determining method according to claim 1, wherein the adjustment information includes an adjustment tag;
and determining the associated document of the document to be associated from the candidate documents according to the adjustment information, wherein the method comprises the following steps:
matching the adjustment label with the characteristic label of each candidate document, and determining the association degree of the document to be associated with each candidate document;
and determining the associated document of the document to be associated from the candidate documents according to the association degree.
7. An associated document determining apparatus, comprising:
the associated information acquisition module is used for acquiring the document to be associated, and carrying out information extraction processing on the document to be associated to obtain associated information of the document to be associated;
the feedback information acquisition module is used for sending the associated information to the client and acquiring feedback information returned by the client, wherein the feedback information comprises an information type;
the adjustment information acquisition module is used for judging the information type of the feedback information, and if the information type is a substance type, carrying out information extraction processing on the document to be associated again according to the feedback information to obtain adjustment information of the document to be associated;
the associated document determining module is used for determining the associated document of the document to be associated from the candidate documents according to the adjustment information;
the associated document determining apparatus further includes:
the feedback tag acquisition module is used for acquiring a feedback tag in the feedback information when the information type of the feedback information is a form type;
the difference proportion calculation module is used for calculating the difference proportion of the feedback label and the corresponding associated label;
the adjustment information acquisition module is further configured to adjust a corresponding tag in the associated information according to the feedback tag when the difference proportion does not exceed a preset proportion, so as to obtain adjustment information;
and the second associated document determining module is used for determining the associated document of the document to be associated from the candidate documents according to the adjustment information.
8. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the associated document determination method according to any one of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the associated document determination method according to any one of claims 1 to 6.
CN201910825858.4A 2019-09-03 2019-09-03 Associated document determining method, device, computer equipment and storage medium Active CN110659347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910825858.4A CN110659347B (en) 2019-09-03 2019-09-03 Associated document determining method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910825858.4A CN110659347B (en) 2019-09-03 2019-09-03 Associated document determining method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110659347A CN110659347A (en) 2020-01-07
CN110659347B true CN110659347B (en) 2023-08-18

Family

ID=69036659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910825858.4A Active CN110659347B (en) 2019-09-03 2019-09-03 Associated document determining method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110659347B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507079B (en) * 2020-12-15 2023-01-17 科大讯飞股份有限公司 Document case situation matching method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003216646A (en) * 2002-01-21 2003-07-31 Ricoh Co Ltd Document retrieval device, method and program, and recording media recording the same
CN108170691A (en) * 2016-12-07 2018-06-15 北京国双科技有限公司 It is associated with the determining method and apparatus of document
CN110019672A (en) * 2017-11-09 2019-07-16 北京国双科技有限公司 A kind of method for pushing of similar case, system, storage medium and processor
CN110083823A (en) * 2019-03-07 2019-08-02 平安科技(深圳)有限公司 Dictionary sheet method for building up and device, computer installation and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10891421B2 (en) * 2016-04-05 2021-01-12 Refinitiv Us Organization Llc Apparatuses, methods and systems for adjusting tagging in a computing environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003216646A (en) * 2002-01-21 2003-07-31 Ricoh Co Ltd Document retrieval device, method and program, and recording media recording the same
CN108170691A (en) * 2016-12-07 2018-06-15 北京国双科技有限公司 It is associated with the determining method and apparatus of document
CN110019672A (en) * 2017-11-09 2019-07-16 北京国双科技有限公司 A kind of method for pushing of similar case, system, storage medium and processor
CN110083823A (en) * 2019-03-07 2019-08-02 平安科技(深圳)有限公司 Dictionary sheet method for building up and device, computer installation and storage medium

Also Published As

Publication number Publication date
CN110659347A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN110493190B (en) Data information processing method and device, computer equipment and storage medium
WO2021012790A1 (en) Page data generation method and apparatus, computer device, and storage medium
EP2715565B1 (en) Dynamic rule reordering for message classification
CN109245996B (en) Mail pushing method and device, computer equipment and storage medium
US10650274B2 (en) Image clustering method, image clustering system, and image clustering server
CN115526363A (en) Business data processing method and device, computer equipment and storage medium
CN108897754B (en) Big data-based work order type identification method and system and computing device
CN110489622B (en) Sharing method and device of object information, computer equipment and storage medium
CN111178949B (en) Service resource matching reference data determining method, device, equipment and storage medium
CN109325118B (en) Unbalanced sample data preprocessing method and device and computer equipment
CN110135943B (en) Product recommendation method, device, computer equipment and storage medium
WO2021164205A1 (en) Identity identification-based data auditing method and apparatus, and computer device
Layton et al. Evaluating authorship distance methods using the positive Silhouette coefficient
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN111598338B (en) Method, apparatus, medium, and electronic device for updating prediction model
CN110765760A (en) Legal case distribution method and device, storage medium and server
CN110555165B (en) Information identification method and device, computer equipment and storage medium
CN110659347B (en) Associated document determining method, device, computer equipment and storage medium
CN110956031A (en) Text similarity matching method, device and system
CN117216239A (en) Text deduplication method, text deduplication device, computer equipment and storage medium
CN111104588B (en) Product information matching method, device, computer equipment and storage medium
CN112101024A (en) Target object identification system based on app information
CN110390083B (en) Method and device for pushing approximate cases, computer equipment and storage medium
CN111552812A (en) Method and device for determining relation category between entities and computer equipment
CN113343024B (en) Object recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant