CN115329756B - Execution body extraction method and device, storage medium and electronic equipment - Google Patents

Execution body extraction method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115329756B
CN115329756B CN202211047583.4A CN202211047583A CN115329756B CN 115329756 B CN115329756 B CN 115329756B CN 202211047583 A CN202211047583 A CN 202211047583A CN 115329756 B CN115329756 B CN 115329756B
Authority
CN
China
Prior art keywords
text
target
clause
sample
bulletin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211047583.4A
Other languages
Chinese (zh)
Other versions
CN115329756A (en
Inventor
张美�
曲悠杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yancheng Tianyanchawei Technology Co ltd
Original Assignee
Yancheng Tianyanchawei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202111229601.6A external-priority patent/CN114048736A/en
Application filed by Yancheng Tianyanchawei Technology Co ltd filed Critical Yancheng Tianyanchawei Technology Co ltd
Publication of CN115329756A publication Critical patent/CN115329756A/en
Application granted granted Critical
Publication of CN115329756B publication Critical patent/CN115329756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The disclosure relates to an extraction method, an extraction device, a storage medium and electronic equipment of an execution body, and relates to the technical field of electronic information, wherein the method comprises the following steps: acquiring a bulletin text to be processed; extracting a plurality of clauses included in the bulletin text; taking the clause comprising the main body information in the clauses as a target clause, and carrying out preset processing on the target clause to obtain a target text, wherein the target text does not comprise the main body information; inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model; and if the association result indicates that the target text is associated, determining an execution subject of the bulletin text according to the subject information included in the target text. Thus, the accuracy and recall rate of the execution subject in the extraction bulletin text can be effectively improved.

Description

Execution body extraction method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of electronic information technology, and in particular, to a method and apparatus for extracting an execution body, a storage medium, and an electronic device.
Background
The judicial auction network may periodically issue judicial auction notices, some of which may not be associated with subject companies (e.g., collateral, subject owners, etc.) that are giving the auction, resulting in the judicial auction notices being difficult to associate with the subject companies. Thus, there is a need to automatically identify subject companies in judicial auction posts for user queries.
At present, there are two main extraction methods for subject companies in judicial auction bulletin, one is to use all the companies in judicial auction bulletin as subject companies, and the accuracy of the method is low. The other is to extract subject companies from judicial auction announcements according to pre-established extraction rules, which are time-consuming and labor-consuming to establish and difficult to cover all scenes, resulting in low recall.
Disclosure of Invention
The purpose of the present disclosure is to provide a method, an apparatus, a storage medium, and an electronic device for extracting an execution subject, which are used for improving the accuracy and recall rate of the execution subject in extracting bulletin texts.
According to a first aspect of embodiments of the present disclosure, there is provided a method of extracting an execution body, the method including: acquiring a bulletin text to be processed; extracting a plurality of clauses included in the bulletin text; taking the clause comprising the main body information in the clauses as a target clause, and carrying out preset processing on the target clause to obtain a target text, wherein the target text does not comprise the main body information; inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model; and if the association result indicates that the target text is associated, determining an execution subject of the bulletin text according to the subject information included in the target text.
Optionally, the extracting a plurality of clauses included in the bulletin text includes: deleting a designated symbol in the bulletin text to obtain an initial bulletin text, wherein the designated symbol is determined according to the type of the bulletin text; dividing the initial bulletin text according to preset separators to obtain a plurality of clauses.
Optionally, the step of taking the clause including the main body information of the multiple clauses as the target clause includes: comparing each clause with a pre-established main body information set, and taking the clause as the target clause if the clause is matched with the main body information set, wherein the main body information set comprises a plurality of main body information; or carrying out semantic recognition on each clause to determine whether the clause comprises main body information, and if the clause comprises the main body information, taking the clause as the target clause.
Optionally, the performing preset processing on the target clause to obtain a target text includes: deleting invalid words in each target clause aiming at each target clause to obtain an initial text corresponding to each target clause; performing de-duplication processing on a plurality of initial texts to obtain at least one intermediate text; and deleting the main body information included in the intermediate text to obtain the target text.
Optionally, the recognition model is trained by: acquiring a plurality of sample bulletin texts, and determining a plurality of sample target texts according to the plurality of sample bulletin texts; taking the sample target text as a sample input to obtain a sample input set comprising a plurality of sample inputs; obtaining a sample output set, wherein the sample output set comprises sample outputs corresponding to each sample input, and each sample output comprises a corresponding real association result to which the sample target text belongs; the sample input set is used as the input of the recognition model, and the sample output set is used as the output of the recognition model to train the recognition model.
Optionally, the determining a plurality of sample target texts according to a plurality of sample bulletin texts includes: extracting a plurality of sample clauses included in each of the sample bulletin texts; taking a sample clause comprising main body information in the plurality of sample clauses as a sample target clause, and carrying out the preset processing on the sample target clause to obtain the sample target text, wherein the sample target text does not comprise the main body information.
Optionally, the method further comprises: associating the bulletin text with the execution subject; and outputting the bulletin text in response to a query instruction for the execution subject.
According to a second aspect of embodiments of the present disclosure, there is provided an extraction apparatus of an execution body, the apparatus including: the acquisition module is used for acquiring the bulletin text to be processed; the extraction module is used for extracting a plurality of clauses included in the bulletin text; the processing module is used for taking the clauses comprising the main body information in the clauses as target clauses, and carrying out preset processing on the target clauses to obtain target texts, wherein the target texts do not comprise the main body information; the first determining module is used for inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model; and the second determining module is used for determining an execution subject of the bulletin text according to the subject information included in the target text if the association result indicates that the target text is associated.
Optionally, the extracting module includes: the first deleting sub-module is used for deleting the appointed symbol in the bulletin text to obtain an initial bulletin text, and the appointed symbol is determined according to the type of the bulletin text; and the dividing sub-module is used for dividing the initial bulletin text according to a preset separator to obtain a plurality of clauses.
Optionally, the processing module is configured to compare each clause with a pre-established body information set, and if the clause matches with the body information set, take the clause as the target clause, where the body information set includes multiple kinds of body information; or carrying out semantic recognition on each clause to determine whether the clause comprises main body information, and if the clause comprises the main body information, taking the clause as the target clause.
Optionally, the processing module includes: the second deleting sub-module is used for deleting invalid words in each target clause aiming at each target clause to obtain an initial text corresponding to each target clause; the de-duplication sub-module is used for de-duplication processing on the plurality of initial texts to obtain at least one intermediate text; and the third deleting sub-module is used for deleting the main body information included in the intermediate text to obtain the target text.
Optionally, the recognition model is trained by the following means: the sample acquisition module is used for acquiring a plurality of sample bulletin texts and determining a plurality of sample target texts according to the plurality of sample bulletin texts; a third determining module, configured to take the sample target text as a sample input, so as to obtain a sample input set including a plurality of sample inputs; the output set acquisition module is used for acquiring a sample output set, wherein the sample output set comprises sample outputs corresponding to each sample input, and each sample output comprises a real association result to which a corresponding sample target text belongs; and the training module is used for taking the sample input set as the input of the recognition model and taking the sample output set as the output of the recognition model so as to train the recognition model.
Optionally, the sample acquisition module includes: an extraction sub-module, configured to extract a plurality of sample clauses included in each of the sample bulletin texts; the processing sub-module is used for taking a sample clause comprising main body information in the plurality of sample clauses as a sample target clause, and carrying out the preset processing on the sample target clause to obtain the sample target text, wherein the sample target text does not comprise the main body information.
Optionally, the apparatus further comprises: the association module is used for associating the bulletin text with the execution subject; and the output module is used for responding to the query instruction aiming at the execution subject and outputting the bulletin text.
According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects of embodiments of the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method according to any one of the first aspect of the embodiments of the present disclosure.
Through the technical scheme, the method comprises the steps of firstly obtaining the bulletin text to be processed; extracting a plurality of clauses included in the bulletin text; taking the clause comprising the main body information in the clauses as a target clause, and carrying out preset processing on the target clause to obtain a target text, wherein the target text does not comprise the main body information; inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model; and if the association result indicates that the target text is associated, determining an execution subject of the bulletin text according to the subject information included in the target text. The method comprises the steps of splitting and presetting an acquired bulletin text to be processed to obtain a target text, inputting the target text into a recognition model to obtain a corresponding association result of the target text, and determining an execution subject of the bulletin text according to the association result. Thus, the accuracy and recall rate of the execution subject in the extraction bulletin text can be effectively improved.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1 is a flow diagram illustrating a method of extracting an execution body according to an exemplary embodiment;
FIG. 2 is a flow diagram illustrating another method of extracting an execution body according to an example embodiment;
FIG. 3 is a flow diagram illustrating another method of extracting an execution body according to an exemplary embodiment;
FIG. 4 is a flow diagram illustrating a training method for a recognition model, according to an example embodiment;
FIG. 5 is a flow diagram illustrating another training method for a recognition model, according to an example embodiment;
FIG. 6 is a flow diagram illustrating another method of extracting an execution body according to an example embodiment;
fig. 7 is a schematic structural view of an extracting apparatus of an execution body according to an exemplary embodiment;
Fig. 8 is a schematic structural view of an extraction apparatus of another execution body according to an exemplary embodiment;
Fig. 9 is a schematic structural view of an extraction apparatus of another execution body according to an exemplary embodiment;
FIG. 10 is a schematic diagram of a training apparatus for identifying models, according to an exemplary embodiment;
FIG. 11 is a schematic diagram of a training apparatus of another recognition model, shown in accordance with an exemplary embodiment;
fig. 12 is a schematic structural view of an extraction apparatus of another execution body according to an exemplary embodiment;
fig. 13 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure.
In the following description, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not for indicating or implying a relative importance or order.
First, an application scenario of the present disclosure will be described, and the present disclosure may be applied in a scenario where an execution subject in a bulletin text is extracted. The judicial auction network may periodically issue judicial auction notices, some of which may not be associated with subject companies (e.g., collateral, subject owners, etc.) that are giving the auction, resulting in the judicial auction notices being difficult to associate with the subject companies. Thus, there is a need to automatically identify subject companies in judicial auction posts for user queries. At present, there are two main extraction modes for subject companies in judicial auction notices, one mode is to use all the companies in judicial auction notices as subject companies; yet another way is to draw subject companies from judicial auction announcements according to pre-established drawing rules. Based on the above-mentioned scene, the inventor finds that the above-mentioned first extraction method has low accuracy due to the introduction of many non-subject companies; in the second extraction mode, the extraction rule is time-consuming and labor-consuming to establish, and it is difficult to cover all scenes, so that the recall rate is low.
In order to solve the above problems, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for extracting an execution subject, where an obtained bulletin text to be processed is split and subjected to a preset process to obtain a target text, and the target text is input into a recognition model to obtain a correlation result corresponding to the target text, so that the execution subject of the bulletin text is determined according to the correlation result. Thus, the accuracy and recall rate of the execution subject in the extraction bulletin text can be effectively improved.
The present disclosure is described below in connection with specific embodiments.
Fig. 1 is a flow chart illustrating a method of extracting an execution body according to an exemplary embodiment, and as shown in fig. 1, the method includes the steps of:
step S101, acquiring a bulletin text to be processed.
For example, the bulletin text to be processed may be obtained by publishing the bulletin web page, and different page types are different corresponding to the obtained text. For example, if the page type is an HTML (english: hyperText Markup Language; chinese: hypertext markup language) page, then the HTML text may be obtained from the page as the bulletin text to be processed; if the page type is an XML (English: extensible Markup Language; chinese: extensible markup language) page, XML text can be obtained from the page as the bulletin text to be processed. The advertisement text may be any advertisement information, for example, the advertisement information may be a judicial auction advertisement, a judicial decision advertisement, or the like.
Step S102, extracting a plurality of clauses included in the bulletin text.
Since redundant information that is unfavorable for the understanding of the text content may be contained in the notice text to be processed acquired in step S101, the notice text may be subjected to preliminary processing by extracting a plurality of clauses included in the notice text. For example, the redundant information may include code symbols and separators (e.g., commas, semicolons, line breaks, etc.) in the bulletin text, and accordingly, the aforementioned redundant information is not included in the clause extracted from the bulletin text.
Step S103, taking the clause comprising the main body information in the clauses as a target clause, and carrying out preset processing on the target clause to obtain a target text.
Wherein the target text after the preset processing does not include the main body information.
For example, since there may be clauses that do not include subject information among the plurality of clauses, which are invalid information for extracting the execution subject, in order to improve efficiency of data processing, the plurality of clauses may be screened according to whether the clauses include the subject information, so as to obtain at least one target clause including the subject information. The subject information may be understood as information of an execution subject, for example, the execution subject may be a company, a legal person, a stakeholder, etc., and the corresponding subject information may include a name of the execution subject (for example, a company name, a legal person name, etc.), and may further include information associated with the execution subject (for example, a company legal person, a stakeholder, etc.). The multiple clauses are screened according to whether the main body information is included, so that the execution main body in the notice text is prevented from being missed, invalid information is prevented from being processed, and the recall rate of extracting the execution main body can be improved.
Furthermore, some invalid words or some repeated clauses may also exist in the target clauses obtained through screening. Therefore, the target clause can be subjected to preset processing, so that the efficiency of data processing is further improved. Meanwhile, since the subject information contained in each target clause obtained by screening may be different, and whether the different subject information is associated with the execution subject is not affected by each target clause, in this embodiment, the subject information in each target clause may also be deleted to obtain a target text, that is, the target text does not include an invalid word and subject information.
Step S104, inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model.
The recognition model can be understood as a model which is trained in advance according to a large number of samples and can classify the target text so as to determine an association result corresponding to the target text. The recognition model can match the target text with a plurality of association results which are pre-specified so as to determine the matching degree of the target text and each association result. The plurality of association results specified in advance may include both association and non-association, the association representing that the target text is associated with the execution subject, that is, subject information included in the target text indicates the execution subject. The non-association means that the target text is not related to the execution subject, that is, the subject information included in the target text indicates that it is not the execution subject. The recognition model can determine the association result corresponding to the target text according to the matching degree, namely, the matching degree of the target text and the association is high, so that the association result corresponding to the target text can be determined to be associated, the matching degree of the target text and the non-association is high, and the association result corresponding to the target text can be determined to be non-association. Among the plurality of samples of the training recognition model may be: a plurality of positive samples (i.e., the corresponding association results are associated) and a plurality of negative samples (i.e., the corresponding association results are non-associated). Further, the structure of the recognition model may be, for example, a two-class network, or may be CNN (english: convolutional Neural Networks; chinese: convolutional neural network), etc., which is not particularly limited in this disclosure.
For example, taking the advertisement text as an example, if the content of the target text is "bankruptcy manager" (the main body information corresponding to the target text is a), and the association result of the target text is association, then company a is the execution main body in the judicial auction advertisement text; if the content of the target text is an "auction facility" (the subject information corresponding to the target text is B), and the association result of the target text is non-association, then the B company is not the executing subject in the judicial auction announcement text.
It should be noted that at least one target text obtained in step S103 may be sequentially input to a recognition model trained in advance, so as to obtain a correlation result corresponding to each target text. Since one or more execution subjects may be included in one bulletin text, in order to avoid missing execution subjects in the bulletin text, after the association results of all the target texts are obtained, the execution subjects of the bulletin text may be determined according to all the association results.
Step 105, if the association result indicates that the target text is associated, determining an execution subject of the bulletin text according to the subject information included in the target text.
For example, if the association result corresponding to the target text output by the recognition model indicates that the target text is associated, the subject information of the execution subject of the bulletin text is indicated to be included in the target text, and at this time, the execution subject of the bulletin text may be determined according to the deleted subject information in the target text. For example, the subject information included in each target text may be recorded by a preset page script. The executive body may be the name of the executive body company, or may be information associated with the executive body company such as a legal person or stakeholder of the executive body company. And determining the execution main body of the notice text according to the association result obtained by the target text after the recognition model, so that the accuracy of extracting the execution main body can be effectively improved.
In the following, the embodiment will be described in detail by taking the advertisement text as an example of a judicial auction advertisement, firstly, the advertisement text to be processed may be obtained from a web page on which the judicial auction advertisement is published, and the content of the advertisement may be obtained after deleting the code symbol in the advertisement text, for example, the content of the advertisement is: "under the name of the executives C, under the name XX county XX photoconduction, minus one-floor underground mall storefront 255, the application: the height of the store front layer of the underground mall is about 5.8 meters, and the two layers on the ground are matched with rooms for amusement park management. Then, the notice text is divided into a plurality of clauses according to the separation Fu Cafen, and the divided clauses are respectively "the executed person C addresses in XX county XX photoconduction minus one-layer underground mall storefront 255", "use: business "," underground mall storefront level is about 5.8 meters "and" above ground two levels are matched with the house for amusement park management ". Then, the split multiple clauses can be screened according to whether the clauses comprise the main body information or not to obtain a target clause comprising the main body information, wherein the target clause is 'the underground mall storefront 255 of the under-executed person C in XX county XX photo city'. And carrying out preset processing on the target clause to obtain a target text which is a negative one-layer underground mall storefront of the XX photo city in XX county of the executed name. And then, inputting the target text into the recognition model to obtain a correlation result corresponding to the target text as a correlation. Finally, determining the execution subject of the bulletin text as 'C' according to the subject information included in the target text.
By adopting the method, the target text is obtained by splitting and presetting the acquired bulletin text to be processed, and the target text is input into the recognition model to obtain the association result corresponding to the target text, so that the execution subject of the bulletin text is determined according to the association result. Thus, the accuracy and recall rate of the execution subject in the extraction bulletin text can be effectively improved.
Fig. 2 is a flowchart illustrating another extraction method of an execution subject according to an exemplary embodiment, and as shown in fig. 2, the extraction of the plurality of clauses included in the bulletin text in step S102 may be implemented by:
And S1021, deleting the designated symbol in the bulletin text to obtain an initial bulletin text.
Wherein the specified symbol may be determined according to the type of the bulletin text.
For example, the specific symbol may be determined according to the type of the bulletin text, and the specific symbol may be understood as a code symbol. For example, if the bulletin text is an HTML text, the specified symbol may include HTML elements in the HTML text (i.e., code from a start tag to an end tag of the HTML text), and if the bulletin text is an XML text, the specified symbol may be an XML element in the XML text. The initial bulletin text obtained after deleting the specified symbol in the bulletin text contains the whole content of the bulletin (including Chinese and English and punctuation marks).
Step S1022, dividing the initial notice text according to preset separator to obtain multiple clauses.
For example, the initial bulletin text may be divided into a plurality of clauses according to a preset separator, and the separator may include commas, line breaks, question marks, semicolons, periods, exclamation marks, and the like punctuation marks capable of separating the respective clauses.
In step S103, the clause including the subject information in the plurality of clauses is taken as the target clause, which may be obtained in the present embodiment by two implementations:
in one possible implementation, each clause may be compared to a pre-established set of subject information, and if the clause matches the set of subject information, the clause is taken as the target clause.
The subject information set may be established by collecting a plurality of different subject information in advance, and the subject information set includes a plurality of subject information. For each clause, the clause may be compared with a pre-established set of body information, and if any one of the set of body information exists in the clause, then it may be determined that the clause matches the set of body information, and the clause is taken as a target clause.
In another possible implementation, semantic recognition may be performed on each clause to determine whether the clause includes subject information, and if the clause includes subject information, the clause is taken as the target clause.
For example, for each clause, the clause may be input into a pre-trained semantic recognition model to obtain the subject information in the clause output by the semantic recognition model, and further, if the clause includes the subject information, the clause may be used as the target clause. The clause can be identified according to a preset semantic identification algorithm to obtain the main body information in the clause output by the semantic identification algorithm, and further, if the clause contains the main body information, the clause can be used as a target clause.
Fig. 3 is a flowchart illustrating another method for extracting an execution body according to an exemplary embodiment, as shown in fig. 3, in step S103, the target text obtained by performing the preset processing on the target clause may be implemented by the following steps:
Step S1031, deleting invalid words in each target clause aiming at each target clause to obtain an initial text corresponding to each target clause.
Wherein the invalid words may include stop words, digits, and special symbols (e.g., delta). Deleting the invalid words in the target clause can effectively reduce noise in data processing and improve the efficiency of data processing.
Step S1032, performing de-duplication processing on the plurality of initial texts to obtain at least one intermediate text.
In order to avoid that the same initial text is repeatedly processed, resulting in a reduction in the efficiency of data processing, in this embodiment, a plurality of initial texts may be subjected to deduplication processing to improve the efficiency of data processing. That is, if the same initial text does not exist among the plurality of initial texts, the number of initial texts is the same as the number of intermediate texts. If the same initial text exists in the plurality of initial texts, the number of initial texts is larger than the number of intermediate texts.
And step S1033, deleting the main body information included in the intermediate text to obtain the target text.
Further, after the intermediate text is obtained, the subject information included in the intermediate text may be deleted. In particular, the subject information may be deleted in a masked manner, for example, a mask may be used instead of the subject information appearing in the intermediate text.
FIG. 4 is a flow chart illustrating a training method for an identification model, which may be trained as shown in FIG. 4, according to an exemplary embodiment, by:
step S401, a plurality of sample bulletin texts are acquired, and a plurality of sample target texts are determined according to the plurality of sample bulletin texts.
Wherein a plurality of sample target texts may be determined from each sample bulletin text. Accordingly, the plurality of sample bulletin texts may determine that the number of sample target texts is greater than the number of the plurality of sample bulletin texts. The method for determining the sample target text according to the sample bulletin text is the same as the method for determining the target text according to the bulletin text, and will not be repeated here.
Step S402, taking the sample target text as a sample input to obtain a sample input set comprising a plurality of sample inputs.
Step S403, a sample output set is obtained, where the sample output set includes sample outputs corresponding to each sample input, and each sample output includes a corresponding real association result to which the sample target text belongs.
For example, in training an identification model, a sample input set first needs to be acquired. The sample input set includes a plurality of sample inputs, the sample inputs can be sample target text, and the sample target text can be determined according to the sample bulletin text. Further, after the sample input set is obtained, a sample output set may be obtained. The sample output set includes a sample output corresponding to each sample input, each sample output including a true correlation result to which the corresponding sample target text pertains. Wherein the true association result includes an association and a non-association. The true association result of each sample output corresponding sample target text can be determined through a preset association rule.
By way of example, the preset association rule may include: if a keyword which is not associated with the main body information exists in a certain sample target text, the real association result of the sample target text is non-association; if the keyword associated with the main body information exists in a certain sample target text, the real association result of the sample target text is association. Taking the sample bulletin text as an example of a judicial auction bulletin, keywords not associated with subject information may include: auction institutions, auxiliary auction institutions, opening banks, places of deposit, manufacturers, regulations for XX, etc. extracted from institutions, provision of mortgage loan service for court auction houses, developers, court principals, institutions in address, specialized assessment institutions; the keywords associated with the subject information may include: collateral owners, subject matter owners, case principals (e.g., executives, application executives), bankruptcy managers, and the like. For example, if the sample target text includes an "auction facility", the sample target text includes a keyword that is not associated with the subject information, which indicates that the true association result to which the sample target text belongs is non-association; if the sample target text comprises a collateral person, the sample target text comprises keywords associated with the main body information, and the real association result of the sample target text is indicated as association.
Step S404, taking the sample input set as the input of the recognition model, and taking the sample output set as the output of the recognition model to train the recognition model.
For example, when training the recognition model, the sample input set may be used as an input of the recognition model, and the sample output set may be used as an output of the recognition model, so that when the sample input set is input, the output of the recognition model can be matched with the sample output set. For example, the amount of loss may be determined from the output of the recognition model and the sample output set, and the neuron parameters in the recognition model may be corrected by using a back propagation algorithm with the aim of reducing the amount of loss, and the neuron parameters may be, for example, weights (english: weight) and offsets (english: bias) of neurons. Repeating the steps until the loss meets the preset condition, for example, the loss is smaller than a preset loss threshold value, so as to achieve the aim of training the identification model. In this embodiment, the structure of the recognition model may be, for example, a two-class network, or may be CNN, etc., which is not particularly limited in this disclosure.
Fig. 5 is a flowchart of another training method for a recognition model according to an exemplary embodiment, as shown in fig. 5, the determining a plurality of sample target texts according to a plurality of the sample advertisement texts in step S401 may be implemented by:
step S4011, extracting a plurality of sample clauses included in each sample bulletin text.
For each sample bulletin text, the initial sample bulletin text may be obtained by deleting a specified symbol in the sample bulletin text. Wherein the specified symbol may be determined according to the type of the sample bulletin text. The initial sample bulletin text after deleting the specified symbol contains the entire content of the bulletin (including Chinese and English and punctuation marks). Then, the initial sample bulletin text may be divided according to a preset separator to obtain a plurality of sample clauses.
Step S4012, taking a sample clause including the main body information in the plurality of sample clauses as a sample target clause, and performing the preset processing on the sample target clause to obtain the sample target text, where the sample target text does not include the main body information.
For example, the sample clause including the subject information among the plurality of sample clauses may be obtained as the sample target clause by two implementations:
In one possible implementation, each sample clause may be compared to a pre-established subject information set, and if the sample clause matches the subject information set, the sample clause is taken as the sample target clause, and the subject information set includes a plurality of subject information;
In another possible implementation, semantic recognition may be performed on each sample clause to determine whether the sample clause includes subject information, and if the sample clause includes subject information, the sample clause is used as the sample target clause.
Then, the preset processing can be carried out on the sample target clauses, and for each sample target clause, invalid words in the sample target clause can be deleted to obtain an initial sample text corresponding to each sample target clause; performing de-duplication processing on a plurality of initial sample texts to obtain at least one middle sample text; and finally deleting the main body information included in the intermediate sample text to obtain the sample target text.
Fig. 6 is a flow chart illustrating another method of extracting an execution body according to an exemplary embodiment, and as shown in fig. 6, the method further includes the steps of:
step S106, associating the notice text with the execution subject.
Step S107, responding to the inquiry instruction of the execution subject and outputting the notice text.
After determining the execution subject of the bulletin text, it may be recorded that the bulletin text has an association relationship with the execution subject, that is, the bulletin text is associated with the execution subject. If the user needs to query the bulletin information related to the execution subject, a query instruction including the execution subject may be input, and accordingly, after receiving the query instruction, the user may query bulletin text (may be one or more) related to the execution subject, and output the bulletin text as a query result.
By adopting the method, the target text is obtained by splitting and presetting the acquired bulletin text to be processed, and the target text is input into the recognition model to obtain the association result corresponding to the target text, so that the execution subject of the bulletin text is determined according to the association result. Thus, the accuracy and recall rate of the execution subject in the extraction bulletin text can be effectively improved.
Fig. 7 is a schematic structural view of an extraction apparatus of an execution body according to an exemplary embodiment, and as shown in fig. 7, the apparatus 700 includes:
An acquiring module 701, configured to acquire a bulletin text to be processed;
An extracting module 702, configured to extract a plurality of clauses included in the bulletin text;
A processing module 703, configured to take a clause including the subject information of the multiple clauses as a target clause, and perform preset processing on the target clause to obtain a target text, where the target text does not include the subject information;
a first determining module 704, configured to input the target text to a pre-trained recognition model, and obtain a correlation result corresponding to the target text output by the recognition model;
And a second determining module 705, configured to determine, according to the subject information included in the target text, an execution subject of the bulletin text if the association result indicates that the target text is associated.
Alternatively, fig. 8 is a schematic structural diagram of an extraction apparatus of another execution body, as shown in fig. 8, and the extraction module 702 includes:
A first deleting submodule 7021, configured to delete a specified symbol in the bulletin text to obtain an initial bulletin text, where the specified symbol is determined according to the type of the bulletin text;
the dividing submodule 7022 is configured to divide the initial bulletin text according to a preset separator, so as to obtain a plurality of clauses.
Optionally, the processing module 703 is configured to compare each clause with a pre-established subject information set, and if the clause matches the subject information set, take the clause as the target clause, where the subject information set includes multiple kinds of subject information; or alternatively
And carrying out semantic recognition on each clause to determine whether the clause comprises main body information, and taking the clause as the target clause if the clause comprises the main body information.
Alternatively, fig. 9 is a schematic structural diagram of another extraction device of an execution body, as shown in fig. 9, and the processing module 703 includes:
A second deleting sub-module 7031, configured to delete, for each target clause, an invalid word in the target clause, to obtain an initial text corresponding to each target clause;
A deduplication sub-module 7032, configured to perform deduplication processing on a plurality of the initial texts, so as to obtain at least one intermediate text;
And a third deleting sub-module 7033, configured to delete the subject information included in the intermediate text, to obtain the target text.
Alternatively, fig. 10 is a schematic structural diagram of a training apparatus for an identification model according to an exemplary embodiment, and as shown in fig. 10, the identification model may be obtained by training by an apparatus 1000 for training the following identification model:
A sample acquiring module 1001, configured to acquire a plurality of sample bulletin texts, and determine a plurality of sample target texts according to the plurality of sample bulletin texts;
A third determining module 1002, configured to take the sample target text as a sample input, so as to obtain a sample input set including a plurality of sample inputs;
an output set obtaining module 1003, configured to obtain a sample output set, where the sample output set includes a sample output corresponding to each sample input, and each sample output includes a true association result to which the corresponding sample target text belongs;
the training module 1004 is configured to train the recognition model by taking the sample input set as an input of the recognition model and the sample output set as an output of the recognition model.
Optionally, fig. 11 is a schematic structural diagram of another training device for identifying a model, and as shown in fig. 11, the sample obtaining module 1001 includes:
An extraction submodule 10011 for extracting a plurality of sample clauses included in each sample bulletin text;
The processing sub-module 10012 is configured to take a sample clause including the subject information of the plurality of sample clauses as a sample target clause, and perform the preset processing on the sample target clause to obtain the sample target text, where the sample target text does not include the subject information.
Alternatively, fig. 12 is a schematic structural view of another extraction device for an execution body according to an exemplary embodiment, and as shown in fig. 12, the device 700 further includes:
An association module 706, configured to associate the bulletin text with the execution body;
And an output module 707 for outputting the bulletin text in response to the query instruction for the execution subject.
By adopting the device, the target text is obtained by splitting and presetting the acquired bulletin text to be processed, and the target text is input into the recognition model to obtain the association result corresponding to the target text, so that the execution subject of the bulletin text is determined according to the association result. Thus, the accuracy and recall rate of the execution subject in the extraction bulletin text can be effectively improved.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 13 is a block diagram of an electronic device 1300, according to an example embodiment. For example, the electronic device 1300 may be provided as a server. Referring to fig. 13, an electronic device 1300 includes a processor 1322, which may be one or more in number, and a memory 1332 for storing computer programs executable by the processor 1322. The computer program stored in the memory 1332 may include one or more modules each corresponding to a set of instructions. Further, the processor 1322 may be configured to execute the computer program to perform the above-described extraction method of the execution body.
In addition, the electronic device 1300 may further include a power component 1326 and a communication component 1350, the power component 1326 may be configured to perform power management of the electronic device 1300, and the communication component 1350 may be configured to enable communication of the electronic device 1300, e.g., wired or wireless communication. In addition, the electronic device 1300 may also include an input/output (I/O) interface 1358. The electronic device 1300 may operate based on an operating system stored in the memory 1332, such as Windows Server TM,Mac OS XTM,UnixTM,LinuxTM or the like.
In another exemplary embodiment, there is also provided a computer readable storage medium including program instructions which, when executed by a processor, implement the steps of the execution body extraction method described above. For example, the non-transitory computer readable storage medium may be the memory 1332 including program instructions described above that are executable by the processor 1322 of the electronic device 1300 to perform the method of extracting an execution body described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the method of the execution body described above when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the embodiments described above, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (8)

1. A method of performing extraction of a subject, the method comprising:
Acquiring a bulletin text to be processed;
Extracting a plurality of clauses included in the bulletin text;
Taking the clause comprising the main body information in the clauses as a target clause, and carrying out preset processing on the target clause to obtain a target text, wherein the target text does not comprise the main body information;
Inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model;
If the association result indicates that the target text is associated, determining an execution subject of the bulletin text according to the subject information included in the target text; the association is used for representing that the target text is associated with an execution subject, and subject information included in the target text indicates the execution subject;
The step of taking the clause including the main body information of the clauses as a target clause includes:
comparing each clause with a pre-established main body information set, and taking the clause as the target clause if the clause is matched with the main body information set, wherein the main body information set comprises a plurality of main body information; or alternatively
Carrying out semantic recognition on each clause to determine whether the clause comprises main body information, and taking the clause as the target clause if the clause comprises the main body information;
the step of performing preset processing on the target clause to obtain a target text includes:
Deleting invalid words in each target clause aiming at each target clause to obtain an initial text corresponding to each target clause;
performing de-duplication processing on a plurality of initial texts to obtain at least one intermediate text;
and deleting the main body information included in the intermediate text to obtain the target text.
2. The method of claim 1, wherein the extracting the plurality of clauses included in the bulletin text comprises:
deleting a designated symbol in the bulletin text to obtain an initial bulletin text, wherein the designated symbol is determined according to the type of the bulletin text;
dividing the initial bulletin text according to preset separators to obtain a plurality of clauses.
3. The method according to claim 1, wherein the recognition model is trained by:
acquiring a plurality of sample bulletin texts, and determining a plurality of sample target texts according to the plurality of sample bulletin texts;
taking the sample target text as a sample input to obtain a sample input set comprising a plurality of sample inputs;
Obtaining a sample output set, wherein the sample output set comprises sample outputs corresponding to each sample input, and each sample output comprises a corresponding real association result to which the sample target text belongs;
the sample input set is used as the input of the recognition model, and the sample output set is used as the output of the recognition model to train the recognition model.
4. The method of claim 3, wherein said determining a plurality of sample target texts from a plurality of said sample bulletin texts comprises:
extracting a plurality of sample clauses included in each of the sample bulletin texts;
Taking a sample clause comprising main body information in the plurality of sample clauses as a sample target clause, and carrying out the preset processing on the sample target clause to obtain the sample target text, wherein the sample target text does not comprise the main body information.
5. The method according to any one of claims 1 to 4, further comprising:
Associating the bulletin text with the execution subject;
and outputting the bulletin text in response to a query instruction for the execution subject.
6. An extraction device of an execution body, characterized in that the device comprises:
The acquisition module is used for acquiring the bulletin text to be processed;
The extraction module is used for extracting a plurality of clauses included in the bulletin text;
The processing module is used for taking the clauses comprising the main body information in the clauses as target clauses, and carrying out preset processing on the target clauses to obtain target texts, wherein the target texts do not comprise the main body information;
the first determining module is used for inputting the target text into a pre-trained recognition model to obtain a correlation result corresponding to the target text output by the recognition model;
the second determining module is used for determining an execution subject of the bulletin text according to the subject information included in the target text if the association result indicates that the target text is associated; the association is used for representing that the target text is associated with an execution subject, and subject information included in the target text indicates the execution subject;
The processing module is used for comparing each clause with a pre-established main body information set, and if the clause is matched with the main body information set, the clause is used as the target clause, and the main body information set comprises a plurality of main body information; or carrying out semantic recognition on each clause to determine whether the clause comprises main body information, and if the clause comprises the main body information, taking the clause as the target clause;
The processing module is used for deleting invalid words in each target clause aiming at each target clause to obtain an initial text corresponding to each target clause; performing de-duplication processing on a plurality of initial texts to obtain at least one intermediate text; and deleting the main body information included in the intermediate text to obtain the target text.
7. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of claims 1 to 5.
8. An electronic device, comprising:
a memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1 to 5.
CN202211047583.4A 2021-10-21 2022-08-29 Execution body extraction method and device, storage medium and electronic equipment Active CN115329756B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111229601.6A CN114048736A (en) 2021-10-21 2021-10-21 Execution subject extraction method and device, storage medium and electronic equipment
CN2021112296016 2021-10-21

Publications (2)

Publication Number Publication Date
CN115329756A CN115329756A (en) 2022-11-11
CN115329756B true CN115329756B (en) 2024-07-05

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582954A (en) * 2018-01-24 2019-04-05 广州数知科技有限公司 Method and apparatus for output information
CN111145052A (en) * 2019-12-26 2020-05-12 北京法意科技有限公司 Structured analysis method and system of judicial documents

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582954A (en) * 2018-01-24 2019-04-05 广州数知科技有限公司 Method and apparatus for output information
CN111145052A (en) * 2019-12-26 2020-05-12 北京法意科技有限公司 Structured analysis method and system of judicial documents

Similar Documents

Publication Publication Date Title
CN108874777B (en) Text anti-spam method and device
US20150356091A1 (en) Method and system for identifying microblog user identity
CN109508458B (en) Legal entity identification method and device
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
CN108021582B (en) Internet public opinion monitoring method and device
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN110837601A (en) Automatic classification and prediction method for alarm condition
CN109446299B (en) Method and system for searching e-mail content based on event recognition
CN114896305A (en) Smart internet security platform based on big data technology
CN113076735A (en) Target information acquisition method and device and server
CN112149386A (en) Event extraction method, storage medium and server
CN112507176A (en) Automatic determination method and device for domain name infringement, electronic equipment and storage medium
CN111581956A (en) Sensitive information identification method and system based on BERT model and K nearest neighbor
CN112613293A (en) Abstract generation method and device, electronic equipment and storage medium
CN113626704A (en) Method, device and equipment for recommending information based on word2vec model
CN113282754A (en) Public opinion detection method, device, equipment and storage medium for news events
CN117609479B (en) Model processing method, device, equipment, medium and product
CN113051384B (en) User portrait extraction method based on dialogue and related device
CN113569118A (en) Self-media pushing method and device, computer equipment and storage medium
CN115329756B (en) Execution body extraction method and device, storage medium and electronic equipment
CN115905885A (en) Data identification method, device, storage medium and program product
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
CN113095073B (en) Corpus tag generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230731

Address after: 224008 Rooms 404-405 and 504, Building B-17-1, Big data Industrial Park, Kecheng Street, Yannan High tech Zone, Yancheng, Jiangsu Province

Applicant after: Yancheng Tianyanchawei Technology Co.,Ltd.

Address before: 224008 room 501-503, building b-17-1, Xuehai road big data Industrial Park, Kecheng street, Yannan high tech Zone, Yancheng City, Jiangsu Province (CNK)

Applicant before: Yancheng Jindi Technology Co.,Ltd.

GR01 Patent grant