CN114528409A

CN114528409A - Method and device for evaluating extraction result of element information of letter and visit article

Info

Publication number: CN114528409A
Application number: CN202210401206.XA
Authority: CN
Inventors: 陈一朴; 宋琪; 韦崟屹; 吴展环; 冀相冰
Original assignee: Beijing Peking University Software Engineering Co ltd
Current assignee: Beijing Peking University Software Engineering Co ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-05-24

Abstract

The application provides a method and a device for evaluating the extraction result of element information of a letter, wherein the method comprises the steps of obtaining a first element set of the element information of the letter extracted by a machine and a second element set of the element information of the letter extracted manually; matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set; and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set. The method can achieve the effect of accurately evaluating the element information of the manually extracted letters.

Description

Method and device for evaluating extraction result of element information of letter and visit article

Technical Field

The present invention relates to the field of information evaluation, and in particular, to a method and an apparatus for evaluating an extraction result of a piece of letter element information.

Background

With the development of big data and artificial intelligence technology, the text classification and entity relation extraction model based on deep learning is widely applied, and can achieve high accuracy rate on the task of extracting the factors of the letters.

However, in a considerable business scenario, for example, a machine cannot work or hardware resources do not meet requirements, the petition element extraction needs to be performed by manually conducting petition case handling, and therefore, quality evaluation and verification work needs to be performed on the former manual petition case handling.

Therefore, how to accurately evaluate the extracted elements after manually handling the letter and visit cases is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application aims to provide a method for scoring the extracted elements, and the effect of accurately evaluating the element information of the manually extracted letters can be achieved through the technical scheme of the embodiment of the application.

In a first aspect, an embodiment of the present application provides a method for evaluating an extraction result of a piece of letter element information, including obtaining a first element set of the piece of letter element information extracted by a machine and a second element set of the piece of letter element information extracted manually; matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set; and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set.

In the process, the machine can achieve very high accuracy in extracting the element information of the letters and the letters, and the element information of the letters and the letters extracted by the machine has a uniform measurement standard, so that the element information of the letters and the letters extracted by the machine is compared with each corresponding element information of the letters and the letters extracted by the manual work, the manually extracted element information is evaluated based on the comparison result of all the elements, and the effect of accurately evaluating the element information of the letters and the letters extracted by the manual work can be achieved.

In one embodiment, the letter element information comprises at least one of the following element information:

multi-classification element information, multi-level address element information, name element information, certificate number element information, mobile phone number element information, multi-level multi-classification element information and letter and visit general element information.

In the process, the element information of the letters and the letters extracted by the machine is divided into various information, and the element information of the letters and the letters extracted by the machine is compared with the element information of the letters and the letters extracted by the manual work based on the various element information, so that the comparison result of the element information of the letters and the elements information of the letters and the letters extracted by the machine can be obtained more comprehensively, and the accurate evaluation of the element information extracted by the manual work can be realized.

In one embodiment, when the letter and visit component information includes any one of name component information, certificate number component information, mobile phone number component information, multi-class component information, multi-level address component information, and multi-level multi-class component information, matching each component information in the second component set with each corresponding component information in the first component set to obtain a score of each component information in the second component set, includes:

matching each element information in the second element set with each corresponding element information in the first element set, and when any element information in the second element set is the same as the corresponding element information in the first element set, giving a full mark to any element information;

when any element information in the second element set is different from the corresponding element information in the first element set, a zero score is given to any element information.

In the process, the manually extracted name element information, certificate number element information, mobile phone number element information, multi-classification element information, multi-level address element information and multi-level multi-classification element information can be simply and accurately scored in a mode that full scores are identical and zero scores are different through comparison of the element information extracted by two different extraction methods.

In one embodiment, when the element information of the letter is letter-visit general element information, matching each element information in the second element set with each corresponding element information in the first element set to obtain a score of each element information in the second element set, includes:

matching the petition general information element information in the second element set with the corresponding petition general information element information in the first element set to obtain general similarity of the petition general information element information in the second element set and the corresponding petition general information element information in the first element set;

and taking the product of the profile similarity and the total score of the corresponding petition profile element information in the second element set as the score of the corresponding petition profile element information in the second element set.

In the above process, for the content description type text such as the petition profile element information, the degree of similarity of the two texts can be determined by the text similarity of the two texts, and finally the petition profile element information can be scored according to the total score and by using the degree of similarity.

In one embodiment, matching the petition profile element information in the second element set with the corresponding petition profile element information in the first element set to obtain a similarity between the petition profile element information in the second element set and the corresponding petition profile element information in the first element set includes:

converting the petition general profile element information in the second element set and the corresponding petition general profile element information in the first element set into vectors through a vector model to obtain a second petition general profile element information vector and a first petition general profile element information vector;

and calculating the cosine similarity of the second petition overview element information vector and the first petition overview element information vector to obtain the similarity of the petition overview element information in the second element set and the corresponding petition overview element information in the first element set.

In the process, the similarity of the element information obtained by two different extraction methods is judged, the two element information can be converted into vectors, and the similarity of the two element information is judged through the cosine similarity of the two vectors, so that the similarity of the two element information is visually embodied by using a data result, and the accuracy is higher.

In one embodiment, determining the evaluation result of the letter element information based on the score of each element in the second element set comprises:

distributing weight to each element in the second element set according to a preset rule;

and weighting and summing the weight of each element in the second element set and the score of each element in the second element set to obtain an evaluation result of manually extracting the element information of the letters.

In the process, each element is assigned with a weight, so that the total evaluation can be more accurate when manually extracted element evaluation is carried out.

In one embodiment, assigning a weight to each element in the second set of elements according to a preset rule includes:

distributing weight to the elements in a multiple increasing manner according to the sequence of the priority of the element information in the second element set from low to high;

or

Assigning weights to the elements in descending order from front to back according to the positions of the element information in the second element set;

or

And assigning weights to the element information and the like in the second element set.

In the above process, a weight may be reasonably assigned to each element according to the priority and position of each element, or a weight may be assigned to each element without the predetermined time or the like, and the final evaluation result may be more accurate by the above method of assigning weights according to the degree of importance or the order of priority.

In one embodiment, before matching each element in the first set of elements with each corresponding element in the second set of elements to obtain a score for each element in the second set of elements, the method further comprises:

a top-score for each element in the second set of elements is determined based on the importance of each element in the second set of elements.

In the above process, according to different importance degrees of different element information on the letters, the highest score set for each element information is different, and the highest score of the important element information is higher, so that the evaluation of the whole letters is more accurate finally.

In a second aspect, an embodiment of the present application provides an apparatus for evaluating an extraction result of element information of a letter, including:

the acquisition module is used for acquiring a first element set of the element information of the machine-extracted letter and a second element set of the element information of the manual-extracted letter;

the matching module is used for matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set;

and the determining module is used for determining an evaluation result of manually extracting the element information of the letters based on the score of each element information in the second element set.

Optionally, the letter element information includes at least one of the following element information:

multi-classification element information, multi-level address element information, name element information, certificate number element information, mobile phone number element information, multi-level multi-classification element information and petition profile element information.

Optionally, the matching module is specifically configured to:

when the letter and visit component element information comprises any one of name element information, certificate number element information, mobile phone number element information, multi-classification element information, multi-level address element information and multi-level multi-classification element information, matching each element information in the second element set with each corresponding element information in the first element set, and when any element information in the second element set is the same as the corresponding element information in the first element set, giving a full mark to any element information;

Optionally, the matching module is specifically configured to:

when the element information of the letter of visit is the element information of the letter of visit general, matching the element information of the letter of visit general in the second element set with the corresponding element information of the letter of visit general in the first element set to obtain the similarity between the element information of the letter of visit general in the second element set and the corresponding element information of the letter of visit general in the first element set;

and taking the product of the similarity of the petition profile element information in the second element set and the corresponding petition profile element information in the first element set and the total score of the corresponding petition profile element information in the second element set as the score of the corresponding petition profile element information in the second element set.

Optionally, the matching module is specifically configured to:

Optionally, the determining module is specifically configured to:

distributing weight to the elements in the second element set in a manner of increasing the priority of the elements from low to high in multiple;

or

Assigning weights to the elements in the second element set in a descending order from front to back according to the positions of the elements in the second element set;

or

And weighting the elements in the second element set and the like.

Optionally, the apparatus further comprises:

and the second determining module is used for determining the full score of each element in the second element set based on the importance degree of each element in the second element set before each element in the first element set is matched with each corresponding element in the second element set by the matching module to obtain the score of each element in the second element set.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the steps in the method as provided in the first aspect are executed.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps in the method as provided in the first aspect.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a method for evaluating an extraction result of a piece of letter element information according to an embodiment of the present application;

fig. 2 is a flowchart of a method for assigning a score to an extraction result of a piece of letter element information according to an embodiment of the present application;

fig. 3 is a flowchart of another method for assigning a score to an extraction result of the information of a letter element according to an embodiment of the present application;

fig. 4 is a flowchart of a method for calculating similarity of results of extracting information of a letter element in different extraction manners according to an embodiment of the present application;

fig. 5 is a flowchart of another method for evaluating the extraction result of the element information of the letter according to the embodiment of the present application;

fig. 6 is a flowchart of a method for assigning a weight to an extraction result of a piece of letter element information according to an embodiment of the present application;

fig. 7 is a schematic diagram illustrating a detailed implementation of a method for evaluating an extraction result of a piece of letter element information according to an embodiment of the present application;

fig. 8 is a schematic block diagram of an apparatus for evaluating an extraction result of a piece of letter element information according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus for evaluating an extraction result of a piece of letter element information according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

First, some terms referred to in the embodiments of the present application will be described to facilitate understanding by those skilled in the art.

The terminal equipment: may be a mobile terminal, a fixed terminal, or a portable terminal such as a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system device, personal navigation device, personal digital assistant, audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that the terminal device can support any type of interface to the user (e.g., wearable device), and the like.

A server: the cloud server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, big data and artificial intelligence platform and the like.

The regular expression is as follows: as a common text matching tool in the field of computer science, the text matching tool can be used for retrieving and replacing texts conforming to a certain mode (rule) on the premise of providing an effective template. The elastic search (search server) as a common retrieval tool can also realize the task of text matching.

The word segmentation of the crust: the Chinese word segmentation tool can complete word segmentation at a very high speed and achieve quite high accuracy, and achieves quite good balance in real-time performance and accuracy. The final participle can also add weight to the designated word by adding a dictionary by the user, so that the word can be more easily taken as a word in the participle result.

Word2vec (Word vector model): is a correlation model used to generate word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic word text. The network is represented by words and the input words in adjacent positions are guessed, and the order of the words is unimportant under the assumption of the bag-of-words model in word2 vec. After training is completed, the word2vec model can be used to map each word to a vector, which can be used to represent word-to-word relationships, and the vector is a hidden layer of the neural network.

The method is applied to the scene of text similarity matching, and the specific scene is that in the process of processing a large number of letters, the letters need to be processed manually due to the fact that internal memory is insufficient or a machine is damaged, so that evaluation of manually extracted element information can be completed by comparing the element information extracted from the letters with the element information extracted from the letters by the machine afterwards.

The system and the method have the advantages that the assessment of the manual handling of the letters and the unification standard are made, and the efficiency of manual handling of the letters by relevant departments of all levels of governments can be effectively improved.

However, in a great number of current business scenes, the letter and visit case handling needs to be manually carried out to extract letter and visit case elements, and in addition, the quality evaluation and verification work needs to be carried out on the existing manual letter and visit case handling, so that how to carry out the quality evaluation work on the manual letter and visit case handling becomes a challenging task.

The method comprises the steps that a first element set of element information of the letters is extracted through an acquisition machine, and a second element set of element information of the letters is extracted manually; matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set; and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set. In the process, the machine can achieve very high accuracy in extracting the element information of the letters and the letters, and the element information of the letters and the letters extracted by the machine has a uniform measurement standard, so that the element information of the letters and the letters extracted by the machine is compared with each corresponding element information of the letters and the letters extracted by the manual work, the manually extracted element information is evaluated based on the comparison result of all the elements, and the effect of accurately evaluating the element information of the letters and the letters extracted by the manual work can be achieved.

In this embodiment of the application, the execution subject may be a device for evaluating the extraction result of the letter element information in the evaluation system, and in practical application, the device for evaluating the extraction result of the letter element information may be an electronic device such as a terminal device and a server device, which is not limited herein.

The method for evaluating the extraction result of the information of the letter elements in the embodiment of the present application is described in detail below with reference to fig. 1.

Referring to fig. 1, fig. 1 is a flowchart of a method for evaluating an extraction result of an element information of a letter according to an embodiment of the present application, where the method for evaluating the extraction result of the element information of the letter shown in fig. 1 includes:

step 110: the acquisition machine extracts a first element set of the element information of the letters and a second element set of the element information of the letters, which is extracted manually.

In the above process, the first element set of the machine-extracted letter element information is acquired and used for evaluating the second element set of the manual-extracted letter element information.

The machine is generally a machine with an extracted element information model. Each element information in the second element set corresponds to one of the element information in the first element set. The extraction task of the extraction model can reach an accuracy rate of more than 95%, so that the letter element information extracted by the model can be well used as an evaluation standard for manually extracting the letter element information.

Specifically, the letter element information comprises at least one of the following element information:

The multi-classification element information comprises multi-classification element information (a visitor certificate type, a visitor purpose and the like) and binary classification element information (related to complaints, whether the complaints are raised or not and the like); the multi-level address element information comprises a visitor address, a visitor detailed address, a problem address and the like; the name element information comprises names of the visitors, the names of the visitors and the like; the certificate number element information comprises a certificate number of the visitor; the mobile phone number element information comprises a mobile phone number of a visitor; the multi-level multi-classification element information includes content classification; the information of the elements of the letter visit profile includes information of the elements of the letter visit profile, and the information of the elements of the letter visit is not limited to this.

Step 120: and matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set.

In the above process, the score of each element information in the second element set is accurately determined by matching the corresponding element information.

The matching mode may be text matching, or similarity matching. In the above process, if the number of manually extracted elements is smaller than the number of machine extracted elements, 0 point is directly assigned to the element which is not manually extracted, and if the manually extracted element information is larger than the element information extracted by the machine and the total number of the letter element information is the same as the number extracted by the machine, for example, only two pieces of mobile phone number element information are provided, two pieces of machine extracted element information are provided, and three pieces of mobile phone number element information are manually extracted, 0 point is assigned to the manually extracted element information, wherein, in general, the manual extraction of element information such as certificate numbers and names is performed frequently.

Specifically, before performing step 120, the following steps may also be adopted:

Wherein, the score of full scale is the highest score of element information evaluation and is also the highest score set manually. The importance degree represents the importance degree of different element information to the letter, for example, the abstract part of the letter summarizes the whole content of the whole letter and can be used as the most important part of information.

Specifically, when the letter/visit component information includes any one of name component information, certificate number component information, mobile phone number component information, multi-class component information, multi-level address component information, and multi-level multi-class component information, the steps shown in fig. 2 may be employed when step 120 is executed.

Referring to fig. 2, fig. 2 is a flowchart of a method for assigning a score to an extraction result of a piece of letter element information according to an embodiment of the present application, where the method for assigning a score to an extraction result of a piece of letter element information shown in fig. 2 includes:

step 121: and matching each element information in the second element set with each corresponding element information in the first element set, and if any element information in the second element set is the same as the corresponding element information in the first element set, giving a full mark to any element information.

Step 122: when any element information in the second element set is different from the corresponding element information in the first element set, a zero score is given to any element information.

The matching of the text can be completed by means of the regular expression, the matching result shows that full scores are given to the text completely as the matching result shows that zero scores are given to the text incompletely as the matching result shows that the matching of the element information can be completed in a similarity matching mode, the similarity is 100% full scores, and the similarity is less than 100% zero scores.

Specifically, when the petition information element information is petition profile element information, the steps shown in fig. 3 may be adopted when step 120 is executed.

Referring to fig. 3, fig. 3 is a flowchart of another method for assigning a score to an extraction result of a piece of letter element information according to an embodiment of the present application, where the method for assigning a score to an extraction result of a piece of letter element information shown in fig. 3 includes:

step 1201: and matching the petition general information in the second element set with the corresponding petition general information in the first element set to obtain the general similarity of the petition general information in the second element set and the corresponding petition general information in the first element set.

Step 1202: and taking the product of the similarity of the general profile and the total score of the corresponding petition profile element information in the second element set as the score of the corresponding petition profile element information in the second element set.

The total score of the petition general element information may be set by the user according to the requirement, and the score calculation method of the petition general element information is, for example, that the similarity between the petition general element information in the second element set and the corresponding petition general element information in the first element set is 70%, and the total score is 100, and the score of the petition general element information is 70. The text similarity calculation method can also be used for performing fine-grained segmentation on the text, then performing feature construction on each feature, and finally completing similarity measurement according to the feature construction, wherein the common fine-grained segmentation comprises the following steps: the method comprises the following steps of (1) original character strings, n-grams (tuples), words, syntactic analysis results, topic models and the like, wherein a common similarity measurement method comprises the following steps: the method comprises the following steps of minimum editing distance, word shift distance, Euclidean distance, cosine distance, Jacard similarity, Hamming distance and the like, and the common characteristic construction method comprises the following steps: TF-IDF (feature extraction), BM25 (similarity algorithm), word vector, sentence vector, Simhash (duplication checking algorithm), and the like.

Specifically, when step 1201 is executed, the steps shown in fig. 4 may also be adopted.

Referring to fig. 4, fig. 4 is a flowchart of a method for calculating similarity of results of extracting the information of the letters by different extraction methods according to an embodiment of the present application, where the method for calculating similarity of results of extracting the information of the letters by different extraction methods shown in fig. 4 includes:

step 12011: and converting the petition profile element information in the second element set and the corresponding petition profile element information in the first element set into vectors through a vector model to obtain a second petition profile element information vector and a first petition profile element information vector.

Step 12012: and calculating the cosine similarity of the second petition overview element information vector and the first petition overview element information vector to obtain the similarity of the petition overview element information in the second element set and the corresponding petition overview element information in the first element set.

The Word obtained after the petition general element information is segmented can obtain the vector representation of each Word through the Word2Vec model, and the vectors of all the words in the petition general element information are aggregated to obtain the vector of the petition general element information. In addition, the vector of the element information of the petition overview can be obtained by directly taking RoBERTA (pre-training model) through BERT-avg (matching model), the pre-training model can carry out forward propagation on each petition overview element information twice by using Dropout mask (discarding occlusion method) in the basic model to obtain two different vectors of the petition overview element information, the vector pair obtained by the same petition overview element information through the model is used as a positive sample pair, and for each vector of the petition overview element information, the vector generated by other petition overview element information is selected as a negative sample to train the model.

When two different pieces of visit general-purpose element information are compared, a sentence vector is generated through a sentence vector model, but the effect is not good due to the anisotropy, for this reason, a vector of the visit general-purpose element information represented by a standard orthogonal base can be obtained through BERT-while (semantic similarity search tool), the vector of the BERT (language representation) visit general-purpose element information can be converted into Gaussian distribution through normative Flows by using a BERT-flow (vector transformation model) for relieving the anisotropy, and the SBERT (semantic similarity model) is subjected to Fine-tuning through a Bi-Encoder (double-tower model) for obtaining better vector representation of the visit general-purpose element information.

In addition, the vector difference is compared by adopting an SimCSE (unsupervised contrast learning) method, the method has an unsupervised mode and a supervised mode, the unsupervised mode depends on Dropout to generate sentences with similar semantics for training, the supervised mode needs to construct a corresponding data set, the core idea of the SimCSE is contrast learning, and the aim is to compare the difference of the vectors of two pieces of information of the petition general profile elements.

Step 130: and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set.

In the process, the element information can be accurately and manually extracted for the whole letter through the scoring of each element information in the second element set to make evaluation.

Specifically, in performing step 130, the steps shown in fig. 5 may be employed.

Referring to fig. 5, fig. 5 is a flowchart of another method for extracting a result of a letter element information according to an embodiment of the present application, where the method for evaluating the extraction result of the letter element information shown in fig. 5 includes:

step 131: and assigning a weight to each element in the second element set according to a preset rule.

Step 132: and weighting and summing the weight of each element in the second element set and the score of each element in the second element set to obtain an evaluation result of manually extracting the element information of the letters.

Specifically, when step 131 is executed, the steps shown in fig. 6 may also be adopted:

referring to fig. 6, fig. 6 is a flowchart of a method for assigning a weight to a result of extracting a piece of letter element information according to an embodiment of the present application, where the method for assigning a weight to a result of extracting a piece of letter element information shown in fig. 6 includes:

step 1311: and distributing the weight of the elements in a multiple increasing mode according to the sequence of the priority of the element information in the second element set from low to high.

Step 1312: or the weight is distributed to the elements in descending order from front to back according to the position of the element information in the second element set.

Step 1313: or the weight is assigned to the element information and the like in the second element set.

The priority order may be the priority in the multi-level content in the element information, the priority determined according to the importance degree, the priority determined according to the appearance sequence, or the like.

According to the method, a first element set of element information of the letters extracted by a machine and a second element set of element information of the letters extracted manually are obtained; matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set; and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set. The method can achieve the effect of accurately evaluating the element information of the manually extracted letters.

The following describes in detail a method for evaluating the extraction result of the letter element information by a specific embodiment.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a detailed implementation of a method for evaluating an extraction result of a piece of letter element information according to an embodiment of the present application.

The element information used for model extraction in the embodiment is an evaluation standard, and the manually extracted element information is evaluated.

The manual extraction element information and the machine extraction element information of the letter and visit article comprise at least one of element information such as multi-classification element information, multi-level address element information, name element information, certificate number element information, mobile phone number element information, multi-level multi-classification element information, letter and visit general element information, and the method for extracting element results by the model comprises the steps of whether equal matching is performed or not, multi-level classification matching is performed, address matching is performed, multi-content matching is performed, text similarity matching is performed and the like.

In this embodiment, when the quality evaluation is performed on the letters, the result of manually extracting the element information and the model from the letters is: the number of visitors, the name of the visitor, the mobile phone number of the visitor, the certificate type of the visitor, the certificate number of the visitor, the name of the visitor, the purpose of the visitor, the complaint, whether the comment is raised, the content classification, the address of the visitor, the detailed address of the visitor, the location of the problem and the general information of the visitor are 14 factors.

Wherein the information of the petty object element comprises a petty name, a petty mobile phone number, a petty certificate type, a petty certificate number, a petty address and a petty detailed address; the multi-classification element information shares the certificate type and the purpose of the petition; the information of the second classification elements shares related complaints and whether the complaints are related; the multi-level address information shares the address of the visitor, the detailed address of the visitor and the problem address; and the three-level multi-classification element information is content classification.

In one embodiment, the element information such as the name of the visitor, the mobile phone number of the visitor, the type of the letter of the visitor, and the letter number of the visitor in the element information of the visitor object extracted manually may be assigned with equal weight, for example, the highest score of each element information may be set to 100 scores when assigning, the name of the visitor, the mobile phone number of the visitor, the type of the letter of the visitor, and the letter number of the visitor in the element information of the visitor object extracted manually may be scored by a method of matching with each other, for example, the mobile phone number of the visitor extracted manually and the corresponding mobile phone number of the visitor extracted manually may be extracted by a model and subjected to similarity matching, if the similarity is 100%, the information of the manually extracted mobile phone number of the visitor may be assigned with 100 scores, if the similarity is not 100%, the information of the manually extracted mobile phone number of the visitor may be assigned with 0 score. In addition, the method of matching the letter or not can also be used for assigning scores to the purposes of petition, complaints, language and the name of the addressee.

In one embodiment, before scoring each element information in the manually extracted multilevel address element information by means of address matching, a local noun dictionary may be constructed according to a local noun library, and the local noun dictionary is added to a word segmentation tool, for example, a final word segmentation tool. Then, the extracted multi-level address information can be accurately segmented, then the address name after segmentation is subjected to equal weight assignment or the weight of a certain address after segmentation is increased through a segmentation tool, for example, each element information in the manually extracted multi-level address information is subjected to equal weight assignment through a segmentation tool in a Beijing market Haizu area, or the importance degree of the address in a dictionary is higher or the weight of the element information is increased through directly performing address matching with the address in the administrative region division rule of China, for example, the two place name words in the Beijing market and the Haizu area are fully assigned for 100 minutes, each place name word is fully assigned for 50 minutes, the place name word is correctly assigned for 50 Beijing 0 minutes, otherwise, the place name word is incorrectly assigned for 0 minutes, or the weight of the Beijing market is increased, the place name word in the Beijing market is fully assigned for 60 minutes, and the full score of the place name word can be set as 40 minutes, similarly, the place name is correctly assigned with the full score, the wrong place name is assigned with the 0 score, and the sum of all the noun scores is used as the total score of the address.

The administrative regions are divided as follows: the first-level provincial administrative district includes: provinces, autonomous regions, direct municipalities and special administrative districts; the second grade place administrative district includes: municipality, region, autonomous state, and union; the third-level county administrative district comprises: prefectures, county-level cities, counties, autonomous counties, flags, autonomous flags, special districts and forest districts; the fourth grade rural administrative area includes: street, town, county, sappan wood, and county. Therefore, when address matching is carried out, matching of each address element information can be completed through a more detailed administrative region rule.

In one embodiment, the score of the manually extracted element information of the addressee is sorted according to the proposed sequence of the addressee, the weight is decreased by half from the first name of the addressee, and the score is given by a method of judging whether the names are equal or not.

In one embodiment, when the manually extracted three-level multi-class element information is scored by multi-content matching and multi-level class matching, when the weight is distributed, the weight distributed from the first level is increased in multiple times, each level is assigned with a score by a method of judging whether the weight is equal to the weight, the score is added in three levels to serve as the total score of the three-level multi-class element information, however, the next level can be assigned with a score only when the previous level is correct, and if the previous level is assigned with 0 score, the next content is also assigned with 0 score even if the previous level is correct.

In one embodiment, manually extracted letter profile factor information is scored by text similarity matching, typically comparing a manually extracted letter digest to a model extracted letter digest. Converting two abstracts obtained by two modes into corresponding vectors, calculating the similarity of the abstracts obtained by the two extraction modes in a mode of calculating the cosine similarity of the two vectors, and scoring the general element information of the manually extracted letters in equal proportion according to the similarity result, wherein for example, if the similarity is 50 percent and the full score is 100 minutes, the general element information of the manually extracted letters is assigned 50 minutes, and if the similarity is 0 or less, the general element information of the manually extracted letters is assigned 0 minute.

In one embodiment, 14 kinds of element information including the number of visitors, the name of the visitor, the mobile phone number of the visitor, the certificate type of the visitor, the certificate number of the visitor, the name of the addressee, the purpose of the visitor, the complaint, whether the complaint is raised, content classification, the address of the visitor, the detailed address of the visitor, the problem address and the general information of the visitor are divided into 8 parts, wherein the name of the visitor, the mobile phone number of the visitor, the certificate type of the visitor, the certificate number of the visitor, the address of the visitor and the detailed address of the visitor are one part, the number of the visitor is not counted, the rest of the element information are one part, the final score is obtained by weighting the 8 parts with equal weight, and the evaluation of the extraction result of the element information of the visitor can be completed.

The method of evaluating the extraction result of the letter element information is described above by fig. 1 to 7, and the apparatus of evaluating the extraction result of the letter element information is described below with reference to fig. 8 to 9.

Referring to fig. 8, a schematic block diagram of an apparatus 800 for evaluating an extraction result of a piece of letter element information provided in an embodiment of the present application is shown, where the apparatus 800 may be a module, a program segment, or code on an electronic device. The apparatus 800 corresponds to the above-mentioned embodiment of the method of fig. 1, and can perform various steps related to the embodiment of the method of fig. 1, and specific functions of the apparatus 800 can be referred to the following description, and detailed descriptions are appropriately omitted herein to avoid redundancy.

Optionally, the apparatus 800 includes:

an obtaining module 810, configured to obtain a first element set of machine-extracted letter element information and a second element set of manual-extracted letter element information;

a matching module 820, configured to match each piece of factor information in the second factor set with each piece of factor information corresponding to the first factor set, so as to obtain a score of each piece of factor information in the second factor set;

and the determining module 830 is a matching module and is configured to determine an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set.

Optionally, the matching module is specifically configured to:

when the element information of the letter of visit is the element information of the letter of visit general, matching the element information of the letter of visit general in the second element set with the corresponding element information of the letter of visit general in the first element set to obtain the general similarity of the element information of the letter of visit general in the second element set and the corresponding element information of the letter of visit general in the first element set;

and taking the product of the similarity of the general profile and the total score of the corresponding petition profile element information in the second element set as the score of the corresponding petition profile element information in the second element set.

Optionally, the matching module is specifically configured to:

Optionally, the determining module is specifically configured to:

or

Optionally, the apparatus further comprises:

Referring to fig. 9, a schematic block diagram of an apparatus 900 for evaluating the extraction result of the letter element information provided in the embodiment of the present application is shown, and the apparatus may include a memory 910 and a processor 920. Optionally, the apparatus may further include: a communication interface 930, and a communication bus 940. The apparatus corresponds to the above-mentioned embodiment of the method of fig. 1, and can perform various steps related to the embodiment of the method of fig. 1, and specific functions of the apparatus can be referred to the following description.

In particular, memory 910 is used to store computer readable instructions.

Processor 920, configured to process the readable instructions stored in the memory, is capable of performing the steps of embodiments 110 to 130 of the method of fig. 1.

A communication interface 930 for communicating signaling or data with other node devices. For example: the method and the device for communication with the server or the terminal, or with other device nodes are used, and the embodiments of the application are not limited thereto.

And a communication bus 940 for realizing direct connection communication of the above components.

In this embodiment, the communication interface 930 of the device in this application is used for performing signaling or data communication with other node devices. The memory 910 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 910 may optionally be at least one memory device located remotely from the processor. The memory 910 stores computer readable instructions, and when the computer readable instructions are executed by the processor 920, the electronic device executes the method process shown in fig. 1. A processor 920 may be used on the apparatus 800 and to perform functions in the present application. The Processor 920 may be, for example, a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component, and the embodiments of the present Application are not limited thereto.

Embodiments of the present application further provide a readable storage medium, and when being executed by a processor, the computer program performs a method process performed by an electronic device in the method embodiment shown in fig. 1.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

In summary, the present application provides a method, an apparatus, an electronic device, and a readable storage medium for evaluating an extraction result of a piece of letter element information, where the method includes obtaining a first element set of machine-extracted letter element information and a second element set of manual-extracted letter element information; matching each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set; and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set. The method can achieve the effect of accurately evaluating the element information of the manually extracted letters.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for evaluating the extraction result of element information of a letter, which is characterized by comprising the following steps:

acquiring a first element set of machine-extracted letter element information and a second element set of manual-extracted letter element information;

matching each element information in the second element set with each corresponding element information in the first element set to obtain a score of each element information in the second element set;

and determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set.

2. The method of claim 1, wherein the letter element information includes at least one of the following element information:

3. The method as claimed in claim 2, wherein when the letter element information includes any one of the name element information, the certificate number element information, the mobile phone number element information, the multi-class element information, the multi-level address element information, and the multi-level multi-class element information, the matching each element information in the second element set with each corresponding element information in the first element set to obtain a score of each element information in the second element set comprises:

and when any element information in the second element set is different from the corresponding element information in the first element set, assigning zero score to any element information.

4. The method of claim 2, wherein when the letter element information is the letter profile element information, the matching of each element information in the second element set with each corresponding element information in the first element set to obtain the score of each element information in the second element set comprises:

matching the petition general information in the second element set with the corresponding petition general information in the first element set to obtain general similarity of the petition general information in the second element set and the corresponding petition general information in the first element set;

5. The method of claim 4, wherein matching the petition profile element information in the second element set with the corresponding petition profile element information in the first element set to obtain the profile similarity between the petition profile element information in the second element set and the corresponding petition profile element information in the first element set comprises:

converting the petition profile element information in the second element set and the corresponding petition profile element information in the first element set into vectors through a vector model to obtain a second petition profile element information vector and a first petition profile element information vector;

and calculating the cosine similarity of the second petition overview element information vector and the first petition overview element information vector to obtain the overview similarity of the petition overview element information in the second element set and the corresponding petition overview element information in the first element set.

6. The method according to any one of claims 1 to 5, wherein the determining the evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set comprises:

and weighting and summing the weight of each element in the second element set and the score of each element in the second element set to obtain the evaluation result of the manually extracted letter element information.

7. The method according to claim 6, wherein said assigning a weight to each element in said second set of elements according to a preset rule comprises:

distributing weight to the elements in a multiple increasing manner according to the sequence of the priority of each element information in the second element set from low to high;

or

Assigning a weight to each element in the second element set, the weight being decreased in multiple according to the order of the position of each element information from front to back;

or alternatively

And equally weighting each element information in the second element set to be assigned with a weight.

8. The method according to any one of claims 1 to 5, wherein before matching each element information in the second element set with each corresponding element information in the first element set to obtain a score of each element information in the second element set, the method further comprises:

determining a top-score for each element in the second set of elements based on the importance of each element in the second set of elements.

9. An apparatus for evaluating an extraction result of a letter element information, comprising:

the acquisition module is used for acquiring a first element set of element information of a machine-extracted letter and a second element set of element information of the manual-extracted letter;

and the determining module is used for determining an evaluation result of manually extracting the element information of the letter based on the score of each element information in the second element set.

10. An apparatus for evaluating an extraction result of a letter element information, comprising:

a memory and a processor, the memory storing computer readable instructions which, when executed by the processor, perform the steps of the method of any one of claims 1 to 8.

11. A computer-readable storage medium, comprising:

computer program, which, when run on a computer, causes the computer to carry out the method according to any one of claims 1-8.