CN108197110B

CN108197110B - Method, device and storage medium for acquiring and correcting names and jobs

Info

Publication number: CN108197110B
Application number: CN201810003630.2A
Authority: CN
Inventors: 安华
Original assignee: Beijing Funcun Kaiyuan Technology Development Co ltd
Current assignee: Beijing Funcun Kaiyuan Technology Development Co ltd
Priority date: 2018-01-03
Filing date: 2018-01-03
Publication date: 2021-07-27
Anticipated expiration: 2038-01-03
Also published as: CN108197110A

Abstract

A method, a device and a storage medium thereof for obtaining and checking names and jobs comprise the following steps: s1: acquiring a name and a position of the name in a text word; s2: judging whether the missing person name exists or not, and if the missing person name exists, recording the missing person name; s3: comparing the name obtained in the step S1 with the name obtained in the step S2, determining whether there is a duplicate name, discarding if so, or recording if not; comparing one by one to obtain a final name list; s4: carrying out error correction processing on the final name list; s5: and performing error correction processing on the post corresponding to the name of the person. By applying the method, the error editing of the names and the jobs in the text is corrected, the use accuracy of the names and the jobs in the text is improved, and the error condition is avoided; the machine is adopted to correct and correct the text, manual detection is replaced, and the work efficiency and the work accuracy of text checking are greatly improved.

Description

Method, device and storage medium for acquiring and correcting names and jobs

Technical Field

The invention relates to the field of text proofreading, in particular to a method and a device for acquiring and proofreading a name title and a storage medium thereof.

Background

With the rapid development of society, modern information technology is comprehensively infiltrated into various industry fields, the publishing industry obtains brand new development opportunities, the editing work is modernized, and the mode and the function of the checking work are greatly changed. In modern editing work, the proofreading work is still a crucial link. However, most of the existing proofreading work is still manually completed, a large amount of words are processed every day in the work of editing proofreading personnel, long-time manual word processing is inevitable and careless, some unexpected problems may remain in text words, for example, detection of person names and detection of jobs, manual detection often causes errors which cannot be completely found or corrected, and the errors sometimes have great adverse effects.

Therefore, in order to solve the above problems, the present patent invents a correction method, specifically, a method for acquiring and correcting names and jobs.

Disclosure of Invention

The invention aims to provide a method and a device for acquiring and correcting names and jobs and a computer readable storage medium thereof, which solve the technical problem that the existing manual detection is difficult to comprehensively correct the editing errors of the names and/or jobs in a text.

To solve the above problems, a first aspect of the present invention provides a method for name and post acquisition and correction, the method comprising the steps of:

s1: acquiring a name and a position of the name in a text word;

s2: judging whether the missing person name exists or not, and if the missing person name exists, recording the missing person name;

s3: comparing the name obtained in the step S1 with the name obtained in the step S2, determining whether there is a duplicate name, discarding if so, or recording if not; comparing one by one to obtain a final name list;

s4: carrying out error correction processing on the final name list;

s5: and performing error correction processing on the post corresponding to the name of the person.

Further, step S1 further includes the steps of:

s11: performing word segmentation processing on text characters, and setting symbols for phrases identified as names of people;

s12: and acquiring the name and the position of the name in the text characters through the set symbol, matching the acquired name with the name dictionary, and recording the successfully matched name.

Further, the formula for obtaining the name in the text is:

g (x) is an expression of the obtained name converted into pinyin, and is a fixed quantity, G (y)_i) Expressing the expression of the name in the dictionary after the name is converted into pinyin, N is the number in the name dictionary, i belongs to [1, N ∈]If G (x) and G (y)_i) The pinyin expressed is the same, then G (x)/G (y)_i) 1 is ═ 1; if G (x) and G (y)_i) If the Pinyin is different, G (x)/G (y)_i) 0; therefore, if the f value is not zero, the obtained participle is determined as the name of the person, and the formula for obtaining the position of the name of the person in the text is as follows:

T＝l(x)/L， (2)

where L (x) is the length of the text word from the beginning to the obtained participle, and L is the total length of the text.

Further, step S2 further includes the steps of:

s21: combining the adjacent word groups after word segmentation again in sequence until all the word segmentations are combined;

s22: judging whether the combined word number is less than a set threshold value, if so, performing step S23, otherwise, performing step S21;

s23: and comparing and matching the participles with the name dictionary, if the matching is successful, judging the participles as the missing person names and recording the person names, and then performing the step S21.

Further, step S4 further includes the steps of:

s41: correcting wrongly written characters of the acquired name;

s42: performing character missing or/and multi-character correction on the acquired name;

s43: sequencing and correcting the acquired names;

s44: and correcting the unmatched name and job title.

Further, step S41 further includes the steps of:

s411: comparing the acquired name with the name in the name dictionary to find out the name different from the dictionary;

s412: screening out the names with only wrong characters in the different names in the step S411;

s413: carrying out pinyin processing on the names with the wrong characters and the names in the name dictionary;

s414: and performing comparison and judgment, and correcting the name with the error word.

Further, step S42 further includes the steps of:

s421: screening out the missing character and/or multi-character names in the different names obtained in the step S411;

s422: the method comprises the steps that the positions of leaders in a cyclic position dictionary are circulated, and the last position in the positions of each leader is obtained;

s423: intercepting fields before and after the name in the text characters and with the same length as the last job in the leader job in the step S422;

s424: comparing the fields intercepted in the step S423 with the last one of the leader jobs in the step S422 to determine whether the fields are matched, if so, performing the step S425, and if not, performing the step S422 until the leader jobs of different names are compared;

s425: and correcting errors in the missing characters and/or the multiple characters of the names according to the names corresponding to the leader tasks in the task dictionary.

Further, step S43 further includes the steps of:

s431: comparing the sequence of the names in the text characters processed in the steps S41 and S42 with the sequence in the name dictionary;

s432: and if the sequence is not consistent with the sequence in the name dictionary, correcting to be consistent.

Further, step S44 further includes the steps of:

s441: acquiring characters between two names according to the names and the positions of the names;

s442: performing word segmentation processing on the characters;

s443: and comparing the fields after word segmentation with the jobs of each leader except the name in the job dictionary, and if the matched fields are displayed in the result obtained after comparison, indicating that the job names are matched wrongly and correcting errors.

Further, the error correction formula is:

S(T)＝s(t₁,t₂) (3)

F(T)＝f(S(T)) (4)

Z(X)＝f(s(n)) (5)

C＝∑Z(X)/F(T) (6)

wherein S (T) s (t)₁,t₂) Indicating words between captured person names, t₁、t₂F (t) ═ f (s (t)) indicates that the obtained characters are subjected to word segmentation, z (x) ═ f (s (n)) indicates that the leader role is subjected to word segmentation, C ∑ z (x)/f (t)) indicates that the obtained characters and the leader role in the role dictionary of the name are compared in order and then the same leader role is counted, and if the value of C is greater than a threshold value, it indicates that the role and the name are not matched; the threshold value is set as the leader of the nameMore than half of the number of jobs.

Further, step S5 further includes the steps of:

s51: correcting the post-name job;

s52: correcting the post before the name;

s53: sorting and correcting the jobs; comparing the sequence of the jobs processed in the steps S51 and S52 in the text characters with the sequence in the job dictionary, and if the sequence is not consistent with the sequence in the job dictionary, correcting to be consistent;

s54: and correcting the error of the misuse duty.

Further, step S51 further includes the steps of:

s511: acquiring the last job of the corresponding name in the job dictionary;

s512: a field with the name of the person in the text character being larger than the length of the last job in the step S511 is obtained;

s513: comparing the field obtained in step S512 with the last job obtained in step S511, and if the obtained field contains the job in step S511, determining that the job is wrong, and performing correction.

Further, step S52 further includes the steps of:

s521: acquiring characters between two names according to the names and the positions of the names;

s522: performing pause number processing on the field intercepted in the step S521;

s523: if the pause number contains a plurality of pause numbers, comparing characters between the pause numbers with each job in the corresponding job dictionary, recording the word number of the job which is the same as the job in the job dictionary, and if the word number of the same job exceeds a set threshold, determining that the job is wrong, and further carrying out error correction processing;

s524: if the field does not contain the pause number, intercepting the field of each job length again, comparing the intercepted field with the jobs in the job dictionary, and if the same word number exceeds a set threshold value, determining that the job has errors, and performing error correction processing.

Further, step S54 further includes the steps of:

s541: acquiring all jobs of the name in a field of a text character according to the name;

s542: comparing all the acquired jobs with the jobs of the name in the job dictionary, and checking whether redundant jobs exist;

s543: and if the redundant jobs exist in the step S542, respectively comparing the redundant jobs with the jobs of the rest of the leaders in the job dictionary, checking whether the same jobs exist, and if the same jobs exist, misusing the jobs and correcting the jobs.

In another aspect, the present invention provides a name and post acquisition and correction apparatus, including a memory and a processor, the memory having stored therein computer-executable instructions operable to perform the above-described method when executed by the processor.

Yet another aspect of the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions operable to perform the above-described method when executed by a computing device.

In summary, the present invention provides a method, an apparatus, and a computer readable storage medium for obtaining and checking names and jobs, which obtain names of people in text and places where jobs are edited incorrectly by comparing with names of people and job dictionaries, and perform error correction to obtain text including correct names and jobs.

The technical scheme of the invention has the following beneficial technical effects:

(1) error editing of names and jobs in the text is corrected, the use accuracy of the names and the jobs in the text is improved, and the error is avoided;

(2) the machine is adopted to correct and correct the text, manual detection is replaced, and the work efficiency and the work accuracy of text checking are greatly improved.

Drawings

FIG. 1 is a general flow chart of the present invention;

FIG. 2 is a flow chart of the present invention for obtaining a name and a location of the name;

FIG. 3 is a flow chart of the present invention for determining if there are missing names;

FIG. 4 is a flow chart of the present invention for determining whether there are duplicate names;

FIG. 5 is a general flow chart of the present invention for error correction processing of acquired names of people;

FIG. 6 is a flow chart of the present invention for correcting wrongly written characters of an acquired name;

FIG. 7 is a flow chart of the present invention for word missing or/and multi-word correction of the acquired name;

FIG. 8 is a flow chart of the present invention for performing a ranking correction on an acquired name;

FIG. 9 is a flow chart of the present invention for correcting a mismatch between a person's name and a job title;

FIG. 10 is a general flow chart of the error correction process for a job of the present invention;

FIG. 11 is a flow chart of the present invention for correcting post-name post jobs;

FIG. 12 is a flow chart of the present invention for correcting pre-name post work;

FIG. 13 is a flow chart of the present invention for ranking and correcting jobs;

FIG. 14 is a flow chart of the present invention for correcting misused jobs.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

Referring to fig. 1, fig. 1 is a general flowchart of a name and post acquisition and collation method disclosed in the present invention, the method comprising the steps of:

s1: acquiring a name and a position of the name in a text word;

s4: carrying out error correction processing on the final name list;

Generally, each text word processed refers to a complete sentence marked by a truncation in the text.

Wherein step S1: acquiring the name and the position of the name in the text, further comprising, S11: performing word segmentation processing on text characters, and setting symbols for phrases identified as names of people; s12: and acquiring the name and the position of the name in the text characters through the set symbol, matching the acquired name with a name dictionary, and recording the successfully matched name. The input text words are segmented by calling a segmentation port, and after segmentation, the input sentence is marked with the following parts of speech,/nr,/a,/m,/ns,/p,/q and other different parts of speech, wherein/nr represents the name of a person. Where word segmentation is the finding of a word in an input sentence by a dictionary. For example: inputting a sentence "teaching Zhang III", after word segmentation: professor/n three/nr. After word segmentation, the names of the people in the text characters and the positions of the names of the people are obtained through set symbols. The formula for obtaining the name in the text is:

the overall expression of the above formula is to handle one person name. G (x) is an expression of the obtained name converted into pinyin, and is a fixed quantity, G (y)_i) Expressing the expression of the name in the dictionary after the name is converted into pinyin, N is the number in the name dictionary, i belongs to [1, N ∈]If G (x) and G (y)_i) The pinyin expressed is the same, then G (x)/G (y)_i) 1 is ═ 1; if it is notG (x) and G (y)_i) If the Pinyin is different, G (x)/G (y)_i) 0; therefore, f is not less than 0; if the f value is not zero, the obtained participle is determined to be the name of the person, and the formula for obtaining the position of the name of the person in the text is as follows:

T＝l(x)/L， (2)

Step S2: judging whether missing names exist or not; the step further includes, S21: combining the adjacent word groups after word segmentation again in sequence until all the word segmentations are combined; s22: judging whether the combined word number is less than a set threshold value, if so, performing step S23, otherwise, performing step S21; s23: and comparing and matching with the name dictionary, if the matching is successful, judging the name as the missing name and recording the name, and then performing the step S21. And recombining the word groups after word segmentation to judge whether missing names exist, judging the word number of the word groups after word segmentation in the combination process, and setting a threshold value, wherein the threshold value is generally 2, 3 or 4, because for Chinese people, the names are generally 2 characters, 3 characters or 4 characters. If the number of words is less than the set threshold value, comparing and matching with the name dictionary, if the matching is successful, judging the name as the missing name and recording the name, then continuing to combine the next adjacent phrase again, and repeating the steps. Until all the missing names are found out and recorded. Take a specific example: the original text is "the meeting is attended by leaders such as the King of Lijian, hongwei and Youyu, etc. Assume that after word segmentation: this \ meeting \ is built by \ li nations \ Wangwei \ and \ you \ rights \ etc \ leader \ participated. The word segmentation leads the person who is the right to go to have no correct segmentation, so the word belongs to the missed name. Combining the phrases two by two or three by three, wherein the threshold value is assumed to be 3, the word number is larger than 3 after the combination of the current meeting, the judgment is made by the word number being equal to 3, whether the name of the ' meeting is equal to 3 or not is checked, the analogy is repeated until the ' more right ' is equal to 2 and is smaller than the set threshold value 3, whether the name of the ' more right ' is present in the name dictionary or not is checked, and the missing name is recorded if the name is present. If the leader has no dominant name in the name dictionary, the number of the combined dominant word is 2 less than 3, and further combining the ' dominant word and the like ', judging whether the ' dominant word and the like ' exists in the dictionary, if the dictionary has no ' dominant word and the number of the combined word is a set threshold value, the combination with the next word is necessarily more than 3, so that the next cycle is performed till then, and the combination judgment is performed from the next word group ' weight '.

Step S3: judging whether a repeated name exists or not; the step further includes, S31: and comparing the name obtained in the step S1 with the name obtained in the step S2, judging whether a repeated name exists, if so, discarding, and otherwise, recording. S32: the name of the final owner in the text of the text is formed.

Step S4: carrying out error correction processing on the acquired name; step S4 further includes the steps of: s41: correcting wrongly written characters of the acquired name; step S41 further includes the following step, S411: comparing the acquired name with the name in the name dictionary to find out the name different from the dictionary; s412: screening out the names with only wrong characters in the different names in the step S411; s413: carrying out pinyin processing on the names with the wrong characters and the names in the name dictionary; s414: and performing comparison and judgment, and correcting the name with the error word. S42: performing character missing or/and multi-character correction on the acquired name; step S42 further includes the following step, S421: screening out the missing character and/or multi-character names in the different names obtained in the step S411; s422: the positions of the leaders in the circular position dictionary and the last position in the positions of each leader are obtained (one leader has a plurality of positions, and the last position is generally the total position or the common position); s423: intercepting fields before and after the name in the text characters and with the same length as the last job in the leader job in the step S422; s424: comparing the fields intercepted in the step S423 with the last one of the leader jobs in the step S422 to determine whether the fields are matched, if so, performing the step S425, and if not, performing the step S422 until the leader jobs of different names are compared; s425: and correcting errors in the missing characters and/or the multiple characters of the names according to the names corresponding to the leader tasks in the task dictionary. S43: sequencing and correcting the acquired names; step S43 further includes the following step, S431: comparing the sequence of the names in the text characters processed in the steps S41 and S42 with the sequence in the name dictionary; s432: if the sequence is not consistent with the sequence in the name dictionary, the sequence is corrected to be consistent, and the step is mainly used for correcting under the condition that the sequence of the names has requirements. S44: correcting the mismatch between the name and the post, the step S44 further includes the following steps, S441: acquiring characters between two names according to the names and the positions of the names; s442: performing word segmentation processing on the characters; s443: and respectively comparing the divided fields with each leader role except the name in the role dictionary after word division, and if the matched fields are displayed in the result obtained after comparison, indicating that the roles of the name in the text characters misuse other leader roles, and performing error correction for the wrong role name matching.

Preferably, the error correction formula is s (t)₁,t₂)

F(T)＝f(S(T))

Z(X)＝f(s(n))

C＝∑Z(X)/F(T)

Wherein S (T) s (t)₁,t₂) Text t between names of persons₁、t₂F (t) (s (t)) represents the result of word segmentation processing performed on the obtained characters, z (x) (s (n)) represents the result of word segmentation processing performed on the leader role, C ∑ z (x)/f (t) represents the number of the same leader role after comparing the obtained word segments with the leader role segments of the leader role in the role dictionary for the name in order, and if the value of C is greater than a threshold value, it indicates that the role and the name do not match; the threshold is set to be more than half of the number of the leaders and the jobs of the name.

S5: and carrying out error correction processing on the jobs. And when the job is judged to have errors, the following steps are carried out, and if the job has no errors, the error correction processing is not carried out.

The step S5 further includes the steps, S51: correcting the post-name job; and the step S51 further includes, S511: acquiring the last job of the corresponding name in the job dictionary; s512: a field with the name of the person in the text character being larger than the length of the last job in the step S511 is obtained; s513: comparing the field obtained in step S512 with the last job obtained in step S511, and if the obtained field contains the job in step S511, determining that the job is wrong, and performing correction. For example, the last job corresponding to the name of the person is "professor", which is two words, and the field obtained by intercepting the name of the person in the text is assumed to be three words "secondary professor", which contains the "professor" job as compared with "professor", but the job is incorrect and needs to be corrected.

S52: correcting the post before the name; and step S52 further includes, S521: acquiring characters between two names according to the names and the positions of the names; s522: performing pause number processing on the field intercepted in the step S521; s523: if a plurality of pause numbers are contained, characters between the pause numbers are compared with each job in the corresponding job dictionary, the word number of the job identical to that of the job in the job dictionary is recorded, if the word number of the identical job exceeds a set threshold value, the job is considered to be wrong, and error correction processing is further carried out. For example: under the condition of normal writing thinking and multiple jobs in the title of one person in a sentence, all job rooms are distinguished by pause signs, such as party group bookings, sunshine elementary school length, moral education group length, junior and junior grade senior main and task li hong. Four small sentences can be obtained after the pause number processing, namely 1 party of book-keeping, 2 sunshine secondary school chief and chief students, 3 moral education group chief and 4 junior, middle and first grade and senior level main and officers. In the job dictionary, the task of lissajous is "party group book keeping, sunshine primary school chief and senior group education group chief and senior level principal and senior level master Lihong", and errors such as the sunshine primary school chief and subordinate chief and senior level in the sunshine primary and senior level chief and senior level can correct errors. After pause number processing, 4 sentences are obtained, each sentence is compared with each job corresponding to plum red in the job dictionary, and the word number of each small sentence which is the same as that of the job can be obtained, for example: the party group notes that the same word number is 4 is the same as that of the sentence 1, no error is considered, the word number of the task 'sunshine primary school length' corresponding to the 'sunshine primary school secondary school length' is 6, the word number of the original sentence 'sunshine primary school secondary school length' is 7 and is not equal to the corresponding work word number, and the sentence 'sunshine primary school secondary school length' is considered to be error if the same word number 6 of the party group notes is greater than a set threshold (assumed to be 4). The threshold is set to determine whether or not jobs with the same number of words are the same, and for example, for a good department and a good school student, the same number of words is 5, but the same number of words is 1 and smaller than the threshold (assuming 3), even if the same number of words, jobs are considered to be different.

S53: sorting and correcting the jobs; step S53 further includes, S531: the sequence of the jobs processed through steps S51 and S52 appearing in the text characters is compared with the sequence in the job dictionary. S532: and if the sequence is inconsistent with the sequence in the job dictionary, correcting to be consistent.

S54: and correcting the error of the misuse duty. Step S54 further includes, S541: acquiring all jobs of the name in the field according to the name; s542: comparing all the acquired jobs with the jobs of the name in the job dictionary, and checking whether redundant jobs exist; s543: and if the redundant jobs exist in the step S542, respectively comparing the redundant jobs with the jobs of the rest of the leaders in the job dictionary, checking whether the same jobs exist, and if the same jobs exist, misusing the jobs and correcting the jobs.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A method for name and post acquisition and proofing, the method comprising the steps of:

s1: acquiring a name and a position of the name in a text word;

s4: carrying out error correction processing on the final name list;

s5: carrying out error correction processing on the jobs corresponding to the names of the persons;

step S1 further includes the steps of:

s12: acquiring the names and the positions of the names in the text characters through the set symbols, matching the acquired names with a name dictionary, and recording the names which are successfully matched;

the formula for obtaining the name in the text is:

g (x) is an expression of the obtained name converted into pinyin, and is a fixed quantity, G (y)_i) Expressing the expression of the name in the dictionary after the name is converted into pinyin, N is the number in the name dictionary, i belongs to [1, N ∈]If G (x) and G (y)_i) The pinyin expressed is the same, then G (x)/G (y)_i) 1 is ═ 1; if G (x) and G (y)_i) If the Pinyin is different, G (x)/G (y)_i) 0; therefore, if the f value is not zero, the obtained participle is determined to be the name of the person;

the formula for obtaining the position of the name in the text is as follows:

T＝l(x)/L， (2)

2. The method of claim 1, wherein step S2 further comprises the steps of:

3. The method of claim 1, wherein step S4 further comprises the steps of:

s41: correcting wrongly written characters of the acquired name;

s43: sequencing and correcting the acquired names;

s44: and correcting the unmatched name and job title.

4. The method of claim 1, wherein step S41 further comprises the steps of:

5. The method according to any one of claims 3 to 4, wherein the step S42 further comprises the steps of:

6. The method of claim 3, wherein the step S43 further comprises the steps of:

7. The method of claim 3, wherein the step S44 further comprises the steps of:

s442: performing word segmentation processing on the characters;

8. The method of claim 7, wherein the error correction formula is:

S(T)＝s(t₁,t₂) (3)

F(T)＝f(S(T)) (4)

Z(X)＝f(s(n)) (5)

C＝∑Z(X)/F(T) (6)

wherein S (T) s (t)₁,t₂) Indicating words between captured person names, t₁、t₂F (t) (s (t)) represents the result of word segmentation processing for the obtained characters, z (x) (s (n)) represents the result of word segmentation processing for the leader role, C ∑ z (x)/f (t) represents the number of the same leader role after comparing the obtained word segmentation with the word segmentation of the leader role in the role dictionary for the name in order, and if the value of C is greater than a threshold value, it indicates that the role and the name do not match; the threshold is set to be more than half of the number of the leaders and the jobs of the name.

9. The method of claim 1, wherein step S5 further comprises the steps of:

s51: correcting the post-name job;

s52: correcting the post before the name;

s54: and correcting the error of the misuse duty.

10. The method of claim 9, wherein step S51 further comprises the steps of:

s511: acquiring the last job of the corresponding name in the job dictionary;

11. The method of claim 10, wherein step S52 further comprises the steps of:

12. The method of claim 9, wherein step S54 further comprises the steps of:

13. An apparatus for name and job acquisition and proofing comprising a memory and a processor, the memory having stored therein computer-executable instructions operable to perform the method of any one of claims 1-12 when executed by the processor.

14. A computer-readable storage medium having stored thereon computer-executable instructions operable, when executed by a computing device, to perform the method of any of claims 1-12.