CN113673238B - Word segmentation correction method and system based on hypernym, electronic device and storage medium - Google Patents

Word segmentation correction method and system based on hypernym, electronic device and storage medium Download PDF

Info

Publication number
CN113673238B
CN113673238B CN202111237607.8A CN202111237607A CN113673238B CN 113673238 B CN113673238 B CN 113673238B CN 202111237607 A CN202111237607 A CN 202111237607A CN 113673238 B CN113673238 B CN 113673238B
Authority
CN
China
Prior art keywords
word segmentation
subject
hypernym
segmentation result
verb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111237607.8A
Other languages
Chinese (zh)
Other versions
CN113673238A (en
Inventor
赵鹏阳
杨红飞
程东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huoshi Creation Technology Co ltd
Original Assignee
Hangzhou Firestone Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Firestone Technology Co ltd filed Critical Hangzhou Firestone Technology Co ltd
Priority to CN202111237607.8A priority Critical patent/CN113673238B/en
Publication of CN113673238A publication Critical patent/CN113673238A/en
Application granted granted Critical
Publication of CN113673238B publication Critical patent/CN113673238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to a word segmentation correction method, a word segmentation correction system, an electronic device and a storage medium based on hypernyms, wherein the word segmentation result of a target text is obtained through a word segmentation tool, and comprises a plurality of words output by the word segmentation tool and corresponding parts of speech; obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object; obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects; and judging whether the final hypernym of the subject is the same as the subject hypernym constraint of the verb, if so, determining that the word segmentation result is correct, otherwise, determining that the word segmentation result is incorrect, and re-segmenting the target text under the condition that the word segmentation result is incorrect, so that the word segmentation accuracy is improved.

Description

Word segmentation correction method and system based on hypernym, electronic device and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a hypernym-based segmentation correction method, system, electronic device, and storage medium.
Background
With the continuous development of computer technology, word segmentation technology has been widely applied in the fields of search engines, machine translation, speech synthesis, automatic summarization, etc. Chinese word segmentation refers to the segmentation of a chinese character sequence into a single word. The Chinese word segmentation is the basis of text mining, and for a section of input Chinese, the Chinese word segmentation is successfully carried out, so that the effect of automatically identifying the meaning of a sentence by a computer can be achieved. In practical application, due to the ambiguity of Chinese, after a sentence or a segment of characters is segmented according to the word segmentation technology, the obtained word segmentation result may have the problem of wrong word segmentation boundary. In the related art, after word segmentation is performed by a word segmentation tool, whether a word segmentation result has errors or not cannot be judged, and then the word segmentation result cannot be corrected, so that the word segmentation accuracy is low.
At present, no effective solution is provided for the problems that in the related art, after word segmentation is carried out through a word segmentation tool, whether a word segmentation result has errors or not cannot be judged, and the word segmentation accuracy rate is low.
Disclosure of Invention
The embodiment of the application provides a word segmentation correction method, a word segmentation correction system, an electronic device and a storage medium based on hypernyms, and aims to at least solve the problems that in the related art, after word segmentation is carried out through a word segmentation tool, whether a word segmentation result has errors or not cannot be judged, and the word segmentation accuracy is low.
In a first aspect, an embodiment of the present application provides a hypernym-based word segmentation correction method, where the method includes:
obtaining a word segmentation result of a word segmentation tool on a target text, wherein the word segmentation result comprises a plurality of words output by the word segmentation tool and corresponding parts of speech;
obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object;
obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects;
and judging whether the final hypernym of the subject is the same as the subject hypernym constraint of the verb, if so, judging that the word segmentation result is correct, otherwise, judging that the word segmentation result is incorrect, and if not, re-segmenting the target text.
In some embodiments, in a case that the segmentation result is incorrect, re-segmenting the target text includes:
acquiring a full word segmentation result of a target text by a full word segmentation mode of a word segmentation tool;
removing verbs and words behind the verbs in the full word segmentation result to obtain a first full word segmentation result, removing words in front of the verbs in the word segmentation result in the first full word segmentation result to obtain a second full word segmentation result, and removing words which can be spliced into a subject of the word segmentation result in the second full word segmentation result to obtain a third full word segmentation result;
and in the target text, acquiring a word between the last word and the verb in the third full word segmentation result as a new subject, and acquiring a corrected word segmentation result according to the third full word segmentation result, the new subject and the word after the verb and the verb in the full word segmentation result.
In some embodiments, obtaining the subject in the target text according to the word segmentation result, and obtaining the final hypernym of the subject includes:
obtaining English words of the subject;
and querying a dictionary, acquiring a subject interpretation list of English words of the subject, and acquiring the final hypernym of the subject according to the subject interpretation list.
In some embodiments, obtaining the final hypernym of the subject from the subject interpretation list comprises:
checking whether an interpretation sentence exists in the subject interpretation list, if so, the final hypernym of the subject is a thing;
if the meaning does not exist, taking a word behind the "is a" in the explanation sentence as a hypernym, and acquiring a subject explanation list of the hypernym;
and repeating the last execution step according to the subject interpretation list of the hypernyms until the final hypernym of the subject is obtained, wherein the last execution step is stopped when the repetition times reach a threshold value, and the final hypernym of the subject is a person.
In some embodiments, obtaining a verb in the target text according to the word segmentation result, and obtaining a hypernym constraint of the subject of the verb includes:
obtaining English words of the verbs;
and querying a dictionary, acquiring a verb explanation list of the English word of the verb, and acquiring the subject hypernym constraint of the verb according to the verb explanation list.
In some embodiments, obtaining the subject hypernym constraint of the verb from the verb interpretation list comprises:
in the verb interpretation list, whether the number of 'If you' or 'If one person' in the interpretation sentence exceeds a preset value is checked, If yes, the subject hypernym of the verb is constrained to be a person, and If not, the subject hypernym of the verb is constrained to be a substance.
In some of these embodiments, obtaining the English word or the list of interpretations of the English word comprises:
splicing the jump address of the dictionary and the word to be inquired into the URL address to be inquired, calling an http request according to the URL address, and obtaining an explanation list of the English word or the English word, wherein the explanation list comprises a subject explanation list and a verb explanation list.
In a second aspect, an embodiment of the present application provides a hypernym-based segmentation correction system, which includes an obtaining module and a determining module,
the acquisition module is used for acquiring word segmentation results of the word segmentation tool on the target text, wherein the word segmentation results comprise a plurality of words output by the word segmentation tool and corresponding parts of speech;
obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object;
obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects;
the judging module is used for judging whether the final hypernym of the subject is the same as the restriction of the hypernym of the subject of the verb, if so, the word segmentation result is correct, and if not, the word segmentation result is incorrect.
In a third aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the hypernym-based word segmentation correction method according to the first aspect.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the program is executed by a processor to implement the hypernym-based word segmentation correction method according to the first aspect.
Compared with the related art, the hypernym-based word segmentation correction method provided by the embodiment of the application obtains the word segmentation result of the word segmentation tool on the target text, wherein the word segmentation result comprises a plurality of words output by the word segmentation tool and corresponding parts of speech; obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object; obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects; and judging whether the final hypernym of the subject is the same as the restriction of the hypernym of the verb, if so, judging that the word segmentation result is correct, otherwise, judging that the word segmentation result is incorrect, and segmenting the target text again under the condition that the word segmentation result is incorrect, so that the word segmentation accuracy is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of a hypernym-based segmentation correction method according to an embodiment of the present application;
FIG. 2 is a flowchart of another hypernym-based segmentation correction method according to an embodiment of the present application;
fig. 3 is a block diagram of a hypernym-based participle correction system according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that such a development effort might be complex and tedious, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure, given the benefit of this disclosure, without departing from the scope of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present embodiment provides a hypernym-based segmentation correction method, and fig. 1 is a flowchart of a hypernym-based segmentation correction method according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, obtaining a word segmentation result of a word segmentation tool on a target text, wherein the word segmentation result comprises a plurality of words and corresponding parts of speech output by the word segmentation tool; the word segmentation tools comprise a Chinese language processing kit (HanLP), a Chinese language processing kit (FoolNLTK) and the like, the jieba supports three word segmentation modes, sentences are cut accurately in the accurate mode, and the word segmentation tools are suitable for text analysis; the full word segmentation mode scans all words which can be formed into words in the sentence, so that the speed is very high; the search engine mode is used for segmenting long words again on the basis of the accurate mode, and the recall rate is improved.
In this embodiment, the target text may be participled using the accurate mode of jieba, and the target text is "Nanjing Yangtze river bridge joins the conference", for example, the participle result is [ pair ('Nanjing City', 'ns'), pair ('Yangtze river bridge', 'ns'), pair ('join', 'v'), pair ('ch', 'ul'), pair ('this time', 'r'), pair ('conference', 'n') ], where each pair is a word and its part of speech, and the first word labeled v is regarded as a verb, and if the word is one of 'n', 'f','s','t', 'nr', 'ns', 'nt', 'nw', and 'nz', the first noun before the verb is regarded as a verb.
Step S102, obtaining a subject in a target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object; for the sake of understanding, continuing the above example, the "Yangtze river bridge" is the subject of the target text, and since the subject "Yangtze river bridge" is the object, the final hypernym of the subject "Yangtze river bridge" is the object.
Step S103, obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects; for the sake of easy understanding, continuing the above example, the verb "join" is a target text, and since the subject of "join" can only be a person, the hypernym of the subject of the verb "join" is restricted to a person.
And step S104, judging whether the final hypernym of the subject is the same as the hypernym constraint of the subject of the verb, if so, determining that the word segmentation result is correct, otherwise, determining that the word segmentation result is incorrect, and if not, re-segmenting the target text. For ease of understanding, continuing the above example, the final hypernym "thing" of the subject is not the same as the subject hypernym constraint "person" of the verb, so the word segmentation for the target text is incorrect.
Under the condition that the word segmentation result is incorrect, the word segmentation can be carried out on the target text again through other word segmentation tools, the dictionary of the current word segmentation tool can be expanded, the word segmentation can be carried out on the target text again through the current word segmentation tool, after the word segmentation is carried out on the target text again, the word segmentation result after the word segmentation is carried out again can be judged to be incorrect through the steps until the correct word segmentation result is obtained.
Illustratively, the target text is re-participled, and the obtained participle result is [ pair ('Nanjing', 'ns'), pair ('city length', 'ns'), pair ('river bridge', 'ns'), pair ('join', 'v'), pair ('included', 'ul'), pair ('this time', 'r'), pair ('conference', 'n') ], if the subject in the target text is "river bridge", the final hypernym of the subject is human, the verb is "join", the subject hypernym constraint of the verb is also human, and the final hypernym of the subject is the same as the subject hypernym constraint of the verb, the participle result is correct.
Compared with the related art, after the word segmentation result is obtained through the word segmentation tool, whether the word segmentation result is wrong or not cannot be judged, and then the word segmentation result cannot be corrected, so that the word segmentation accuracy is low.
In some embodiments, fig. 2 is a flowchart of another hypernym-based segmentation correction method according to an embodiment of the present application, and as shown in fig. 2, when a segmentation result is incorrect, re-segmenting a target text includes the following steps:
step S201, acquiring a full word segmentation result of a target text by a full word segmentation mode of a word segmentation tool; for example, the full word segmentation mode of the jieba word segmentation tool is continued to perform word segmentation on the target text "the Yangjiang river bridge in Nanjing City participates in the conference" to obtain a full word segmentation result: ' Nanjing ', ' Nanjing City ', ' City growth ', ' Yangtze river bridge ', ' bridge ', participation ', ' this time ', ' meeting '.
Step S202, removing verbs and words behind the verbs in the full word segmentation result to obtain a first full word segmentation result, removing words in front of the verbs in the word segmentation result in the first full word segmentation result to obtain a second full word segmentation result, and removing words which can be spliced into a subject of the word segmentation result in the second full word segmentation result to obtain a third full word segmentation result; continuing the above example, removing the verb "join" and the words following the verb, the first full participle result is obtained as: ' Nanjing ', ' Nanjing city ', ' City ' ', ' Changjiang river bridge ', and the words before the verb in the segmentation result are ' Nanjing city ' and ' Changjiang river bridge ', then the second full segmentation result is: 'Nanjing', 'city long', 'Yangtze river', 'bridge', removing words which can be spliced as a word segmentation result subject, and obtaining a third full word segmentation result as follows: 'Nanjing' and 'Juchang'.
Step S203, in the target text, obtaining a word between the last word and the verb in the third full word segmentation result as a new subject, and obtaining a corrected word segmentation result according to the third full word segmentation result, the new subject and the word after the verb and the verb in the full word segmentation result. Continuing the above example, in the target text "the south jing city changjiang river bridge participates in the conference", the word between the last word "the city long" and the verb "participates in" in the third full participle result is "the river bridge", that is, "the river bridge" is used as a new subject, then the third full participle result 'south jing', 'the city long', the new subject 'the river bridge', and the word after the verb and verb in the full participle result 'participate', 'this time', 'the conference' constitute the final participle result, that is, the modified participle result is: 'Nanjing', 'City Chang', 'Jiangtao', 'Party in', 'Up', 'this', 'conference'.
In the related technology, the word segmentation principle of the word segmentation tool comprises a dictionary-based word segmentation algorithm and a statistic-based machine learning algorithm, the dictionary-based word segmentation algorithm is used for matching a character string to be matched with a word in an established 'sufficiently large' dictionary according to a certain strategy, if a certain entry is found, the matching is successful, the word is recognized, but the recognition is difficult for ambiguous words and words which are not included in the dictionary, so that the word segmentation result may have errors; the basic idea of the machine learning algorithm based on statistics is to perform labeling training on Chinese characters, not only the occurrence frequency of words is considered, but also the context is considered, and the machine learning algorithm has better learning capability, so that the recognition of ambiguous words and unknown words has a good effect, but the number of manually labeled words is small, the training of a word segmentation model is time-consuming and labor-consuming, and the quality of word segmentation is difficult to ensure.
In some embodiments, obtaining the subject in the target text according to the word segmentation result, and obtaining the final hypernym of the subject includes:
obtaining English words of a subject;
and querying the dictionary, acquiring a subject interpretation list of English words of the subject, and acquiring the final hypernym of the subject according to the subject interpretation list.
In this embodiment, the subject may be translated into english words via a dictionary, and if there are multiple english words, the first english word is taken, when the subject is translated into a phrase, the last english word of the phrase is taken, if the first letter of the english word is capitalized, the capital is changed to lowercase, and illustratively, the target text is "the changjiang river bridge in Nanjing city participated in the current meeting", the participle results are [ pair ('Nanjing city', 'ns'), pair ('changjiang river bridge', 'ns'), pair ('attended', 'v'), pair ('owned', 'ul'), pair ('this', 'r'), pair ('meeting', 'n') ], the subject "changjiang River Bridge" is translated into "Yangtze River Bridge", the obtained english word of the subject is "Bridge", in the cholens dictionary, querying "bridge" yields a list of subject interpretations about "bridge" as:
1. N-COUNT A bridge a structure that is a build over a road, river, or road so that the road or the vehicle can cross from one side to the other;
2. N-COUNT A bridge between two places of land of join or contacts the.
According to the subject interpretation list, the final hypernym of the subject can be known as the object.
Optionally, obtaining the final hypernym of the subject according to the subject interpretation list includes: in the subject interpretation list, checking whether an interpretation sentence exists in the "is sensing", if so, taking the final hypernym of the subject as an object;
if the meaning does not exist, taking a word behind the "is a" in the explanation sentence as a hypernym, and acquiring a subject explanation list of the hypernym;
and repeating the last execution step according to the subject interpretation list of the hypernyms until the final hypernym of the subject is obtained, wherein the last execution step is stopped under the condition that the repetition times reach a threshold value, and the final hypernym of the subject is human.
For example, the subject interpretation list does not have "is" so that the word "structure" after "is a" in the first interpretation sentence is taken as the superior word, and the superior word of the superior word is searched by the method, that is, the subject interpretation list of "structure" is obtained:
1. N-VAR The structure of sensing is The way in The ground it is male, build, or organized;
2. N-COUNT A structure is a positioning of the connected together of the structures in an ordered way, and so on.
At this time, if the second interpretation sentence in the subject interpretation list has "is sensing", it is indicated that the hypernym of the "structure" is the object, that is, the final hypernym of the subject is the object, and if the threshold is set to 10 times, in the case that the repetition number reaches 10 times, that is, the final hypernym of the subject cannot be obtained when the number of times of obtaining the subject interpretation list reaches 10 times, it is not necessary to search the hypernym of the hypernym, and it is obtained that the final hypernym of the subject is the person.
In some embodiments, obtaining the verb in the target text according to the word segmentation result, and obtaining the subject hypernym constraint of the verb comprises:
obtaining English words of verbs;
and inquiring the dictionary, acquiring a verb explanation list of the English word of the verb, and acquiring the subject hypernym constraint of the verb according to the verb explanation list.
In this embodiment, the verb may be translated into an english word through a dictionary with a track, and when there are a plurality of translated english words, the first english word may be taken, and the first english word continuing the above-mentioned example, the verb "join" is "join", and for the translated verb, the explanation of the verb in the choleris dictionary, that is, the explanation under the query mark "V-T" or "V-I", is queried, for example, the verb explanation list related to "join" is obtained as:
1. V-T If one ketone soap conjugates antenna, the y move or go to the same place, for example, so that the so that bed of the m can do so to come together with …;
2. V-T If you join an organization, you beacon a member of an animal start word as an employee of it.;
3. V-T/V-I If you join an activity of the needle area, you take part in the needle area or inflated with it.;
4. V-T If you join a line, you stand at the end of it so that you are part of it.;
5. V-T To join two threads means To attach or fasten the together.
6. V-T If sensing sub as a line or path joints two threads, it connections the.
According to the verb interpretation list, the position word constraint person in the subject of the verb can be known.
Optionally, obtaining the subject hypernym constraint of the verb according to the verb interpretation list includes: in the verb interpretation list, whether the number of the 'If you' or 'If one person' in the interpretation sentence exceeds a preset value is checked, If so, the subject hypernym of the verb is restricted to be a person, and If not, the subject hypernym of the verb is restricted to be a substance. In this embodiment, the preset value may be set according to the number of interpretive sentences in the verb interpretive list, for example, the preset value may be set to be half of the number of interpretive sentences in the verb interpretive list, and continuing to the above, for example, there are 6 interpretive sentences marked as "V-T" or "V-I" in the verb interpretive list, where there are 4 interpretive sentences in which "If you" or "If one person" exists, that is, the number of interpretive sentences in which "If you" or "If one person" exists exceeds the preset value, and then the subject hypernym of the verb is restricted to a person.
Optionally, the verb is translated into an english word by using a dictionary, when there are a plurality of translated english words, the first two english words may be taken, for example, the first two english words of the verb "join" are "join" and "attribute", a verb interpretation list of the "join" and "attribute" is obtained, respectively, and when the number of "If you" or "If one person" existing in the interpretation sentence in the verb interpretation list of the "join" exceeds a preset value, or the number of "If you" or "If one person" existing in the interpretation sentence in the verb interpretation list of the "attribute" exceeds a preset value, the subject upper word of the verb is restricted to a person.
In some of these embodiments, obtaining the English word or the list of interpretations of the English word comprises: splicing the jump address of the dictionary and the word to be inquired into the URL address to be inquired, calling an http request according to the URL address, and obtaining an explanation list of the English word or the English word, wherein the explanation list comprises a subject explanation list and a verb explanation list.
In this embodiment, the obtained english word, the subject interpretation list, and the verb interpretation list may be automatically obtained by the computer, for example, if the jump address of the dictionary is "https:// fact.you.com/w/", and the query word is "join", the "https:// fact.you.com/w/" and "join" are spliced into the URL address to be queried, an http request is called according to the URL address, and the content of each < a class = "search-js" > tag is queried in a response body returned by the http request, that is, the translated english word; when a subject interpretation list of 'bridge' is to be acquired, splicing 'https:// dit.you.com/w/' and 'bridge' into a URL address to be queried, calling an http request by using the URL address query, and querying all < p > sub-tags of < div class = 'collins Majortrans' > tags in a response body returned by the http request, wherein the content of each < p > sub-tag forms the subject interpretation list of 'bridge', and the content of each < p > sub-tag is an interpretation sentence.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The present embodiment further provides a hypernym-based word segmentation correction system, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system is omitted here. As used below, the terms "module," "unit," "sub-unit," and the like may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a structure of a hypernym-based word segmentation correction system according to an embodiment of the present application, and as shown in fig. 3, the system includes an obtaining module 31 and a determining module 32, where the obtaining module 31 is configured to obtain a word segmentation result of a word segmentation tool for a target text, where the word segmentation result includes a plurality of words output by the word segmentation tool and corresponding parts of speech; obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object; obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects; the judging module 32 is configured to judge whether the final hypernym of the subject is the same as the subject hypernym constraint of the verb, if so, the word segmentation result is correct, if not, the word segmentation result is incorrect, whether the word segmentation result is wrong is judged, and under the condition that the word segmentation result is wrong, the word segmentation is performed on the target text again until a correct word segmentation result is obtained, so that the word segmentation accuracy is improved.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the hypernym-based word segmentation correction method in the above embodiment, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the above-described embodiments of hypernym-based segmentation correction methods.
In one embodiment, a computer device is provided, which may be a terminal. The computer device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a hypernym-based word segmentation correction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A word segmentation correction method based on hypernyms is characterized by comprising the following steps:
obtaining a word segmentation result of a word segmentation tool on a target text, wherein the word segmentation result comprises a plurality of words output by the word segmentation tool and corresponding parts of speech, and the word segmentation result is obtained by segmenting the target text through an accurate mode of the word segmentation tool;
obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object;
obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects;
judging whether the final hypernym of the subject is the same as the subject hypernym constraint of the verb, if so, the word segmentation result is correct, if not, the word segmentation result is incorrect, and if not, re-segmenting the target text, wherein if the word segmentation result is incorrect, re-segmenting the target text comprises the following steps:
acquiring a full word segmentation result of a target text by a full word segmentation mode of a word segmentation tool;
removing verbs and words behind the verbs in the full word segmentation result to obtain a first full word segmentation result, removing words in front of the verbs in the word segmentation result in the first full word segmentation result to obtain a second full word segmentation result, and removing words which can be spliced into a subject of the word segmentation result in the second full word segmentation result to obtain a third full word segmentation result;
and in the target text, acquiring a word between the last word and the verb in the third full word segmentation result as a new subject, and acquiring a corrected word segmentation result according to the third full word segmentation result, the new subject and the word after the verb and the verb in the full word segmentation result.
2. The method of claim 1, wherein obtaining the subject in the target text according to the segmentation result, and obtaining the final hypernym of the subject comprises:
obtaining English words of the subject;
and querying a dictionary, acquiring a subject interpretation list of English words of the subject, and acquiring the final hypernym of the subject according to the subject interpretation list.
3. The method of claim 2, wherein obtaining the final hypernym of the subject from the subject interpretation list comprises:
checking whether an interpretation sentence exists in the subject interpretation list, if so, the final hypernym of the subject is a thing;
if the meaning does not exist, taking a word behind the "is a" in the explanation sentence as a hypernym, and acquiring a subject explanation list of the hypernym;
and repeating the last execution step according to the subject interpretation list of the hypernyms until the final hypernym of the subject is obtained, wherein the last execution step is stopped when the repetition times reach a threshold value, and the final hypernym of the subject is a person.
4. The method of claim 1, wherein obtaining verbs in the target text according to the segmentation result, and wherein obtaining subject hypernym constraints of the verbs comprises:
obtaining English words of the verbs;
and querying a dictionary, acquiring a verb explanation list of the English word of the verb, and acquiring the subject hypernym constraint of the verb according to the verb explanation list.
5. The method of claim 4, wherein obtaining the subject hypernym constraint of the verb from the verb interpretation list comprises:
in the verb interpretation list, whether the number of 'If you' or 'If one person' in the interpretation sentence exceeds a preset value is checked, If yes, the subject hypernym of the verb is constrained to be a person, and If not, the subject hypernym of the verb is constrained to be a substance.
6. The method of claim 2 or 4, wherein obtaining the English word or the list of interpretations of the English word comprises:
splicing the jump address of the dictionary and the word to be inquired into the URL address to be inquired, calling an http request according to the URL address, and obtaining an explanation list of the English word or the English word, wherein the explanation list comprises a subject explanation list and a verb explanation list.
7. A word segmentation correction system based on hypernyms is characterized by comprising an acquisition module and a judgment module,
the acquisition module is used for acquiring word segmentation results of a word segmentation tool on a target text, wherein the word segmentation results comprise a plurality of words output by the word segmentation tool and corresponding parts of speech, and the word segmentation results are obtained by performing word segmentation on the target text through an accurate mode of the word segmentation tool;
obtaining a subject in the target text according to the word segmentation result, and obtaining a final hypernym of the subject, wherein the final hypernym of the subject is used for indicating that the subject is a person or an object;
obtaining verbs in the target text according to the word segmentation result, and obtaining subject hypernym constraints of the verbs, wherein the subject hypernym constraints of the verbs are used for indicating that the subjects are people or objects;
the judging module is configured to judge whether the final hypernym of the subject is the same as the subject hypernym constraint of the verb, if so, the word segmentation result is correct, and if not, the word segmentation result is incorrect, where, under the condition that the word segmentation result is incorrect, re-segmenting the target text includes:
acquiring a full word segmentation result of a target text by a full word segmentation mode of a word segmentation tool;
removing verbs and words behind the verbs in the full word segmentation result to obtain a first full word segmentation result, removing words in front of the verbs in the word segmentation result in the first full word segmentation result to obtain a second full word segmentation result, removing words which can be spliced into a subject of the word segmentation result in the second full word segmentation result to obtain a third full word segmentation result;
and in the target text, acquiring a word between the last word and the verb in the third full word segmentation result as a new subject, and acquiring a corrected word segmentation result according to the third full word segmentation result, the new subject and the word after the verb and the verb in the full word segmentation result.
8. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the hypernym-based segmentation correction method according to any one of claims 1 to 6.
9. A storage medium, in which a computer program is stored, wherein the computer program is configured to execute the hypernym-based segmentation correction method according to any one of claims 1 to 6 when running.
CN202111237607.8A 2021-10-25 2021-10-25 Word segmentation correction method and system based on hypernym, electronic device and storage medium Active CN113673238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111237607.8A CN113673238B (en) 2021-10-25 2021-10-25 Word segmentation correction method and system based on hypernym, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111237607.8A CN113673238B (en) 2021-10-25 2021-10-25 Word segmentation correction method and system based on hypernym, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113673238A CN113673238A (en) 2021-11-19
CN113673238B true CN113673238B (en) 2022-05-06

Family

ID=78551052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111237607.8A Active CN113673238B (en) 2021-10-25 2021-10-25 Word segmentation correction method and system based on hypernym, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113673238B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117034911B (en) * 2023-09-28 2023-12-22 通用技术集团健康数字科技(北京)有限公司 Correction method and device for hospital diagnosis dictionary, server and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418716A (en) * 1990-07-26 1995-05-23 Nec Corporation System for recognizing sentence patterns and a system for recognizing sentence patterns and grammatical cases
CN109582962A (en) * 2018-11-28 2019-04-05 北京创鑫旅程网络技术有限公司 Segmenting method and device
CN112668311A (en) * 2019-09-29 2021-04-16 北京国双科技有限公司 Text error detection method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418716A (en) * 1990-07-26 1995-05-23 Nec Corporation System for recognizing sentence patterns and a system for recognizing sentence patterns and grammatical cases
CN109582962A (en) * 2018-11-28 2019-04-05 北京创鑫旅程网络技术有限公司 Segmenting method and device
CN112668311A (en) * 2019-09-29 2021-04-16 北京国双科技有限公司 Text error detection method and device

Also Published As

Publication number Publication date
CN113673238A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN107798136B (en) Entity relation extraction method and device based on deep learning and server
CN108052659B (en) Search method and device based on artificial intelligence and electronic equipment
RU2480822C2 (en) Coreference resolution in ambiguity-sensitive natural language processing system
CN111753531A (en) Text error correction method and device based on artificial intelligence, computer equipment and storage medium
CN108959559B (en) Question and answer pair generation method and device
CN109828981B (en) Data processing method and computing device
CN110674319A (en) Label determination method and device, computer equipment and storage medium
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN109033074B (en) News abstract generation method, device, equipment and computer readable medium
CN113673238B (en) Word segmentation correction method and system based on hypernym, electronic device and storage medium
CN111666399A (en) Intelligent question and answer method and device based on knowledge graph and computer equipment
CN113536735B (en) Text marking method, system and storage medium based on keywords
JP2009151777A (en) Method and apparatus for aligning spoken language parallel corpus
CN111079429A (en) Entity disambiguation method and device based on intention recognition model and computer equipment
CN110866095A (en) Text similarity determination method and related equipment
Rehman et al. Morpheme matching based text tokenization for a scarce resourced language
CN112633423A (en) Training method of text recognition model, text recognition method, device and equipment
CN113076404B (en) Text similarity calculation method and device, computer equipment and storage medium
CN108595437B (en) Text query error correction method and device, computer equipment and storage medium
CN113434631A (en) Emotion analysis method and device based on event, computer equipment and storage medium
CN111368061A (en) Short text filtering method, device, medium and computer equipment
CN113255343A (en) Semantic identification method and device for label data, computer equipment and storage medium
CN110162615B (en) Intelligent question and answer method and device, electronic equipment and storage medium
KR101663038B1 (en) Entity boundary detection apparatus in text by usage-learning on the entity&#39;s surface string candidates and mtehod thereof
CN111460789A (en) L STM sentence segmentation method, system and medium based on character embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 310000 7th floor, building B, No. 482, Qianmo Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Huoshi Creation Technology Co.,Ltd.

Address before: 310000 7th floor, building B, No. 482, Qianmo Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee before: HANGZHOU FIRESTONE TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder