CN110162788B - Entity dependency relationship determination method and device - Google Patents

Entity dependency relationship determination method and device Download PDF

Info

Publication number
CN110162788B
CN110162788B CN201910372285.4A CN201910372285A CN110162788B CN 110162788 B CN110162788 B CN 110162788B CN 201910372285 A CN201910372285 A CN 201910372285A CN 110162788 B CN110162788 B CN 110162788B
Authority
CN
China
Prior art keywords
text
sub
entities
entity
texts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910372285.4A
Other languages
Chinese (zh)
Other versions
CN110162788A (en
Inventor
王卓然
亓超
马宇驰
王东亮
陈华荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910372285.4A priority Critical patent/CN110162788B/en
Publication of CN110162788A publication Critical patent/CN110162788A/en
Application granted granted Critical
Publication of CN110162788B publication Critical patent/CN110162788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a method and a device for determining entity dependence relationship, wherein the method comprises the following steps: splitting a text to be recognized into a plurality of sub-texts; identifying an entity in each of the plurality of sub-texts; and determining the dependency relationship between the entities in the text to be identified according to the dependency relationship between the entities in each sub-text and/or the entities in the adjacent sub-texts. Therefore, compared with the method and the device for directly determining the dependency relationship among all the entities in the text to be recognized, by adopting the method and the device for determining the entity dependency relationship in the embodiment of the invention, before determining the dependency relationship among the entities, the text to be recognized is firstly split into a plurality of sub-texts, and then the dependency relationship among the entities in the text to be recognized is determined according to the dependency relationship among the entities in each sub-text and/or the entities in the adjacent sub-texts, so that unrelated entities in the two sub-texts can be prevented from being linked together, and the accuracy of determining the dependency relationship among the entities in the text is improved.

Description

Entity dependency relationship determination method and device
Technical Field
The embodiment of the invention relates to the technical field of internet, in particular to a method and a device for determining entity dependence relationship.
Background
There is a method in the prior art of displaying named entities (or entities, NER) identified from a piece of text, in which method each named entity is displayed separately. For example, for a text "i go to seven-day hotel with sunny ward community route 8", two entities, namely "sunny ward community route 8" and "seven-day hotel" exist in the text, cards corresponding to the two entities are respectively displayed in the user terminal, and in an achievable manner, in order to improve user experience, entities with dependency relationship are generally displayed in one card.
In the process of implementing the embodiment of the present invention, the inventor finds that, for a text with a relatively complex content in the prior art, the dependency relationship between entities in the text cannot be accurately determined according to the existing text recognition technology. For example: the content is "you go to kendyk at location a, i go to mcdonald's at location B. The text can identify four entities of an A place, a Kenday place, a B place and a McDonald' from the text, and the dependency relationship analysis of the four entities can possibly obtain the result that the A place describes the McDonald and the B place describes the Kenday, but actually, the A place describes the Kenday and the B place describes the McDonald, so that when a user performs subsequent operations of card selection, more accurate recommendations cannot be obtained.
Disclosure of Invention
In view of the foregoing problems, an object of the embodiments of the present invention is to provide a method and an apparatus for determining entity dependencies, which are capable of accurately determining dependencies between entities in a text.
In a first aspect, an embodiment of the present invention provides a method for determining an entity dependency relationship, where the method includes: splitting a text to be recognized into a plurality of sub-texts; identifying an entity in each of the plurality of sub-texts; and determining the dependency relationship between the entities in the text to be identified according to the dependency relationship between the entities in each sub-text and/or the entities in the adjacent sub-texts.
In a second aspect, an embodiment of the present invention provides an apparatus for determining entity dependencies, where the apparatus includes: the splitting module is configured to split the text to be recognized into a plurality of sub-texts; an identification module configured to identify an entity in each of the plurality of sub-texts; and the determining module is configured to determine the dependency relationship between the entities in the text to be recognized according to the dependency relationship of the entities in each sub-text and/or the entities in the adjacent sub-texts.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes: at least one processor; and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to call the program instructions in the memory to perform the method according to one or more of the above-mentioned embodiments.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to perform a method in one or more of the above technical solutions.
According to the method and the device for determining the entity dependence relationship, after the text to be recognized is obtained, firstly, the text to be recognized is split into a plurality of sub-texts; then, identifying an entity in each of the plurality of sub-texts; and finally, determining the dependency relationship among the entities in the text to be recognized according to the dependency relationship of the entities in each sub-text and/or the entities in the adjacent sub-texts. Therefore, compared with the method and the device for directly determining the dependency relationship among all the entities in the text to be recognized, by adopting the method and the device for determining the entity dependency relationship in the embodiment of the invention, before determining the dependency relationship among the entities, the text to be recognized is firstly split into a plurality of sub-texts, and then the dependency relationship among the entities in the text to be recognized is determined according to the dependency relationship among the entities in each sub-text and/or the entities in the adjacent sub-texts, so that unrelated entities in the two sub-texts can be prevented from being linked together, and the accuracy of determining the dependency relationship among the entities in the text is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a first flowchart illustrating a method for determining entity dependencies according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a text to be recognized in the embodiment of the present invention after being split into multiple sub-texts;
FIG. 3 is a flowchart illustrating a second method for determining entity dependencies according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating splitting a text to be recognized according to punctuation marks in an embodiment of the present invention;
fig. 5 is a first schematic diagram illustrating a splitting of a text to be recognized according to a preset word count in an embodiment of the present invention;
fig. 6 is a second schematic diagram illustrating splitting of a text to be recognized according to a preset word count in the embodiment of the present invention;
fig. 7 is a schematic diagram illustrating splitting a text to be recognized according to punctuation marks and then according to a preset number of words in the embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating splitting a text to be recognized according to a predetermined number of words and then according to punctuation marks in an embodiment of the present invention;
FIG. 9 is a first diagram illustrating determining entity dependencies in a text to be recognized according to an embodiment of the present invention;
FIG. 10 is a second diagram illustrating determining entity dependencies in a text to be recognized according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating merging entities with dependencies in an embodiment of the invention;
FIG. 12 is a schematic structural diagram of an entity dependency relationship determining apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
An embodiment of the present invention provides a method for determining an entity dependency relationship, where fig. 1 is a first flowchart of the method for determining an entity dependency relationship in the embodiment of the present invention, and as shown in fig. 1, the method may include:
s101: and splitting the text to be recognized into a plurality of sub texts.
In practical application, the text to be recognized may be a text in a chat interface of instant messaging software such as a WeChat, a text in a notepad of a terminal device such as a mobile phone, or a text in other applications with a text recording function, which is not limited herein.
Here, the text to be recognized may refer to a piece of text with complex content, in which a plurality of entities exist and the relationship between the plurality of entities is complicated. For example: the text "you go to kentucky of five mouths, i go to mcdonald of west two flags" in which there are four entities "five mouths, kentucky, west two flags, mcdonald", and between these four entities, five mouths are descriptive of kentucky, west two flags are descriptive of mcdonald, and kentucky is juxtaposed with mcdonald.
Because the content in the text to be recognized is complex, the text to be recognized needs to be split into a plurality of sub-texts. Specifically, firstly, a splitting position is determined, where the splitting position may be any one position in the text to be recognized, one sub-text is located on one side of the splitting position, and the other sub-text is located on the other side of the splitting position; and then, splitting the text to be recognized into a plurality of sub-texts according to the splitting position.
It should be noted that: the number of the splitting positions may be one or more. When the splitting position is one, the text to be recognized is split into two sub-texts, and when the splitting position is N, the text to be recognized is split into N +1 sub-texts. The number of the splitting positions can be determined according to the actual situation of the text to be recognized, and is not limited herein.
Fig. 2 is a schematic diagram of a text to be recognized that is split into multiple sub-texts according to an embodiment of the present invention, as shown in fig. 2, the text 201 to be recognized is a text with relatively complex content, and first, a splitting position 202 is determined, and then, according to the splitting position 202, the text 201 to be recognized is split into two sub-texts, namely a sub-text 203 and a sub-text 204.
S102: an entity in each of the plurality of sub-texts is identified.
In particular, a named entity recognition model may be employed to identify entities in each of the sub-texts. Here, the named entity recognition model is an existing trained model for recognizing entities in a text, and thus, details of a specific training process of the named entity recognition model are not repeated herein.
Here, the process of identifying an entity in each sub-text and the identification result will be described by taking the identification of an entity in one sub-text as an example. Assuming that a certain sub-text is 'i go to the seven-day hotel with the sunny ward community No. 8', two entities, namely 'the sunny ward community No. 8' and 'the seven-day hotel' can be identified from the text by adopting a named entity identification model.
S103: and determining the dependency relationship between the entities in the text to be recognized according to the dependency relationship of the entities in each sub-text and/or the entities in the adjacent sub-texts.
Here, the dependency relationship between entities in the text to be recognized may be determined according to the completeness of the content of each sub-text, in the following two ways, but not limited to.
The first mode is as follows: if the content of each sub-text is complete, the dependency relationship between the entities in the text to be recognized can be determined according to the dependency relationship of the entities in each sub-text.
Illustratively, for the text to be recognized, i.e., "you go to kendir of five mouths, i go to mcdonald of west two flags", the text to be recognized is split into two sub-texts, i.e., "you go to kendir of five mouths", and i go to mcdonald of west two flags ". By analysis, the contents of both sub-texts are complete. The five mouths are identified to describe the kendyy from the sub-text of 'you go to the kendyy of the five mouths', the west two flags are identified to describe the mcdonald from the sub-text of 'i go to the mcdonald of the west two flags', so that the dependency relationship among the entities identified from the text to be identified is that the five mouths are described to the kendyy, and the west two flags are described to the mcdonald.
The second mode is as follows: if the sub-texts with incomplete contents exist in each sub-text, the dependency relationship between the entities in the text to be recognized can be determined according to the dependency relationship between the entities in each sub-text and the entities in the adjacent sub-texts.
Illustratively, the link to the "hundredth web disk is https:// pan. baidu.com/s/Pw51 qOA" to-be-recognized text, and the to-be-recognized text is split into two sub-texts, i.e., "hundredth web disk link is" and "https:// pan. baidu.com/s/Pw51 qOA". By analysis, the link of the Baidu network disk is that the content of the sub-text is incomplete. The entity of the 'hundredth network disk' is identified from the sub-text of 'the link of the hundredth network disk' and the entity of 'https:// pan.baidu.com/s/Pw51 qOA' is identified from the sub-text of 'https:// pan.baidu.com/s/Pw51 qOA', and the entity of 'https:// pan.baidu.com/s/Pw51 qOA' is identified according to the two entities of 'the hundredth network disk' and 'https:// pan.baidu.com/s/Pw51 qOA'.
In the above two examples, for analyzing whether the content in the sub-text is complete, whether the content in the sub-text is complete can be determined by the semantics in the sub-text. And if the complete meaning can be obtained through the sub-text, determining that the content of the sub-text is complete, and if the complete meaning cannot be obtained through the sub-text, determining that the content of the sub-text is incomplete. Of course, whether the content in the sub-text is complete may also be determined in other ways, and is not limited herein.
Thus, the dependency relationship among the entities in the text to be recognized is determined.
As can be seen from the above, in the method for determining an entity dependency relationship provided in the embodiment of the present invention, after obtaining a text to be recognized, first, the text to be recognized is split into a plurality of sub-texts; then, identifying an entity in each of the plurality of sub-texts; and finally, determining the dependency relationship among the entities in the text to be recognized according to the dependency relationship of the entities in each sub-text and/or the entities in the adjacent sub-texts. Therefore, compared with the method for directly determining the dependency relationship among all the entities in the text to be recognized, by adopting the method for determining the entity dependency relationship in the embodiment of the invention, before determining the dependency relationship among the entities, the text to be recognized is firstly split into a plurality of sub-texts, and then the dependency relationship among the entities in the text to be recognized is determined according to the dependency relationship among the entities in each sub-text and/or the entities in the adjacent sub-texts, so that unrelated entities in the two sub-texts can be prevented from being linked together, and the accuracy of determining the dependency relationship among the entities in the text is improved.
Based on the foregoing embodiment, as a refinement and extension of the method shown in fig. 1, the embodiment of the present invention further provides a method for determining an entity dependency relationship. Fig. 3 is a flowchart illustrating a second method for determining entity dependencies in the embodiment of the present invention, and referring to fig. 3, the method may include:
s301: and splitting the text to be recognized into a plurality of sub texts.
In a specific implementation process, the text to be recognized can be split into a plurality of sub-texts according to punctuation marks, preset word numbers and the like. Specifically, the following three modes can be included but not limited to:
the first mode is as follows: and splitting the text to be recognized into a plurality of sub-texts only according to the punctuation marks.
Specifically, the following two steps may be included:
the method comprises the following steps: punctuation marks are recognized from the text to be recognized.
Step two: and dividing the text to be recognized into a plurality of sub-texts by taking punctuation marks as intervals.
For example, fig. 4 is a schematic diagram of splitting a text to be recognized according to punctuation marks in an embodiment of the present invention, and as shown in fig. 4, for a text to be recognized 400 "you go to the kendir of five tracks and i go to the mcdonald 'of west two flags", a punctuation mark 401 "between the kendir and i is recognized from the text to be recognized 400", and at intervals of the punctuation mark 401, the text to be recognized 400 is split into a sub-text 402 "you go to the kendir of five tracks" and a sub-text 403 "i go to the mcdonald' of west two flags".
The second mode is as follows: and splitting the text to be recognized into a plurality of sub-texts only according to the preset word number.
Specifically, the text to be recognized is split into a plurality of sub-texts according to the second preset word number.
Here, the second preset word number is a fixed word number, for example: the second predetermined number of words is 10 words. The second preset word number is 10 words, which means that: in the text to be recognized, the text to be recognized is split into a plurality of sub-texts at intervals of every 10 words, and the number of words of each sub-text is 10 words.
For example, fig. 5 is a schematic diagram of splitting a text to be recognized according to a preset word number in an embodiment of the present invention, referring to fig. 5, for a text to be recognized 500 "i am going to a seven-day hotel with a sunny-area minor way No. 8 at two points in the afternoon", assuming that the second preset word number is 9 words, the text to be recognized 500 is split into a sub-text 501 "i am going to two points in the afternoon", a sub-text 502 "going to a sunny-area minor way No. 8" and a sub-text 503 "seven-day hotel".
It should be noted that: the number of words of the text to be recognized is not necessarily an integer multiple of the second preset number of words, so the number of words of the last sub-text of the plurality of sub-texts is not necessarily the second preset number of words, but the number of words of the other sub-texts except the last sub-text is the second preset number of words.
After splitting the to-be-recognized text into a plurality of sub-texts in the second manner, there may be a case where one entity is split into two parts, and the two parts are respectively located in two adjacent sub-texts in front and at the back. This may cause the split entity not to be recognized from the text to be recognized, and further cause the dependency relationship between the entities in the text to be recognized to be not accurately determined. In order to avoid the above situation and improve the accuracy of determining the entity dependency relationship, after the text to be recognized is split into a plurality of sub-texts according to the second preset word number, the following two steps may be continuously performed:
step A: and C, judging whether a split entity exists between two adjacent subfolders in the plurality of subfolders, if so, executing the step B, and if not, maintaining the current situation.
In a specific implementation process, for determining whether a split entity exists between two adjacent subfolders in the plurality of subfolders, one determination method may be: before step S301 is executed, that is, before the text to be recognized is split into a plurality of sub-texts, all entities in the text to be recognized are recognized; splitting the text to be recognized into a plurality of sub-texts; then, identifying entities in each sub-text; and finally, comparing all entities in the text to be recognized with the entities in each sub-text, if the entities recognized from the text to be recognized do not correspond to the entities recognized from the plurality of sub-texts, or the number of the entities recognized from the text to be recognized is greater than the number of the entities recognized from the plurality of sub-texts, determining that a split entity exists between two adjacent sub-texts in the plurality of sub-texts, and otherwise, determining that the split entity does not exist between two adjacent sub-texts in the plurality of sub-texts.
For example: for a text to be identified, namely a seven-day hotel which is about to go to the sunny-facing district cell way 8, firstly, all entities in the text to be identified, namely two entities, namely the sunny-facing district cell way 8 and the seven-day hotel, are identified; then, splitting the text to be identified into two sub-texts of 'i want to go to camping in a sunny district' and 'seven-day hotel No. Lu 8', in order to judge whether a split entity exists between the two sub-texts, then, identifying the entity in the two sub-texts, and obtaining an entity of 'seven-day hotel'; finally, comparing the entities in the text to be recognized with the entities in the two sub-texts, so that the number of the entities in the text to be recognized is greater than the number of the entities in the two sub-texts, and then determining that the split entities exist in the two sub-texts, and the split entities are '8 th oriented community way' through comparison.
Of course, for determining whether there is a split entity between two adjacent subfolders in the plurality of subfolders, another determination method may be: for two adjacent front and back sub-texts, namely a front sub-text and a back sub-text, firstly, acquiring a plurality of characters at the tail of the front sub-text and a plurality of characters at the head of the back sub-text; secondly, splicing a plurality of characters at the tail end in the front sub-text with a plurality of characters at the head end in the rear sub-text; and finally, judging whether an entity exists in the spliced characters, if so, determining that a split entity exists between the front sub-text and the rear sub-text, and if not, determining that the split entity does not exist between the front sub-text and the rear sub-text.
For example: for the text to be recognized, namely 'i want to go to seven-day hotel of way 8 towards the sunny district', the text to be recognized is split into a front sub-text 'i want to go to the sunny district' and a back sub-text 'seven-day hotel of way 8'. Firstly, 5 characters at the tail end of the front sub-text, namely 'camping in sunny area', and 4 characters at the head of the rear sub-text, namely 'road 8' are obtained; then, splicing the 'camping in the sunny ward district' with the 'No. 8 road' to obtain 'No. 8 camping in the sunny ward district'; and finally, judging that an entity exists in the 'sunny facing district cell camp way 8', namely the 'sunny facing district cell camp way 8', determining that a split entity exists between the 'I need to go to the sunny facing district cell camp' and the 'seven-day hotel with way 8'.
Here, the number of last characters in the sub-text before the acquisition and the number of first characters in the sub-text after the acquisition are not particularly limited. And if the existence of the entity in the spliced characters is not judged in the primary judgment process, the number of the last characters in the acquired front sub-text and/or the number of the first characters in the acquired rear sub-text can be adjusted, and then whether the entity exists in the spliced characters can be judged again.
It should be noted that the two determination manners are only two specific manners for determining whether the split entity exists between two adjacent sub-documents in the plurality of sub-documents, and certainly, other determination manners exist for determining whether the split entity exists between two adjacent sub-documents in the plurality of sub-documents, which is not limited herein.
And B: and drawing the split entity into one of the two adjacent sub texts.
Fig. 6 is a schematic diagram of splitting a text to be recognized according to a preset number of words in an exemplary embodiment of the invention, and as shown in fig. 6, it is assumed that the text to be recognized 600 is "seven-day hotel that i want to go to sunny district cell No. 8 at two points in the afternoon of today", and the second preset number of words is 15 words. According to the second preset number of words, the text 600 to be recognized is split into a sub-text 601 "i am going to the sunny district's camp road two points in the afternoon" and a sub-text 602 "seven-day hotel No. 8". By judging, there is a split entity 603 "camp-on route 8 toward the sun" between the subfile 601 and the subfile 602, then the entity 603 is classified into the subfile 601 or the subfile 602, if the entity 603 is classified into the subfile 601, a subfile 604 "seven-day hotel that i want to go to camp-on route 8 toward the sun" at two points in the afternoon of today "and the subfile 605" is obtained, and if the entity 603 is classified into the subfile 602, a subfile 606 "two points in the afternoon of i go" and a subfile 607 "seven-day hotel that goes to camp-on route 8 toward the sun" are obtained.
The dividing of the divided entity into the preceding sub-text or the following sub-text in the two adjacent sub-texts may be determined according to the number of words in the two divided entities, and if the number of words in the first half of the two divided entities is less than the number of words in the second half of the two divided entities, the divided entity may be divided into the following sub-text, if the number of words in the first half of the two divided entities is greater than the number of words in the second half of the two divided entities, the divided entity may be divided into the preceding sub-text, and if the number of words in the first half of the two divided entities is greater than the number of words in the second half of the two divided entities, the divided entity may be divided into the following sub-text. Of course, it may also be determined whether to divide the split entity into the preceding sub-text or the following sub-text in the two adjacent sub-texts according to other conditions, which is not limited herein.
Here, splitting the text to be recognized into multiple sub-texts in the first way may result in some sub-texts without entities, and continuing to recognize entities in the sub-texts without entities may reduce the efficiency of determining entity dependencies. For example: the text A00 to be recognized, namely 'two points at afternoon of tomorrow go to five-mouth cinemas to see the sea king in the morning' is recorded ', is split into a sub-text A10' two points at afternoon tomorrow see the sea king in the afternoon 'and a sub-text A20' recorded in the sub-text A20 without entities, and then the entities in the sub-text A20 are recognized, so that the efficiency of determining the entity dependence relationship is reduced.
Moreover, the text to be recognized is split into a plurality of sub-texts by the second method, which may cause incomplete meanings in some sub-texts, and cause erroneous judgment of the entity dependency relationship, thereby affecting the accuracy of determining the entity dependency relationship. For example: for the text to be recognized B00 "i want to go to kendyn of five mouths today in the afternoon and not to mcdonald's mcdonald', split into sub-text B10 "i want to go today in the afternoon", sub-text B20 "five tracks of kendyno go" and sub-text B30 "west-two flags mcdonald", whereas for the sub-text B10, the entity B11 "afternoon today" can only be recognized, and it is not possible to know what to do in the afternoon today, whereas from sub-document B20 an entity B21 "five crossing" and an entity B22 "kendyl" can be identified, and from the subfile B30 the entity B31 "west two flag" and the entity B32 "mcdonald" can be identified, this results in two results, one being to go to kentucky today in the afternoon, the other to go to mcdonald's day in the afternoon, the other result is not the true meaning of the text to be recognized, and the entity dependence relationship determination is wrong.
In order to solve the problems of the first and second modes, a third mode is also proposed.
The third mode is as follows: and splitting the text to be recognized into a plurality of sub-texts according to the punctuation marks and the preset word number.
Here, the text to be recognized may be divided into a plurality of sub-texts in two aspects according to the precedence order of using punctuation marks and preset word numbers.
In a first aspect: the method comprises the steps of splitting a text to be recognized into a plurality of sub-texts according to punctuation marks, and adjusting the content in each sub-text according to the preset word number.
In the specific implementation process, the method can comprise the following two steps:
the method comprises the following steps: punctuation marks are recognized from the text to be recognized.
Step two: and dividing the text to be recognized into a plurality of sub-texts by taking punctuation marks as intervals.
Step three: a number of words is determined for each of the plurality of sub-texts.
Step four: and judging whether the plurality of sub texts have sub texts with the word number smaller than the first preset word number, if so, executing the step five, and if not, maintaining the current situation.
Step five: and incorporating the sub-text with the word number smaller than the first preset word number into the adjacent sub-text.
The first step and the second step are the same as the first step and the second step in the first mode, and are not described herein again.
The following mainly explains step three, step four and step five.
Generally, the fewer the number of words in a sub-text, the lower the probability that an entity exists in the sub-text or can express a complete meaning. After determining the number of words in each sub-text, determining whether there is a sub-text with a number of words smaller than a first preset number of words in the plurality of sub-texts, where the first preset number of words is a fixed number of words, which can be set according to practical situations, and is not limited herein. In practical application, the first preset number is not too large, and may be any positive integer from 1 to 5. If the entity dependency relationship exists, the sub-text is merged into the sub-text adjacent to the sub-text, so that the times of not identifying the entity can be reduced, and the efficiency of determining the entity dependency relationship is improved.
For example, fig. 7 is a schematic diagram of splitting a text to be recognized according to punctuation marks and then according to preset words in the embodiment of the present invention, and as shown in fig. 7, it is assumed that a text 700 to be recognized is "you go to kentucky at five tracks, i go to mcdonald's at west, and remember" that the first preset word is 5 words. Through the first step and the second step, the text 700 to be recognized is split into sub-texts 701 "you go to kentucky of five mouths", sub-text 702 "i go to the horse shoe of west two flags" and sub-text 703 "to remember" in. Firstly, determining that 9 characters exist in the subfolders 701, 9 characters exist in the subfolders 702, and 3 characters exist in the subfolders 703; then, comparing the number of words in the sub-text 701, the sub-text 702 and the sub-text 703 with the first preset number of words, and finding that 3 words in the sub-text 703 are less than 5 words in the first preset number of words; finally, sub-text 703 is incorporated into sub-text 702, eventually sub-text 704 "you go to Kenday on five mouths" and sub-text 705 "I go to Marango on two flags", remember ".
It should be noted that: after determining the sub-text having the number of words smaller than the first preset number of words, whether the sub-text is incorporated into the preceding sub-text adjacent thereto or the following sub-text adjacent thereto, may be determined according to the type of punctuation marks between the sub-text and the sub-text adjacent thereto, for example: if the punctuation mark between the sub-text and the preceding sub-text adjacent thereto is a period, the meaning of the preceding sub-text adjacent to the sub-text has been expressed, so that the sub-text is incorporated into the following sub-text adjacent thereto. For another example: if the punctuation between the sub-text and the following sub-text adjacent to it is a period, it means that a complete meaning has been expressed at the end of the text of the sub-text, so the sub-text is incorporated into the preceding sub-text adjacent to it. Of course, whether the sub-text is incorporated into the preceding sub-text adjacent to the sub-text or the following sub-text adjacent to the sub-text may also be determined according to other factors, which are not limited herein.
In a second aspect: the method comprises the steps of splitting a text to be recognized into a plurality of sub-texts according to the preset word number, and adjusting the content of each sub-text according to punctuation marks.
In a specific implementation process, the method can comprise the following five steps:
the method comprises the following steps: and splitting the text to be recognized into a plurality of sub-texts according to the second preset word number.
Step two: and determining a target sub-text with punctuation marks in the target sub-text from the plurality of sub-texts.
Step three: and acquiring a first word number between the punctuation mark in the target sub-text and the beginning of the target sub-text and a second word number between the punctuation mark in the target sub-text and the end of the target sub-text, judging the sizes of the first word number and the second word number, executing the fourth step if the first word number is larger than the second word number, and executing the fifth step if the first word number is smaller than the second word number.
Step four: and the text after the punctuation mark in the target sub-text is drawn into the next sub-text of the target sub-text.
Step five: and drawing the text before the punctuation mark in the target sub-text into the last sub-text of the target sub-text.
The specific implementation process of the first step is the same as that of the second method, and is not described herein again.
The following mainly explains step two, step three, step four and step five.
Generally, if punctuation exists in a sub-text, a part of the content in the sub-text may be the content in the previous sub-text of the sub-text or the content in the next sub-text of the sub-text. Then, by comparing the size of the first word number between the punctuation mark and the beginning of the target sub-text and the size of the second word number between the punctuation mark and the end of the target sub-text in the target sub-text with the punctuation mark, the content with small word number is divided into the corresponding sub-texts, so that the content in each sub-text can be kept intact to the maximum extent, and the accuracy of determining the entity dependence relationship is further improved. Here, the first word count refers to how many words between the landmark symbol and the beginning of the target sub-text in the target sub-text, and the second word count refers to how many words between the landmark symbol and the end of the target sub-text in the target sub-text.
For example, fig. 8 is a schematic diagram of splitting a text to be recognized according to preset words and punctuation marks, and as shown in fig. 8, assuming that the text 800 to be recognized is "you will go to five movie theaters six and a half at tomorrow evening, and will take meeting data and a notebook computer", and the second preset word is 15 words, through the above step one, the text 800 to be recognized is split into a sub-text 801 "you will go to five movie theaters six and a half at tomorrow evening" and a sub-text 802 ", and will take meeting data and a notebook computer". In the sub-text 802, punctuation marks 803 ",", 2 words are between the punctuation marks 803 and the beginning of the sub-text 802, 13 words are between the punctuation marks 803 and the end of the sub-text 802, and it is obvious that 2 words are less than 13 words, that is, the number of words between the punctuation marks 803 and the beginning of the sub-text 802 is less than the number of words between the punctuation marks 803 and the end of the sub-text 802, then the words between the punctuation marks 803 and the beginning of the sub-text 802 are drawn into the sub-text 801, and finally the sub-text 804 "you tomorrow night six and half go to five movie theatre conferences" and the sub-text 805 "need to carry conference materials and notebook".
Here, it should be noted that: if the number of words between the punctuation mark symbol in the target sub-text and the beginning of the target sub-text is the same as the number of words between the punctuation mark symbol in the target sub-text and the end of the target sub-text, if a split entity exists between the target sub-text and the preceding sub-text, the words between the punctuation mark symbol in the target sub-text and the beginning of the target sub-text are marked into the preceding sub-text; if a split entity exists between the target sub-text and the sub-text behind the target sub-text, dividing words between the punctuation marks in the target sub-text and the tail of the target sub-text into the sub-text behind the target sub-text; if there is no split entity between the target sub-text and the preceding sub-text, and there is no split entity between the target sub-text and the following sub-text, the word between the punctuation mark in the target sub-text and the beginning of the target sub-text may be drawn into the preceding sub-text, or the word between the punctuation mark in the target sub-text and the end of the target sub-text may be drawn into the following sub-text, which is not limited herein.
So far, the text to be recognized can be split into a plurality of sub-texts by any one of the three ways.
S302: an entity in each of the plurality of sub-texts is identified.
S303: the number of entities in each sub-text is determined.
S304: and judging whether a single sub text with one entity exists in each sub text, if so, executing S305 and S307, and if not, executing S306 and S307.
S305: dependencies between entities in the single sub-text and entities in sub-text adjacent to the single sub-text are identified.
S306: dependencies between entities in each sub-text are identified.
S307: and determining the dependency relationship between the entities in the text to be recognized according to the recognized dependency relationship.
The following describes how to determine the dependency relationship between entities in the text to be recognized through a plurality of sub-texts after the text to be recognized is split into the plurality of sub-texts by two specific examples.
Example one: fig. 9 is a schematic diagram one of determining entity dependency relationship in a text to be recognized in the embodiment of the present invention, and as shown in fig. 9, it is assumed that a text to be recognized 900 is "i go to kendir of five gates and you go to mcdonald of west two flags", a split sub-text 901 is "i go to kendir of five gates, a split sub-text 902 is" i go to mcdonald of west two flags ", an entity 903" five gates "and an entity 904" kendir "are present in the sub-text 901, an entity 905" west two flags "and an entity 906" mcdonald "are present in the sub-text 902, two entities are present in the sub-text 901, and no single sub-text exists, then, the dependency relationship between the entity 903 and the entity 904 in the sub-text is recognized, the five gates are described for kendir, the relationship between the entity 905 and the entity 906 in the sub-text 902 is recognized, and the west two flags are described for mcdonald, finally, the analysis results that kendir and mcdonald are parallel, so the dependency relationship of each entity finally determined from the text 900 to be recognized is as follows: the fifth crossing is described for kendyl and the west second flag is described for mcdonald.
Example two: fig. 10 is a schematic diagram illustrating determination of entity dependency relationship in a text to be recognized in the embodiment of the present invention, and referring to fig. 10, it is assumed that a text to be recognized 1000 is "nine o ' clock in the morning in tomorrow to go to the first airport for receiving a xiaoming, and then go to five-way theater for a meeting", a split sub-text 1001 is "nine o ' clock in tomorrow morning", a split sub-text 1002 is "go to the first airport for receiving a xiaoming", a split sub-text 1003 is "go to five-way theater for a meeting", an entity 1004 "nine o ' clock in tomorrow morning" in the sub-text 1001, an entity 1005 "first airport" and an entity 1006 "go to a xiaoming", an entity 1007 "five-way theater" and an entity 1008 "a meeting" are present in the sub-text 1003, one entity is present in the sub-text 1001, two entities are present in the sub-text 1002, and two entities 1006 are present in the sub-text 1003, wherein the sub-text 1001 is a single sub-text 1001, and then the entity 1005 and the sub-text 1002 in the sub-text 1002 are identified together, the nine points in the morning of the tomorrow and the capital airport are all described with Zhangxian, the dependency relationship between the entity 1007 and the entity 1008 in the sub-text 1003 is identified, five movie theaters are obtained with descriptive meetings, and finally, the Zhangxian and the meetings are analyzed and obtained in parallel, so that the dependency relationship of each entity determined from the text 1000 to be identified is as follows: nine points in the morning and the capital airport in the tomorrow describe the reception of a small figure, and a five-mouth cinema describes the meeting.
It should be noted that: if a single text exists in a plurality of sub-texts, whether the dependency relationship is determined by the single entity in the single text and the entity in the sub-text before the single text or the dependency relationship is determined by the single entity in the single text and the entity in the sub-text after the single text can be determined according to the semantics in the preceding sub-text or the semantics in the following sub-text, of course, the dependency relationship can be determined according to other conditions, and the determination is not limited herein.
Thus, the dependency relationship among the entities in the text to be recognized is determined.
In order to enable a user to directly and clearly know the dependency relationship among the entities in the text to be recognized, the determined dependency relationship can be displayed.
S308: and merging and displaying the entities with the dependency relationship among each other in the text to be recognized.
For example, fig. 11 is a schematic diagram of merging and displaying entities with dependency relationships in an embodiment of the present invention, and referring to fig. 11, a text 1101 to be recognized is "six o' clock in the evening and half a meeting in five movie theaters", and the dependency relationship determined from the text 1101 to be recognized is: "six and a half nights" and "five-mouth cinema" are both descriptive of "meeting". The "six and a half nights", "five movie theaters" and "meeting" are merged and displayed, and the merged display result is the content in the note 1102, where the "meeting reminder" in fig. 11 is the "meeting" in this example.
In particular implementation, in order to enable a user to quickly know the information level in the text to be recognized, entities with dependency relationships can be displayed hierarchically.
Specifically, the method can comprise the following steps:
the method comprises the following steps: and identifying the types of entities with dependency relationships among the entities in the text to be identified.
Wherein the types include: and the object is used for describing the subject. For example: for the text to be recognized, "the meeting is carried out by going to five movie theaters six and a half at night", the entities with the dependency relationship are "six and a half at night", "five movie theaters" and "meeting", wherein the entity with the type of the subject is "meeting", and the entity with the type of the object is "six and a half at night" and "five movie theaters".
Step two: and the entity with the type as the subject and the entity with the type as the object are set in a distinguishing way, so that the entity with the type as the subject is prominent.
Specifically, the font of the entity with the type as the subject may be enlarged and displayed, the font of the entity with the type as the subject may be displayed with a color, the font of the entity with the type as the subject may flash, and of course, the entity with the type as the subject may be highlighted in other manners, which is not limited herein.
Step three: and merging and displaying the entity with the type as the subject and the entity with the type as the object after the difference setting.
It should be noted that: the execution sequence of the second step and the third step may also be reversed, that is, after the first step is executed, the third step may be executed first, and then the second step is executed, that is, after the types of the entities having the dependency relationship with each other in the text to be recognized are recognized, the entity with the type as the subject and the entity with the type as the object are merged first, and then the entity with the type as the subject and the entity with the type as the object are displayed in a distinguishing manner, so that the entity with the type as the subject is highlighted.
For example, still referring to fig. 11, the entities "six and a half evening," "five-lane cinema" and "meeting reminder" are merged and displayed in the note 1102, and the font of the "meeting reminder" is larger than the fonts of the "six and a half evening" and the "five-lane cinema," so that the user can see the "meeting reminder" first when seeing the note 1102, and can directly know that the note 1102 is meeting reminder, and then see the "six and a half evening" and the "five-lane cinema," so that the user can know the time and place of the meeting. It can be seen that highlighting entities of the type subject enables the user to quickly understand the level of information in the text to be recognized.
Thus, the whole process of determining the dependency relationship among the entities in the text to be recognized and performing combined display on the entities with the dependency relationship among each other is completed.
Based on the same inventive concept, as an implementation of the method, the embodiment of the present invention further provides a device for determining an entity dependency relationship. Fig. 12 is a schematic structural diagram of an entity dependency relationship determining apparatus in an embodiment of the present invention, and referring to fig. 12, the apparatus 1200 may include: a splitting module 1201 configured to split a text to be recognized into a plurality of sub-texts; an identification module 1202 configured to identify an entity in each of a plurality of sub-texts; a determining module 1203 configured to determine dependency relationships between entities in the text to be recognized according to dependency relationships of the entities in each sub-text and/or entities in adjacent sub-texts.
Based on the foregoing embodiment, the splitting module is configured to recognize punctuation marks from a text to be recognized; and dividing the text to be recognized into a plurality of sub-texts by taking punctuation marks as intervals.
Based on the foregoing embodiment, the apparatus may further include: a first adjustment module configured to determine a number of words of each of a plurality of sub-texts; judging whether a sub text with the word number smaller than a first preset word number exists in the plurality of sub texts; if yes, the sub-text with the number of words smaller than the first preset number of words is merged into the sub-text adjacent to the sub-text.
Based on the foregoing embodiment, the splitting module is configured to split the text to be recognized into a plurality of sub-texts according to a second preset word count.
Based on the foregoing embodiment, the apparatus may further include: a second adjusting module configured to determine whether a split entity exists between two adjacent subfolders in the plurality of subfolders; if yes, the split entity is scratched into one of the two adjacent sub texts.
Based on the foregoing embodiment, the apparatus may further include: the third adjusting module is configured to determine a target sub-text with punctuation marks in the target sub-text from the plurality of sub-texts; acquiring a first word number between the punctuation mark in the target sub-text and the beginning of the target sub-text and a second word number between the punctuation mark in the target sub-text and the end of the target sub-text; if the first word number is larger than the second word number, the text after the punctuation mark in the target sub-text is classified into the next sub-text of the target sub-text; and if the first word number is less than the second word number, the text before the punctuation mark in the target sub-text is drawn into the last sub-text of the target sub-text.
Based on the foregoing embodiments, a determination module configured to determine the number of entities in each sub-text; judging whether a single sub text with one entity exists in each sub text; if so, identifying the dependency relationship between the entity in the single sub text and the entity in the sub text adjacent to the single sub text; if not, identifying the dependency relationship between the entities in each sub-text; and determining the dependency relationship between the entities in the text to be recognized according to the recognized dependency relationship.
Based on the foregoing embodiment, the apparatus may further include: and the display module is configured to combine and display the entities with the dependency relationship among the entities in the text to be recognized.
Based on the foregoing embodiment, the display module is configured to identify types of entities having dependencies between each other in the text to be identified, where the types include: the system comprises a subject and an object, wherein the subject is an object described in a dependency relationship, and the object is used for describing the subject; in the entities having a dependency relationship with each other, the entity of the type subject and the entity of the type object are displayed in a merged manner, so that the entity of the type subject is distinguished from the entity of the type object, and the entity of the type subject is highlighted.
Here, it should be noted that: the above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention for understanding.
Based on the same inventive concept, the embodiment of the invention also provides electronic equipment. Fig. 13 is a schematic structural diagram of an electronic device in an embodiment of the present invention, and referring to fig. 13, the electronic device 1300 may include: at least one processor 1301; and at least one memory 1302, bus 1303 connected to processor 1301; the processor 1301 and the memory 1302 complete communication with each other through the bus 1303; processor 1301 is configured to invoke program instructions in memory 1302 to perform the methods in one or more of the embodiments described above.
Here, it should be noted that: the above description of the embodiments of the electronic device is similar to the description of the embodiments of the method described above, and has similar advantageous effects to the embodiments of the method. For technical details not disclosed in the embodiments of the electronic device according to the embodiments of the present invention, please refer to the description of the method embodiments of the present invention.
Based on the same inventive concept, the embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the method in one or more embodiments described above.
Here, it should be noted that: the above description of the computer-readable storage medium embodiments is similar to the description of the method embodiments described above, with similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the computer-readable storage medium of the embodiments of the present invention, reference is made to the description of the method embodiments of the present invention for understanding.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that: the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (18)

1. A method for determining entity dependencies, the method comprising:
splitting a text to be recognized into a plurality of sub-texts;
determining a number of words for each of the plurality of sub-texts;
judging whether a sub text with the word number smaller than a first preset word number exists in the plurality of sub texts;
if yes, the subfile with the word number smaller than the first preset word number is merged into one subfile adjacent to the subfile;
identifying an entity in each of the plurality of sub-texts;
and determining the dependency relationship between the entities in the text to be identified according to the dependency relationship between the entities in each sub-text and/or the entities in the adjacent sub-texts.
2. The method of claim 1, wherein splitting the text to be recognized into a plurality of sub-texts comprises:
identifying punctuation marks from the text to be identified;
and splitting the text to be recognized into a plurality of sub-texts by taking the punctuations as intervals.
3. The method of claim 1, wherein splitting the text to be recognized into a plurality of sub-texts comprises:
and splitting the text to be recognized into a plurality of sub-texts according to a second preset word number.
4. The method of claim 3, wherein after the splitting the text to be recognized into a plurality of sub-texts according to the second preset word number, the method further comprises:
judging whether a split entity exists between two adjacent subfolders in the plurality of subfolders;
if yes, the split entity is scratched into one of the two adjacent sub texts.
5. The method of claim 3, wherein after the splitting the text to be recognized into a plurality of sub-texts according to the second preset word number, the method further comprises:
determining a target sub-text with punctuations from the plurality of sub-texts;
acquiring a first word count between a punctuation mark in the target sub-text and the beginning of the target sub-text and a second word count between the punctuation mark in the target sub-text and the end of the target sub-text;
if the first word number is larger than the second word number, the text after the punctuation mark in the target sub-text is classified into the next sub-text of the target sub-text;
and if the first word number is smaller than the second word number, the text before the punctuation mark in the target sub-text is drawn into the last sub-text of the target sub-text.
6. The method according to any one of claims 1 to 5, wherein the determining the dependency relationship between the entities in the text to be identified according to the dependency relationship between the entities in each sub-text and/or the entities in the adjacent sub-texts comprises:
determining the number of entities in each sub-text;
judging whether a single sub text with one entity number exists in each sub text;
if so, identifying the dependency relationship between the entity in the single sub text and the entity in the sub text adjacent to the single sub text;
if not, identifying the dependency relationship between the entities in each sub-text;
and determining the dependency relationship among the entities in the text to be recognized according to the recognized dependency relationship.
7. The method according to claim 6, wherein after determining the dependency relationship between the entities in the text to be recognized according to the identified dependency relationship, the method further comprises:
and merging and displaying the entities with the dependency relationship among the texts to be recognized.
8. The method according to claim 7, wherein the merging and displaying the entities having dependency relationships with each other in the text to be recognized comprises:
identifying the types of entities with dependency relationships among the entities in the text to be identified, wherein the types comprise: a subject and an object, wherein the subject is an object described in the dependency relationship, and the object is used for describing the subject;
the entity with the type as the subject and the entity with the type as the object are set in a distinguishing way, so that the entity with the type as the subject is prominent;
and merging and displaying the entity with the type as the subject and the entity with the type as the object after the difference setting.
9. An apparatus for determining entity dependencies, the apparatus comprising:
the splitting module is configured to split the text to be recognized into a plurality of sub-texts;
a first adjustment module configured to determine a number of words for each of the plurality of sub-texts; judging whether a sub text with the word number smaller than a first preset word number exists in the plurality of sub texts; if yes, the subfile with the word number smaller than the first preset word number is merged into one subfile adjacent to the subfile;
an identification module configured to identify an entity in each of the plurality of sub-texts;
and the determining module is configured to determine the dependency relationship between the entities in the text to be recognized according to the dependency relationship of the entities in each sub-text and/or the entities in the adjacent sub-texts.
10. The apparatus of claim 9, wherein the splitting module is configured to identify punctuation marks from the text to be identified; and splitting the text to be recognized into a plurality of sub-texts by taking the punctuations as intervals.
11. The apparatus of claim 9, wherein the splitting module is configured to split the text to be recognized into a plurality of sub-texts according to a second preset word count.
12. The apparatus of claim 11, further comprising:
a second adjusting module configured to determine whether a split entity exists between two adjacent subfolders of the plurality of subfolders; if yes, the split entity is scratched into one of the two adjacent sub texts.
13. The apparatus of claim 11, further comprising:
the third adjusting module is configured to determine a target sub-text with punctuations from the plurality of sub-texts; acquiring a first word count between a punctuation mark in the target sub-text and the beginning of the target sub-text and a second word count between the punctuation mark in the target sub-text and the end of the target sub-text; if the first word number is larger than the second word number, the text after the punctuation mark in the target sub-text is classified into the next sub-text of the target sub-text; and if the first word number is smaller than the second word number, the text before the punctuation mark in the target sub-text is drawn into the last sub-text of the target sub-text.
14. The apparatus according to any of claims 9 to 13, wherein the determining module is configured to determine the number of entities in each of the sub-texts; judging whether a single sub text with one entity number exists in each sub text; if so, identifying the dependency relationship between the entity in the single sub text and the entity in the sub text adjacent to the single sub text; if not, identifying the dependency relationship between the entities in each sub-text; and determining the dependency relationship among the entities in the text to be recognized according to the recognized dependency relationship.
15. The apparatus of claim 14, further comprising:
and the display module is configured to combine and display the entities with the dependency relationship among the entities in the text to be recognized.
16. The apparatus according to claim 15, wherein the display module is configured to identify types of entities having dependencies between each other in the text to be identified, and the types include: a subject and an object, wherein the subject is an object described in the dependency relationship, and the object is used for describing the subject; the entity with the type as the subject and the entity with the type as the object are set in a distinguishing way, so that the entity with the type as the subject is prominent; and merging and displaying the entity with the type as the subject and the entity with the type as the object after the difference setting.
17. An electronic device, characterized in that the electronic device comprises: at least one processor; and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the method of any of claims 1 to 8.
18. A computer-readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when executed, controls an apparatus on which the storage medium is located to perform the method according to any of claims 1 to 8.
CN201910372285.4A 2019-05-06 2019-05-06 Entity dependency relationship determination method and device Active CN110162788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910372285.4A CN110162788B (en) 2019-05-06 2019-05-06 Entity dependency relationship determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910372285.4A CN110162788B (en) 2019-05-06 2019-05-06 Entity dependency relationship determination method and device

Publications (2)

Publication Number Publication Date
CN110162788A CN110162788A (en) 2019-08-23
CN110162788B true CN110162788B (en) 2021-02-09

Family

ID=67633489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910372285.4A Active CN110162788B (en) 2019-05-06 2019-05-06 Entity dependency relationship determination method and device

Country Status (1)

Country Link
CN (1) CN110162788B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10503833B2 (en) * 2016-12-06 2019-12-10 Siemens Aktiengesellschaft Device and method for natural language processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
汉语依存句法分析关键技术研究;李正华;《中国博士学位论文全文数据库》;20140115;正文1-39页 *
长句切割在依存句法分析中的应用;汤光旭;《中国优秀硕士学位论文全文数据库》;20160815;全文 *

Also Published As

Publication number Publication date
CN110162788A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
EP3358476A1 (en) Method and apparatus for constructing decision model, computer device and storage device
US20180293217A1 (en) Smarter copy/paste
US11232523B2 (en) System and method for providing an intelligent operating interface and intelligent personal assistant as a service on a crypto secure social media and cross bridge service with continuous prosumer validation based on i-operant tags, i-bubble tags, demojis and demoticons
CN111291566A (en) Event subject identification method and device and storage medium
CN110674297B (en) Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment
US11532333B1 (en) Smart summarization, indexing, and post-processing for recorded document presentation
US20220121668A1 (en) Method for recommending document, electronic device and storage medium
CN111143551A (en) Text preprocessing method, classification method, device and equipment
WO2018151775A1 (en) Structured response summarization of electronic messages
CN114359533B (en) Page number identification method based on page text and computer equipment
CN111046640B (en) Dynamic generation method and device for certificate
CN115048435A (en) Intelligent database storage method and system
CN110827085A (en) Text processing method, device and equipment
CN110232156A (en) Information recommendation method and device based on long text
US11733823B2 (en) Synthetic media detection and management of trust notifications thereof
CN113011169B (en) Method, device, equipment and medium for processing conference summary
CN110162788B (en) Entity dependency relationship determination method and device
CN107247716B (en) Method and device for increasing electronic eye information, navigation chip and server
US20230359827A1 (en) Representing Confidence in Natural Language Processing
CN110232155B (en) Information recommendation method for browser interface and electronic equipment
CN110825874A (en) Chinese text classification method and device and computer readable storage medium
US20160232231A1 (en) System and method for document and/or message document and/or message content suggestion, user rating and user reward
US20200226208A1 (en) Electronic presentation reference marker insertion
CN110688467A (en) Named entity recognition method and device, computer equipment and storage medium
CN111723177B (en) Modeling method and device of information extraction model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200728

Address after: 518000 Nanshan District science and technology zone, Guangdong, Zhejiang Province, science and technology in the Tencent Building on the 1st floor of the 35 layer

Applicant after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

Address before: 100029, Beijing, Chaoyang District new East Street, building No. 2, -3 to 25, 101, 8, 804 rooms

Applicant before: Tricorn (Beijing) Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant