CN114970491A - Text connectivity judgment method and device, electronic equipment and storage medium - Google Patents
Text connectivity judgment method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114970491A CN114970491A CN202210919249.7A CN202210919249A CN114970491A CN 114970491 A CN114970491 A CN 114970491A CN 202210919249 A CN202210919249 A CN 202210919249A CN 114970491 A CN114970491 A CN 114970491A
- Authority
- CN
- China
- Prior art keywords
- preset
- task key
- determining
- key
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000011218 segmentation Effects 0.000 claims description 30
- 238000000605 extraction Methods 0.000 claims description 29
- 230000015654 memory Effects 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 23
- 238000004458 analytical method Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 17
- 238000013210 evaluation model Methods 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000008520 organization Effects 0.000 description 12
- 238000004891 communication Methods 0.000 description 8
- 238000012544 monitoring process Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000007123 defense Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000009429 distress Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 230000005389 magnetism Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention relates to the technical field of computers, in particular to a text connectivity judgment method and device, electronic equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a target text, analyzing the target text to obtain task key language segments of the target text, obtaining tag named entities in the task key language segments based on a preset named entity recognition model and the task key language segments, and determining a connectivity judgment result between the task key language segments based on the tag named entities. By further obtaining the tag named entities in the task key language segments after the task key language segments are locked and utilizing the tag named entities to calculate the connectivity between the task key language segments, the method determines that the connectivity relation of each language segment time in a text can be fully judged, and whether the following pre-arranged plan in the text can solve the problem in the preceding text or not, thereby improving the working efficiency.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a text connectivity judgment method and device, electronic equipment and a storage medium.
Background
With the development of artificial intelligence, the artificial intelligence can gradually understand the text content. In the prior art, artificial intelligence can be used for identifying the similarity, consistency and the like of texts.
However, in the prior art, the artificial intelligence can only distinguish whether the text is saying the same problem, especially in the field of emergency plans, the requirement for the artificial intelligence is not limited to identifying whether the text is saying the same problem, and more importantly, whether the future plan in the text can solve the previous problem needs to be judged, which relates to the judgment of text connectivity, and the judgment of text continuity and practicability.
Therefore, a text connectivity determination method is needed to solve the above problems.
Disclosure of Invention
In view of this, to solve the above technical problems in the prior art, embodiments of the present invention provide a text connectivity determining method, apparatus, electronic device and storage medium.
In a first aspect, an embodiment of the present invention provides a text connectivity determining method, where the method includes: acquiring a target text; analyzing the target text to obtain a task key word segment of the target text; obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment; and determining the result of the judgment of the connectivity between the key language segments of each task based on the tag named entity.
Optionally, parsing the target text to obtain a task key word segment of the target text includes: an initial analysis model preset with a target text input value is used for determining an initial analysis result; determining at least two process language segments based on a preset knowledge base and an initial analysis result; extracting key phrases from each process speech segment by using a preset key phrase extraction model, and determining a key phrase extraction result; and obtaining a task key word segment of the target text according to the key phrase extraction result.
Optionally, performing key phrase extraction on each process corpus by using a preset key phrase extraction model, and determining a key phrase extraction result, including: performing word segmentation processing on the process language segments based on a preset word segmentation model to obtain word segmentation results; determining the weight corresponding to each word segmentation result based on the word segmentation results and a preset weight rule; and determining a key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
Optionally, obtaining the tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment, including: inputting the task key language segment into a preset part-of-speech tagging model, and determining a part-of-speech tagging result; based on the part-of-speech tagging result and a preset target part-of-speech, reserving a target vocabulary conforming to the target part-of-speech; and inputting the target vocabulary into a preset named entity recognition model to obtain the tag named entity in the task key language segment.
Optionally, determining a result of determining the connectivity between the task key words based on the tag named entity includes: inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity among the tag named entities; determining whether a connection exists between the tag named entities based on the semantic similarity; acquiring the connection number of the label named entities corresponding to each task key language segment; acquiring the element number of a tag named entity corresponding to each task key language segment; and determining the connectivity judgment result between the task key language segments based on the number of the elements and the number of the connections.
Optionally, determining whether a connection exists between the tag-named entities based on the semantic similarity includes: when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities; otherwise, no connection between the tag named entities is assumed.
In a second aspect, an embodiment of the present invention provides a device for determining text engagement, including: the acquisition module is used for acquiring a target text; the analysis module is used for analyzing the target text to obtain a task key word segment of the target text; the first processing module is used for obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment; and the second processing module is used for determining the result of the judgment of the connectivity between the key language segments of each task based on the tag named entity.
In a third aspect, the present application provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method as described in the first aspect or any of the possible embodiments of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as described in the first aspect or any of the possible embodiments of the first aspect.
The invention provides a text connectivity judgment method and device, electronic equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a target text, analyzing the target text to obtain task key language segments of the target text, obtaining tag named entities in the task key language segments based on a preset named entity recognition model and the task key language segments, and determining a connectivity judgment result between the task key language segments based on the tag named entities. By further obtaining the tag named entities in the task key language segments after the task key language segments are locked and utilizing the tag named entities to calculate the connectivity between the task key language segments, the method determines that the connectivity relation of each language segment time in a text can be fully judged, and whether the following pre-arranged plan in the text can solve the problem in the preceding text or not, thereby improving the working efficiency.
Drawings
Fig. 1 is a schematic flow chart of a text connectivity determination method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a text connectivity determination method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a text connectivity determination apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device for determining text engagement according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the purpose of facilitating understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.
Fig. 1 is a schematic flow chart of a text engagement determination method provided in an embodiment of the present invention, and an implementation process of the method steps may specifically refer to fig. 1, where the method includes:
and S110, acquiring a target text.
Illustratively, the target text may be any type of text, including but not limited to: the emergency plan, the emergency and disaster relief duty and the like are not limited herein, and the data format of the text is not limited, including but not limited to files in doc, docx and other formats.
In an optional embodiment, after obtaining the files of other format types, the target text type may also be converted into the file of the docx format by the file format conversion tool, and in the subsequent processing process, the file of the docx format is uniformly processed.
And S120, analyzing the target text to obtain a task key word segment of the target text.
Exemplarily, after a target text is obtained, inputting the target text into a preset initial analysis model, determining an initial analysis result, determining at least two process language segments based on a preset knowledge base and the initial analysis result, extracting key phrases of the process language segments by using a preset key phrase extraction model, and determining a key phrase extraction result; and obtaining a task key word segment of the target text according to the key phrase extraction result.
In an optional embodiment, the preset initial analysis model is used for sequentially reading each entity attribute of the stored data from top to bottom, wherein each entity attribute comprises a title index, a title content, a title level, a superior title index and a text, obtaining a flat entity set after reading is finished, and dividing the target text into layers. For example, the target text includes two items of "organization and responsibility", "monitoring and early warning forecast", and there are sub-items of "emergency organization and responsibility" below the "organization and responsibility" item, and "monitoring of geological disaster" below the "monitoring and early warning forecast" item, obviously, the "organization and responsibility" and "monitoring and early warning forecast" are in the same level, the "emergency organization and responsibility" and "monitoring of geological disaster" are in the same level, and are lower than the level of the "organization and responsibility" and "monitoring and early warning forecast", and according to the above division mode, the target text is divided into several levels through the preset initial analysis model, and the several levels are the initial analysis results.
Further, after the hierarchy is divided, at least two process language segments are determined based on a preset knowledge base and an initial analysis result.
In an optional embodiment, the preset knowledge bases comprise a chapter knowledge base and an organization knowledge base, wherein the chapter knowledge base is formed by establishing the chapter knowledge base to facilitate finding of corresponding contents due to different organization descriptions corresponding to different special projects. For different texts, the "chapter knowledge base" needs to correspond to different contents, for example, when the text related to the emergency plan is processed, the "chapter knowledge base" can be shown in the following table 1:
TABLE 1
In an optional embodiment, it is assumed that a target text exists, after a process language segment is located, an "emergency response" text in which an "organization and responsibility" and each response level are located is first located, that is, all texts in which the "organization and responsibility" and each response level are contained in the segment are searched, further, after the first location is completed, according to a location result, a "member unit" text in the "organization and responsibility" and each response level "text in the" emergency response "are further located, and finally, a text corresponding to a" member unit "in each response level" is located.
Further, since many mechanism units in the actual text are shorthand or variant, an "organizational structure knowledge base" needs to be established to ensure that the member units can be accurately located when the shorthand or variant of the mechanism units occurs.
In practical application, in order to conveniently locate a process speech segment, a shorthand or a variant of a mechanism unit can be replaced, and a flash text algorithm can be generally adopted for implementation. However, it should be noted that, in practical applications, the method for replacing or finding the shorthand or variant of the institution unit is not limited to the FlashText algorithm, and this example is only for explanation, and is not limited herein, subject to practical application.
Further, after the process language segment is determined, the key phrases of the process language segment need to be extracted, and the sentence where the key phrase is located is used as the task key language segment.
Illustratively, the method for extracting the key phrases may be to perform word segmentation processing on process language segments based on a preset word segmentation model to obtain word segmentation results, determine weights corresponding to the word segmentation results based on the word segmentation results and preset weight rules, and determine key phrase extraction results based on the weights corresponding to the word segmentation results and preset selection rules.
In an optional embodiment, the extraction of the key phrases is to firstly clean the text of the target text and remove impurity data such as abnormal characters, redundant characters, special characters, various brackets and the like. And then, segmenting the text, using a segmentation model to perform segmentation and part-of-speech tagging, and loading an emergency plan field specific dictionary library to prevent field nouns from being separated. For example, emergency domain proper terms such as "zone defense," "rescue authorities," "lead units" are not separable in the emergency plan text. And then calculating word frequency, carrying out word frequency statistics on the words after word segmentation, and calculating the weight of each word. The weights of the words may be assigned according to preset data, or may be calculated by using a weight calculation model, which is not limited herein, subject to actual application. Finally, selecting proper phrases according to a preset selection rule, and calculating the weight occupied by each phrase according to a preset calculation rule.
In practical applications, the phrase selection rule may be set with reference to the following manner: rule 1: one phrase cannot exceed 25 char; rule 2: one phrase cannot exceed 12 tokens; rule 3: more than one particle cannot occur in a phrase; rule 4: the front and the back of the phrase can not be the dummy word or stop word, and the tail of the phrase can not be the verb; rule 5: the number of stop words in the candidate phrases cannot exceed the specified number (1); rule 6: the first word of the candidate phrase must be a verb (v), an adverb (d) and a preposition (p); rule 7: a candidate phrase must not be a noun. The above rules are all made according to the plan text data. The weight calculation rule can be designed as follows: the weight calculation formula of the candidate phrase is as follows: the product of phrase weight, phrase length weight and part of speech weight, wherein the phrase weight is the sum of the weights of all words in the phrase. For example, the phrases [ ('tissue', 'v',0.6762), ('done', 'v',0.8136), ('medical', 'n',4.4245), ('rescue', 'v',1.5946) ], the phrase weights are: 0.6762+0.8136+4.4245+1.5946=7.5089
It is generally assumed that shorter fields should have more weight, and thus the phrase length weight is a number obtained through multiple verifications, and the final weight values are {1: 1, 2: 5.6, 3:1.1, 4:2.0, 5:0.7, 6:0.9, 7:0.48,8: 0.43, 9: 0.24, 10:0.15, 11:0.07, 12:0.05 }. The part-of-speech weight represents the part-of-speech translation weight for the first word to the last word of the phrase, such as: { "v | n": 0.6575342465753424, "n": 0.9154147615937296 }. And finally, sorting the weights of the phrases according to the sizes, and selecting the phrases according to a preset rule. For example, assume that there are 5 phrases, respectively phrase a, phrase B, phrase C, phrase D, and phrase E, respectively corresponding to the following weights: 0.1, 0.2, 0.3, 0.4, 0.5, assuming that only 3 phrases need to be taken, then choose: phrase C, phrase D, phrase E.
And S130, obtaining the label named entity in the task key language segment based on a preset named entity recognition model and the task key language segment.
Illustratively, a named entity can be any entity, including but not limited to: a role, a site, an organization, etc., are not limited herein to practice. The task key word segment may also be a text segment of any length, which is not limited herein.
Exemplarily, after the task key word segment is obtained, the task key word segment is input into a preset part-of-speech tagging model, a part-of-speech tagging result is determined, a target word conforming to the target part-of-speech is reserved based on the part-of-speech tagging result and a preset target part-of-speech, and the target word is input into a preset named entity recognition model, so that a tag named entity in the task key word segment is obtained.
In an optional embodiment, before part-of-speech tagging, an entity in a task key word segment is tagged, where the method for tagging the entity includes, but is not limited to, BIEO tagging, and when the BIEO tagging is adopted, it is assumed that the task key word segment is: the system is responsible for assisting municipal emergency administration in handling water works emergencies occurring during typhoon influence and providing emergency technical support for municipal defense. The system is responsible for hydrological observation, early warning and forecasting, scheduling operation and emergency repair of water works, clearing and dredging river channels and draining accumulated water. ", the labeling results are as shown in table 2 below:
TABLE 2
Further, according to the mission critical segment, an entity dictionary is established, wherein the entity dictionary is used for indicating the magnetism corresponding to each entity, and the entity is supposed to exist: "monitoring and early warning", "draining accumulated water", "light traffic police team", "city ecological environment bureau", "teaching place", "tourist attraction", the corresponding "entity dictionary" is as shown in table 3 below:
TABLE 3
In an optional embodiment, after the 'entity dictionary' is established, the entity dictionary is used as a self-defining dictionary for word segmentation, word segmentation part-of-speech tagging is carried out on the responsibility task text, and data analysis is carried out on the word segmentation part-of-speech tagging result. Data expansion is carried out on the entity of the Duty part of speech, the entity of the LOC part of speech and the entity of the ORG part of speech in the entity dictionary by using the part of speech combination of 'action noun + noun', 'noun + action noun', and the like, and a task key field is supposed to exist: the system is responsible for overall planning and guidance of major dangerous situation and disaster situation propaganda and reporting and for overall planning and guidance of emergency rescue and disaster relief public opinion guide and handling work. "after processing, the part of speech tagging results are shown in the following table 4:
TABLE 4
Further, after the labeling result is obtained, filtering the modified contents such as adjectives, adverbs, time adverbs, modifiers and the like of the task key segment part-of-speech labeling result, retaining core words such as verbs, common nouns, responsibility label entity words and the like related to the task, and extracting function label phrases from the task key segment part-of-speech labeling core words. The phrase extraction can adopt any phrase extraction mode including but not limited to an NLTK regular expression blocker and the like.
And after phrase extraction is finished, the phrases are sorted to obtain the tag named entities corresponding to the task key segments. Assuming that there is a text segment: the system is responsible for rescuing the persons in distress, transferring and evacuating the trapped masses, dealing with secondary disasters caused by typhoons and assisting related departments in carrying out related work in post-disaster reconstruction. "and" organize the assault rescue team, schedule the technical strength of hygiene, rescue the sick and wounded; and (4) well doing sanitation and epidemic prevention work in the disaster area and preventing the spread of epidemic situation and epidemic disease in the disaster area. "through the above extraction, the tag named entity can be obtained as shown in the following table 5:
TABLE 5
S140, based on the tag named entity, determining the result of the judgment of the connectivity between the key language segments of each task.
Exemplarily, after obtaining the tag named entities, determining whether connections exist between the tag named entities based on semantic similarity, taking the number of the connections of the tag named entities corresponding to each task key phrase, obtaining the number of elements of the tag named entities corresponding to each task key phrase, and determining the result of determining the connectivity between each task key phrase based on the number of elements and the number of the connections.
In an optional embodiment, the semantic similarity may be determined according to any semantic discrimination model, which is not limited herein, and after determining the semantic similarity between the tag named entities, it is determined whether the semantic similarity between the tag named entities is greater than a preset first threshold, and only when the semantic similarity is greater than the preset first threshold, it is determined that a connection exists between the tag named entities.
After confirming the relationship between the tagged named entities, the connectivity between the task key segments can be calculated according to the following formula:
wherein,expressing the degree of linkage between the task key language segment A and the task key language segment B;the number of the tag named entities which are connected with the task key language segment A and the task key language segment B is represented;representing the number of all label named entities in the task key language segment A;the number of the tag named entities which represent the connection between the task key language segment B and the task key language segment A;and the number of all the tag named entities in the task key language segment B is represented. Wherein,the larger the value of (A) is, the better the connection between the task key language segment A and the task key language segment B is.
As shown in fig. 2, if there are 5 tagged named entities in the task key phrase a, and there are 6 tagged named entities in the task key phrase B, the number of the tagged named entities connected between the task key phrase a and the task key phrase B is 2, and the number of the tagged named entities connected between the task key phrase B and the task key phrase a is 2, the degree of engagement between the task key phrase a and the task key phrase B is:
the invention further obtains the tag named entities in the task key language segments after the task key language segments are locked, and calculates the connectivity between the task key language segments by using the tag named entities, thereby determining that the connectivity relation of each language segment time in a text can be fully judged, and whether the following scheme in the text can solve the problem in the previous text or not, and improving the working efficiency.
The embodiment of the present invention further discloses a device for determining text connectivity, as shown in fig. 3, including:
an obtaining module 301, configured to obtain a target text;
for details, refer to the related description of step S110 in any of the above embodiments, and are not repeated herein.
The analysis module 302 is configured to analyze the target text to obtain a task key phrase of the target text;
for details, refer to the related description of step S120 in any of the above embodiments, and are not repeated herein.
The first processing module 303 is configured to obtain a tag named entity in the task key language fragment based on a preset named entity recognition model and the task key language fragment;
for details, refer to the related description of step S130 in any of the above embodiments, and are not repeated herein.
The second processing module 304 is configured to determine a result of determining the connectivity between the task key language segments based on the tag named entity.
For details, refer to the related description of step S140 in any of the above embodiments, and are not repeated herein.
The invention further obtains the tag named entities in the task key language segments after the task key language segments are locked, and calculates the connectivity between the task key language segments by using the tag named entities, thereby determining that the connectivity relation of each language segment time in a text can be fully judged, and whether the following scheme in the text can solve the problem in the previous text or not, and improving the working efficiency.
As an optional embodiment of the present application, the parsing module 302 is configured to: inputting the target text into a preset initial analysis model, and determining an initial analysis result; determining at least two process language segments based on a preset knowledge base and an initial analysis result; extracting key phrases from each process speech segment by using a preset key phrase extraction model, and determining a key phrase extraction result; and obtaining a task key word segment of the target text according to the key phrase extraction result.
As an optional embodiment of the present application, the parsing module 302 is configured to: performing word segmentation processing on the process language segments based on a preset word segmentation model to obtain word segmentation results; determining the weight corresponding to each word segmentation result based on the word segmentation result and a preset weight rule; and determining a key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
As an optional implementation manner of the present application, the first processing module 303 is configured to: inputting the task key language segment into a preset part-of-speech tagging model, and determining a part-of-speech tagging result; based on the part-of-speech tagging result and a preset target part-of-speech, reserving a target vocabulary conforming to the target part-of-speech; and inputting the target vocabulary into a preset named entity recognition model to obtain the tag named entity in the task key language segment.
As an optional implementation manner of the present application, the second processing module 304 is configured to: inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity among the tag named entities; determining whether a connection exists between the tag named entities based on the semantic similarity; acquiring the connection number of the label named entities corresponding to each task key language segment; acquiring the element number of a tag named entity corresponding to each task key language segment; and determining the connectivity judgment result between the task key language segments based on the number of the elements and the number of the connections.
As an optional implementation manner of the present application, the second processing module 304 is configured to: when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities; otherwise, no connection between the tag named entities is assumed.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 4, the electronic device may include: at least one processor 41, such as a CPU (Central Processing Unit), at least one communication interface 43, memory 44, and at least one communication bus 42. Wherein a communication bus 42 is used to enable the connection communication between these components. The communication interface 43 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 43 may also include a standard wired interface and a standard wireless interface. The Memory 44 may be a high-speed RAM (Random Access Memory) or a non-volatile Memory, such as at least one disk Memory. The memory 44 may alternatively be at least one memory device located remotely from the aforementioned processor 41. Wherein the processor 41 may be in connection with the apparatus described in fig. 4, an application program is stored in the memory 44, and the processor 41 calls the program code stored in the memory 44 for performing any of the above-mentioned method steps.
The communication bus 42 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 42 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The memory 44 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: solid-state drive, abbreviated: SSD); the memory 44 may also comprise a combination of the above kinds of memories.
The processor 41 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.
The processor 41 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 44 is also used to store program instructions. Processor 41 may invoke program instructions to implement a text engagement determination method as shown in any of the embodiments of the present application.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the text connectivity judgment method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.
Claims (10)
1. A text engagement judgment method is characterized by comprising the following steps:
acquiring a target text;
analyzing the target text to obtain a task key word segment of the target text;
obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment;
and determining the result of the judgment of the connectivity among the task key language segments based on the tag named entity.
2. The method according to claim 1, wherein the parsing the target text to obtain a task key word segment of the target text comprises:
inputting the target text into a preset initial analysis model, and determining an initial analysis result;
determining at least two process language segments based on a preset knowledge base and the initial analysis result;
extracting key phrases from the process language segments by using a preset key phrase extraction model to determine a key phrase extraction result;
and obtaining a task key word segment of the target text according to the key phrase extraction result.
3. The method according to claim 2, wherein the extracting key phrases from the respective process speech segments by using a preset key phrase extraction model to determine a key phrase extraction result comprises:
performing word segmentation processing on the process language fragment based on a preset word segmentation model to obtain a word segmentation result;
determining the weight corresponding to each word segmentation result based on the word segmentation results and a preset weight rule;
and determining a key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
4. The method according to claim 1, wherein obtaining the tagged named entity in the task key corpus based on a preset named entity recognition model and the task key corpus comprises:
inputting the task key language segment into a preset part-of-speech tagging model, and determining a part-of-speech tagging result;
based on the part-of-speech tagging result and a preset target part-of-speech, reserving a target vocabulary according with the target part-of-speech;
and inputting the target vocabulary into a preset named entity recognition model to obtain the tag named entity in the task key language segment.
5. The method according to claim 1, wherein the determining a result of determining the connectivity between the task key words based on the tag named entity comprises:
inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity between each tag named entity;
determining whether a connection exists between the tag named entities based on the semantic similarity;
acquiring the connection number of the label named entities corresponding to each task key language segment;
acquiring the number of elements of the tag named entity corresponding to each task key language segment;
and determining a result of the judgment of the connectivity between the task key language segments based on the number of the elements and the number of the connections.
6. The method of claim 5, wherein the determining whether a connection exists between the named entities of the tag based on the semantic similarity comprises:
when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities;
otherwise, no connection exists between the tag named entities.
7. A text engageability determination device comprising:
the acquisition module is used for acquiring a target text;
the analysis module is used for analyzing the target text to obtain a task key word segment of the target text;
the first processing module is used for obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment;
and the second processing module is used for determining the result of the connectivity judgment between the task key language segments based on the tag named entity.
8. The apparatus of claim 7, wherein the parsing module is further configured to:
an initial analysis model preset for the target text input value is used for determining an initial analysis result;
determining at least two process language segments based on a preset knowledge base and the initial analysis result;
extracting key phrases from the process language segments by using a preset key phrase extraction model to determine a key phrase extraction result;
and obtaining a task key word segment of the target text according to the key phrase extraction result.
9. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method of any one of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1-6.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210919249.7A CN114970491B (en) | 2022-08-02 | 2022-08-02 | Text connectivity judgment method and device, electronic equipment and storage medium |
PCT/CN2022/135015 WO2023098658A1 (en) | 2022-08-02 | 2022-11-29 | Text cohesion determination method and apparatus, and electronic device and storage medium |
ZA2023/01703A ZA202301703B (en) | 2022-08-02 | 2023-02-10 | Text cohesion judgment methods, devices, electronic equipment and storage media |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210919249.7A CN114970491B (en) | 2022-08-02 | 2022-08-02 | Text connectivity judgment method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114970491A true CN114970491A (en) | 2022-08-30 |
CN114970491B CN114970491B (en) | 2022-10-04 |
Family
ID=82969946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210919249.7A Active CN114970491B (en) | 2022-08-02 | 2022-08-02 | Text connectivity judgment method and device, electronic equipment and storage medium |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN114970491B (en) |
WO (1) | WO2023098658A1 (en) |
ZA (1) | ZA202301703B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023098658A1 (en) * | 2022-08-02 | 2023-06-08 | 深圳市城市公共安全技术研究院有限公司 | Text cohesion determination method and apparatus, and electronic device and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116678162B (en) * | 2023-08-02 | 2023-09-26 | 八爪鱼人工智能科技(常熟)有限公司 | Cold storage operation information management method, system and storage medium based on artificial intelligence |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294663A (en) * | 2013-05-03 | 2013-09-11 | 苏州大学 | Text coherence detection method and device |
CN110287497A (en) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | A kind of coherent analysis method of the semantic structure of English text |
CN110442872A (en) * | 2019-08-06 | 2019-11-12 | 中科鼎富(北京)科技发展有限公司 | A kind of text elements integrality checking method and device |
CN111428470A (en) * | 2020-03-23 | 2020-07-17 | 北京世纪好未来教育科技有限公司 | Text continuity judgment method, text continuity judgment model training method, electronic device and readable medium |
CN112597309A (en) * | 2020-12-25 | 2021-04-02 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Detection system for identifying microblog data stream of sudden event in real time |
CN113297367A (en) * | 2021-06-29 | 2021-08-24 | 中国平安人寿保险股份有限公司 | Method for generating user conversation linking language and related equipment |
CN113553830A (en) * | 2021-08-11 | 2021-10-26 | 桂林电子科技大学 | Graph-based English text sentence language piece coherent analysis method |
CN113743125A (en) * | 2021-09-07 | 2021-12-03 | 广州晓阳智能科技有限公司 | Text continuity analysis method and device |
CN113869033A (en) * | 2021-09-24 | 2021-12-31 | 厦门大学 | Graph neural network sentence sequencing method integrated with iterative sentence pair relation prediction |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10120844B2 (en) * | 2014-10-23 | 2018-11-06 | International Business Machines Corporation | Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process |
CN110147421B (en) * | 2019-05-10 | 2022-06-21 | 腾讯科技(深圳)有限公司 | Target entity linking method, device, equipment and storage medium |
WO2021183159A1 (en) * | 2020-03-13 | 2021-09-16 | Google Llc | Re-ranking results from semantic natural language processing machine learning algorithms for implementation in video games |
CN111931509A (en) * | 2020-08-28 | 2020-11-13 | 北京百度网讯科技有限公司 | Entity chain finger method, device, electronic equipment and storage medium |
CN112380866A (en) * | 2020-11-25 | 2021-02-19 | 厦门市美亚柏科信息股份有限公司 | Text topic label generation method, terminal device and storage medium |
CN114970491B (en) * | 2022-08-02 | 2022-10-04 | 深圳市城市公共安全技术研究院有限公司 | Text connectivity judgment method and device, electronic equipment and storage medium |
-
2022
- 2022-08-02 CN CN202210919249.7A patent/CN114970491B/en active Active
- 2022-11-29 WO PCT/CN2022/135015 patent/WO2023098658A1/en unknown
-
2023
- 2023-02-10 ZA ZA2023/01703A patent/ZA202301703B/en unknown
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294663A (en) * | 2013-05-03 | 2013-09-11 | 苏州大学 | Text coherence detection method and device |
CN110287497A (en) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | A kind of coherent analysis method of the semantic structure of English text |
CN110442872A (en) * | 2019-08-06 | 2019-11-12 | 中科鼎富(北京)科技发展有限公司 | A kind of text elements integrality checking method and device |
CN111428470A (en) * | 2020-03-23 | 2020-07-17 | 北京世纪好未来教育科技有限公司 | Text continuity judgment method, text continuity judgment model training method, electronic device and readable medium |
CN112597309A (en) * | 2020-12-25 | 2021-04-02 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Detection system for identifying microblog data stream of sudden event in real time |
CN113297367A (en) * | 2021-06-29 | 2021-08-24 | 中国平安人寿保险股份有限公司 | Method for generating user conversation linking language and related equipment |
CN113553830A (en) * | 2021-08-11 | 2021-10-26 | 桂林电子科技大学 | Graph-based English text sentence language piece coherent analysis method |
CN113743125A (en) * | 2021-09-07 | 2021-12-03 | 广州晓阳智能科技有限公司 | Text continuity analysis method and device |
CN113869033A (en) * | 2021-09-24 | 2021-12-31 | 厦门大学 | Graph neural network sentence sequencing method integrated with iterative sentence pair relation prediction |
Non-Patent Citations (2)
Title |
---|
MIRELLA LAPATA: "Automatic Evaluation of Text Coherence: Models and Representations", 《PROCEEDINGS OF IJCAI》 * |
杨秋红: "面向新闻话题的社交媒体文本上下文衔接研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023098658A1 (en) * | 2022-08-02 | 2023-06-08 | 深圳市城市公共安全技术研究院有限公司 | Text cohesion determination method and apparatus, and electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114970491B (en) | 2022-10-04 |
WO2023098658A1 (en) | 2023-06-08 |
ZA202301703B (en) | 2023-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114970491B (en) | Text connectivity judgment method and device, electronic equipment and storage medium | |
CN106919793B (en) | Data standardization processing method and device for medical big data | |
US10818397B2 (en) | Clinical content analytics engine | |
RU2760471C1 (en) | Methods and systems for identifying fields in a document | |
CN112527961B (en) | Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit | |
CN110741376A (en) | Automatic document analysis for different natural languages | |
CN103218444A (en) | Method of Tibetan language webpage text classification based on semanteme | |
KR20220064016A (en) | Method for extracting construction safety accident based data mining using big data | |
US11048711B1 (en) | System and method for automated classification of structured property description extracted from data source using numeric representation and keyword search | |
WO2023125589A1 (en) | Emergency monitoring method and apparatus | |
CN112287664B (en) | Text index data analysis method and system, corresponding equipment and storage medium | |
Koncar et al. | A natural-language-translation neural network | |
CN112541066A (en) | Text-structured-based medical and technical report detection method and related equipment | |
Perevalov et al. | Augmentation-based Answer Type Classification of the SMART dataset. | |
KR102276761B1 (en) | How to automatically extract information on the cause of disaster | |
Bokinsky et al. | Application of natural language processing techniques to marine V-22 maintenance data for populating a CBM-oriented database | |
CN117291192B (en) | Government affair text semantic understanding analysis method and system | |
JP7434125B2 (en) | Document search device, document search method, and program | |
Zhang et al. | A machine learning-based approach for building code requirement hierarchy extraction | |
Marciniak et al. | Nested term recognition driven by word connection strength | |
CN107577760B (en) | text classification method and device based on constraint specification | |
US11748573B2 (en) | System and method to quantify subject-specific sentiment | |
Bozkurt et al. | Automated detection of ambiguity in BI-RADS assessment categories in mammography reports | |
Underwood et al. | Computational curation of a digitized record series of WWII Japanese-American Internment | |
NL1020670C2 (en) | Determining a semantic image. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |