CN114970491B - Text connectivity judgment method and device, electronic equipment and storage medium - Google Patents

Text connectivity judgment method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114970491B
CN114970491B CN202210919249.7A CN202210919249A CN114970491B CN 114970491 B CN114970491 B CN 114970491B CN 202210919249 A CN202210919249 A CN 202210919249A CN 114970491 B CN114970491 B CN 114970491B
Authority
CN
China
Prior art keywords
task key
determining
preset
language
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210919249.7A
Other languages
Chinese (zh)
Other versions
CN114970491A (en
Inventor
徐大用
习树峰
蒋会春
沈赣苏
张少标
房龄航
秦宇
张�杰
凌君
张波
焦圆圆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Technology Institute of Urban Public Safety Co Ltd
Original Assignee
Shenzhen Technology Institute of Urban Public Safety Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Technology Institute of Urban Public Safety Co Ltd filed Critical Shenzhen Technology Institute of Urban Public Safety Co Ltd
Priority to CN202210919249.7A priority Critical patent/CN114970491B/en
Publication of CN114970491A publication Critical patent/CN114970491A/en
Application granted granted Critical
Publication of CN114970491B publication Critical patent/CN114970491B/en
Priority to PCT/CN2022/135015 priority patent/WO2023098658A1/en
Priority to ZA2023/01703A priority patent/ZA202301703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention relates to the technical field of computers, in particular to a text connectivity judgment method and device, electronic equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a target text, analyzing the target text to obtain task key language segments of the target text, obtaining tag named entities in the task key language segments based on a preset named entity recognition model and the task key language segments, and determining a connectivity judgment result between the task key language segments based on the tag named entities. By further obtaining the tag named entities in the task key language segments after the task key language segments are locked and utilizing the tag named entities to calculate the connectivity between the task key language segments, the method determines that the connectivity relation of each language segment time in a text can be fully judged, and whether the following pre-arranged plan in the text can solve the problem in the preceding text or not, thereby improving the working efficiency.

Description

Text connectivity judgment method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a text connectivity judgment method and device, electronic equipment and a storage medium.
Background
With the development of artificial intelligence, the text content can be gradually understood by the artificial intelligence. In the prior art, artificial intelligence can be used for identifying the similarity, consistency and the like of texts.
However, in the prior art, the artificial intelligence can only distinguish whether the text is saying the same problem, especially in the field of emergency plans, the requirement for the artificial intelligence is not limited to identifying whether the text is saying the same problem, and more importantly, whether the future plan in the text can solve the previous problem needs to be judged, which relates to the judgment of text connectivity, and the judgment of text continuity and practicability.
Therefore, a text connectivity determination method is needed to solve the above problems.
Disclosure of Invention
In view of this, to solve the above technical problems in the prior art, embodiments of the present invention provide a text connectivity determining method, apparatus, electronic device and storage medium.
In a first aspect, an embodiment of the present invention provides a text connectivity determining method, where the method includes: acquiring a target text; analyzing the target text to obtain a task key word segment of the target text; obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment; and determining the result of the judgment of the connectivity between the key language segments of each task based on the tag named entity.
Optionally, analyzing the target text to obtain a task key word segment of the target text, including: an initial analysis model preset with a target text input value is used for determining an initial analysis result; determining at least two process language segments based on a preset knowledge base and an initial analysis result; extracting key phrases from each process speech segment by using a preset key phrase extraction model, and determining a key phrase extraction result; and obtaining a task key phrase segment of the target text according to the key phrase extraction result.
Optionally, performing key phrase extraction on each process corpus by using a preset key phrase extraction model, and determining a key phrase extraction result, including: performing word segmentation processing on the process language segments based on a preset word segmentation model to obtain word segmentation results; determining the weight corresponding to each word segmentation result based on the word segmentation results and a preset weight rule; and determining a key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
Optionally, obtaining the tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment, including: inputting the task key language segment into a preset part-of-speech tagging model, and determining a part-of-speech tagging result; based on the part-of-speech tagging result and a preset target part-of-speech, reserving a target vocabulary conforming to the target part-of-speech; and inputting the target vocabulary into a preset named entity recognition model to obtain the tag named entity in the task key phrase.
Optionally, determining a result of determining the connectivity between the task key words based on the tag named entity includes: inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity among the tag named entities; determining whether a connection exists between the tag named entities based on the semantic similarity; acquiring the connection number of the label named entities corresponding to each task key language segment; acquiring the element number of a tag named entity corresponding to each task key language segment; and determining the connectivity judgment result between the task key language segments based on the number of the elements and the number of the connections.
Optionally, determining whether a connection exists between the tag-named entities based on the semantic similarity includes: when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities; otherwise, no connection between the tag named entities is assumed.
In a second aspect, an embodiment of the present invention provides a device for determining text engagement, including: the acquisition module is used for acquiring a target text; the analysis module is used for analyzing the target text to obtain a task key word segment of the target text; the first processing module is used for obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment; and the second processing module is used for determining the result of the judgment of the connectivity among the key language segments of each task based on the tag named entity.
In a third aspect, the present application provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method as described in the first aspect or any of the possible embodiments of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as described in the first aspect or any of the possible embodiments of the first aspect.
The invention provides a text connectivity judgment method and device, electronic equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a target text, analyzing the target text to obtain task key language segments of the target text, obtaining tag named entities in the task key language segments based on a preset named entity recognition model and the task key language segments, and determining a connectivity judgment result between the task key language segments based on the tag named entities. By further obtaining the tag named entities in the task key language segments after the task key language segments are locked and utilizing the tag named entities to calculate the connectivity between the task key language segments, the method determines that the connectivity relation of each language segment time in a text can be fully judged, and whether the following pre-arranged plan in the text can solve the problem in the preceding text or not, thereby improving the working efficiency.
Drawings
Fig. 1 is a schematic flow chart of a text connectivity determination method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a text connectivity determination method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a text connectivity determining apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device for determining text engagement according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the purpose of facilitating understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.
Fig. 1 is a schematic flow chart of a text engagement determination method provided in an embodiment of the present invention, and an implementation process of the method steps may specifically refer to fig. 1, where the method includes:
and S110, acquiring a target text.
Illustratively, the target text may be any type of text, including but not limited to: the emergency plan, the emergency and disaster relief duty and the like are not limited herein, and the data format of the text is not limited, including but not limited to files in doc, docx and other formats.
In an optional embodiment, after obtaining files of other format types, the target text type may also be converted into files of the docx format by a file format conversion tool, and in a subsequent processing process, the files of the docx format are processed uniformly.
And S120, analyzing the target text to obtain a task key word segment of the target text.
Exemplarily, after a target text is obtained, inputting the target text into a preset initial analysis model, determining an initial analysis result, determining at least two process language segments based on a preset knowledge base and the initial analysis result, extracting key phrases from the process language segments by using a preset key phrase extraction model, and determining a key phrase extraction result; and obtaining a task key word segment of the target text according to the key phrase extraction result.
In an optional embodiment, the preset initial analysis model is used for sequentially reading each entity attribute of the stored data from top to bottom, wherein the entity attribute comprises a title index, a title content, a title level, a superior title index and a text, obtaining a flat entity set after reading is finished, and dividing a target text into layers. For example, the target text includes two items of "organization and responsibility", "monitoring and early warning forecast", and there are sub-items of "emergency organization and responsibility" below the "organization and responsibility" item, and "monitoring of geological disaster" below the "monitoring and early warning forecast" item, obviously, the "organization and responsibility" and "monitoring and early warning forecast" are in the same level, the "emergency organization and responsibility" and "monitoring of geological disaster" are in the same level, and are lower than the level of the "organization and responsibility" and "monitoring and early warning forecast", and according to the above division mode, the target text is divided into several levels through the preset initial analysis model, and the several levels are the initial analysis results.
Further, after the hierarchy division, at least two process speech segments are determined based on a preset knowledge base and an initial analysis result.
In an optional embodiment, the preset knowledge bases comprise a chapter knowledge base and an organization knowledge base, wherein the chapter knowledge base is required to be established to facilitate finding corresponding content due to different organization descriptions corresponding to different special projects. For different texts, the "chapter knowledge base" needs to correspond to different contents, for example, when the text related to the emergency plan is processed, the "chapter knowledge base" can be shown in the following table 1:
TABLE 1
Figure 469998DEST_PATH_IMAGE001
In an optional embodiment, it is assumed that a target text exists, after the process language segment is located, the "emergency response" text where the "organization and responsibility" and each response level are located is first located, that is, all texts where the "organization and responsibility" and each response level are contained in the segment are searched, further, after the first location is completed, the text of the "member unit" in the "organization and responsibility" and the text of each response level "in the" emergency response "are further located according to a location result, and finally, the text corresponding to the" member unit "in each response level" is located.
Further, since many mechanism units in the actual text are shorthand or variant, an "organizational structure knowledge base" needs to be established to ensure that the member units can be accurately located when the shorthand or variant of the mechanism units occurs.
In practical application, in order to conveniently locate the process language fragment, a shorthand or a variant of a mechanism unit can be replaced, and a flash text algorithm can be generally adopted for implementation. However, it should be noted that, in practical applications, the method for replacing or finding the shorthand or variant of the institution unit is not limited to the FlashText algorithm, and this example is only for explanation, and is not limited herein, subject to practical application.
Further, after the process language segment is determined, the key phrases of the process language segment need to be extracted, and the sentence where the key phrase is located is used as the task key language segment.
Illustratively, the method for extracting the key phrases may be to perform word segmentation processing on process language segments based on a preset word segmentation model to obtain word segmentation results, determine weights corresponding to the word segmentation results based on the word segmentation results and preset weight rules, and determine key phrase extraction results based on the weights corresponding to the word segmentation results and preset selection rules.
In an optional embodiment, the extraction of the key phrases is to firstly clean the text of the target text and remove impurity data such as abnormal characters, redundant characters, special characters, various brackets and the like. And then, segmenting the text, using a segmentation model to perform segmentation and part-of-speech tagging, and loading an emergency plan field specific dictionary library to prevent field nouns from being separated. For example, emergency domain proper terms such as "zone defense," "rescue authorities," "lead units" are not separable in the emergency plan text. And then calculating word frequency, carrying out word frequency statistics on the words after word segmentation, and calculating the weight of each word. The weights of the words may be assigned according to preset data, or may be calculated by using a weight calculation model, which is not limited herein, subject to practical application. Finally, selecting proper phrases according to a preset selection rule, and calculating the weight occupied by each phrase according to a preset calculation rule.
In practical applications, the phrase selection rule may be set with reference to the following manner: rule 1: one phrase cannot exceed 25 char; rule 2: one phrase cannot exceed 12 tokens; rule 3: more than one particle cannot occur in a phrase; rule 4: the front and the back of the phrase can not be the dummy word or stop word, and the tail of the phrase can not be the verb; rule 5: the number of stop words in the candidate phrases cannot exceed the specified number (1); rule 6: the first word of the candidate phrase must be a verb (v), an adverb (d) and a preposition (p); rule 7: a candidate phrase must not be a noun. The above rules are all made according to the plan text data. The weight calculation rule can be designed as follows: the weight calculation formula of the candidate phrase is as follows: the product of phrase weight, phrase length weight and part of speech weight, wherein the phrase weight is the sum of the weights of all words in the phrase. For example, the phrase [ ('tissue', 'v', 0.6762), ('done', 'v', 0.8136), ('medical', 'n', 4.4245), ('rescue', 'v', 1.5946) ], the phrase weight is:
0.6762+0.8136+4.4245+1.5946=7.5089
it is generally assumed that the shorter field should have more weight, and thus the phrase length weight is a numerical value obtained through multiple verifications, and the final weight value is {1: 1, 2: 5.6, 3:1.1, 4. The part-of-speech weight represents the part-of-speech translation weight for the first word to the last word of the phrase, such as: { "v | n": 0.6575342465753424, "n": 0.9154147615937296}. And finally, sorting the weights of the phrases according to the sizes, and selecting the phrases according to a preset rule. For example, assume that there are 5 phrases, respectively phrase a, phrase B, phrase C, phrase D, and phrase E, respectively corresponding to the following weights: 0.1,0.2,0.3,0.4,0.5, assuming that only 3 phrases need to be taken, then choose: phrase C, phrase D, phrase E.
And S130, obtaining the label named entity in the task key language segment based on a preset named entity recognition model and the task key language segment.
Illustratively, a named entity can be any entity, including but not limited to: a role, a site, an organization, etc., are not limited herein to practice. The task key word segment may also be a segment of any length, which is not limited herein.
Exemplarily, after the task key word segment is obtained, the task key word segment is input into a preset part-of-speech tagging model, a part-of-speech tagging result is determined, a target word conforming to the target part-of-speech is reserved based on the part-of-speech tagging result and a preset target part-of-speech, and the target word is input into a preset named entity recognition model, so that a tag named entity in the task key word segment is obtained.
In an optional embodiment, before part-of-speech tagging, an entity in a task key word segment is tagged, where the method for tagging the entity includes, but is not limited to, BIEO tagging, and when the BIEO tagging is adopted, it is assumed that the task key word segment is: the system is responsible for assisting municipal emergency administration in handling water works emergencies occurring during typhoon influence and providing emergency technical support for municipal defense. The system is responsible for hydrological observation, early warning and forecasting, scheduling operation and emergency repair of water works, clearing and dredging river channels and draining accumulated water. ", the labeling results are as shown in table 2 below:
TABLE 2
Figure 793663DEST_PATH_IMAGE003
Further, according to the mission critical segment, an entity dictionary is established, wherein the entity dictionary is used for indicating the magnetism corresponding to each entity, and the entity is supposed to exist: "monitoring and early warning", "draining accumulated water", "light traffic police team", "city ecological environment bureau", "teaching place", "tourist attraction", the corresponding "entity dictionary" is as shown in table 3 below:
TABLE 3
Figure 838979DEST_PATH_IMAGE004
In an optional embodiment, after the 'entity dictionary' is established, the entity dictionary is used as a self-defining dictionary for word segmentation, word segmentation part-of-speech tagging is carried out on the responsibility task text, and data analysis is carried out on the word segmentation part-of-speech tagging result. Data expansion is carried out on the entity of the Duty part of speech, the entity of the LOC part of speech and the entity of the ORG part of speech in the entity dictionary by using the part of speech combination of 'action noun + noun', 'noun + action noun', and the like, and a task key field is supposed to exist: the system is responsible for overall planning and guiding major dangerous situation and disaster propaganda and reporting, and is responsible for overall planning and guiding rescue and relief public opinion guiding and dealing work. "after processing, the part of speech tagging results are shown in the following table 4:
TABLE 4
Figure 640713DEST_PATH_IMAGE005
Further, after the labeling result is obtained, filtering the modification contents such as adjectives, adverbs, time adverbs, modifiers and the like of the result of the part-of-speech labeling of the task key field, retaining core words such as verbs, common nouns and responsibility label entity words related to the task, and extracting function label phrases of the part-of-speech labeling core words of the task key field. The phrase extraction can adopt any phrase extraction mode including but not limited to an NLTK regular expression blocker and the like.
And after phrase extraction is finished, the phrases are sorted to obtain the tag named entities corresponding to the task key segments. Assuming that there is a text segment: the system is responsible for rescuing the persons in distress, transferring and evacuating the trapped masses, dealing with secondary disasters caused by typhoons and assisting related departments in carrying out related work in post-disaster reconstruction. "and" organize the assault rescue team, schedule the technical strength of hygiene, rescue the sick and wounded; and (4) well doing sanitation and epidemic prevention work in the disaster area and preventing the spread of epidemic situation and epidemic disease in the disaster area. "through the extraction in the above manner, the tag named entity can be finally obtained as shown in the following table 5:
TABLE 5
Figure 986244DEST_PATH_IMAGE007
S140, based on the tag named entity, determining the result of the judgment of the connectivity between the key language segments of each task.
Exemplarily, after obtaining the tag named entities, determining whether connections exist among the tag named entities based on semantic similarity, taking the number of the connections of the tag named entities corresponding to each task key corpus, obtaining the number of elements of the tag named entities corresponding to each task key corpus, and determining an engagement judgment result among the task key corpora based on the number of elements and the number of the connections.
In an optional embodiment, the semantic similarity may be determined according to any semantic discrimination model, which is not limited herein, and after determining the semantic similarity between the tag named entities, it is determined whether the semantic similarity between the tag named entities is greater than a preset first threshold, and only when the semantic similarity is greater than the preset first threshold, it is determined that a connection exists between the tag named entities.
After confirming the relationship between the tagged named entities, the connectivity between the task key segments can be calculated according to the following formula:
Figure 215231DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 13423DEST_PATH_IMAGE009
representing the degree of linkage between the task key language segment A and the task key language segment B;
Figure 618847DEST_PATH_IMAGE010
the number of the label named entities which represent the connection of the task key language section A and the task key language section B is represented;
Figure 818885DEST_PATH_IMAGE011
representing the number of all label named entities in the task key language segment A;
Figure 749931DEST_PATH_IMAGE012
the number of the tag named entities which represent the connection between the task key language segment B and the task key language segment A;
Figure DEST_PATH_IMAGE013
and the number of all the tag named entities in the task key language segment B is represented. Wherein the content of the first and second substances,
Figure 379627DEST_PATH_IMAGE009
the larger the value of (A) is, the better the connection between the task key language segment A and the task key language segment B is.
As shown in fig. 2, if there are 5 tagged named entities in the task key phrase a, and there are 6 tagged named entities in the task key phrase B, the number of the tagged named entities connected between the task key phrase a and the task key phrase B is 2, and the number of the tagged named entities connected between the task key phrase B and the task key phrase a is 2, the degree of engagement between the task key phrase a and the task key phrase B is:
Figure 523164DEST_PATH_IMAGE014
the invention further obtains the tag named entities in the task key language segments after the task key language segments are locked, and calculates the connectivity between the task key language segments by using the tag named entities, thereby determining that the connectivity relation of each language segment time in a text can be fully judged, and whether the following scheme in the text can solve the problem in the previous text or not, and improving the working efficiency.
The embodiment of the present invention further discloses a text connectivity determining apparatus, as shown in fig. 3, including:
an obtaining module 301, configured to obtain a target text;
for details, refer to the related description of step S110 in any of the above embodiments, and are not repeated herein.
The analysis module 302 is configured to analyze the target text to obtain a task key phrase of the target text;
for details, refer to the related description of step S120 in any of the above embodiments, and are not repeated herein.
The first processing module 303 is configured to obtain a tag named entity in the task key language fragment based on a preset named entity recognition model and the task key language fragment;
for details, refer to the related description of step S130 in any of the above embodiments, and are not repeated herein.
The second processing module 304 is configured to determine a result of determining the connectivity between the task key language segments based on the tag named entity.
For details, refer to the related description of step S140 in any of the above embodiments, and are not repeated herein.
The invention further obtains the tag named entities in the task key language segments after the task key language segments are locked, and calculates the connectivity between the task key language segments by using the tag named entities, thereby determining that the connectivity relation of each language segment time in a text can be fully judged, and whether the following scheme in the text can solve the problem in the previous text or not, and improving the working efficiency.
As an optional embodiment of the present application, the parsing module 302 is configured to: inputting the target text into a preset initial analysis model, and determining an initial analysis result; determining at least two process language segments based on a preset knowledge base and an initial analysis result; extracting key phrases from each process speech segment by using a preset key phrase extraction model, and determining a key phrase extraction result; and obtaining a task key phrase segment of the target text according to the key phrase extraction result.
As an optional embodiment of the present application, the parsing module 302 is configured to: performing word segmentation processing on the process language fragment based on a preset word segmentation model to obtain a word segmentation result; determining the weight corresponding to each word segmentation result based on the word segmentation results and a preset weight rule; and determining a key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
As an optional implementation manner of the present application, the first processing module 303 is configured to: inputting the task key language segment into a preset part-of-speech tagging model, and determining a part-of-speech tagging result; based on the part-of-speech tagging result and a preset target part-of-speech, reserving a target vocabulary conforming to the target part-of-speech; and inputting the target vocabulary into a preset named entity recognition model to obtain the tag named entity in the task key language segment.
As an optional implementation manner of the present application, the second processing module 304 is configured to: inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity among the tag named entities; determining whether a connection exists between the tag named entities based on the semantic similarity; acquiring the connection number of the label named entities corresponding to each task key language segment; acquiring the element number of a tag named entity corresponding to each task key language segment; and determining the connectivity judgment result between the task key language segments based on the number of the elements and the number of the connections.
As an optional implementation manner of the present application, the second processing module 304 is configured to: when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities; otherwise, no connection between the tag named entities is determined.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 4, the electronic device may include: at least one processor 41, such as a CPU (Central Processing Unit), at least one communication interface 43, memory 44, and at least one communication bus 42. Wherein a communication bus 42 is used to enable the connection communication between these components. The communication interface 43 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 43 may further include a standard wired interface and a standard wireless interface. The Memory 44 may be a high-speed RAM (Random Access Memory) or a non-volatile Memory, such as at least one disk Memory. The memory 44 may alternatively be at least one memory device located remotely from the aforementioned processor 41. Wherein the processor 41 may be in connection with the apparatus described in fig. 4, an application program is stored in the memory 44, and the processor 41 calls the program code stored in the memory 44 for performing any of the above-mentioned method steps.
The communication bus 42 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 42 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The memory 44 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: solid-state drive, abbreviated: SSD); the memory 44 may also comprise a combination of the above-mentioned kinds of memories.
The processor 41 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.
The processor 41 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 44 is also used to store program instructions. The processor 41 may call program instructions to implement a text engagement determination method as shown in any of the embodiments of the present application.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the text connectivity judgment method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (8)

1. A text engagement judgment method is characterized by comprising the following steps:
acquiring a target text;
analyzing the target text to obtain a task key word segment of the target text;
obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment;
determining a result of judgment of the connectivity among the task key language segments based on the tag named entity;
determining a result of determining the connectivity between the task key language segments based on the tag named entity, including:
inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity between each tag named entity;
determining whether a connection exists between the tag named entities based on the semantic similarity;
acquiring the connection number of the label named entities corresponding to each task key language segment;
acquiring the number of elements of the tag named entity corresponding to each task key language segment;
determining a result of judgment of the connectivity among the task key language segments based on the number of the elements and the number of the connections;
wherein the determining whether a connection exists between the tag-named entities based on the semantic similarity comprises:
when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities;
otherwise, determining that no connection exists between the tag named entities;
the method comprises the following steps of calculating the connectivity among task key language segments according to the following formula:
Figure 377824DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 447411DEST_PATH_IMAGE004
representing the degree of linkage between the task key language segment A and the task key language segment B;
Figure 910754DEST_PATH_IMAGE006
the number of the label named entities which represent the connection of the task key language section A and the task key language section B is represented;
Figure 723989DEST_PATH_IMAGE008
representing the number of all label named entities in the task key language segment A;
Figure 690808DEST_PATH_IMAGE010
the number of the tag named entities which represent the connection between the task key language segment B and the task key language segment A;
Figure 196875DEST_PATH_IMAGE012
representing the number of all the tag named entities in the task key language section B; wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE014
the larger the value of (A) is, the better the connection between the task key language segment A and the task key language segment B is.
2. The method according to claim 1, wherein the parsing the target text to obtain a task key word segment of the target text comprises:
inputting the target text into a preset initial analysis model, and determining an initial analysis result;
determining at least two process language segments based on a preset knowledge base and the initial analysis result;
extracting key phrases from the process language fragments by using a preset key phrase extraction model, and determining a key phrase extraction result;
and obtaining a task key word segment of the target text according to the key phrase extraction result.
3. The method according to claim 2, wherein the extracting key phrases from the respective process speech segments by using a preset key phrase extraction model to determine a key phrase extraction result comprises:
performing word segmentation processing on the process language fragment based on a preset word segmentation model to obtain a word segmentation result;
determining the weight corresponding to each word segmentation result based on the word segmentation results and a preset weight rule;
and determining a key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
4. The method according to claim 1, wherein the obtaining of the tagged named entity in the task key corpus based on a preset named entity recognition model and the task key corpus comprises:
inputting the task key language segment into a preset part-of-speech tagging model, and determining a part-of-speech tagging result;
based on the part-of-speech tagging result and a preset target part-of-speech, reserving a target vocabulary conforming to the target part-of-speech;
and inputting the target vocabulary into a preset named entity recognition model to obtain the tag named entity in the task key language segment.
5. A text engageability determination device comprising:
the acquisition module is used for acquiring a target text;
the analysis module is used for analyzing the target text to obtain a task key word segment of the target text;
the first processing module is used for obtaining a tag named entity in the task key language segment based on a preset named entity recognition model and the task key language segment;
the second processing module is used for determining the result of the judgment of the connectivity among the task key language segments based on the tag named entity;
wherein the second processing module is specifically configured to:
inputting each tag named entity into a preset semantic evaluation model, and determining semantic similarity between each tag named entity;
determining whether a connection exists between the tag named entities based on the semantic similarity;
acquiring the connection number of the label named entities corresponding to each task key language segment;
acquiring the number of elements of the tag named entity corresponding to each task key language segment;
determining a result of judgment of the connectivity between the task key language segments based on the number of the elements and the number of the connections;
wherein the determining whether a connection exists between the tag-named entities based on the semantic similarity comprises:
when the semantic similarity is larger than a preset first threshold value, determining that connection exists between the tag named entities;
otherwise, determining that no connection exists between the tag named entities;
the method comprises the following steps of calculating the connectivity among task key language segments according to the following formula:
Figure DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE018
expressing the degree of linkage between the task key language segment A and the task key language segment B;
Figure DEST_PATH_IMAGE020
the number of the label named entities which represent the connection of the task key language section A and the task key language section B is represented;
Figure DEST_PATH_IMAGE022
representing the number of all label named entities in the task key language segment A;
Figure DEST_PATH_IMAGE024
the number of the label named entities which represent the connection of the task key language section B and the task key language section A is represented;
Figure DEST_PATH_IMAGE026
representing the number of all the tag named entities in the task key language section B; wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE028
the larger the value of (A) is, the better the connection between the task key language segment A and the task key language segment B is.
6. The apparatus of claim 5, wherein the parsing module is further configured to:
determining an initial analysis result by using an initial analysis model preset by the target text input value;
determining at least two process language segments based on a preset knowledge base and the initial analysis result;
extracting key phrases from the process language fragments by using a preset key phrase extraction model, and determining a key phrase extraction result;
and obtaining a task key phrase segment of the target text according to the key phrase extraction result.
7. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method of any one of claims 1-4.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1-4.
CN202210919249.7A 2022-08-02 2022-08-02 Text connectivity judgment method and device, electronic equipment and storage medium Active CN114970491B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202210919249.7A CN114970491B (en) 2022-08-02 2022-08-02 Text connectivity judgment method and device, electronic equipment and storage medium
PCT/CN2022/135015 WO2023098658A1 (en) 2022-08-02 2022-11-29 Text cohesion determination method and apparatus, and electronic device and storage medium
ZA2023/01703A ZA202301703B (en) 2022-08-02 2023-02-10 Text cohesion judgment methods, devices, electronic equipment and storage media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210919249.7A CN114970491B (en) 2022-08-02 2022-08-02 Text connectivity judgment method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114970491A CN114970491A (en) 2022-08-30
CN114970491B true CN114970491B (en) 2022-10-04

Family

ID=82969946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210919249.7A Active CN114970491B (en) 2022-08-02 2022-08-02 Text connectivity judgment method and device, electronic equipment and storage medium

Country Status (3)

Country Link
CN (1) CN114970491B (en)
WO (1) WO2023098658A1 (en)
ZA (1) ZA202301703B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970491B (en) * 2022-08-02 2022-10-04 深圳市城市公共安全技术研究院有限公司 Text connectivity judgment method and device, electronic equipment and storage medium
CN116678162B (en) * 2023-08-02 2023-09-26 八爪鱼人工智能科技(常熟)有限公司 Cold storage operation information management method, system and storage medium based on artificial intelligence

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294663A (en) * 2013-05-03 2013-09-11 苏州大学 Text coherence detection method and device
CN110287497A (en) * 2019-07-03 2019-09-27 桂林电子科技大学 A kind of coherent analysis method of the semantic structure of English text
CN110442872A (en) * 2019-08-06 2019-11-12 中科鼎富(北京)科技发展有限公司 A kind of text elements integrality checking method and device
CN111428470A (en) * 2020-03-23 2020-07-17 北京世纪好未来教育科技有限公司 Text continuity judgment method, text continuity judgment model training method, electronic device and readable medium
CN112597309A (en) * 2020-12-25 2021-04-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Detection system for identifying microblog data stream of sudden event in real time
CN113297367A (en) * 2021-06-29 2021-08-24 中国平安人寿保险股份有限公司 Method for generating user conversation linking language and related equipment
CN113553830A (en) * 2021-08-11 2021-10-26 桂林电子科技大学 Graph-based English text sentence language piece coherent analysis method
CN113743125A (en) * 2021-09-07 2021-12-03 广州晓阳智能科技有限公司 Text continuity analysis method and device
CN113869033A (en) * 2021-09-24 2021-12-31 厦门大学 Graph neural network sentence sequencing method integrated with iterative sentence pair relation prediction

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10120844B2 (en) * 2014-10-23 2018-11-06 International Business Machines Corporation Determining the likelihood that an input descriptor and associated text content match a target field using natural language processing techniques in preparation for an extract, transform and load process
CN110147421B (en) * 2019-05-10 2022-06-21 腾讯科技(深圳)有限公司 Target entity linking method, device, equipment and storage medium
EP4010840A1 (en) * 2020-03-13 2022-06-15 Google LLC Re-ranking results from semantic natural language processing machine learning algorithms for implementation in video games
CN111931509A (en) * 2020-08-28 2020-11-13 北京百度网讯科技有限公司 Entity chain finger method, device, electronic equipment and storage medium
CN112380866A (en) * 2020-11-25 2021-02-19 厦门市美亚柏科信息股份有限公司 Text topic label generation method, terminal device and storage medium
CN114970491B (en) * 2022-08-02 2022-10-04 深圳市城市公共安全技术研究院有限公司 Text connectivity judgment method and device, electronic equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294663A (en) * 2013-05-03 2013-09-11 苏州大学 Text coherence detection method and device
CN110287497A (en) * 2019-07-03 2019-09-27 桂林电子科技大学 A kind of coherent analysis method of the semantic structure of English text
CN110442872A (en) * 2019-08-06 2019-11-12 中科鼎富(北京)科技发展有限公司 A kind of text elements integrality checking method and device
CN111428470A (en) * 2020-03-23 2020-07-17 北京世纪好未来教育科技有限公司 Text continuity judgment method, text continuity judgment model training method, electronic device and readable medium
CN112597309A (en) * 2020-12-25 2021-04-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Detection system for identifying microblog data stream of sudden event in real time
CN113297367A (en) * 2021-06-29 2021-08-24 中国平安人寿保险股份有限公司 Method for generating user conversation linking language and related equipment
CN113553830A (en) * 2021-08-11 2021-10-26 桂林电子科技大学 Graph-based English text sentence language piece coherent analysis method
CN113743125A (en) * 2021-09-07 2021-12-03 广州晓阳智能科技有限公司 Text continuity analysis method and device
CN113869033A (en) * 2021-09-24 2021-12-31 厦门大学 Graph neural network sentence sequencing method integrated with iterative sentence pair relation prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Automatic Evaluation of Text Coherence: Models and Representations;Mirella Lapata;《Proceedings of IJCAI》;20151231;全文 *
面向新闻话题的社交媒体文本上下文衔接研究;杨秋红;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215(第(2018)02期);全文 *

Also Published As

Publication number Publication date
WO2023098658A1 (en) 2023-06-08
CN114970491A (en) 2022-08-30
ZA202301703B (en) 2023-07-26

Similar Documents

Publication Publication Date Title
CN114970491B (en) Text connectivity judgment method and device, electronic equipment and storage medium
CN106919793B (en) Data standardization processing method and device for medical big data
Grover et al. LT TTT-a flexible tokenisation tool
CA1300272C (en) Word annotation system
KR100717998B1 (en) Method for examining plagiarism of document
CN112527961B (en) Automatic extraction method for emergency response level of emergency plan and responsibility of administrative unit
CN109637605B (en) Electronic medical record structuring method and computer-readable storage medium
CN110741376A (en) Automatic document analysis for different natural languages
JP2009514076A (en) Computer-based automatic similarity calculation system for quantifying the similarity of text expressions
US10528609B2 (en) Aggregating procedures for automatic document analysis
CN112287664B (en) Text index data analysis method and system, corresponding equipment and storage medium
Koncar et al. A natural-language-translation neural network
KR20220064016A (en) Method for extracting construction safety accident based data mining using big data
Dai et al. A new statistical formula for Chinese text segmentation incorporating contextual information
Bokinsky et al. Application of natural language processing techniques to marine V-22 maintenance data for populating a CBM-oriented database
CN111177401A (en) Power grid free text knowledge extraction method
Hkiri et al. Integrating bilingual named entities lexicon with conditional random fields model for Arabic named entities recognition
Marciniak et al. Nested term recognition driven by word connection strength
Gritta Where are you talking about? advances and challenges of geographic analysis of text with application to disease monitoring
CN107577760B (en) text classification method and device based on constraint specification
KR102276761B1 (en) How to automatically extract information on the cause of disaster
US11748573B2 (en) System and method to quantify subject-specific sentiment
Oo Comparing accuracy between svm, random forest, k-nn text classifier algorithms for detecting syntactic ambiguity in software requirements
Cassim et al. Using text mining techniques to extract prostate cancer predictive information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng province, South Africa
NL1020670C2 (en) Determining a semantic image.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant