WO2023098658A1 - 文本衔接性判断方法、装置、电子设备及存储介质 - Google Patents

文本衔接性判断方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023098658A1
WO2023098658A1 PCT/CN2022/135015 CN2022135015W WO2023098658A1 WO 2023098658 A1 WO2023098658 A1 WO 2023098658A1 CN 2022135015 W CN2022135015 W CN 2022135015W WO 2023098658 A1 WO2023098658 A1 WO 2023098658A1
Authority
WO
WIPO (PCT)
Prior art keywords
mission
critical
preset
segment
named entities
Prior art date
Application number
PCT/CN2022/135015
Other languages
English (en)
French (fr)
Inventor
徐大用
岳清瑞
蒋会春
沈赣苏
秦宇
董方
习树峰
施钟淇
凌君
Original Assignee
深圳市城市公共安全技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市城市公共安全技术研究院有限公司 filed Critical 深圳市城市公共安全技术研究院有限公司
Priority to ZA2023/01703A priority Critical patent/ZA202301703B/en
Publication of WO2023098658A1 publication Critical patent/WO2023098658A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular, to a text cohesion judgment method, device, electronic equipment, and storage medium.
  • artificial intelligence can also gradually understand the text content.
  • artificial intelligence can be used to identify the similarity, consistency, etc. of text.
  • embodiments of the present application provide a text cohesion judging method, device, electronic equipment, and storage medium.
  • the embodiment of the present application provides a text cohesion judging method, the method includes: obtaining the target text; analyzing the target text to obtain the task-critical segment of the target text; The key segment is used to obtain the tag named entities in the task-critical segment; based on the tag-named entity, the cohesion judgment results between each task-critical segment are determined.
  • parsing the target text to obtain the mission-critical segment of the target text includes: inputting the target text into an initial analysis model with preset values to determine the initial analysis results; based on the preset knowledge base and initial analysis results, determining At least two process segments; using a preset key phrase extraction model to extract key phrases from each process segment, and determine the key phrase extraction results; obtain the task key segments of the target text according to the key phrase extraction results.
  • a preset key phrase extraction model to extract key phrases from each process segment, and determine the key phrase extraction results, including: performing word segmentation processing on the process segment based on a preset word segmentation model to obtain word segmentation results; As a result, the weight corresponding to each word segmentation result is determined according to the preset weight rule; and the key phrase extraction result is determined based on the weight corresponding to each word segmentation result and the preset selection rule.
  • obtaining the tag named entity in the task key segment includes: inputting the task key segment into the preset part-of-speech tagging model to determine the part of speech Annotate the result; based on the part-of-speech tagging result and the preset target part of speech, retain the target vocabulary that meets the target part of speech; input the target vocabulary into the preset named entity recognition model to obtain the tag named entity in the key segment of the task.
  • determining the cohesion judgment results between the key segments of the task includes: inputting each tag named entity into a preset semantic evaluation model, and determining the semantic similarity between each tag named entity Degree; Based on the semantic similarity, determine whether there is a connection between the tag named entities; obtain the number of connections of the tag named entities corresponding to each mission-critical segment; obtain the number of elements of the tag-named entity corresponding to each task-critical segment; based on The number of elements and the number of connections determine the cohesiveness judgment results between the key segments of each task.
  • determining whether there is a connection between the tag named entities includes: when the semantic similarity is greater than a preset first threshold, it is determined that there is a connection between the tag named entities; otherwise, it is determined that there is a connection between the tag named entities. There is no connection between them.
  • the embodiment of the present application provides a text cohesion judging device, including: an acquisition module, used to acquire the target text; a parsing module, used to analyze the target text, and obtain the task-critical paragraphs of the target text; the first The processing module is used to obtain the tag named entity in the mission-critical segment based on the preset named entity recognition model and the mission-critical segment; the second processing module is used to determine each task-critical segment based on the tag-named entity The cohesion judgment results between them.
  • the present application provides an electronic device, including: at least one processor; and a memory communicatively connected to at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are executed by at least one processor. Execution by a processor, so that at least one processor executes the steps of the method in the first aspect or any possible implementation manner of the first aspect.
  • the present application provides a computer-readable storage medium, on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the method according to the first aspect or any possible implementation manner of the first aspect are implemented.
  • the text cohesion judging method, device, electronic equipment and storage medium provided in the present application.
  • the method comprises: obtaining the target text, analyzing the target text, obtaining the task key segment of the target text, based on the preset named entity recognition model and the task key segment, obtaining the tag named entity in the task key segment, Based on the tag named entities, determine the cohesion judgment results between each mission-critical segment.
  • FIG. 1 is a schematic flow chart of a method for judging text cohesion provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a method for judging text cohesion provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a text cohesion judging device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device for providing text cohesion judgment according to an embodiment of the present application.
  • Fig. 1 is a schematic flow chart of the text cohesion judging method provided by the embodiment of the present application.
  • the execution process of the method steps can be referred to in Fig. 1 for details.
  • the method includes:
  • the target text can be any type of text, including but not limited to: various types of emergency plans, rescue and disaster relief responsibilities, etc., which are not limited here, and the data format of the text is also not limited, including but not limited to doc, docx and other formats.
  • the target text type can also be converted into a file in docx format by a file format conversion tool, and in the subsequent processing process, the file in docx format is processed Unified processing.
  • the target text after acquiring the target text, input the target text into the preset initial analysis model, determine the initial analysis result, determine at least two process segments based on the preset knowledge base and the initial analysis result, and use the preset
  • the key phrase extraction model performs key phrase extraction on each process segment, and determines the key phrase extraction result; according to the key phrase extraction result, the task key segment of the target text is obtained.
  • the role of the preset initial analysis model is to read each entity attribute of the stored data sequentially from top to bottom, including title index, title content, title level, parent title index, text, read After fetching, get a set of entities at the same level, and divide the target text into layers.
  • the target text contains two items of "Organization and Responsibilities”, “Monitoring and Early Warning and Forecasting", and under the item of "Organization and Responsibilities”, there are sub-entries “Emergency Organization and Responsibilities”, “Monitoring and There is a sub-entry “Monitoring of Geological Hazards" under the item “Early Warning and Forecasting”.
  • the preset knowledge base includes “chapter knowledge base” and "organization knowledge base”, wherein, "chapter knowledge base” is due to different organization descriptions corresponding to different special plans, so it is necessary to establish “ “Chapter Knowledge Base” is convenient for finding the corresponding content.
  • the "chapter knowledge base” needs to correspond to different contents. For example, when dealing with texts related to emergency plans, the “chapter knowledge base” can be shown in Table 1 below:
  • the abbreviation or variant of the organizational unit can also be replaced, which can usually be realized by using the FlashText algorithm.
  • the method of replacing or finding the abbreviations or variants of organizational units is not limited to the FlashText algorithm, and this example is only for explanation, and there is no limitation here, and the actual application shall prevail.
  • the key phrase of the process sentence needs to be extracted, and the sentence where the key phrase is located is taken as the task key sentence.
  • the key phrase extraction method may be, based on the preset word segmentation model, perform word segmentation processing on the process segment to obtain word segmentation results, based on the word segmentation results and preset weight rules, determine the weight corresponding to each word segmentation result, based on The weight corresponding to each word segmentation result and the preset selection rules determine the key phrase extraction result.
  • the extraction of key phrases first requires cleaning the target text to remove impurity data such as abnormal characters, redundant characters, special characters, and various brackets. After that, segment the text into sentences, and then use the word segmentation model for word segmentation and part-of-speech tagging. At the same time, load the emergency plan domain-specific dictionary library to prevent domain nouns from being separated. For example, in the text of the emergency plan, specific terms in the emergency field such as "district defense director", "rescue agency”, and "lead unit” cannot be separated. Next, calculate the word frequency, perform word frequency statistics on the words after word segmentation, and calculate the weight of each word.
  • the weight of words can be assigned according to preset data, or can be calculated using a weight calculation model, which is not limited here and is subject to actual application. Finally, an appropriate phrase is selected according to a preset selection rule, and the weight of each phrase is calculated according to a preset calculation rule.
  • phrase selection rules As follows: Rule 1: A phrase cannot exceed 25 chars; Rule 2: A phrase cannot exceed 12 tokens (words); Rule 3: A phrase cannot appear more than one function word ;Rule 4: Function words and stop words cannot be used before and after a phrase, and the end of a phrase cannot be a verb; Rule 5: Candidate phrases cannot exceed the specified number (1) of stop words; Rule 6: Candidate phrases come first A word must be a verb (v), adverb (d), preposition (p); Rule 7: A candidate phrase must not be a noun. The above rules are formulated based on the text data of the plan.
  • the weight calculation rule can be designed as follows: the weight calculation formula of the candidate phrase is: the product of the phrase weight, the phrase length weight, and the part-of-speech weight, wherein the phrase weight is the sum of the weights of each word in the phrase.
  • the phrase weight is:
  • the phrase length weight is the value obtained through multiple verifications, and the final weight value is ⁇ 1:1,2:5.6,3:1.1,4:2.0,5:0.7 ,6:0.9,7:0.48,8:0.43,9:0.24,10:0.15,11:0.07,12:0.05 ⁇ .
  • the part-of-speech weight represents the part-of-speech conversion weight from the first word to the last word of the phrase, such as: ⁇ "v
  • the weights of phrases are sorted according to size and selected according to preset rules.
  • phrase A For example, suppose there are 5 phrases in total, namely phrase A, phrase B, phrase C, phrase D, and phrase E, and the corresponding weights are: 0.1, 0.2, 0.3, 0.4, 0.5, assuming that only 3 phrases need to be taken, then Choices: Phrase C, Phrase D, Phrase E.
  • the named entity may be any entity, including but not limited to: a responsibility, a location, an organization, etc., which are not limited here and subject to actual application.
  • the mission-critical paragraph can also be a text paragraph of any length, which is not limited here.
  • the task key segment after obtaining the task key segment, input the task key segment into the preset part-of-speech tagging model, determine the part-of-speech tagging result, and retain the target vocabulary that meets the target part-of-speech based on the part-of-speech tagging result and the preset target part of speech , input the target vocabulary into the pre-set named entity recognition model, and obtain the tag named entities in the mission-critical segment.
  • the entity in the mission-critical segment must first be tagged.
  • the method of entity tagging includes but is not limited to BIEO tagging.
  • BIEO tagging it is assumed that the mission-critical The paragraph is: "Responsible for assisting the Municipal Emergency Management Bureau in dealing with water engineering emergencies that occurred during the impact of typhoons, and providing emergency technical support for the Municipal Defense Command. Responsible for hydrological observation and early warning and forecasting, water engineering scheduling and operation, emergency repairs, and organization of dredging river course, pumping and drainage of accumulated water.”, the labeling results are shown in Table 2 below:
  • the entity dictionary is used to indicate the magnetism corresponding to each entity. "Municipal Ecological Environment Bureau”, “Teaching Places”, and “Tourist Attractions”, the corresponding "Entity Dictionary” is shown in Table 3 below:
  • the entity dictionary is used as a custom dictionary for word segmentation, word segmentation part-of-speech tagging is performed on the responsibility task text, and data analysis is performed on the part-of-speech tagging result.
  • phrase extraction may adopt any phrase extraction method including but not limited to NLTK regular expression blocker.
  • the phrases are sorted out to obtain the tag named entities corresponding to the mission-critical paragraphs. Assume that there are paragraphs: "Responsible for rescuing people in distress, transferring and evacuating trapped people, handling secondary disasters caused by typhoons, assisting local people's governments in post-disaster reconstruction work.” and “Organizing assault rescue teams, dispatching health technology Rescue the wounded and sick; do a good job in sanitation and epidemic prevention in the disaster area, and prevent the spread of epidemics and epidemics in the disaster area.” After the extraction in the above method, the tag named entity can finally be obtained as shown in Table 5 below:
  • the tag named entities are obtained, based on the semantic similarity, it is determined whether there is a connection between the tag named entities, the number of connections of the tag named entities corresponding to each mission key segment is obtained, and the corresponding task key segment is obtained.
  • the number of elements of the tag named entity determines the cohesion judgment result between each mission-critical segment.
  • the semantic similarity can be determined according to any semantic discrimination model, which is not limited here, and after determining the semantic similarity between the tag named entities, the Whether the semantic similarity is greater than a preset first threshold, only when the semantic similarity is greater than the preset first threshold, it is determined that there is a connection between the tag named entities.
  • the cohesion between the task-critical segments can be calculated according to the following formula:
  • R (A, B) represents the degree of cohesion between the mission-critical sentence A and the mission-critical sentence B;
  • L (A) represents the number of label named entities connected between the mission-critical sentence A and the mission-critical sentence B;
  • N (A) represents the number of all tagged named entities in mission-critical segment A;
  • L (B) represents the number of tagged named entities connected between task-critical segment B and task-critical segment A;
  • N (B) represents the number of tagged entities in task-critical segment A
  • the number of all tag named entities in segment B is the larger the value of R (A, B) , the better the cohesion between task-critical segment A and task-critical segment B is.
  • the application further obtains the tag named entities in the mission-critical paragraphs after locking the mission-critical paragraphs, and uses these tag-named entities to calculate the cohesion between the mission-critical paragraphs, so that in a piece of text, each sentence
  • the cohesive relationship of time can fully judge whether the plan in the later part of the text can solve the problems in the previous part and improve work efficiency.
  • the embodiment of the present application also discloses a text cohesion judging device, as shown in FIG. 3 , including:
  • An acquisition module 301 configured to acquire the target text
  • step S110 in any of the foregoing embodiments, and details are not repeated here.
  • Parsing module 302 is used for parsing target text, obtains the task key phrase of target text;
  • step S120 For details, refer to the relevant description of step S120 in any of the foregoing embodiments, and details are not repeated here.
  • the first processing module 303 is used to obtain the tag named entity in the task key segment based on the preset named entity recognition model and the task key segment;
  • step S130 in any of the foregoing embodiments, and details are not repeated here.
  • the second processing module 304 is configured to determine the cohesiveness judgment results between various mission-critical phrases based on the tag named entities.
  • step S140 For details, refer to the relevant description of step S140 in any of the foregoing embodiments, and details are not repeated here.
  • the application further obtains the tag named entities in the mission-critical paragraphs after locking the mission-critical paragraphs, and uses these tag-named entities to calculate the cohesion between the mission-critical paragraphs, so that in a piece of text, each sentence
  • the cohesive relationship of time can fully judge whether the plan in the later part of the text can solve the problems in the previous part and improve work efficiency.
  • the parsing module 302 is configured to: input the target text into a preset initial analysis model, and determine the initial analysis result; based on the preset knowledge base and the initial analysis result, determine at least two process language segment; use the pre-set key phrase extraction model to extract key phrases from each process segment, and determine the key phrase extraction result; obtain the task key segment of the target text according to the key phrase extraction result.
  • the parsing module 302 is configured to: perform word segmentation processing on the process segment based on the preset word segmentation model to obtain word segmentation results; based on the word segmentation results and preset weight rules, determine the corresponding weight; determine the key phrase extraction result based on the weight corresponding to each word segmentation result and a preset selection rule.
  • the first processing module 303 is configured to: input the task key phrase into the preset part-of-speech tagging model to determine the part-of-speech tagging result; based on the part-of-speech tagging result and the preset target part of speech, retain The target vocabulary that meets the target part of speech; input the target vocabulary into the preset named entity recognition model, and obtain the tag named entity in the key segment of the task.
  • the second processing module 304 is configured to: input each tag named entity into a preset semantic evaluation model, and determine the semantic similarity between each tag named entity; based on the semantic similarity, Determine whether there is a connection between the tag named entities; obtain the number of connections of the tag named entities corresponding to each mission-critical segment; obtain the number of elements of the tag-named entity corresponding to each mission-critical segment; based on the number of elements and the number of connections , to determine the cohesion judgment results between the key segments of each task.
  • the second processing module 304 is configured to: when the semantic similarity is greater than the preset first threshold, determine that there is a connection between the tag named entities; otherwise, determine that there is no connection between the tag named entities connect.
  • Fig. 4 is a schematic structural diagram of an electronic device provided in an optional embodiment of the present application.
  • the electronic device may include: at least one processor 41, such as a CPU (Central Processing Unit, central processor), at least one communication interface 43, memory 44, and at least one communication bus 42.
  • the communication bus 42 is used to realize connection and communication between these components.
  • the communication interface 43 may include a display screen (Display) and a keyboard (Keyboard), and the optional communication interface 43 may also include a standard wired interface and a wireless interface.
  • the memory 44 can be a high-speed RAM memory (Random Access Memory, volatile random access memory), or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 44 may also be at least one storage device located away from the aforementioned processor 41 .
  • the processor 41 may be combined with the device described in FIG. 4 , the memory 44 stores an application program, and the processor 41 invokes the program code stored in the memory 44 to execute any of the above method steps.
  • the communication bus 42 may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the communication bus 42 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 4 , but it does not mean that there is only one bus or one type of bus.
  • the memory 44 may include a volatile memory (English: volatile memory), such as a random access memory (English: random-access memory, abbreviated: RAM); the memory may also include a non-volatile memory (English: non-volatile memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviated: HDD) or solid-state hard disk (English: solid-state drive, abbreviated: SSD); memory 44 can also include the above-mentioned types combination of memory.
  • volatile memory such as a random access memory (English: random-access memory, abbreviated: RAM)
  • non-volatile memory such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviated: HDD) or solid-state hard disk (English: solid-state drive, abbreviated: SSD); memory 44 can also include the above-mentioned types combination of memory.
  • the processor 41 may be a central processing unit (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP.
  • CPU central processing unit
  • NP network processor
  • the processor 41 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), a programmable logic device (English: programmable logic device, abbreviation: PLD) or a combination thereof.
  • the above PLD can be a complex programmable logic device (English: complex programmable logic device, abbreviated: CPLD), field programmable logic gate array (English: field-programmable gate array, abbreviated: FPGA), general array logic (English: generic array logic, abbreviation: GAL) or any combination thereof.
  • memory 44 is also used to store program instructions.
  • the processor 41 may invoke program instructions to implement the method for judging text cohesion as shown in any embodiment of the present application.
  • the embodiment of the present application also provides a non-transitory computer storage medium, the computer storage medium stores computer-executable instructions, and the computer-executable instructions can execute the method for judging text coherence in any of the above method embodiments.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or solid-state hard drive (Solid-State Drive, SSD) etc.;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本申请实施例涉及计算机技术领域,尤其涉及文本衔接性判断方法、装置、电子设备及存储介质。该方法包括:获取、解析目标文本,得到目标文本的任务关键语段,基于预设的命名实体识别模型及所述任务关键语段,得到任务关键语段中的标签命名实体,基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果。在锁定任务关键语段之后,进一步获得任务关键语段中的标签命名实体,利用标签命名实体计算任务关键语段之间的衔接性,可以充分的判断出文本中后文的预案是否能解决前文中的问题。

Description

文本衔接性判断方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求在2022年08月02日提交中国专利局、申请号为202210919249.7、发明名称为“一种文本衔接性判断方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,尤其涉及文本衔接性判断方法、装置、电子设备及存储介质。
背景技术
随着人工智能的发展,人工智能也可以逐渐理解文本内容。现有技术下,可以利用人工智能来识别文本的相似度,一致性等等。
但是,现有技术下,人工智能仅能分辨出文本是否在说同一问题,尤其在应急预案领域,对于人工智能的要求不仅限于使其识别文本中是否在说同一问题,更重要的是,需要判断文本中后文的预案是否能解决前文中的问题,这就涉及到了对文本衔接性的判断,判断文本的连贯性和实用性。
因此,需要一种文本衔接性判断方法,以解决上述问题。
发明内容
鉴于此,为解决现有技术中上述技术问题,本申请实施例提供文本衔接性判断方法、装置、电子设备及存储介质。
第一方面,本申请实施例提供文本衔接性判断方法,该方法包括:获取目标文本;对目标文本进行解析,得到目标文本的任务关键语段;基于预设的命名实体识别模型及所述任务关键语段,得到任务关键语段中的标签命名实体;基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果。
可选地,对目标文本进行解析,得到目标文本的任务关键语段,包括:将目标文本输入值预先设置的初始分析模型,确定初始分析结果;基于预先设置的知识库和初始分析结果,确定至少两个过程语段;利用预先设置的关键短语提取模型对各个过程语段进行关键短语提取,确定关键短语提取结果;根据关键短语提取结果,得到目标文本的任务关键语段。
可选地,利用预设的关键短语提取模型对各个过程语段进行关键短语提取,确定关键短语提取结果,包括:基于预设的分词模型对过程语段进行分词处理,得到分词结果;基于分词结果,和预设的权重规则,确定各个分词结果对应的权重;基于各个分词结果对应的权重和预先设定的选择规则,确定关键短语提取结果。
可选地,基于预设的命名实体识别模型及所述任务关键语段,得到任务关键语段中的标签命名实体,包括:将任务关键语段输入至预设的词性标注模型中,确定词性标注结果;基于词性标注结果和预先设置的目标词性,保留符合目标词性的目标词汇;将目标词汇输入至预先设定的命名实体识别模型中,得到任务关键语段中的标签命名实体。
可选地,基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果,包括:将各个标签命名实体输入至预先设定的语义评估模型,确定各个标签命名实体之间的语义相似度;基于语义相似度,确定标签命名实体之间是否存在连接;获取各个任务关键语段对应的标签命名实体的连接个数;获取各个任务关键语段对应的标签命名实体的元素个数;基于元素个数和连接个数,确定各个任务关键语段之间的衔接性判断结果。
可选地,基于语义相似度,确定标签命名实体之间是否存在连接,包括:当语义相似度大于预设的第一阈值时,认定标签命名实体之间存在连接;否则,认定标签命名实体之间不存在连接。
第二方面,本申请实施例提供一种文本衔接性判断装置,包括:获取模块,用于获取目标文本;解析模块,用于对目标文本进行解析,得到目标文本的任务关键语段;第一处理模块,用于基于预设的命名实体识别模型及所述任务关键语段,得到任务关键语段中的标签命名实体;第二处理模块,用于基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果。
第三方面,本申请提供了一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器执行如第一方面或第一方面任一可能的实施方式的方法的步骤。
第四方面,本申请提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如第一方面或第一方面任一可能的实施方式的方法的步骤。
本申请提供的文本衔接性判断方法、装置、电子设备及存储介质。该方法包括:获取目标文本,对目标文本进行解析,得到目标文本的任务关键语段,基于预设的命名实体识别模型及所述任务关键语段,得到任务关 键语段中的标签命名实体,基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果。通过在锁定任务关键语段之后,进一步地获得任务关键语段中的标签命名实体,利用这些标签命名实体计算任务关键语段之间的衔接性,明确了在一段文本中,各个语段时间的衔接性关系可以充分的判断出,文本中后文的预案是否能解决前文中的问题,提高了工作效率。
附图说明
图1为本申请一实施例提供的文本衔接性判断方法流程示意图;
图2为本申请一实施例提供的文本衔接性判断方法示意图;
图3为本申请一实施例提供的文本衔接性判断装置结构示意图;
图4为本申请一实施例提供文本衔接性判断电子设备结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为便于对本申请实施例的理解,下面将结合附图以具体实施例做进一步地解释说明,实施例并不构成对本申请实施例的限定。
图1为本申请实施例提供的文本衔接性判断方法流程示意图,该方法步骤执行过程,具体可以参见图1所示,该方法包括:
S110,获取目标文本。
示例性地,目标文本可以是任意类型的文本,包括但不限于:应急预案,抢险救灾职责等等各种类型,在此不做限定,同时文本的数据格式也不做限定,包括但不限于doc、docx等格式的文件。
在一种可选实施例中,在获得其他格式类型的文件后,也可以通过文件格式转换工具,将目标文本类型转换为docx格式的文件,在之后的处理过程中,对docx格式的文件进行统一处理。
S120,对目标文本进行解析,得到目标文本的任务关键语段。
示例性地,在获取目标文本之后,将目标文本输入至预先设置的初始分析模型,确定初始分析结果,基于预先设置的知识库和初始分析结果,确定至少两个过程语段,利用预先设置的关键短语提取模型对各个过程语段进行关键短语提取,确定关键短语提取结果;根据关键短语提取结果,得到目标文本的任务关键语段。
在一种可选实施例中,预先设置的初始分析模型的作用是通过自上而 下顺序读取存储数据的每一个实体属性包含标题索引、标题内容、标题级别、上级标题索引、正文,读取结束后得到平级的实体集合,并将目标文本划分出层次。例如,目标文本中包含有“组织机构和职责”,“监测和预警预报”这两个条目,并且,在“组织机构和职责”条目下有子条目“应急组织机构和职责”,“监测和预警预报”条目下有子条目“地质灾害的监测”,显然,“组织机构和职责”和“监测和预警预报”为同一层级,“应急组织机构和职责”和“地质灾害的监测”为同一层级,且低于“组织机构和职责”和“监测和预警预报”所在层级,按照如上的划分方式,通过预先设置的初始分析模型,将目标文本划分为若干层级,这个若干层级就是初始分析结果。
进一步地,划分层级之后,开始基于预先设置的知识库和初始分析结果,确定至少两个过程语段。
在一种可选实施例中,预先设置的知识库包括“章节知识库”和“组织机构知识库”,其中,“章节知识库”是由于不同专项预案对应的机构描述不同,因此需要建立“章节知识库”便于找到对应的内容。对应不同的文本,“章节知识库”需要对应不同的内容,例如,在处理有关应急预案的文本时,“章节知识库”可以如下表1所示:
表1
Figure PCTCN2022135015-appb-000001
Figure PCTCN2022135015-appb-000002
在一可选实施例中,假设存在一个目标文本,经过对过程语段经行定位,第一次首先定位到“组织机构与职责”和各响应等级所在的“应急响应”文本,即查找文段中包含有“组织机构与职责”“应急响应”的全部文本,进一步地,在完成第一次定位后,根据定位结果,进一步地对“组织机构与职责”中的“成员单位”文本与“应急响应”中“各响应等级”文本进行定位,最终,定位“各响应等级”中“成员单位”对应的文本。
进一步地,由于实际文本中机构单位很多是简写或者变体,因此建立需要建立“组织机构知识库”来保证可以在出现机构单位简写或者变体时,仍可以对成员单位进行精准定位。
在实际应用中,为了方便定位过程语段,也可以对机构单位简写或者变体进行替换,通常可以采用FlashText算法实现。但是,需要说明的是,实际应用中,替换或查找机构单位简写或者变体的方法不仅限于FlashText算法,本实例中仅为解释说明,在此不做限定,以实际应用为准。
进一步地,在确定了过程语段之后,需要对过程语段的关键短语进行提取,将关键短语所在的句子,作为任务关键语段。
示例性地,关键短语的提取方法可以是,基于预设的分词模型对过程语段进行分词处理,得到分词结果,基于分词结果,和预设的权重规则,确定各个分词结果对应的权重,基于各个分词结果对应的权重和预先设定的选择规则,确定关键短语提取结果。
在一个可选实施例中,关键短语的提取,首先要对目标文本进行清洗文本,去除异常字符、冗余字符、特殊字符、各种括号等杂质数据。之后,对文本进行分句,然后使用分词模型做分词和词性标注,同时加载应急预案领域特定字典库,防止领域名词被分开。例如,在应急预案文本中“区防指”、“救援机构”、“牵头单位”这些应急领域专属名词是不能被分开的。接下来计算词频,对分词后的词进行词频统计,计算每一个词的权重。词的权重可以根据预先设定的数据进行赋值,也可以用权重计算模型进行计算,在此不做限定,以实际应用为准。最终,根据预先设定的选择规则选出合适的短语,并根据预先设定的计算规则,计算出每一个短语所占的权重。
在实际应用中,可以参照如下方式设置短语选择规则:规则1:一个短语不能超过25个char;规则2:一个短语不能超过12个token(词); 规则3:一个短语中不能出现超过一个虚词;规则4:短语的前后不可以是虚词、停用词,短语末尾不可是动词;规则5:候选短语中不可以超过规定个数(1个)的停用词;规则6:候选短语第一个词必须是动词(v)、副词(d)、介词(p);规则7:候选短语不得是一个名词。以上规则都是根据预案文本数据制定的。权重计算规则,则可以按照如下方式设计:候选短语的权重计算公式为:短语权重、短语长度权重、词性权重三项的乘积,其中短语权重为短语中各个词的权重之和。例如,短语[('组织','v',0.6762),('做好','v',0.8136),('医疗','n',4.4245),('救护','v',1.5946)],短语权重为:
0.6762+0.8136+4.4245+1.5946=7.5089
通常认为较短字段应该有更多的权重,因此短语长度权重是通过多次验证得到的数值,最终的权重值为{1:1,2:5.6,3:1.1,4:2.0,5:0.7,6:0.9,7:0.48,8:0.43,9:0.24,10:0.15,11:0.07,12:0.05}。词性权重表示的是短语第一个词转换到最后一个词的词性转化权重,如:{"v|n":0.6575342465753424,"n":0.9154147615937296}。最终,对短语的权重按照大小进行排序,并按照预先设定的规则进行选择。例如,假设共有5个短语,分别为短语A,短语B,短语C,短语D,短语E,分别对应的权重为:0.1,0.2,0.3,0.4,0.5,假设只需要取3个短语,则选择:短语C,短语D,短语E。
S130,基于预设的命名实体识别模型及所述任务关键语段,得到任务关键语段中的标签命名实体。
示例性地,命名实体可以是任何实体,包括但不限于:一种职责,一个地点,一个组织机构等等,在此不做限定,以实际应用为准。任务关键语段也可以是任意长度的文段,在此不做限定。
示例性地,在获得任务关键语段后,将任务关键语段输入至预设的词性标注模型中,确定词性标注结果,基于词性标注结果和预先设置的目标词性,保留符合目标词性的目标词汇,将目标词汇输入至预先设定的命名实体识别模型中,得到任务关键语段中的标签命名实体。
在一可选实施例中,在进行词性标注之前,首先要对任务关键语段中的实体进行标注,实体标注的方法包括但不限于BIEO标注等方式,当采用BIEO标注方式时,假设任务关键语段为:“负责协助市应急管理局处置台风影响期间发生的水务工程突发事件,为市防指提供抢险技术支撑。负责水文观测和预警预报,水务工程调度运行及抢险抢修,组织清疏河道、抽排积水。”,则标注结果参照如下表2所示:
表2
Figure PCTCN2022135015-appb-000003
进一步地,根据任务关键文段,建立“实体词典”,实体词典是用来表明各个实体对应的磁性,假设存在有实体:“监测预警”、“抽排积水”、“光明交警大队”、“市生态环境局”、“教学场所”、“旅游景区”,则对应的“实体词典”如下表3所示:
表3
职能标签实体 词性
监测预警 职责(Duty)
抽排积水 Duty
光明交警大队 组织(ORG)
市生态环境局 ORG
教学场所 地点(LOC)
旅游景区 LOC
在一个可选地实施例中,在建立“实体词典”之后,使用实体词典作为分词的自定义词典并对职责任务文本进行分词词性标注并对词性标注结果进行数据分析。使用‘动名词+名词’,‘名词+名词’,‘名词+动名词’等词性结合的方式对实体词典中的Duty词性的实体、LOC词性的实体和ORG词性的实体进行数据扩充,假设存在有任务关键文段:“负责统筹指导重大险情灾情宣传报道,负责统筹指导抢险救灾舆情引导应对工作。”经过处理,则有如下表4所示词性标注结果:
表4
Figure PCTCN2022135015-appb-000004
进一步地,在获得标注结果之后,对任务关键文段词性标注的结果进行形容词、副词、时间副词、修饰词等修饰内容的过滤,保留与任务相关的动词,普通名词、以及职责标签实体词汇等核心词汇,并对任务关键文段词性标注核心词汇进行职能标签短语抽取。其中,短语抽取可以采用包括但不限于NLTK正则表达式分块器等任何短语抽取方式。
在完成短语抽取,之后,对短语进行整理,获得任务关键文段对应的标签命名实体。假设存在有文段:“负责抢救遇险人员,转移和疏散被困群众,处置台风引发的次生灾害,协助地方人民政府开展灾后重建中的相关工作。”和“组织突击救护队伍,调度卫生技术力量,抢救受灾伤病人员;做好灾区卫生防疫工作,防止灾区疫情、疫病的传播蔓延。”经过上述方式的抽取,最终可以获得标签命名实体如下表5所示:
表5
Figure PCTCN2022135015-appb-000005
S140,基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果。
示例性地,在获得标签命名实体之后,基于语义相似度,确定标签命名实体之间是否存在连接,取各个任务关键语段对应的标签命名实体的连接个数,获取各个任务关键语段对应的标签命名实体的元素个数,基于元素个数和连接个数,确定各个任务关键语段之间的衔接性判断结果。
在一可选实施例中,语义的相似度可以根据任意的语义判别模型进行确定,在此不做限定,并且,在确定标签命名实体之间的语义相似度之后,判断标签命名实体之间的语义相似度是否大于预设的第一阈值,只有当述语义相似度大于预设的第一阈值时,才认定标签命名实体之间存在连接。
在确认标签命名实体之间的关系之后,则可以按照如下公式计算任务关键语段之间的衔接性:
Figure PCTCN2022135015-appb-000006
其中,R (A,B)表示任务关键语段A与任务关键语段B之间的衔接度;L (A)表示任务关键语段A与任务关键语段B连接的标签命名实体个数;N (A)表示任务关键语段A中所有标签命名实体个数;L (B)表示任务关键语段B与任务关键语段A连接的标签命名实体个数;N (B)表示任务关键语段B中所有标签命名实体个数。其中,R (A,B)的取值越大,代表任务关键语段A和任务关键语段B的衔接性越好。
参阅图2所示,假设任务关键语段A中有标签命名实体5个,假设任务关键语段B中有标签命名实体6个,任务关键语段A与任务关键语段B 连接的标签命名实体个数为2,任务关键语段B与任务关键语段A连接的标签命名实体个数为2,则任务关键语段A与任务关键语段B之间的衔接度为:
Figure PCTCN2022135015-appb-000007
本申请通过在锁定任务关键语段之后,进一步地获得任务关键语段中的标签命名实体,利用这些标签命名实体计算任务关键语段之间的衔接性,明确了在一段文本中,各个语段时间的衔接性关系可以充分的判断出,文本中后文的预案是否能解决前文中的问题,提高了工作效率。
本申请实施例还公开了一种文本衔接性判断装置,如图3所示,包括:
获取模块301,用于获取目标文本;
详细内容参见上述任意实施例中步骤S110的相关描述,在此不再赘述。
解析模块302,用于对目标文本进行解析,得到目标文本的任务关键语段;
详细内容参见上述任意实施例中步骤S120的相关描述,在此不再赘述。
第一处理模块303,用于基于预设的命名实体识别模型及所述任务关键语段,得到任务关键语段中的标签命名实体;
详细内容参见上述任意实施例中步骤S130的相关描述,在此不再赘述。
第二处理模块304,用于基于标签命名实体,确定各个任务关键语段之间的衔接性判断结果。
详细内容参见上述任意实施例中步骤S140的相关描述,在此不再赘述。
本申请通过在锁定任务关键语段之后,进一步地获得任务关键语段中的标签命名实体,利用这些标签命名实体计算任务关键语段之间的衔接性,明确了在一段文本中,各个语段时间的衔接性关系可以充分的判断出,文本中后文的预案是否能解决前文中的问题,提高了工作效率。
作为本申请一个可选实施方式,解析模块302,用于:将目标文本输入至预先设置的初始分析模型,确定初始分析结果;基于预先设置的知识库和初始分析结果,确定至少两个过程语段;利用预先设置的关键短语提取模型对各个过程语段进行关键短语提取,确定关键短语提取结果;根据关键短语提取结果,得到目标文本的任务关键语段。
作为本申请一个可选实施方式,解析模块302,用于:基于预设的分词模型对过程语段进行分词处理,得到分词结果;基于分词结果,和预设的权重规则,确定各个分词结果对应的权重;基于各个分词结果对应的权重和预先设定的选择规则,确定关键短语提取结果。
作为本申请一个可选实施方式,第一处理模块303,用于:将任务关键语段输入至预设的词性标注模型中,确定词性标注结果;基于词性标注 结果和预先设置的目标词性,保留符合目标词性的目标词汇;将目标词汇输入至预先设定的命名实体识别模型中,得到任务关键语段中的标签命名实体。
作为本申请一个可选实施方式,第二处理模块304,用于:将各个标签命名实体输入至预先设定的语义评估模型,确定各个标签命名实体之间的语义相似度;基于语义相似度,确定标签命名实体之间是否存在连接;获取各个任务关键语段对应的标签命名实体的连接个数;获取各个任务关键语段对应的标签命名实体的元素个数;基于元素个数和连接个数,确定各个任务关键语段之间的衔接性判断结果。
作为本申请一个可选实施方式,第二处理模块304,用于:当语义相似度大于预设的第一阈值时,认定标签命名实体之间存在连接;否则,认定标签命名实体之间不存在连接。
请参阅图4,图4是本申请可选实施例提供的一种电子设备的结构示意图,如图4所示,该电子设备可以包括:至少一个处理器41,例如CPU(Central Processing Unit,中央处理器),至少一个通信接口43,存储器44,至少一个通信总线42。其中,通信总线42用于实现这些组件之间的连接通信。其中,通信接口43可以包括显示屏(Display)、键盘(Keyboard),可选通信接口43还可以包括标准的有线接口、无线接口。存储器44可以是高速RAM存储器(Random Access Memory,易挥发性随机存取存储器),也可以是非不稳定的存储器(non—volatile memory),例如至少一个磁盘存储器。存储器44可选的还可以是至少一个位于远离前述处理器41的存储装置。其中处理器41可以结合图4所描述的装置,存储器44中存储应用程序,且处理器41调用存储器44中存储的程序代码,以用于执行上述任一方法步骤。
其中,通信总线42可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。通信总线42可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
其中,存储器44可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random—access memory,缩写:RAM);存储器也可以包括非易失性存储器(英文:non—volatile memory),例如快闪存储器(英文:flash memory),硬盘(英文:hard disk drive,缩写:HDD)或固态硬盘(英文:solid—state drive,缩写:SSD);存储器44还可以包括上述种类的存储器的组合。
其中,处理器41可以是中央处理器(英文:central processing unit,缩写:CPU),网络处理器(英文:network processor,缩写:NP)或者CPU和NP的组合。
其中,处理器41还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(英文:application—specific integrated circuit,缩写:ASIC),可编程逻辑器件(英文:programmable logic device,缩写:PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(英文:complex programmable logic device,缩写:CPLD),现场可编程逻辑门阵列(英文:field—programmable gate array,缩写:FPGA),通用阵列逻辑(英文:generic array logic,缩写:GAL)或其任意组合。
可选地,存储器44还用于存储程序指令。处理器41可以调用程序指令,实现如本申请任一实施例中所示的文本衔接性判断方法。
本申请实施例还提供了一种非暂态计算机存储介质,计算机存储介质存储有计算机可执行指令,该计算机可执行指令可执行上述任意方法实施例中的文本衔接性判断方法。其中,存储介质可为磁碟、光盘、只读存储记忆体(Read—Only Memory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid—State Drive,SSD)等;存储介质还可以包括上述种类的存储器的组合。
虽然结合附图描述了本申请的实施例,但是本领域技术人员可以在不脱离本申请的精神和范围的情况下做出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。

Claims (8)

  1. 一种文本衔接性判断方法,其特征在于,包括:
    获取目标文本;
    对所述目标文本进行解析,得到所述目标文本的任务关键语段;
    基于预设的命名实体识别模型及所述任务关键语段,得到所述任务关键语段中的标签命名实体;
    基于所述标签命名实体,确定各个所述任务关键语段之间的衔接性判断结果;
    所述基于所述标签命名实体,确定各个所述任务关键语段之间的衔接性判断结果,包括:
    将各个所述标签命名实体输入至预先设定的语义评估模型,确定各个所述标签命名实体之间的语义相似度;
    基于所述语义相似度,确定所述标签命名实体之间是否存在连接;
    获取各个所述任务关键语段对应的所述标签命名实体的连接个数;
    获取各个所述任务关键语段对应的所述标签命名实体的元素个数;
    基于所述元素个数和所述连接个数,确定各个所述任务关键语段之间的衔接性判断结果;
    其中,所述基于所述语义相似度,确定所述标签命名实体之间是否存在连接,包括:
    当所述语义相似度大于预设的第一阈值时,认定所述标签命名实体之间存在连接;
    否则,认定所述标签命名实体之间不存在连接;
    其中,按照如下公式计算任务关键语段之间的衔接性:
    Figure PCTCN2022135015-appb-100001
    其中,R (A,B)表示任务关键语段A与任务关键语段B之间的衔接度;L (A)表示任务关键语段A与任务关键语段B连接的标签命名实体个数;N (A)表示任务关键语段A中所有标签命名实体个数;L (B)表示任务关键语段B与任务关键语段A连接的标签命名实体个数;
    N (B)表示任务关键语段B中所有标签命名实体个数;其中,R (A,B)的取值越大,代表任务关键语段A和任务关键语段B的衔接性越好。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述目标文本进 行解析,得到所述目标文本的任务关键语段,包括:
    将所述目标文本输入至预先设置的初始分析模型,确定初始分析结果;
    基于预先设置的知识库和所述初始分析结果,确定至少两个过程语段;
    利用预先设置的关键短语提取模型对各个所述过程语段进行关键短语提取,确定关键短语提取结果;
    根据所述关键短语提取结果,得到所述目标文本的任务关键语段。
  3. 根据权利要求2所述的方法,其特征在于,所述利用预先设置的关键短语提取模型对各个所述过程语段进行关键短语提取,确定关键短语提取结果,包括:
    基于预设的分词模型对所述过程语段进行分词处理,得到分词结果;
    基于所述分词结果,和预设的权重规则,确定各个所述分词结果对应的权重;
    基于所述各个所述分词结果对应的权重和预先设定的选择规则,确定关键短语提取结果。
  4. 根据权利要求1所述的方法,其特征在于,所述基于预设的命名实体识别模型及所述任务关键语段,得到所述任务关键语段中的标签命名实体,包括:
    将所述任务关键语段输入至预设的词性标注模型中,确定词性标注结果;
    基于所述词性标注结果和预先设置的目标词性,保留符合所述目标词性的目标词汇;
    将所述目标词汇输入至预先设定的命名实体识别模型中,得到所述任务关键语段中的标签命名实体。
  5. 一种文本衔接性判断装置,其特征在于,包括:
    获取模块,用于获取目标文本;
    解析模块,用于对所述目标文本进行解析,得到所述目标文本的任务关键语段;
    第一处理模块,用于基于预设的命名实体识别模型及所述任务关键语段,得到所述任务关键语段中的标签命名实体;
    第二处理模块,用于基于所述标签命名实体,确定各个所述任务关键语段之间的衔接性判断结果;
    其中,所述第二处理模块具体用于:
    将各个所述标签命名实体输入至预先设定的语义评估模型,确定各个所述标签命名实体之间的语义相似度;
    基于所述语义相似度,确定所述标签命名实体之间是否存在连接;
    获取各个所述任务关键语段对应的所述标签命名实体的连接个数;
    获取各个所述任务关键语段对应的所述标签命名实体的元素个数;
    基于所述元素个数和所述连接个数,确定各个所述任务关键语段之间的衔接性判断结果;
    其中,所述基于所述语义相似度,确定所述标签命名实体之间是否存在连接,包括:
    当所述语义相似度大于预设的第一阈值时,认定所述标签命名实体之间存在连接;
    否则,认定所述标签命名实体之间不存在连接;
    其中,按照如下公式计算任务关键语段之间的衔接性:
    Figure PCTCN2022135015-appb-100002
    其中,R (A,B)表示任务关键语段A与任务关键语段B之间的衔接度;L (A)表示任务关键语段A与任务关键语段B连接的标签命名实体个数;N (A)表示任务关键语段A中所有标签命名实体个数;L (B)表示任务关键语段B与任务关键语段A连接的标签命名实体个数;
    N (B)表示任务关键语段B中所有标签命名实体个数;其中,R (A,B)的取值越大,代表任务关键语段A和任务关键语段B的衔接性越好。
  6. 根据权利要求5所述的装置,其特征在于,所述解析模块还用于:
    将所述目标文本输入值预先设置的初始分析模型,确定初始分析结果;
    基于预先设置的知识库和所述初始分析结果,确定至少两个过程语段;
    利用预先设置的关键短语提取模型对各个所述过程语段进行关键短语提取,确定关键短语提取结果;
    根据所述关键短语提取结果,得到所述目标文本的任务关键语段。
  7. 一种电子设备,其特征在于,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行如权利要求1-4任一所述的方法的步骤。
  8. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-4中任一项所述的方法的步骤。
PCT/CN2022/135015 2022-08-02 2022-11-29 文本衔接性判断方法、装置、电子设备及存储介质 WO2023098658A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
ZA2023/01703A ZA202301703B (en) 2022-08-02 2023-02-10 Text cohesion judgment methods, devices, electronic equipment and storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210919249.7 2022-08-02
CN202210919249.7A CN114970491B (zh) 2022-08-02 2022-08-02 一种文本衔接性判断方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023098658A1 true WO2023098658A1 (zh) 2023-06-08

Family

ID=82969946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135015 WO2023098658A1 (zh) 2022-08-02 2022-11-29 文本衔接性判断方法、装置、电子设备及存储介质

Country Status (3)

Country Link
CN (1) CN114970491B (zh)
WO (1) WO2023098658A1 (zh)
ZA (1) ZA202301703B (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116678162A (zh) * 2023-08-02 2023-09-01 八爪鱼人工智能科技(常熟)有限公司 基于人工智能的冷库运行信息管理方法、系统及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970491B (zh) * 2022-08-02 2022-10-04 深圳市城市公共安全技术研究院有限公司 一种文本衔接性判断方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117293A1 (en) * 2014-10-23 2016-04-28 International Business Machines Corporation Natural language processing-assisted extract, transform, and load techniques
CN110147421A (zh) * 2019-05-10 2019-08-20 腾讯科技(深圳)有限公司 一种目标实体链接方法、装置、设备及存储介质
CN112380866A (zh) * 2020-11-25 2021-02-19 厦门市美亚柏科信息股份有限公司 一种文本话题标签生成方法、终端设备及存储介质
WO2021183159A1 (en) * 2020-03-13 2021-09-16 Google Llc Re-ranking results from semantic natural language processing machine learning algorithms for implementation in video games
CN113743125A (zh) * 2021-09-07 2021-12-03 广州晓阳智能科技有限公司 文本连贯性分析方法及装置
US20220067439A1 (en) * 2020-08-28 2022-03-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Entity linking method, electronic device and storage medium
CN114970491A (zh) * 2022-08-02 2022-08-30 深圳市城市公共安全技术研究院有限公司 一种文本衔接性判断方法、装置、电子设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294663B (zh) * 2013-05-03 2016-03-02 苏州大学 一种文本连贯性检测方法和装置
CN110287497B (zh) * 2019-07-03 2023-03-31 桂林电子科技大学 一种英语文本的语义结构连贯分析方法
CN110442872B (zh) * 2019-08-06 2022-12-16 鼎富智能科技有限公司 一种文本要素完整性审核方法及装置
CN111428470B (zh) * 2020-03-23 2022-04-22 北京世纪好未来教育科技有限公司 文本连贯性判定及其模型训练方法、电子设备及可读介质
CN112597309A (zh) * 2020-12-25 2021-04-02 西南电子技术研究所(中国电子科技集团公司第十研究所) 实时识别突发事件微博数据流的检测系统
CN113297367A (zh) * 2021-06-29 2021-08-24 中国平安人寿保险股份有限公司 用户对话衔接语生成的方法及相关设备
CN113553830B (zh) * 2021-08-11 2023-01-03 桂林电子科技大学 一种基于图的英语文本句子语篇连贯分析方法
CN113869033A (zh) * 2021-09-24 2021-12-31 厦门大学 融入迭代式句对关系预测的图神经网络句子排序方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117293A1 (en) * 2014-10-23 2016-04-28 International Business Machines Corporation Natural language processing-assisted extract, transform, and load techniques
CN110147421A (zh) * 2019-05-10 2019-08-20 腾讯科技(深圳)有限公司 一种目标实体链接方法、装置、设备及存储介质
WO2021183159A1 (en) * 2020-03-13 2021-09-16 Google Llc Re-ranking results from semantic natural language processing machine learning algorithms for implementation in video games
US20220067439A1 (en) * 2020-08-28 2022-03-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Entity linking method, electronic device and storage medium
CN112380866A (zh) * 2020-11-25 2021-02-19 厦门市美亚柏科信息股份有限公司 一种文本话题标签生成方法、终端设备及存储介质
CN113743125A (zh) * 2021-09-07 2021-12-03 广州晓阳智能科技有限公司 文本连贯性分析方法及装置
CN114970491A (zh) * 2022-08-02 2022-08-30 深圳市城市公共安全技术研究院有限公司 一种文本衔接性判断方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116678162A (zh) * 2023-08-02 2023-09-01 八爪鱼人工智能科技(常熟)有限公司 基于人工智能的冷库运行信息管理方法、系统及存储介质
CN116678162B (zh) * 2023-08-02 2023-09-26 八爪鱼人工智能科技(常熟)有限公司 基于人工智能的冷库运行信息管理方法、系统及存储介质

Also Published As

Publication number Publication date
CN114970491A (zh) 2022-08-30
CN114970491B (zh) 2022-10-04
ZA202301703B (en) 2023-07-26

Similar Documents

Publication Publication Date Title
WO2023098658A1 (zh) 文本衔接性判断方法、装置、电子设备及存储介质
US10698868B2 (en) Identification of domain information for use in machine learning models
US9251182B2 (en) Supplementing structured information about entities with information from unstructured data sources
CN109522418B (zh) 一种半自动的知识图谱构建方法
US9880997B2 (en) Inferring type classifications from natural language text
JP4767694B2 (ja) 不正ハイパーリンク検出装置及びその方法
KR100717998B1 (ko) 문서의 표절 검사 방법
US9087121B2 (en) Solving problems in data processing systems based on text analysis of historical data
CN111597351A (zh) 可视化文档图谱构建方法
CN107193796B (zh) 一种舆情事件检测方法及装置
Islam et al. Comparing word relatedness measures based on google n-grams
WO2023125589A1 (zh) 突发事件的监测方法及装置
US10796092B2 (en) Token matching in large document corpora
Hollenstein et al. Inconsistency detection in semantic annotation
Charoenpornsawat et al. Automatic sentence break disambiguation for Thai
Putri et al. Software feature extraction using infrequent feature extraction
Marciniak et al. Nested term recognition driven by word connection strength
JP6168057B2 (ja) 不具合発生原因抽出装置、不具合発生原因抽出方法および不具合発生原因抽出プログラム
Boulaknadel et al. Amazighe Named Entity Recognition using a A rule based approach
US7865488B2 (en) Method for discovering design documents
Behera An Experiment with the CRF++ Parts of Speech (POS) Tagger for Odia.
Kallimani et al. Statistical and analytical study of guided abstractive text summarization
US7865489B2 (en) System and computer program product for discovering design documents
Spasic FlexiTerm: a more efficient implementation of flexible multi-word term recognition
Sánchez et al. An unsupervised method for automatic validation of verbal phraseological units

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900462

Country of ref document: EP

Kind code of ref document: A1