WO2022171093A1 - 人员关系图谱的构建方法、装置及电子设备 - Google Patents

人员关系图谱的构建方法、装置及电子设备 Download PDF

Info

Publication number
WO2022171093A1
WO2022171093A1 PCT/CN2022/075564 CN2022075564W WO2022171093A1 WO 2022171093 A1 WO2022171093 A1 WO 2022171093A1 CN 2022075564 W CN2022075564 W CN 2022075564W WO 2022171093 A1 WO2022171093 A1 WO 2022171093A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
personnel
text sentence
person
entity word
Prior art date
Application number
PCT/CN2022/075564
Other languages
English (en)
French (fr)
Inventor
肖楠
顾松庠
Original Assignee
京东科技控股股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技控股股份有限公司 filed Critical 京东科技控股股份有限公司
Publication of WO2022171093A1 publication Critical patent/WO2022171093A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, apparatus, and electronic device for constructing a person relationship graph.
  • the person relationship graph is a knowledge graph constructed at the core of the "person” entity and the social relationship between people. According to the "Six Degrees of Separation Theory", in interpersonal communication, any two strangers can establish contact through at most five friends. In a sense, all people in the world can be connected in special ways through personal networks.
  • the first is to manually build a personnel relationship graph; the second is to collect a structured personnel database and convert it into a graph format.
  • the construction cost of the personnel relationship graph is high and the efficiency is low; in the second method above, the structured personnel information in the structured personnel database is relatively one-sided, lacks a lot of text information, and is inefficient.
  • the present application aims to solve one of the technical problems in the related art at least to a certain extent.
  • the present application proposes a method, device and electronic device for constructing a personnel relationship graph, so as to solve the technical problems of high cost and low efficiency in the related art method for constructing a personnel relation graph.
  • An embodiment of the first aspect of the present application proposes a method for constructing a person relationship graph, including: grabbing each text sentence used to construct a person relationship graph; for each text sentence, extracting a person entity word in the text sentence; Combining the text sentence and the person entity word, extract the relational role word in the text sentence; combine the person entity word and the relational role word in the text sentence to generate a multivariate corresponding to the text sentence Group information; build a personnel relationship graph according to the tuple information corresponding to each text sentence.
  • An embodiment of the second aspect of the present application proposes an apparatus for constructing a person relationship graph, including: a grabbing module for grabbing each text sentence used to construct a person relation graph; a first extraction module for each text sentence, extract the person entity word in the text sentence; the second extraction module is used to combine the text sentence and the person entity word to extract the relationship role word in the text sentence; the generation module is used to combine all the The personnel entity word and the relationship role word in the text sentence are used to generate the tuple information corresponding to the text sentence; the building module is used for constructing a personnel relationship graph according to the tuple information corresponding to each text sentence.
  • An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be executed by the at least one processor The instruction is executed by the at least one processor, so that the at least one processor can execute the method for constructing a person relationship graph according to the embodiment of the first aspect of the present application.
  • Embodiments of the fourth aspect of the present application provide a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are used to cause the computer to perform the construction of a person relationship graph as proposed in the embodiments of the first aspect of the present application method.
  • Embodiments of the fifth aspect of the present application provide a computer program product, including a computer program that, when executed by a processor, implements the method for constructing a person relationship graph as proposed in the embodiments of the first aspect of the present application.
  • each text sentence used to build the personnel relationship graph By grabbing each text sentence used to build the personnel relationship graph, for each text sentence, first extract the personnel entity words in the text sentence; then combine the text sentences and personnel entity words to extract the relationship role words in the text sentence; combine the text
  • the personnel entity words and relational role words in the sentence are used to generate the tuple information corresponding to the text sentence; then according to the tuple information corresponding to each text sentence, a personnel relationship graph is constructed, so that the tuple information corresponding to the text sentence can be automatically extracted and automatically constructed.
  • the personnel relation graph improves the construction efficiency of the personnel relation graph and reduces the construction cost of the personnel relation graph.
  • FIG. 1 is a schematic flowchart of a method for constructing a personnel relationship graph provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic flowchart of a method for constructing a personnel relationship map provided in Embodiment 2 of the present application;
  • FIG. 3 is a schematic structural diagram of a device for constructing a personnel relationship map provided in Embodiment 3 of the present application;
  • FIG. 4 is a schematic structural diagram of a device for constructing a personnel relationship map provided in Embodiment 4 of the present application;
  • Figure 5 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.
  • the personnel relationship graph is usually constructed manually, or the structured personnel database is collected and converted into a graph format, which is costly and inefficient for manual construction; and the structured personnel information in the structured personnel database is relatively one-sided and lacks a lot. Text information, poor construction efficiency.
  • the present application mainly aims at the technical problems of high cost and poor efficiency in the construction method of the personnel relation graph in the related art, and proposes a construction method of the personnel relation graph.
  • a personnel relationship graph in the embodiment of the present application, by grabbing each text sentence used to construct a personnel relation graph, for each text sentence, first extract the personnel entity word in the text sentence; then combine the text sentence and the personnel entity word , extract the relational role words in the text sentences; combine the personnel entity words and relational role words in the text sentences to generate the tuple information corresponding to the text sentences; The tuple information corresponding to the text sentence is automatically extracted, and the personnel relation graph is automatically constructed, which improves the construction efficiency of the personnel relation graph and reduces the construction cost of the personnel relation graph.
  • FIG. 1 is a schematic flowchart of a method for constructing a personnel relationship graph according to Embodiment 1 of the present application.
  • the embodiment of the present application is exemplified in that the method for constructing the personnel relation graph is configured in the apparatus for constructing the personnel relation graph, and the apparatus for constructing the personnel relation graph can be applied to any electronic device, so that the Electronic devices can perform the building function of a people relationship graph.
  • the electronic device can be a personal computer (Personal Computer, PC for short), a cloud device, a mobile device, etc.
  • the mobile device can be, for example, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, a vehicle-mounted device, etc. with various operating systems, Hardware devices for touch screens and/or display screens.
  • the method for constructing the personnel relationship graph may include the following steps 101 to 105 .
  • Step 101 grab each text sentence used for constructing a person relationship graph.
  • the process of performing step 101 by the device for constructing the personnel relationship graph may be, for example, to determine the individual personnel information and the scraping source for constructing the personnel relationship graph; Text sentences related to personnel information; determine the text sentences related to each personnel information as each text sentence used to construct a personnel relationship graph.
  • the crawling source may be, for example, a paper library, an established library, a webpage of a website, or the like.
  • the personnel information may include, for example, parameters such as personnel name, personnel address, personnel mobile phone number, etc.; the text statement related to the personnel information may be a text sentence including any parameter in the personnel information.
  • Step 102 for each text sentence, extract the person entity word in the text sentence.
  • the process of performing step 102 by the device for constructing the personnel relationship graph may be, for example, determining the text vector corresponding to the text sentence; inputting the text vector corresponding to the text sentence into a preset personnel entity word extraction model to extract the text sentence Person entity words in .
  • the method of determining the text vector corresponding to the text sentence by the device for constructing the personnel relationship map may be, for example, obtaining each word in the text sentence; combining the semantic representation model and the text sentence, determining word vector corresponding to each word; combining word vector corresponding to each word, text content, preset syntactic dependency tree and graph vector model, to determine the text vector corresponding to the text sentence.
  • the manner of determining the word vector corresponding to each word may be, for example, inputting the text sentence and each word in the text sentence into the semantic representation model to obtain the word vector corresponding to each word in the text sentence.
  • the semantic representation model can be a semantic representation model pre-trained by big data, so that a large amount of language knowledge can be included in the word vector.
  • the text vector corresponding to the text sentence is determined by combining the text content and the syntactic dependency tree, and the text vector can include the dependencies between words in the text sentence, thereby improving the accuracy of the text vector.
  • the acquisition method of the preset personnel entity word extraction model may be, for example, acquiring training data, wherein the training data includes: a large number of sample text sentences and corresponding personnel entity words; using the training data to extract the initial personnel entity word extraction model Perform training to obtain the preset human entity word extraction model.
  • the person entity word extraction model may specifically be a sequence labeling model.
  • Step 103 extracting relational role words in the text sentence in combination with the text sentence and the person entity word.
  • the process of performing step 103 by the device for constructing the personnel relationship graph may be, for example, determining the text vector corresponding to the text sentence; inputting the text vector and the personnel entity word into a preset relationship role word extraction model to obtain the text sentence Relational role words in .
  • the process of obtaining the relationship role words in the text sentence by the relationship role word extraction model may be, for example, performing a fully connected network layer encoding process on the text vector and the person entity word to obtain a semantic encoding matrix about the person entity word.
  • the semantic encoding process The matrix is processed through the convolutional neural network layer to obtain the start position and end position of each relational role word in the text sentence, and then obtain each relational role word in the text sentence.
  • the acquisition method of the preset relational role word extraction model may be, for example, acquiring training data, wherein the training data includes: a large number of sample text sentences and corresponding personnel entity words and relational role words;
  • the role word extraction model is trained to obtain the preset relational role word extraction model.
  • the relational role word extraction model may specifically be a sequence labeling model.
  • Step 104 generating tuple information corresponding to the text sentence in combination with the person entity word and the relational role word in the text sentence.
  • the device for constructing the personnel relationship graph can determine the relationship between the personnel entity words according to the positions of the personnel entity words and the positions of the relationship role words in the text sentence, and then generate tuple information.
  • the tuple information may be a triplet, a quadruple, or more tuples, and the like. Taking a triplet as an example, the triplet information may include: a person entity word A, a person entity word B, and the relationship between A and B. Taking a tuple as an example, the tuple information may include: personnel entity word A, personnel entity word B, the relationship between A and B, personnel entity word A, personnel entity word C, and the relationship between A and C.
  • Step 105 construct a personnel relationship graph according to the tuple information corresponding to each text sentence.
  • the device for constructing the personnel relation graph can construct the personnel relation graph according to each personnel entity word in the tuple information corresponding to each text sentence and the corresponding relationship between them.
  • a personnel relationship graph in the embodiment of the present application, by grabbing each text sentence used to construct a personnel relation graph, for each text sentence, first extract the personnel entity word in the text sentence; then combine the text sentence and the personnel entity word , extract the relational role words in the text sentences; combine the personnel entity words and relational role words in the text sentences to generate the tuple information corresponding to the text sentences; The tuple information corresponding to the text sentence is automatically extracted, and the personnel relation graph is automatically constructed, which improves the construction efficiency of the personnel relation graph and reduces the construction cost of the personnel relation graph.
  • FIG. 2 is a schematic flowchart of a method for constructing a personnel relationship graph according to Embodiment 2 of the present application. As shown in FIG. 2 , on the basis of the embodiment shown in FIG. 1 , after step 105 , the method may further include the following steps 201 to 203 .
  • Step 201 Receive a query request, where the query request includes: the information of the person to be queried.
  • the personnel information to be queried may include, for example, the name of the personnel to be queried, and the like.
  • the personnel information to be queried may be input by the user in the query box, or obtained by recognizing the voice after the user's voice input.
  • Step 202 Query the personnel relationship graph according to the personnel information to be queried to obtain a first personnel entity word matching the personnel information to be queried, and a second personnel entity word that is related to the first personnel entity word.
  • the first person entity word matching the person information to be queried may be included in the personnel information to be queried, or the similarity with the person information to be queried exceeds a certain threshold.
  • the relationship between the first person entity word and the second person entity word may be, for example, a parent-child relationship, a colleague relationship, a relative relationship, a customer relationship, and the like.
  • the number of the second person entity words may be one or more.
  • the method further includes: if the first person entity word is not queried, or, if the second person entity word that has a relationship with the first person entity word is not queried, grabbing and querying extract the tuple information from the text sentences related to the personnel information to be queried, and update the personnel relationship graph in combination with the extracted tuple information; query and update according to the personnel information to be queried to obtain the first person entity word that matches the person information to be queried, and the second person entity word that has a relationship with the first person entity word.
  • the update trigger condition of the personnel relationship graph may include: for a certain personnel information to be queried, the first personnel entity word is not queried, or the second personnel entity word that has a relationship with the first personnel entity word is not queried.
  • the update triggering condition of the personnel relationship graph may further include: periodic triggering, for example, triggering the update of the personnel relationship graph every preset time period.
  • the update process of the personnel relationship graph can be similar to the construction process, except that the last step is to update the existing personnel relationship graph instead of rebuilding and constructing it.
  • the last step is to update the existing personnel relationship graph instead of rebuilding and constructing it.
  • Step 203 displaying the second person entity word and the relationship between the first person entity word and the second person entity word.
  • the display mode of the device for constructing the personnel relationship graph may be: sending the second personnel entity word and the relationship between the first personnel entity word and the second personnel entity word to the terminal used by the user equipment, terminal equipment is displayed on the display screen.
  • a personnel relationship graph in the embodiment of the present application, by grabbing each text sentence used to construct a personnel relation graph, for each text sentence, first extract the personnel entity word in the text sentence; then combine the text sentence and the personnel entity word , extract the relational role words in the text sentence; combine the personnel entity words and relational role words in the text sentence to generate the tuple information corresponding to the text sentence; and then construct a personnel relationship map according to the tuple information corresponding to each text sentence; then, Receive a query request, where the query request includes: the personnel information to be queried; query the personnel relationship graph according to the personnel information to be queried to obtain a first personnel entity word matching the personnel information to be queried, and a first personnel entity word matching the personnel information to be queried Establish a related second person entity word; display the second person entity word, and the relationship between the first person entity word and the second person entity word, so that personnel information can be queried in combination with the automatically constructed personnel relationship graph, and the improvement of personnel Information query efficiency.
  • FIG. 3 is a schematic structural diagram of an apparatus for constructing a personnel relationship graph provided in Embodiment 3 of the present application.
  • the apparatus 300 for constructing a person relationship graph may include: a grabbing module 310 , a first extracting module 320 , a second extracting module 330 , a generating module 340 and a constructing module 350 .
  • the grabbing module 310 is used for grabbing each text sentence used for constructing a person relationship graph
  • the first extraction module 320 is used for extracting the person entity words in the text sentence for each text sentence;
  • the second extraction module 330 is configured to extract the relational role words in the text statement in combination with the text statement and the person entity word;
  • a generating module 340 configured to generate the tuple information corresponding to the text statement in combination with the person entity word and the relationship role word in the text statement;
  • the building module 350 is configured to build a personnel relationship graph according to the tuple information corresponding to each text sentence.
  • the scraping module 310 is specifically configured to determine each person information and scraping source used for constructing the personnel relationship graph; from the page corresponding to the scraping source, scrape Text sentences related to the individual personnel information; determining the text sentences related to the individual personnel information as the individual text sentences used for constructing the personnel relationship graph.
  • the first extraction module 320 is specifically configured to determine the text vector corresponding to the text sentence; input the text vector corresponding to the text sentence into a preset person entity word extraction model to extract Extract the person entity words in the text sentence.
  • the first extraction module 320 is specifically configured to obtain each word in the text sentence; combine the semantic representation model and the text sentence to determine the word vector corresponding to the each word; The text vector corresponding to the text sentence is determined by combining the word vector corresponding to each word, the text content, a preset syntactic dependency tree and a graph vector model.
  • the second extraction module 330 is specifically configured to determine the text vector corresponding to the text sentence; input the text vector and the person entity word into a preset relational role word extraction model , to obtain relational role words in the text sentence.
  • the apparatus further includes: a receiving module 360, a query module 370 and a display module 380; the receiving module 360 is configured to receive a query request, wherein the query The request includes: the personnel information to be queried; the query module 370 is configured to query the personnel relationship graph according to the personnel information to be queried, so as to obtain the first personnel entity word matching the personnel information to be queried, and a second person entity word that is related to the first person entity word; the display module 380 is configured to display the second person entity word, and the first person entity word and the second person Relationships between entity words.
  • the apparatus further includes: an update module; the grabbing module 310 is further configured to: when the first person entity word is not queried, or, when the first person entity word is not queried, or the first person entity word is not queried When a person entity word establishes a relationship with the second person entity word, grab text sentences related to the person information to be queried; The tuple information is extracted from the text sentence, and the personnel relationship graph is updated in combination with the extracted tuple information; the query module 370 is further configured to query the updated personnel information according to the personnel information to be queried. to obtain a first person entity word that matches the person information to be queried, and a second person entity word that has a relationship with the first person entity word.
  • the apparatus for constructing a person relationship graph by grabbing each text sentence used to construct a person relation graph, for each text sentence, first extracts the person entity word in the text sentence; then combines the text sentence and the person entity word , extract the relational role words in the text sentences; combine the personnel entity words and relational role words in the text sentences to generate the tuple information corresponding to the text sentences; The tuple information corresponding to the text sentence is automatically extracted, and the personnel relation graph is automatically constructed, which improves the construction efficiency of the personnel relation graph and reduces the construction cost of the personnel relation graph.
  • the present application also provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor
  • the executed instructions are executed by the at least one processor, so that the at least one processor can execute the method for constructing a person relationship graph as proposed in the foregoing embodiments of the present application.
  • the present application further provides a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are used to cause the computer to execute the method for constructing a person relationship graph as proposed in the foregoing embodiments of the present application .
  • the present application also provides a computer program product, including a computer program, which, when executed by a processor, implements the method for constructing a person relationship graph as proposed in the foregoing embodiments of the present application.
  • FIG. 5 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.
  • the electronic device 12 shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • the electronic device 12 takes the form of a general-purpose computing device.
  • Components of electronic device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereinafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (Peripheral Component Interconnection; hereinafter referred to as: PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnection
  • Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including both volatile and non-volatile media, removable and non-removable media.
  • the memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32 .
  • Electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive").
  • a magnetic disk drive for reading and writing to removable non-volatile magnetic disks (eg, "floppy disks") and removable non-volatile optical disks (eg, compact disk read only memory) may be provided.
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
  • Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
  • the electronic device 12 may also communicate with one or more external devices 14 (eg, a keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 .
  • the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or a public network, such as the Internet, through the network adapter 20 ) communication.
  • networks such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or a public network, such as the Internet, through the network
  • network adapter 20 communicates with other modules of electronic device 12 via bus 18 .
  • bus 18 It should be understood that, although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems, etc.
  • the processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, implements the methods mentioned in the foregoing embodiments.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with “first”, “second” may expressly or implicitly include at least one of that feature.
  • plurality means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.
  • computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种人员关系图谱的构建方法、装置及电子设备,方法包括:抓取用于构建人员关系图谱的各个文本语句(101),针对每个文本语句,先抽取文本语句中的人员实体词(102);然后结合文本语句以及人员实体词,抽取文本语句中的关系角色词(103);结合文本语句中的人员实体词以及关系角色词,生成文本语句对应的多元组信息(104);进而根据各个文本语句对应的多元组信息,构建人员关系图谱(105)。

Description

人员关系图谱的构建方法、装置及电子设备
相关申请的交叉引用
本申请基于申请号为202110177821.2、申请日为2021年02月09日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及计算机技术领域,尤其涉及一种人员关系图谱的构建方法、装置及电子设备。
背景技术
人员关系图谱是以“人员”实体和人员之间的社会关系为核心构建的知识图谱。根据“六度分离理论”,在人际交往中,任意两个陌生人最多只要通过五个朋友就能建立联系。从某种意义上来说,世界上所有人都可以通过个人的关系网以特殊的方式联系起来。
目前,建立人员关系图谱的方式主要有两种,第一种是人工手动构建人员关系图谱;第二种是收集结构化人员数据库转换成图谱格式。上述第一种方式,人员关系图谱的构建成本大,效率差;上述第二种方式,结构化人员数据库中的结构化人员信息比较片面,缺少很多文本信息,效率差。
发明内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。
本申请提出一种人员关系图谱的构建方法、装置及电子设备,以解决相关技术中的人员关系图谱的构建方法成本大,效率差的技术问题。
本申请第一方面实施例提出了一种人员关系图谱的构建方法,包括:抓取用于构建人员关系图谱的各个文本语句;针对每个文本语句,抽取所述文本语句中的人员实体词;结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
本申请第二方面实施例提出了一种人员关系图谱的构建装置,包括:抓取模块,用于抓取用于构建人员关系图谱的各个文本语句;第一抽取模块,用于针对每个文本语句,抽取所述文本语句中的人员实体词;第二抽取模块,用于结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;生成模块,用于结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;构建模块,用于根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
本申请第三方面实施例提出了一种电子设备,包括:至少一个处理器;以及与所述 至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如本申请第一方面实施例提出的人员关系图谱的构建方法。
本申请第四方面实施例提出了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行如本申请第一方面实施例提出的人员关系图谱的构建方法。
本申请第五方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如本申请第一方面实施例提出的人员关系图谱的构建方法。
通过抓取用于构建人员关系图谱的各个文本语句,针对每个文本语句,先抽取文本语句中的人员实体词;然后结合文本语句以及人员实体词,抽取文本语句中的关系角色词;结合文本语句中的人员实体词以及关系角色词,生成文本语句对应的多元组信息;进而根据各个文本语句对应的多元组信息,构建人员关系图谱,从而能够自动抽取文本语句对应的多元组信息,自动构建人员关系图谱,提高了人员关系图谱的构建效率,降低了人员关系图谱的构建成本。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本申请实施例一所提供的人员关系图谱的构建方法的流程示意图;
图2为本申请实施例二所提供的人员关系图谱的构建方法的流程示意图;
图3为本申请实施例三所提供的人员关系图谱的构建装置的结构示意图;
图4为本申请实施例四所提供的人员关系图谱的构建装置的结构示意图;
图5示出了适于用来实现本申请实施方式的示例性电子设备的框图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
相关技术中,通常是人工手动构建人员关系图谱,或者,收集结构化人员数据库转换成图谱格式,手动构建成本大,构建效率差;且结构化人员数据库中的结构化人员信息比较片面,缺少很多文本信息,构建效率差。
本申请主要针对相关技术中的人员关系图谱的构建方法成本大,效率差的技术问题,提出一种人员关系图谱的构建方法。
本申请实施例的人员关系图谱的构建方法,通过抓取用于构建人员关系图谱的各个 文本语句,针对每个文本语句,先抽取文本语句中的人员实体词;然后结合文本语句以及人员实体词,抽取文本语句中的关系角色词;结合文本语句中的人员实体词以及关系角色词,生成文本语句对应的多元组信息;进而根据各个文本语句对应的多元组信息,构建人员关系图谱,从而能够自动抽取文本语句对应的多元组信息,自动构建人员关系图谱,提高了人员关系图谱的构建效率,降低了人员关系图谱的构建成本。
下面参考附图描述本申请实施例的人员关系图谱的构建方法、装置及电子设备。
图1为本申请实施例一所提供的人员关系图谱的构建方法的流程示意图。
需要说明的是,本申请实施例以该人员关系图谱的构建方法被配置于人员关系图谱的构建装置中来举例说明,该人员关系图谱的构建装置可以应用于任一电子设备中,以使该电子设备可以执行人员关系图谱的构建功能。
其中,电子设备可以为个人电脑(Personal Computer,简称PC)、云端设备、移动设备等,移动设备例如可以为手机、平板电脑、个人数字助理、穿戴式设备、车载设备等具有各种操作系统、触摸屏和/或显示屏的硬件设备。
如图1所示,该人员关系图谱的构建方法可以包括以下步骤101至步骤105。
步骤101,抓取用于构建人员关系图谱的各个文本语句。
本申请实施例中,人员关系图谱的构建装置执行步骤101的过程例如可以为,确定用于构建人员关系图谱的各个人员信息以及抓取源;从抓取源对应的页面中,抓取与各个人员信息相关的文本语句;将与各个人员信息相关的文本语句,确定为用于构建人员关系图谱的各个文本语句。
本申请实施例中,抓取源例如可以为论文库、建立库、网站的网页等。其中,人员信息例如可以包括:人员名称、人员地址、人员手机号等参数;与人员信息相关的文本语句,可以为包括人员信息中任意一个参数的文本语句。
步骤102,针对每个文本语句,抽取文本语句中的人员实体词。
本申请实施例中,人员关系图谱的构建装置执行步骤102的过程例如可以为,确定文本语句对应的文本向量;将文本语句对应的文本向量输入预设的人员实体词抽取模型,以抽取文本语句中的人员实体词。
本申请实施例中,为了提高文本向量的准确度,人员关系图谱的构建装置确定文本语句对应的文本向量的方式例如可以为,获取文本语句中的各个词语;结合语义表示模型以及文本语句,确定各个词语对应的词语向量;结合各个词语对应的词语向量、文本内容、预设的句法依存树以及图向量模型,确定文本语句对应的文本向量。
其中,确定各个词语对应的词语向量的方式例如可以为,将文本语句以及文本语句中的各个词语输入语义表示模型,以获取文本语句中各个词语对应的词语向量。其中,语义表示模型可以为经过大数据预训练的语义表示模型,从而能够在词语向量中包含大量语言知识。
其中,结合文本内容以及句法依存树,来确定文本语句对应的文本向量,能够在文本向量中包含文本语句中词语之间的依存关系,从而提高文本向量的准确度。
其中,预设的人员实体词抽取模型的获取方式例如可以为,获取训练数据,其中,训练数据包括:大量的样本文本语句以及对应的人员实体词;采用训练数据对初始的人员实体词抽取模型进行训练,得到所述预设的人员实体词抽取模型。其中,人员实体词抽取模型,具体可以为序列标注模型。
步骤103,结合文本语句以及人员实体词,抽取文本语句中的关系角色词。
本申请实施例中,人员关系图谱的构建装置执行步骤103的过程例如可以为,确定文本语句对应的文本向量;将文本向量以及人员实体词输入预设的关系角色词抽取模型,以获取文本语句中的关系角色词。
其中,关系角色词抽取模型获取文本语句中的关系角色词的过程例如可以为,对文本向量以及人员实体词进行全连接网络层编码处理,以获取关于人员实体词的语义编码矩阵,该语义编码矩阵通过卷积神经网络层进行处理,以获取文本语句中各个关系角色词的开始位置以及结束位置,进而获取文本语句中的各个关系角色词。
其中,预设的关系角色词抽取模型的获取方式例如可以为,获取训练数据,其中,训练数据包括:大量的样本文本语句以及对应的人员实体词和关系角色词;采用训练数据对初始的关系角色词抽取模型进行训练,得到所述预设的关系角色词抽取模型。其中,关系角色词抽取模型,具体可以为序列标注模型。
步骤104,结合文本语句中的人员实体词以及关系角色词,生成文本语句对应的多元组信息。
本申请实施例中,人员关系图谱的构建装置根据文本语句中的人员实体词的所在位置以及关系角色词的所在位置,可以确定各个人员实体词之间的关系,进而生成多元组信息。其中,多元组信息可以为三元组、四元组或者更多元组等。以三元组为例,三元组信息中可以包括:人员实体词A、人员实体词B、A与B之间的关系。以多元组为例,多元组信息中可以包括:人员实体词A、人员实体词B、A与B之间的关系、人员实体词A、人员实体词C、A与C之间的关系。
本申请实施例中,需要说明的是,由于部分人员存在多个名称,例如昵称、小名、软件账号名称等,因此,需要对各个文本语句对应的多元组信息进行对齐处理,也就是说,对指代相同人员的多个人员实体词进行实体词统一化处理,将指代相同人员的多个人员实体词的关系,作为统一化处理后的人员实体词的关系。
步骤105,根据各个文本语句对应的多元组信息,构建人员关系图谱。
本申请实施例中,人员关系图谱的构建装置根据各个文本语句对应的多元组信息中的各个人员实体词以及之间的对应关系,就可以构建人员关系图谱。
本申请实施例的人员关系图谱的构建方法,通过抓取用于构建人员关系图谱的各个文本语句,针对每个文本语句,先抽取文本语句中的人员实体词;然后结合文本语句以及人员实体词,抽取文本语句中的关系角色词;结合文本语句中的人员实体词以及关系角色词,生成文本语句对应的多元组信息;进而根据各个文本语句对应的多元组信息,构建人员关系图谱,从而能够自动抽取文本语句对应的多元组信息,自动构建人员关系 图谱,提高了人员关系图谱的构建效率,降低了人员关系图谱的构建成本。
图2为本申请实施例二所提供的人员关系图谱的构建方法的流程示意图。如图2所示,在图1所示实施例的基础上,步骤105之后,所述的方法还可以包括以下步骤201至步骤203。
步骤201,接收查询请求,其中,查询请求包括:待查询的人员信息。
本申请实施例中,待查询的人员信息例如可以包括:待查询的人员名称等。其中,待查询的人员信息,可以为用户在查询框中输入的,或者为用户语音输入后对语音进行识别后得到的。
步骤202,根据待查询的人员信息查询人员关系图谱,以获取与待查询的人员信息匹配的第一人员实体词,以及与第一人员实体词建立有关系的第二人员实体词。
本申请实施例中,与待查询的人员信息匹配的第一人员实体词,可以包含在待查询的人员信息中,或者与待查询的人员信息的相似度超过一定阈值。其中,第一人员实体词与第二人员实体词之间的关系例如可以为,父子关系、同事关系、亲戚关系、客户关系等。其中,第二人员实体词的数量可以为一个或者多个。
本申请实施例中,所述的方法还包括:若未查询到第一人员实体词,或者,未查询到与第一人员实体词建立有关系的第二人员实体词,则抓取与待查询的人员信息相关的文本语句;从与待查询的人员信息相关的文本语句中抽取多元组信息,并结合抽取到的多元组信息对人员关系图谱进行更新处理;根据待查询的人员信息查询更新后的人员关系图谱,以获取与待查询的人员信息匹配的第一人员实体词,以及与第一人员实体词建立有关系的第二人员实体词。
其中,人员关系图谱的更新触发条件可以包括:针对某个待查询的人员信息,未查询到第一人员实体词,或者,未查询到与第一人员实体词建立有关系的第二人员实体词。另外,为了进一步提高查询效率,人员关系图谱的更新触发条件还可以包括:周期性触发,例如,每隔预设时间段触发人员关系图谱的更新。
其中,人员关系图谱的更新过程可以与构建过程类似,只是最后一个步骤是对已有的人员关系图谱进行更新,而不是重建构建,其它步骤可以参考图1所示实施例的说明,此处不做详细说明。
步骤203,展示第二人员实体词,以及第一人员实体词与所述第二人员实体词之间的关系。
本申请实施例中,人员关系图谱的构建装置的展示方式可以为,将第二人员实体词,以及第一人员实体词与所述第二人员实体词之间的关系发送给用户所使用的终端设备,终端设备在显示屏上进行展示。
本申请实施例的人员关系图谱的构建方法,通过抓取用于构建人员关系图谱的各个文本语句,针对每个文本语句,先抽取文本语句中的人员实体词;然后结合文本语句以及人员实体词,抽取文本语句中的关系角色词;结合文本语句中的人员实体词以及关系 角色词,生成文本语句对应的多元组信息;进而根据各个文本语句对应的多元组信息,构建人员关系图谱;之后,接收查询请求,其中,查询请求包括:待查询的人员信息;根据待查询的人员信息查询人员关系图谱,以获取与待查询的人员信息匹配的第一人员实体词,以及与第一人员实体词建立有关系的第二人员实体词;展示第二人员实体词,以及第一人员实体词与第二人员实体词之间的关系,从而能够结合自动构建的人员关系图谱进行人员信息查询,提高人员信息查询效率。
图3为本申请实施例三所提供的人员关系图谱的构建装置的结构示意图。
如图3所示,该人员关系图谱的构建装置300可以包括:抓取模块310、第一抽取模块320、第二抽取模块330、生成模块340和构建模块350。
其中,抓取模块310,用于抓取用于构建人员关系图谱的各个文本语句;
第一抽取模块320,用于针对每个文本语句,抽取所述文本语句中的人员实体词;
第二抽取模块330,用于结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;
生成模块340,用于结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;
构建模块350,用于根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
进一步地,在一些实施方式中,所述抓取模块310具体用于,确定用于构建所述人员关系图谱的各个人员信息以及抓取源;从所述抓取源对应的页面中,抓取与所述各个人员信息相关的文本语句;将与所述各个人员信息相关的文本语句,确定为用于构建所述人员关系图谱的所述各个文本语句。
进一步地,在一些实施方式中,所述第一抽取模块320具体用于,确定所述文本语句对应的文本向量;将所述文本语句对应的文本向量输入预设的人员实体词抽取模型,以抽取所述文本语句中的人员实体词。
进一步地,在一些实施方式中,所述第一抽取模块320具体用于,获取所述文本语句中的各个词语;结合语义表示模型以及所述文本语句,确定所述各个词语对应的词语向量;结合所述各个词语对应的词语向量、所述文本内容、预设的句法依存树以及图向量模型,确定所述文本语句对应的文本向量。
进一步地,在一些实施方式中,所述第二抽取模块330具体用于,确定所述文本语句对应的文本向量;将所述文本向量以及所述人员实体词输入预设的关系角色词抽取模型,以获取所述文本语句中的关系角色词。
进一步地,在一些实施方式中,结合参考图4,所述的装置还包括:接收模块360、查询模块370和展示模块380;所述接收模块360,用于接收查询请求,其中,所述查询请求包括:待查询的人员信息;所述查询模块370,用于根据所述待查询的人员信息查询所述人员关系图谱,以获取与所述待查询的人员信息匹配的第一人员实体词,以及与所述第一人员实体词建立有关系的第二人员实体词;所述展示模块380,用于展示所 述第二人员实体词,以及所述第一人员实体词与所述第二人员实体词之间的关系。
进一步地,在一些实施方式中,所述的装置还包括:更新模块;所述抓取模块310,还用于在未查询到所述第一人员实体词,或者,未查询到与所述第一人员实体词建立有关系的所述第二人员实体词时,抓取与所述待查询的人员信息相关的文本语句;所述更新模块,用于从与所述待查询的人员信息相关的文本语句中抽取多元组信息,并结合抽取到的所述多元组信息对所述人员关系图谱进行更新处理;所述查询模块370,还用于根据所述待查询的人员信息查询所述更新后的人员关系图谱,以获取与所述待查询的人员信息匹配的第一人员实体词,以及与所述第一人员实体词建立有关系的第二人员实体词。
需要说明的是,前述人员关系图谱的构建方法实施例中的解释说明也适用于该实施例的人员关系图谱的构建装置,此处不再赘述。
本申请实施例的人员关系图谱的构建装置,通过抓取用于构建人员关系图谱的各个文本语句,针对每个文本语句,先抽取文本语句中的人员实体词;然后结合文本语句以及人员实体词,抽取文本语句中的关系角色词;结合文本语句中的人员实体词以及关系角色词,生成文本语句对应的多元组信息;进而根据各个文本语句对应的多元组信息,构建人员关系图谱,从而能够自动抽取文本语句对应的多元组信息,自动构建人员关系图谱,提高了人员关系图谱的构建效率,降低了人员关系图谱的构建成本。
在一些实施例中,本申请还提出一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如本申请前述实施例提出的人员关系图谱的构建方法。
在一些实施例中,本申请还提出一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行如本申请前述实施例提出的人员关系图谱的构建方法。
在一些实施例中,本申请还提出一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如本申请前述实施例提出的人员关系图谱的构建方法。
图5示出了适于用来实现本申请实施方式的示例性电子设备的框图。图5显示的电子设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图5所示,电子设备12以通用计算设备的形式表现。电子设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture;以下简称:ISA)总线,微通道体系结构(Micro Channel Architecture;以下简称:MAC) 总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association;以下简称:VESA)局域总线以及外围组件互连(Peripheral Component Interconnection;以下简称:PCI)总线。
电子设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory;以下简称:RAM)30和/或高速缓存存储器32。电子设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如:光盘只读存储器(Compact Disc Read Only Memory;以下简称:CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory;以下简称:DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可以包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。
电子设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该电子设备12交互的设备通信,和/或与使得该电子设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,电子设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与电子设备12的其它模块通信。应当明白,尽管图5中未示出,可以结合电子设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现前述实施例中提及的方法。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以 在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模 块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (17)

  1. 一种人员关系图谱的构建方法,包括:
    抓取用于构建人员关系图谱的各个文本语句;
    针对每个文本语句,抽取所述文本语句中的人员实体词;
    结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;
    结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;
    根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
  2. 根据权利要求1所述的人员关系图谱的构建方法,其中所述抓取用于构建人员关系图谱的各个文本语句,包括:
    确定用于构建所述人员关系图谱的各个人员信息以及抓取源;
    从所述抓取源对应的页面中,抓取与所述各个人员信息相关的文本语句;
    将与所述各个人员信息相关的文本语句,确定为用于构建所述人员关系图谱的所述各个文本语句。
  3. 根据权利要求1所述的人员关系图谱的构建方法,其中所述抽取所述文本语句中的人员实体词,包括:
    确定所述文本语句对应的文本向量;
    将所述文本语句对应的文本向量输入预设的人员实体词抽取模型,以抽取所述文本语句中的人员实体词。
  4. 根据权利要求3所述的人员关系图谱的构建方法,其中所述确定所述文本语句对应的文本向量,包括:
    获取所述文本语句中的各个词语;
    结合语义表示模型以及所述文本语句,确定所述各个词语对应的词语向量;
    结合所述各个词语对应的词语向量、所述文本内容、预设的句法依存树以及图向量模型,确定所述文本语句对应的文本向量。
  5. 根据权利要求1所述的人员关系图谱的构建方法,其中所述结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词,包括:
    确定所述文本语句对应的文本向量;
    将所述文本向量以及所述人员实体词输入预设的关系角色词抽取模型,以获取所述文本语句中的关系角色词。
  6. 根据权利要求1所述的人员关系图谱的构建方法,还包括:
    接收查询请求,其中,所述查询请求包括:待查询的人员信息;
    根据所述待查询的人员信息查询所述人员关系图谱,以获取与所述待查询的人员信息匹配的第一人员实体词,以及与所述第一人员实体词建立有关系的第二人员实体词;
    展示所述第二人员实体词,以及所述第一人员实体词与所述第二人员实体词之间的关系。
  7. 根据权利要求6所述的人员关系图谱的构建方法,还包括:
    若未查询到所述第一人员实体词,或者,未查询到与所述第一人员实体词建立有关系的所述第二人员实体词,则抓取与所述待查询的人员信息相关的文本语句;
    从与所述待查询的人员信息相关的文本语句中抽取多元组信息,并结合抽取到的所述多元组信息对所述人员关系图谱进行更新处理;
    根据所述待查询的人员信息查询所述更新后的人员关系图谱,以获取与所述待查询的人员信息匹配的第一人员实体词,以及与所述第一人员实体词建立有关系的第二人员实体词。
  8. 一种人员关系图谱的构建装置,包括:
    抓取模块,用于抓取用于构建人员关系图谱的各个文本语句;
    第一抽取模块,用于针对每个文本语句,抽取所述文本语句中的人员实体词;
    第二抽取模块,用于结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;
    生成模块,用于结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;
    构建模块,用于根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
  9. 根据权利要求8所述的人员关系图谱的构建装置,其中所述抓取模块进一步用于,
    确定用于构建所述人员关系图谱的各个人员信息以及抓取源;
    从所述抓取源对应的页面中,抓取与所述各个人员信息相关的文本语句;
    将与所述各个人员信息相关的文本语句,确定为用于构建所述人员关系图谱的所述各个文本语句。
  10. 根据权利要求8所述的人员关系图谱的构建装置,其中所述第一抽取模块进一步用于,
    确定所述文本语句对应的文本向量;
    将所述文本语句对应的文本向量输入预设的人员实体词抽取模型,以抽取所述文本语句中的人员实体词。
  11. 根据权利要求10所述的人员关系图谱的构建装置,其中所述第一抽取模块进一步用于,
    获取所述文本语句中的各个词语;
    结合语义表示模型以及所述文本语句,确定所述各个词语对应的词语向量;
    结合所述各个词语对应的词语向量、所述文本内容、预设的句法依存树以及图向量模型,确定所述文本语句对应的文本向量。
  12. 根据权利要求8所述的人员关系图谱的构建装置,其中所述第二抽取模块进一步用于,
    确定所述文本语句对应的文本向量;
    将所述文本向量以及所述人员实体词输入预设的关系角色词抽取模型,以获取所述文本语句中的关系角色词。
  13. 根据权利要求8所述的人员关系图谱的构建装置,还包括:接收模块、查询模块和展示模块;
    所述接收模块,用于接收查询请求,其中,所述查询请求包括:待查询的人员信息;
    所述查询模块,用于根据所述待查询的人员信息查询所述人员关系图谱,以获取与所述待查询的人员信息匹配的第一人员实体词,以及与所述第一人员实体词建立有关系的第二人员实体词;
    所述展示模块,用于展示所述第二人员实体词,以及所述第一人员实体词与所述第二人员实体词之间的关系。
  14. 根据权利要求13所述的人员关系图谱的构建装置,还包括:更新模块;
    所述抓取模块,还用于在未查询到所述第一人员实体词,或者,未查询到与所述第一人员实体词建立有关系的所述第二人员实体词时,抓取与所述待查询的人员信息相关的文本语句;
    所述更新模块,用于从与所述待查询的人员信息相关的文本语句中抽取多元组信息,并结合抽取到的所述多元组信息对所述人员关系图谱进行更新处理;
    所述查询模块,还用于根据所述待查询的人员信息查询所述更新后的人员关系图谱,以获取与所述待查询的人员信息匹配的第一人员实体词,以及与所述第一人员实体词建立有关系的第二人员实体词。
  15. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个 处理器执行,以使所述至少一个处理器能够执行以下步骤:
    抓取用于构建人员关系图谱的各个文本语句;
    针对每个文本语句,抽取所述文本语句中的人员实体词;
    结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;
    结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;
    根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
  16. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行以下步骤:
    抓取用于构建人员关系图谱的各个文本语句;
    针对每个文本语句,抽取所述文本语句中的人员实体词;
    结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;
    结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;
    根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
  17. 一种计算机程序产品,包括计算机程序,其中所述计算机程序在被处理器执行时实现以下步骤:
    抓取用于构建人员关系图谱的各个文本语句;
    针对每个文本语句,抽取所述文本语句中的人员实体词;
    结合所述文本语句以及所述人员实体词,抽取所述文本语句中的关系角色词;
    结合所述文本语句中的所述人员实体词以及所述关系角色词,生成所述文本语句对应的多元组信息;
    根据所述各个文本语句对应的多元组信息,构建人员关系图谱。
PCT/CN2022/075564 2021-02-09 2022-02-08 人员关系图谱的构建方法、装置及电子设备 WO2022171093A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110177821.2 2021-02-09
CN202110177821.2A CN113806549A (zh) 2021-02-09 2021-02-09 人员关系图谱的构建方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2022171093A1 true WO2022171093A1 (zh) 2022-08-18

Family

ID=78892818

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075564 WO2022171093A1 (zh) 2021-02-09 2022-02-08 人员关系图谱的构建方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN113806549A (zh)
WO (1) WO2022171093A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806549A (zh) * 2021-02-09 2021-12-17 京东科技控股股份有限公司 人员关系图谱的构建方法、装置及电子设备
CN116562275B (zh) * 2023-06-09 2023-09-15 创意信息技术股份有限公司 一种结合实体属性图的自动文本摘要方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117311A1 (en) * 2014-10-22 2016-04-28 Thomson Licensing Method and Device for Performing Story Analysis
CN106776544A (zh) * 2016-11-24 2017-05-31 四川无声信息技术有限公司 人物关系识别方法及装置和分词方法
CN107526722A (zh) * 2017-07-31 2017-12-29 努比亚技术有限公司 一种人物关系分析方法及终端
CN110516012A (zh) * 2019-08-30 2019-11-29 广东工业大学 一种人物关系图谱构建方法
CN111858898A (zh) * 2020-07-30 2020-10-30 中国科学院自动化研究所 基于人工智能的文本处理方法、装置及电子设备
CN113806549A (zh) * 2021-02-09 2021-12-17 京东科技控股股份有限公司 人员关系图谱的构建方法、装置及电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776711B (zh) * 2016-11-14 2020-04-07 浙江大学 一种基于深度学习的中文医学知识图谱构建方法
CN110851610B (zh) * 2018-07-25 2022-09-27 百度在线网络技术(北京)有限公司 知识图谱生成方法、装置、计算机设备以及存储介质
CN109446343B (zh) * 2018-11-05 2020-10-27 上海德拓信息技术股份有限公司 一种公共安全知识图谱构建的方法
CN110222199A (zh) * 2019-06-20 2019-09-10 青岛大学 一种基于本体和多种神经网络集成的人物关系图谱构建方法
CN110489520B (zh) * 2019-07-08 2023-05-16 平安科技(深圳)有限公司 基于知识图谱的事件处理方法、装置、设备和存储介质
CN111177315B (zh) * 2019-12-19 2023-04-28 北京明略软件系统有限公司 知识图谱的更新方法、装置及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117311A1 (en) * 2014-10-22 2016-04-28 Thomson Licensing Method and Device for Performing Story Analysis
CN106776544A (zh) * 2016-11-24 2017-05-31 四川无声信息技术有限公司 人物关系识别方法及装置和分词方法
CN107526722A (zh) * 2017-07-31 2017-12-29 努比亚技术有限公司 一种人物关系分析方法及终端
CN110516012A (zh) * 2019-08-30 2019-11-29 广东工业大学 一种人物关系图谱构建方法
CN111858898A (zh) * 2020-07-30 2020-10-30 中国科学院自动化研究所 基于人工智能的文本处理方法、装置及电子设备
CN113806549A (zh) * 2021-02-09 2021-12-17 京东科技控股股份有限公司 人员关系图谱的构建方法、装置及电子设备

Also Published As

Publication number Publication date
CN113806549A (zh) 2021-12-17

Similar Documents

Publication Publication Date Title
CN107210035B (zh) 语言理解系统和方法的生成
WO2018205389A1 (zh) 语音识别方法、系统、电子装置及介质
US10102191B2 (en) Propagation of changes in master content to variant content
WO2022171093A1 (zh) 人员关系图谱的构建方法、装置及电子设备
TW202020691A (zh) 特徵詞的確定方法、裝置和伺服器
WO2021134524A1 (zh) 数据处理方法、装置、电子设备和存储介质
JP6693582B2 (ja) 文書要約の生成方法、装置、電子機器、コンピュータ読み取り可能な記憶媒体
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
CN109299227B (zh) 基于语音识别的信息查询方法和装置
WO2021174864A1 (zh) 基于少量训练样本的信息抽取方法及装置
CN114595686B (zh) 知识抽取方法、知识抽取模型的训练方法及装置
CN112541070B (zh) 槽位更新语料的挖掘方法、装置、电子设备和存储介质
US11763090B2 (en) Predicting user intent for online system actions through natural language inference-based machine learning model
US20180101521A1 (en) Avoiding sentiment model overfitting in a machine language model
US20150006537A1 (en) Aggregating Question Threads
US20220237376A1 (en) Method, apparatus, electronic device and storage medium for text classification
CN111767334A (zh) 信息抽取方法、装置、电子设备及存储介质
CN111259160A (zh) 知识图谱构建方法、装置、设备及存储介质
CN110750627A (zh) 一种素材的检索方法、装置、电子设备及存储介质
US11769013B2 (en) Machine learning based tenant-specific chatbots for performing actions in a multi-tenant system
CN112582073B (zh) 医疗信息获取方法、装置、电子设备和介质
CN114360678A (zh) 信息处理方法、装置、设备和存储介质
WO2019085118A1 (zh) 基于主题模型的关联词分析方法、电子装置及存储介质
US20230359825A1 (en) Knowledge graph entities from text
US11928437B2 (en) Machine reading between the lines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22752253

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.11.2023)