WO2021212682A1 - Procédé d'extraction de connaissances, appareil, dispositif électronique et support de stockage - Google Patents

Procédé d'extraction de connaissances, appareil, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2021212682A1
WO2021212682A1 PCT/CN2020/104964 CN2020104964W WO2021212682A1 WO 2021212682 A1 WO2021212682 A1 WO 2021212682A1 CN 2020104964 W CN2020104964 W CN 2020104964W WO 2021212682 A1 WO2021212682 A1 WO 2021212682A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
text
initial
list
entity list
Prior art date
Application number
PCT/CN2020/104964
Other languages
English (en)
Chinese (zh)
Inventor
张聪
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021212682A1 publication Critical patent/WO2021212682A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • This application relates to the field of data analysis technology, and in particular to a knowledge extraction method, device, electronic equipment, and storage medium.
  • the current knowledge extraction usually relies on templates, trigger words or supervised learning methods, requiring manual rule summarization and data labeling to form a rule base, and matching on the basis of the rule base.
  • a knowledge extraction method includes:
  • Disambiguation processing is performed on the candidate entity list by using a semantic matching model trained based on the Attention-DSSM algorithm to obtain the target entity;
  • a knowledge extraction device includes:
  • the acquiring unit is used to acquire the source data when the knowledge extraction instruction is received;
  • the preprocessing unit is used to preprocess the source data to obtain text data
  • a recognition unit configured to recognize entities in the text data through a sequence labeling model based on Bi-LSTM+CRF to obtain an initial entity list;
  • An expansion unit configured to expand the initial entity list based on a pre-configured knowledge graph to obtain a candidate entity list
  • a disambiguation unit configured to use a semantic matching model trained based on the Attention-DSSM algorithm to disambiguate the candidate entity list to obtain a target entity;
  • the extraction unit is used to extract knowledge based on the information on the node.
  • An electronic device which includes:
  • the memory stores at least one computer readable instruction
  • the processor executes at least one computer-readable instruction stored in the memory to implement the following steps:
  • Disambiguation processing is performed on the candidate entity list by using a semantic matching model trained based on the Attention-DSSM algorithm to obtain the target entity;
  • a computer-readable storage medium in which at least one computer-readable instruction is stored, and the at least one computer-readable instruction is executed by a processor in an electronic device to implement the following steps:
  • Disambiguation processing is performed on the candidate entity list by using a semantic matching model trained based on the Attention-DSSM algorithm to obtain the target entity;
  • the Attention mechanism of this application strengthens the association between each vocabulary and other vocabulary, and increases the weight of the keyword vocabulary.
  • the newly-added Interaction layer also enhances the association between the text to be matched, thus obtaining The target entity of is more accurate, which improves the efficiency and accuracy of knowledge extraction.
  • Fig. 1 is a flowchart of a preferred embodiment of the knowledge extraction method of the present application.
  • Fig. 2 is a schematic diagram of the relationship network extracted from the data source of the example in this application.
  • Fig. 3 is a functional block diagram of a preferred embodiment of the knowledge extraction device of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the knowledge extraction method according to the present application.
  • FIG. 1 it is a flowchart of a preferred embodiment of the knowledge extraction method of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • the knowledge extraction method is applied to one or more electronic devices.
  • the electronic device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes but is not limited to Microprocessor, Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the electronic device may be any electronic product that can perform human-computer interaction with the user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • a personal computer a tablet computer
  • a smart phone a personal digital assistant (PDA)
  • PDA personal digital assistant
  • IPTV interactive network television
  • smart wearable devices etc.
  • the electronic device may also include a network device and/or user equipment.
  • the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing.
  • the network where the electronic device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), etc.
  • the knowledge extraction instruction may be triggered by a designated user, and the designated user includes, but is not limited to, relevant staff such as project managers.
  • the source data can be obtained from a configuration database.
  • the source data can be obtained from a database accessible by the court, such as an internal database of the court, an online open source database, etc.
  • the source data can be a picture type or a text type, which is not limited in this application.
  • the electronic device In order to enable the machine to recognize the source data, the electronic device first needs to preprocess the source data.
  • the preprocessing of the source data to obtain text data includes:
  • the source data is converted into initial text, the initial text is filtered and cleaned, and the filtered text is obtained, based on UTF-8 (8-bit Unicode Transformation Format, 8-bit The meta) encoding algorithm encodes the filtered text to obtain the text data.
  • UTF-8 Unicode Transformation Format, 8-bit The meta
  • the source data is a text type
  • the source data is filtered and cleaned to obtain the filtered text
  • the filtered text is encoded based on the UTF-8 encoding algorithm to obtain the text data.
  • the electronic device may use an OCR (Optical Character Recognition) algorithm to convert the source data into the initial text.
  • OCR Optical Character Recognition
  • the text data may be in TXT text format or other text formats, which is not limited by this application.
  • the source data can be filtered and cleaned to eliminate interference information, and the source data can be further converted into a unified text format, which not only realizes the unification of the data format, but also the preprocessed text
  • the data can be recognized and processed by the machine.
  • S12 Recognizing entities in the text data through a sequence labeling model based on Bi-LSTM+CRF to obtain an initial entity list.
  • the text data obtained is unstructured text data. Therefore, it is necessary to identify the key entity information in the text data, which is equivalent to performing sequence labeling on the text data . Therefore, the electronic device needs to first construct a sequence labeling model associated with this knowledge extraction instruction.
  • the knowledge extraction method further includes:
  • the electronic device configures the sequence labeling mode according to the predefined demand data, and adds the sequence labeling mode to the Bi-LSTM+CRF model to obtain the sequence labeling model.
  • sequence labeling mode can be configured according to specific task requirements.
  • Bi-LSTM Bidirectional Long Short Term Memory
  • CRF condition random field, condition Random field
  • the recognition of entities in the text data through a sequence labeling model based on Bi-LSTM+CRF to obtain an initial entity list includes:
  • the electronic device inputs the text data into the sequence labeling model based on Bi-LSTM+CRF, and obtains the output probability and transition probability of each label corresponding to each sequence position in the Softmax layer. For each sequence Position, the electronic device calculates the sum of the output probability and transition probability of each tag as the score of each tag, and determines the tag with the highest score as the output tag of each sequence location, and the electronic device combines each sequence location The output tag of to get the initial entity list.
  • the transition probability is also considered comprehensively, that is, it must conform to the sequence labeling mode. (For example: B cannot be followed by B).
  • the most likely output sequence is BBIBIOOO, and because in the transition probability matrix, according to the sequence labeling mode, the probability of B—>B is very small, or even negative, then the above The sequence does not get the highest score, that is, it is not the initial entity list.
  • the initial entity list obtained can include the following sequence: sequence (B, E), representing the name of the person; sequence (B, I, E), representing the name of the organization ; Sequence (O), which means independent characters.
  • the knowledge graph can be pre-configured according to each technical field, such as a legal knowledge graph in the legal field.
  • each entity in the initial entity list may be a partial representation or alternative representation of the entity. Therefore, it is necessary to perform surface name extension on each entity to obtain the candidate entity list.
  • the expanding the initial entity list based on the pre-configured knowledge graph to obtain the candidate entity list includes:
  • the electronic device calculates the cosine similarity between each entity in the initial entity list and the entity on each node in the knowledge graph, and obtains at least one entity whose cosine similarity is greater than or equal to a preset similarity from each node As a candidate entity, the electronic device constructs the candidate entity list according to the initial entity list and the candidate entity.
  • the preset similarity can be customized configuration, such as 99.7%.
  • the cosine similarity refers to a way to measure the similarity between two texts by the cosine value of the angle between two vectors in the vector space. Compared with the distance measure, the cosine similarity pays more attention to the direction of the two vectors. The difference. In general, after embedding (embedded vector) is used to obtain the vector representation of two texts, the cosine similarity can be used to calculate the similarity between the two texts.
  • the cosine similarity can be used to calculate the similarity between each entity and the entity on the knowledge graph node to determine whether there is a co-referential relationship, thereby achieving the expansion of the initial entity list and obtaining a more comprehensive coverage.
  • List of candidate entities List of candidate entities.
  • the candidate entity list obtained may have multiple candidates. Therefore, it is necessary to further disambiguate the candidate entity list, so as to be more accurate based on multiple similar representations.
  • the only target entity that most closely matches the entity on the knowledge graph node is determined.
  • a semantic matching model trained based on the Attention-DSSM algorithm to perform disambiguation processing on the candidate entity list to obtain a target entity includes:
  • the electronic device encodes each entity in the candidate entity list based on the One-Hot coding algorithm to obtain the word ID of each entity, and inputs the word ID of each entity into a pre-configured dictionary, and outputs each entity.
  • the electronic device processes the word vector of each entity based on the Attention mechanism to obtain the semantic representation of each entity, and further interacts with the semantic representation of each entity in the Interaction layer (interaction layer), and outputs each The semantic vector after entity interaction, the electronic device matches the semantic vector after each entity interaction with the entity on the knowledge graph node at the matching layer, and outputs the entity with the highest matching degree as the target entity.
  • the traditional DSSM model (Deep Structured Semantic Model) expresses the extracted entity context information and candidate entity context information as low-dimensional semantic vectors, and calculates the distance between the two semantic vectors through the cosine distance.
  • DSSM adopts the bag-of-words model, word order information and context information are lost.
  • the DSSM model is a weakly supervised, end-to-end model, the prediction results are uncontrollable, remote information cannot be obtained, and there are problems such as gradient disappearance.
  • the semantic matching model may include: an input layer, a semantic representation layer, an interaction layer, and a matching layer from bottom to top.
  • the word vector of each entity is processed based on the Attention mechanism, which enhances the semantic representation ability, and at the same time strengthens the association between the vocabulary in each text and other vocabulary, and increases the weight of the keyword in the text. Therefore, The accuracy is high.
  • the new Interaction layer enables interaction between the two texts to be matched, and enhances the association between the two texts to be matched through mutual representation, and improves the normalization ability of the model.
  • the electronic device obtains the node corresponding to the target entity on the knowledge graph, and further links the target entity to the node of the knowledge graph.
  • the performing knowledge extraction based on the information on the node includes:
  • the electronic device obtains at least one path between nodes and associated information on each path from the information on the nodes, and extracts at least one relationship network based on the associated information on each path and the corresponding path.
  • the corresponding relationship network can be extracted.
  • the knowledge graph is composed of a large amount of knowledge and the relationship between knowledge.
  • the nodes in the network represent entities that exist in the real world, and the edges between nodes represent the relationship between two entities.
  • the combination of Hebian abstracts the knowledge in the real world into a knowledge network for machine processing and application.
  • knowledge extraction is performed based on the information on the nodes. After the target entity is linked to the nodes of the knowledge graph, the hidden information of the linked nodes can be obtained. The path for the extraction of relationship and event information.
  • this application can obtain source data when receiving a knowledge extraction instruction, and preprocess the source data to obtain text data, realize the unification of the format, and mark by Bi-LSTM+CRF sequence
  • the model recognizes the entities in the text data to obtain an initial entity list, implements accurate conversion to unstructured data based on the Bi-LSTM+CRF sequence labeling model, and further expands the initial entity list based on the pre-configured knowledge graph, Obtain a list of candidate entities to achieve comprehensive coverage of similar representations, and use a semantic matching model trained based on the Attention-DSSM algorithm to disambiguate the candidate entity list to obtain the target entity. Because the Attention mechanism strengthens each vocabulary and other The association between vocabulary and the weight of the keyword vocabulary are increased.
  • the new Interaction layer also enhances the association between the text to be matched, so the target entity obtained is more accurate, and the target entity is further linked to the node of the knowledge graph
  • automatic knowledge extraction is performed based on the information on the node, which improves the efficiency and accuracy of knowledge extraction.
  • the knowledge extraction device 11 includes an acquisition unit 110, a preprocessing unit 111, an identification unit 112, an expansion unit 113, a disambiguation unit 114, a link unit 115, an extraction unit 116, a configuration unit 117, and an addition unit 118.
  • the module/unit referred to in this application refers to a series of computer program segments that can be executed by the processor 13 and can complete fixed functions, and are stored in the memory 12. In this embodiment, the functions of each module/unit will be described in detail in subsequent embodiments.
  • the obtaining unit 110 obtains the source data.
  • the knowledge extraction instruction may be triggered by a designated user, and the designated user includes, but is not limited to, relevant staff such as project managers.
  • the source data can be obtained from a configuration database.
  • the source data can be obtained from a database accessible by the court, such as an internal database of the court, an online open source database, etc.
  • the source data can be a picture type or a text type, which is not limited in this application.
  • the preprocessing unit 111 preprocesses the source data to obtain text data.
  • the preprocessing unit 111 In order to enable the machine to recognize the source data, the preprocessing unit 111 first needs to preprocess the source data.
  • the preprocessing unit 111 preprocessing the source data to obtain text data includes:
  • the preprocessing unit 111 converts the source data into initial text, filters and cleans the initial text, and obtains filtered text based on UTF-8 (8-bit The Unicode Transformation Format, 8-bit) encoding algorithm encodes the filtered text to obtain the text data.
  • UTF-8 8-bit The Unicode Transformation Format, 8-bit
  • the preprocessing unit 111 filters and cleans the source data to obtain filtered text, and encodes the filtered text based on the UTF-8 encoding algorithm to obtain The text data.
  • the preprocessing unit 111 may use an OCR (Optical Character Recognition, optical character recognition) algorithm to convert the source data into the initial text.
  • OCR Optical Character Recognition, optical character recognition
  • the text data may be in TXT text format or other text formats, which is not limited by this application.
  • the source data can be filtered and cleaned to eliminate interference information, and the source data can be further converted into a unified text format, which not only realizes the unification of the data format, but also the preprocessed text
  • the data can be recognized and processed by the machine.
  • the recognition unit 112 recognizes entities in the text data through a sequence labeling model based on Bi-LSTM+CRF to obtain an initial entity list.
  • the text data obtained is unstructured text data. Therefore, it is necessary to identify the key entity information in the text data, which is equivalent to performing sequence labeling on the text data . Therefore, it is necessary to first construct a sequence labeling model associated with this knowledge extraction instruction.
  • the configuration unit 117 configures the sequence labeling mode according to the predefined demand data
  • the adding unit 118 adds the sequence labeling mode to the Bi-LSTM+CRF model to obtain the sequence labeling model.
  • sequence labeling mode can be configured according to specific task requirements.
  • Bi-LSTM Bidirectional Long Short Term Memory
  • CRF condition random field, condition Random field
  • the recognition unit 112 recognizes entities in the text data through a sequence labeling model based on Bi-LSTM+CRF, and obtaining an initial entity list includes:
  • the recognition unit 112 inputs the text data into the sequence labeling model based on Bi-LSTM+CRF, and obtains the output probability and transition probability of each label corresponding to each sequence position in the Softmax layer.
  • the identification unit 112 calculates the sum of the output probability and transition probability of each tag as the score of each tag, and determines the tag with the highest score as the output tag of each sequence position.
  • the identification unit 112 combines each tag.
  • the output tags of the sequence positions to obtain the initial entity list.
  • the transition probability is also considered comprehensively, that is, it must conform to the sequence labeling mode. (For example: B cannot be followed by B).
  • the most likely output sequence is BBIBIOOO, and because in the transition probability matrix, according to the sequence labeling mode, the probability of B—>B is very small, or even negative, then the above The sequence does not get the highest score, that is, it is not the initial entity list.
  • the initial entity list obtained can include the following sequence: sequence (B, E), representing the name of the person; sequence (B, I, E), representing the name of the organization ; Sequence (O), which means independent characters.
  • the expansion unit 113 expands the initial entity list based on the pre-configured knowledge graph to obtain a candidate entity list.
  • the knowledge graph can be pre-configured according to each technical field, such as a legal knowledge graph in the legal field.
  • each entity in the initial entity list may be a partial representation or alternative representation of the entity. Therefore, it is necessary to perform surface name extension on each entity to obtain the candidate entity list.
  • the expansion unit 113 expands the initial entity list based on the pre-configured knowledge graph, and obtaining the candidate entity list includes:
  • the expansion unit 113 calculates the cosine similarity between each entity in the initial entity list and the entity on each node in the knowledge graph, and obtains from each node at least one whose cosine similarity is greater than or equal to a preset similarity
  • the entity is a candidate entity, and the expansion unit 113 constructs the candidate entity list according to the initial entity list and the candidate entity.
  • the preset similarity can be customized configuration, such as 99.7%.
  • the cosine similarity refers to a way to measure the similarity between two texts by the cosine value of the angle between two vectors in the vector space. Compared with the distance measure, the cosine similarity pays more attention to the direction of the two vectors. The difference. In general, after embedding (embedded vector) is used to obtain the vector representation of two texts, the cosine similarity can be used to calculate the similarity between the two texts.
  • the cosine similarity can be used to calculate the similarity between each entity and the entity on the knowledge graph node to determine whether there is a co-referential relationship, thereby achieving the expansion of the initial entity list and obtaining a more comprehensive coverage.
  • List of candidate entities List of candidate entities.
  • the disambiguation unit 114 uses a semantic matching model trained based on an Attention-DSSM (Attention-Deep Structured Semantic Model) algorithm to perform disambiguation processing on the candidate entity list to obtain a target entity.
  • Attention-DSSM Application-Deep Structured Semantic Model
  • the candidate entity list obtained may have multiple candidates. Therefore, it is necessary to further disambiguate the candidate entity list, so as to be more accurate based on multiple similar representations.
  • the only target entity that most closely matches the entity on the knowledge graph node is determined.
  • the disambiguation unit 114 uses a semantic matching model trained based on the Attention-DSSM algorithm to perform disambiguation processing on the candidate entity list to obtain the target entity including:
  • the disambiguation unit 114 encodes each entity in the candidate entity list based on the One-Hot coding algorithm, obtains the word ID of each entity, and inputs the word ID of each entity into a pre-configured dictionary, Output the word vector of each entity, the disambiguation unit 114 processes the word vector of each entity based on the Attention mechanism to obtain the semantic representation of each entity, and further interacts with the semantic representation of each entity in the Interaction layer (interaction layer) , Output the semantic vector after each entity interaction, the disambiguation unit 114 matches the semantic vector after each entity interaction with the entity on the knowledge graph node at the matching layer, and outputs the entity with the highest matching degree as the Target entity.
  • the traditional DSSM model (Deep Structured Semantic Model) expresses the extracted entity context information and candidate entity context information as low-dimensional semantic vectors, and calculates the distance between the two semantic vectors through the cosine distance.
  • DSSM adopts the bag-of-words model, word order information and context information are lost.
  • the DSSM model is a weakly supervised, end-to-end model, the prediction results are uncontrollable, remote information cannot be obtained, and there are problems such as gradient disappearance.
  • the semantic matching model may include: an input layer, a semantic representation layer, an interaction layer, and a matching layer from bottom to top.
  • the word vector of each entity is processed based on the Attention mechanism, which enhances the semantic representation ability, and at the same time strengthens the association between the vocabulary in each text and other vocabulary, and increases the weight of the keyword in the text. Therefore, The accuracy is high.
  • the new Interaction layer enables interaction between the two texts to be matched, and enhances the association between the two texts to be matched through mutual representation, and improves the normalization ability of the model.
  • the linking unit 115 links the target entity to the node of the knowledge graph.
  • the linking unit 115 obtains the node corresponding to the target entity on the knowledge graph, and further links the target entity to the node of the knowledge graph.
  • the extraction unit 116 performs knowledge extraction based on the information on the node.
  • the extraction unit 116 performing knowledge extraction based on the information on the node includes:
  • the extraction unit 116 obtains at least one path between nodes and the associated information on each path from the information on the nodes, and extracts at least one relationship network based on the associated information on each path and the corresponding path.
  • the corresponding relationship network can be extracted.
  • the knowledge graph is composed of a large amount of knowledge and the relationship between knowledge.
  • the nodes in the network represent entities that exist in the real world, and the edges between nodes represent the relationship between two entities.
  • the combination of Hebian abstracts the knowledge in the real world into a knowledge network for machine processing and application.
  • knowledge extraction is performed based on the information on the nodes. After the target entity is linked to the nodes of the knowledge graph, the hidden information of the linked nodes can be obtained. The path for the extraction of relationship and event information.
  • this application can obtain source data when receiving a knowledge extraction instruction, and preprocess the source data to obtain text data, realize the unification of the format, and mark by Bi-LSTM+CRF sequence
  • the model recognizes the entities in the text data to obtain an initial entity list, implements accurate conversion to unstructured data based on the Bi-LSTM+CRF sequence labeling model, and further expands the initial entity list based on the pre-configured knowledge graph, Obtain a list of candidate entities to achieve comprehensive coverage of similar representations, and use a semantic matching model trained based on the Attention-DSSM algorithm to disambiguate the candidate entity list to obtain the target entity. Because the Attention mechanism strengthens each vocabulary and other The association between vocabulary and the weight of the keyword vocabulary are increased.
  • the new Interaction layer also enhances the association between the text to be matched, so the target entity obtained is more accurate, and the target entity is further linked to the node of the knowledge graph
  • automatic knowledge extraction is performed based on the information on the node, which improves the efficiency and accuracy of knowledge extraction.
  • FIG. 4 it is a schematic structural diagram of an electronic device implementing a preferred embodiment of the knowledge extraction method of the present application.
  • the electronic device 1 may include a memory 12, a processor 13, and a bus, and may also include a computer program stored in the memory 12 and running on the processor 13, such as a knowledge extraction program.
  • the electronic device 1 may have a bus structure or a star structure.
  • the device 1 may also include more or less other hardware or software than shown in the figure, or a different component arrangement.
  • the electronic device 1 may also include an input/output device, a network access device, and the like.
  • the electronic device 1 is only an example. If other existing or future electronic products can be adapted to this application, they should also be included in the scope of protection of this application and included here by reference. .
  • the memory 12 includes at least one type of readable storage medium, the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. .
  • the memory 12 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 12 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SD) equipped on the electronic device 1. ) Card, Flash Card, etc.
  • the memory 12 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 12 can be used not only to store application software and various data installed in the electronic device 1, such as the code of a knowledge extraction program, etc., but also to temporarily store data that has been output or will be output.
  • the processor 13 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more central processing units. Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips.
  • the processor 13 is the control unit of the electronic device 1, which uses various interfaces and lines to connect the various components of the entire electronic device 1, and runs or executes programs or modules stored in the memory 12 (such as executing Knowledge extraction program, etc.), and call the data stored in the memory 12 to execute various functions of the electronic device 1 and process data.
  • the processor 13 executes the operating system of the electronic device 1 and various installed applications.
  • the processor 13 executes the application program to implement the steps in the above-mentioned knowledge extraction method embodiments, such as steps S10, S11, S12, S13, S14, S15, and S16 shown in FIG. 1.
  • the processor 13 implements the functions of the modules/units in the foregoing device embodiments when executing the computer program, for example:
  • Disambiguation processing is performed on the candidate entity list by using a semantic matching model trained based on the Attention-DSSM algorithm to obtain the target entity;
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to complete the present invention.
  • the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program in the electronic device 1.
  • the computer program may be divided into an acquisition unit 110, a preprocessing unit 111, an identification unit 112, an expansion unit 113, a disambiguation unit 114, a link unit 115, an extraction unit 116, a configuration unit 117, and an addition unit 118.
  • the above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium.
  • the above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a computer device, or a network device, etc.) or a processor to execute the knowledge described in the various embodiments of this application. Part of the extraction method.
  • the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed by instructing related hardware devices through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer program includes computer program code
  • the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the bus may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one arrow is used to indicate in FIG. 4, but it does not mean that there is only one bus or one type of bus.
  • the bus is configured to implement connection and communication between the memory 12 and at least one processor 13 and the like.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 13 through a power management device, so as to be realized by the power management device. Functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • FIG. 4 only shows the electronic device 1 with components 12-13. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include less Or more parts, or a combination of some parts, or a different arrangement of parts.
  • the memory 12 in the electronic device 1 stores multiple instructions to implement a knowledge extraction method, and the processor 13 can execute the multiple instructions to achieve:
  • Disambiguation processing is performed on the candidate entity list by using a semantic matching model trained based on the Attention-DSSM algorithm to obtain the target entity;
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé d'extraction de connaissances, un appareil, un dispositif électronique et un support de stockage qui se rapportent à l'intelligence artificielle, et qui peuvent effectuer un prétraitement sur des données source, obtenir des données de texte, identifier des entités dans les données de texte au moyen d'un modèle de marquage séquentiel Bi-LSTM+CRF et obtenir une liste d'entités initiales, qui réalise une transformation précise en données non structurées d'après le modèle de marquage séquentiel Bi-LSTM+CTF, puis réaliser une expansion sur la liste d'entités initiales d'après un graphe de connaissances et obtenir une liste d'entités candidates, qui met en place une couverture complète de représentations similaires, et utiliser un modèle d'appariement sémantique formé d'après un algorithme d'attention DSSM pour effectuer une désambiguïsation sur la liste d'entités candidates et obtenir des entités cibles. Comme un mécanisme d'attention renforce les associations entre chaque mot et d'autres mots, et augmente les poids des mots-clés, les entités cibles obtenues après avoir subi une analyse de données sont plus précises. Les entités cibles sont ensuite liées à des nœuds du graphe de connaissances, et une extraction de connaissances automatique est effectuée d'après des informations sur les nœuds, ce qui augmente l'efficacité et la précision d'extraction des connaissances.
PCT/CN2020/104964 2020-04-21 2020-07-27 Procédé d'extraction de connaissances, appareil, dispositif électronique et support de stockage WO2021212682A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010318382.8 2020-04-21
CN202010318382.8A CN111639498A (zh) 2020-04-21 2020-04-21 知识抽取方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021212682A1 true WO2021212682A1 (fr) 2021-10-28

Family

ID=72328869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104964 WO2021212682A1 (fr) 2020-04-21 2020-07-27 Procédé d'extraction de connaissances, appareil, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN111639498A (fr)
WO (1) WO2021212682A1 (fr)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186759A (zh) * 2022-02-16 2022-03-15 杭州杰牌传动科技有限公司 基于减速机知识图谱的物料调度控制方法及其系统
CN114218403A (zh) * 2021-12-20 2022-03-22 平安付科技服务有限公司 基于知识图谱的故障根因定位方法、装置、设备及介质
CN114218931A (zh) * 2021-11-04 2022-03-22 北京百度网讯科技有限公司 信息抽取方法、装置、电子设备和可读存储介质
CN114239583A (zh) * 2021-12-15 2022-03-25 北京百度网讯科技有限公司 实体链指模型的训练及实体链指方法、装置、设备及介质
CN114238597A (zh) * 2021-12-06 2022-03-25 河南讯飞人工智能科技有限公司 一种信息抽取方法、装置、设备及存储介质
CN114237829A (zh) * 2021-12-27 2022-03-25 南方电网物资有限公司 一种电力设备的数据采集与处理方法
CN114330353A (zh) * 2022-01-06 2022-04-12 腾讯科技(深圳)有限公司 虚拟场景的实体识别方法、装置、设备、介质及程序产品
CN114330345A (zh) * 2021-12-24 2022-04-12 北京百度网讯科技有限公司 命名实体识别方法、训练方法、装置、电子设备及介质
CN114385833A (zh) * 2022-03-23 2022-04-22 支付宝(杭州)信息技术有限公司 更新知识图谱的方法及装置
CN114416976A (zh) * 2021-12-23 2022-04-29 北京百度网讯科技有限公司 文本标注方法、装置及电子设备
CN114491232A (zh) * 2021-12-24 2022-05-13 北京百度网讯科技有限公司 信息查询方法、装置、电子设备和存储介质
CN114707005A (zh) * 2022-06-02 2022-07-05 浙江建木智能系统有限公司 一种舰船装备的知识图谱构建方法和系统
CN114896408A (zh) * 2022-03-24 2022-08-12 北京大学深圳研究生院 一种材料知识图谱的构建方法、材料知识图谱及应用
CN114912637A (zh) * 2022-05-21 2022-08-16 重庆大学 人机物知识图谱制造产线运维决策方法及系统、存储介质
CN114942998A (zh) * 2022-04-25 2022-08-26 西北工业大学 融合多源数据的知识图谱邻域结构稀疏的实体对齐方法
CN115017255A (zh) * 2022-08-08 2022-09-06 杭州实在智能科技有限公司 一种基于树状结构的知识库构建和搜索方法
CN115050085A (zh) * 2022-08-15 2022-09-13 珠海翔翼航空技术有限公司 基于图谱的模拟机管理系统对象识别方法、系统及设备
CN115510245A (zh) * 2022-10-14 2022-12-23 北京理工大学 一种面向非结构化数据的领域知识抽取方法
CN115544626A (zh) * 2022-10-21 2022-12-30 清华大学 子模型抽取方法、装置、计算机设备及介质
CN115796189A (zh) * 2023-01-31 2023-03-14 北京面壁智能科技有限责任公司 语义确定方法、装置、电子设备及介质
CN115795051A (zh) * 2022-12-02 2023-03-14 中科雨辰科技有限公司 一种基于实体关系获取链接实体的数据处理系统
CN115827935A (zh) * 2023-02-09 2023-03-21 支付宝(杭州)信息技术有限公司 一种数据处理方法、装置及设备
CN115878849A (zh) * 2023-02-27 2023-03-31 北京奇树有鱼文化传媒有限公司 一种视频标签关联方法、装置和电子设备
CN116070001A (zh) * 2023-02-03 2023-05-05 深圳市艾莉诗科技有限公司 基于互联网的信息定向抓取方法及装置
CN116108857A (zh) * 2022-05-30 2023-05-12 北京百度网讯科技有限公司 信息抽取方法、装置、电子设备以及存储介质
CN116127053A (zh) * 2023-02-14 2023-05-16 北京百度网讯科技有限公司 实体词消歧、知识图谱生成和知识推荐方法以及装置
CN116362166A (zh) * 2023-05-29 2023-06-30 青岛泰睿思微电子有限公司 芯片封装用的图谱合并系统及方法
CN116503865A (zh) * 2023-05-29 2023-07-28 北京石油化工学院 氢气道路运输风险识别方法、装置、电子设备及存储介质
CN116663537A (zh) * 2023-07-26 2023-08-29 中信联合云科技有限责任公司 基于大数据分析的选题策划信息处理方法及系统
CN116719955A (zh) * 2023-08-09 2023-09-08 北京国电通网络技术有限公司 标签标注信息生成方法、装置、电子设备和可读介质
CN116756151A (zh) * 2023-08-17 2023-09-15 公安部信息通信中心 一种知识搜索与数据处理系统
WO2023173596A1 (fr) * 2022-03-15 2023-09-21 青岛海尔科技有限公司 Procédé et appareil de reconnaissance d'intention de texte d'énoncé, support de stockage et appareil électronique
CN116821712A (zh) * 2023-08-25 2023-09-29 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN116881471A (zh) * 2023-07-07 2023-10-13 深圳智现未来工业软件有限公司 一种基于知识图谱的大语言模型微调方法及装置
CN117012373A (zh) * 2023-10-07 2023-11-07 广州市妇女儿童医疗中心 一种葡萄胎辅助检查模型的训练方法、应用方法及系统
CN117172323A (zh) * 2023-11-02 2023-12-05 知呱呱(天津)大数据技术有限公司 一种基于特征对齐的专利多领域知识抽取方法及系统
CN117272170A (zh) * 2023-09-20 2023-12-22 东旺智能科技(上海)有限公司 一种基于知识图谱的it运维故障根因分析方法
CN117349386A (zh) * 2023-10-12 2024-01-05 吉玖(天津)技术有限责任公司 一种基于数据强弱关联模型的数字人文应用方法
WO2024109097A1 (fr) * 2022-11-21 2024-05-30 开普云信息科技股份有限公司 Procédé et appareil de création de carte de connaissances pour texte de brevet, et support de stockage et dispositif

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328653B (zh) * 2020-10-30 2023-07-28 北京百度网讯科技有限公司 数据识别方法、装置、电子设备及存储介质
CN112307172B (zh) * 2020-10-31 2023-08-01 平安科技(深圳)有限公司 一种语义解析设备、方法、终端及存储介质
CN112395429A (zh) * 2020-12-02 2021-02-23 上海三稻智能科技有限公司 基于图神经网络的hs编码判定、推送、应用方法、系统及存储介质
CN112528660B (zh) * 2020-12-04 2023-10-24 北京百度网讯科技有限公司 处理文本的方法、装置、设备、存储介质和程序产品
CN112464669B (zh) * 2020-12-07 2024-02-09 宁波深擎信息科技有限公司 股票实体词消歧方法、计算机设备及存储介质
CN112507126B (zh) * 2020-12-07 2022-11-15 厦门渊亭信息科技有限公司 一种基于循环神经网络的实体链接装置和方法
CN112426726A (zh) * 2020-12-09 2021-03-02 网易(杭州)网络有限公司 游戏事件抽取方法、装置、存储介质及服务器
CN112508615A (zh) * 2020-12-10 2021-03-16 深圳市欢太科技有限公司 特征提取方法、特征提取装置、存储介质与电子设备
CN112380359B (zh) * 2021-01-18 2021-04-20 平安科技(深圳)有限公司 基于知识图谱的培训资源分配方法、装置、设备及介质
CN113705194A (zh) * 2021-04-12 2021-11-26 腾讯科技(深圳)有限公司 简称抽取方法及电子设备
CN113111660A (zh) * 2021-04-22 2021-07-13 脉景(杭州)健康管理有限公司 数据处理方法、装置、设备和存储介质
CN113220835B (zh) * 2021-05-08 2023-09-29 北京百度网讯科技有限公司 文本信息处理方法、装置、电子设备以及存储介质
CN113268452B (zh) * 2021-05-25 2024-02-02 联仁健康医疗大数据科技股份有限公司 一种实体抽取的方法、装置、设备和存储介质
CN113297419B (zh) * 2021-06-23 2024-04-09 南京谦萃智能科技服务有限公司 视频知识点确定方法、装置、电子设备和存储介质
CN113505889A (zh) * 2021-07-23 2021-10-15 中国平安人寿保险股份有限公司 图谱化知识库的处理方法、装置、计算机设备和存储介质
CN114186690B (zh) * 2022-02-16 2022-04-19 中国空气动力研究与发展中心计算空气动力研究所 飞行器知识图谱构建方法、装置、设备及存储介质
CN114780749B (zh) * 2022-05-05 2023-05-23 国网江苏省电力有限公司营销服务中心 基于图注意力机制的电力实体链指方法
CN115062619B (zh) * 2022-08-11 2022-11-22 中国人民解放军国防科技大学 中文实体链接方法、装置、设备及存储介质
CN115795056A (zh) * 2023-01-04 2023-03-14 中国电子科技集团公司第十五研究所 非结构化信息构建知识图谱的方法、服务器及存储介质
CN116826933B (zh) * 2023-08-30 2023-12-01 深圳科力远数智能源技术有限公司 一种基于知识图谱混合储能电池供电反步控制方法及系统
CN117668259B (zh) * 2024-02-01 2024-04-26 华安证券股份有限公司 基于知识图谱的内外规数据联动分析方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733792A (zh) * 2018-05-14 2018-11-02 北京大学深圳研究生院 一种实体关系抽取方法
WO2019171490A1 (fr) * 2018-03-07 2019-09-12 日本電気株式会社 Système, procédé et programme d'expansion des connaissances
CN110362660A (zh) * 2019-07-23 2019-10-22 重庆邮电大学 一种基于知识图谱的电子产品质量自动检测方法
CN110609902A (zh) * 2018-05-28 2019-12-24 华为技术有限公司 一种基于融合知识图谱的文本处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019171490A1 (fr) * 2018-03-07 2019-09-12 日本電気株式会社 Système, procédé et programme d'expansion des connaissances
CN108733792A (zh) * 2018-05-14 2018-11-02 北京大学深圳研究生院 一种实体关系抽取方法
CN110609902A (zh) * 2018-05-28 2019-12-24 华为技术有限公司 一种基于融合知识图谱的文本处理方法及装置
CN110362660A (zh) * 2019-07-23 2019-10-22 重庆邮电大学 一种基于知识图谱的电子产品质量自动检测方法

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218931A (zh) * 2021-11-04 2022-03-22 北京百度网讯科技有限公司 信息抽取方法、装置、电子设备和可读存储介质
CN114218931B (zh) * 2021-11-04 2024-01-23 北京百度网讯科技有限公司 信息抽取方法、装置、电子设备和可读存储介质
CN114238597A (zh) * 2021-12-06 2022-03-25 河南讯飞人工智能科技有限公司 一种信息抽取方法、装置、设备及存储介质
CN114239583A (zh) * 2021-12-15 2022-03-25 北京百度网讯科技有限公司 实体链指模型的训练及实体链指方法、装置、设备及介质
CN114239583B (zh) * 2021-12-15 2023-04-07 北京百度网讯科技有限公司 实体链指模型的训练及实体链指方法、装置、设备及介质
CN114218403A (zh) * 2021-12-20 2022-03-22 平安付科技服务有限公司 基于知识图谱的故障根因定位方法、装置、设备及介质
CN114218403B (zh) * 2021-12-20 2024-04-09 平安付科技服务有限公司 基于知识图谱的故障根因定位方法、装置、设备及介质
CN114416976A (zh) * 2021-12-23 2022-04-29 北京百度网讯科技有限公司 文本标注方法、装置及电子设备
CN114491232B (zh) * 2021-12-24 2023-03-24 北京百度网讯科技有限公司 信息查询方法、装置、电子设备和存储介质
CN114330345A (zh) * 2021-12-24 2022-04-12 北京百度网讯科技有限公司 命名实体识别方法、训练方法、装置、电子设备及介质
CN114491232A (zh) * 2021-12-24 2022-05-13 北京百度网讯科技有限公司 信息查询方法、装置、电子设备和存储介质
CN114237829A (zh) * 2021-12-27 2022-03-25 南方电网物资有限公司 一种电力设备的数据采集与处理方法
CN114330353B (zh) * 2022-01-06 2023-06-13 腾讯科技(深圳)有限公司 虚拟场景的实体识别方法、装置、设备、介质及程序产品
CN114330353A (zh) * 2022-01-06 2022-04-12 腾讯科技(深圳)有限公司 虚拟场景的实体识别方法、装置、设备、介质及程序产品
CN114186759A (zh) * 2022-02-16 2022-03-15 杭州杰牌传动科技有限公司 基于减速机知识图谱的物料调度控制方法及其系统
WO2023173596A1 (fr) * 2022-03-15 2023-09-21 青岛海尔科技有限公司 Procédé et appareil de reconnaissance d'intention de texte d'énoncé, support de stockage et appareil électronique
CN114385833A (zh) * 2022-03-23 2022-04-22 支付宝(杭州)信息技术有限公司 更新知识图谱的方法及装置
CN114896408B (zh) * 2022-03-24 2024-04-19 北京大学深圳研究生院 一种材料知识图谱的构建方法、材料知识图谱及应用
CN114896408A (zh) * 2022-03-24 2022-08-12 北京大学深圳研究生院 一种材料知识图谱的构建方法、材料知识图谱及应用
CN114942998B (zh) * 2022-04-25 2024-02-13 西北工业大学 融合多源数据的知识图谱邻域结构稀疏的实体对齐方法
CN114942998A (zh) * 2022-04-25 2022-08-26 西北工业大学 融合多源数据的知识图谱邻域结构稀疏的实体对齐方法
CN114912637B (zh) * 2022-05-21 2023-08-29 重庆大学 人机物知识图谱制造产线运维决策方法及系统、存储介质
CN114912637A (zh) * 2022-05-21 2022-08-16 重庆大学 人机物知识图谱制造产线运维决策方法及系统、存储介质
CN116108857B (zh) * 2022-05-30 2024-04-05 北京百度网讯科技有限公司 信息抽取方法、装置、电子设备以及存储介质
CN116108857A (zh) * 2022-05-30 2023-05-12 北京百度网讯科技有限公司 信息抽取方法、装置、电子设备以及存储介质
CN114707005B (zh) * 2022-06-02 2022-10-25 浙江建木智能系统有限公司 一种舰船装备的知识图谱构建方法和系统
CN114707005A (zh) * 2022-06-02 2022-07-05 浙江建木智能系统有限公司 一种舰船装备的知识图谱构建方法和系统
CN115017255B (zh) * 2022-08-08 2022-11-01 杭州实在智能科技有限公司 一种基于树状结构的知识库构建和搜索方法
CN115017255A (zh) * 2022-08-08 2022-09-06 杭州实在智能科技有限公司 一种基于树状结构的知识库构建和搜索方法
CN115050085A (zh) * 2022-08-15 2022-09-13 珠海翔翼航空技术有限公司 基于图谱的模拟机管理系统对象识别方法、系统及设备
CN115510245B (zh) * 2022-10-14 2024-05-14 北京理工大学 一种面向非结构化数据的领域知识抽取方法
CN115510245A (zh) * 2022-10-14 2022-12-23 北京理工大学 一种面向非结构化数据的领域知识抽取方法
CN115544626B (zh) * 2022-10-21 2023-10-20 清华大学 子模型抽取方法、装置、计算机设备及介质
CN115544626A (zh) * 2022-10-21 2022-12-30 清华大学 子模型抽取方法、装置、计算机设备及介质
WO2024109097A1 (fr) * 2022-11-21 2024-05-30 开普云信息科技股份有限公司 Procédé et appareil de création de carte de connaissances pour texte de brevet, et support de stockage et dispositif
CN115795051A (zh) * 2022-12-02 2023-03-14 中科雨辰科技有限公司 一种基于实体关系获取链接实体的数据处理系统
CN115796189A (zh) * 2023-01-31 2023-03-14 北京面壁智能科技有限责任公司 语义确定方法、装置、电子设备及介质
CN116070001A (zh) * 2023-02-03 2023-05-05 深圳市艾莉诗科技有限公司 基于互联网的信息定向抓取方法及装置
CN116070001B (zh) * 2023-02-03 2023-12-19 深圳市艾莉诗科技有限公司 基于互联网的信息定向抓取方法及装置
CN115827935A (zh) * 2023-02-09 2023-03-21 支付宝(杭州)信息技术有限公司 一种数据处理方法、装置及设备
CN116127053A (zh) * 2023-02-14 2023-05-16 北京百度网讯科技有限公司 实体词消歧、知识图谱生成和知识推荐方法以及装置
CN116127053B (zh) * 2023-02-14 2024-01-02 北京百度网讯科技有限公司 实体词消歧、知识图谱生成和知识推荐方法以及装置
CN115878849A (zh) * 2023-02-27 2023-03-31 北京奇树有鱼文化传媒有限公司 一种视频标签关联方法、装置和电子设备
CN116362166A (zh) * 2023-05-29 2023-06-30 青岛泰睿思微电子有限公司 芯片封装用的图谱合并系统及方法
CN116503865A (zh) * 2023-05-29 2023-07-28 北京石油化工学院 氢气道路运输风险识别方法、装置、电子设备及存储介质
CN116881471B (zh) * 2023-07-07 2024-06-04 深圳智现未来工业软件有限公司 一种基于知识图谱的大语言模型微调方法及装置
CN116881471A (zh) * 2023-07-07 2023-10-13 深圳智现未来工业软件有限公司 一种基于知识图谱的大语言模型微调方法及装置
CN116663537B (zh) * 2023-07-26 2023-11-03 中信联合云科技有限责任公司 基于大数据分析的选题策划信息处理方法及系统
CN116663537A (zh) * 2023-07-26 2023-08-29 中信联合云科技有限责任公司 基于大数据分析的选题策划信息处理方法及系统
CN116719955A (zh) * 2023-08-09 2023-09-08 北京国电通网络技术有限公司 标签标注信息生成方法、装置、电子设备和可读介质
CN116719955B (zh) * 2023-08-09 2023-10-27 北京国电通网络技术有限公司 标签标注信息生成方法、装置、电子设备和可读介质
CN116756151A (zh) * 2023-08-17 2023-09-15 公安部信息通信中心 一种知识搜索与数据处理系统
CN116756151B (zh) * 2023-08-17 2023-11-24 公安部信息通信中心 一种知识搜索与数据处理系统
CN116821712A (zh) * 2023-08-25 2023-09-29 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN116821712B (zh) * 2023-08-25 2023-12-19 中电科大数据研究院有限公司 非结构化文本与知识图谱的语义匹配方法及装置
CN117272170B (zh) * 2023-09-20 2024-03-08 东旺智能科技(上海)有限公司 一种基于知识图谱的it运维故障根因分析方法
CN117272170A (zh) * 2023-09-20 2023-12-22 东旺智能科技(上海)有限公司 一种基于知识图谱的it运维故障根因分析方法
CN117012373B (zh) * 2023-10-07 2024-02-23 广州市妇女儿童医疗中心 一种葡萄胎辅助检查模型的训练方法、应用方法及系统
CN117012373A (zh) * 2023-10-07 2023-11-07 广州市妇女儿童医疗中心 一种葡萄胎辅助检查模型的训练方法、应用方法及系统
CN117349386A (zh) * 2023-10-12 2024-01-05 吉玖(天津)技术有限责任公司 一种基于数据强弱关联模型的数字人文应用方法
CN117349386B (zh) * 2023-10-12 2024-04-12 吉玖(天津)技术有限责任公司 一种基于数据强弱关联模型的数字人文应用方法
CN117172323B (zh) * 2023-11-02 2024-01-23 知呱呱(天津)大数据技术有限公司 一种基于特征对齐的专利多领域知识抽取方法及系统
CN117172323A (zh) * 2023-11-02 2023-12-05 知呱呱(天津)大数据技术有限公司 一种基于特征对齐的专利多领域知识抽取方法及系统

Also Published As

Publication number Publication date
CN111639498A (zh) 2020-09-08

Similar Documents

Publication Publication Date Title
WO2021212682A1 (fr) Procédé d'extraction de connaissances, appareil, dispositif électronique et support de stockage
WO2021174919A1 (fr) Procédé et appareil d'analyse et de mise en correspondance d'informations de données de curriculum vitae, dispositif électronique, et support
WO2021068339A1 (fr) Procédé et dispositif de classification de texte, et support de stockage lisible par ordinateur
WO2020232861A1 (fr) Procédé de reconnaissance d'entité nommée, dispositif électronique et support de stockage
CN113051356B (zh) 开放关系抽取方法、装置、电子设备及存储介质
WO2021208696A1 (fr) Procédé d'analyse d'intention d'utilisateur, appareil, dispositif électronique et support de stockage informatique
CN111159485B (zh) 尾实体链接方法、装置、服务器及存储介质
CN113157927B (zh) 文本分类方法、装置、电子设备及可读存储介质
US20220147835A1 (en) Knowledge graph construction system and knowledge graph construction method
CN112101031B (zh) 一种实体识别方法、终端设备及存储介质
CN112860848B (zh) 信息检索方法、装置、设备及介质
CN110110213B (zh) 挖掘用户职业的方法、装置、计算机可读存储介质和终端设备
CN113378970B (zh) 语句相似性检测方法、装置、电子设备及存储介质
CN112100384B (zh) 一种数据观点抽取方法、装置、设备及存储介质
WO2023116561A1 (fr) Procédé et appareil d'extraction d'entité, dispositif électronique et support de stockage
CN113268615A (zh) 资源标签生成方法、装置、电子设备及存储介质
CN113360654B (zh) 文本分类方法、装置、电子设备及可读存储介质
CN113157739B (zh) 跨模态检索方法、装置、电子设备及存储介质
CN112948573B (zh) 文本标签的提取方法、装置、设备和计算机存储介质
CN112364068A (zh) 课程标签生成方法、装置、设备及介质
CN116720525A (zh) 基于问诊数据的疾病辅助分析方法、装置、设备及介质
WO2023178979A1 (fr) Procédé et appareil d'étiquetage de question, dispositif électronique et support de stockage
CN116450829A (zh) 医疗文本分类方法、装置、设备及介质
CN111339760A (zh) 词法分析模型的训练方法、装置、电子设备、存储介质
CN115510188A (zh) 文本关键词关联方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932630

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20932630

Country of ref document: EP

Kind code of ref document: A1