CN112749561A

CN112749561A - Entity identification method and device

Info

Publication number: CN112749561A
Application number: CN202010303388.8A
Authority: CN
Inventors: 黄婷
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2021-05-04
Anticipated expiration: 2040-04-17
Also published as: CN112749561B

Abstract

The embodiment of the invention provides an entity identification method and equipment; the method comprises the following steps: acquiring information to be processed and text information corresponding to the information to be processed to obtain text information to be identified; adopting at least one basic entity recognition model to perform entity recognition on the text information to be recognized to obtain at least one corresponding recognition result; at least one underlying entity recognition model refers to at least one means for performing entity recognition; extracting text features of text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and performing feature construction on at least one recognition result to obtain basic model features; carrying out entity recognition by combining the basic model features and the target text features to obtain a target recognition result; the fused entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing at least one recognition result. By the embodiment of the invention, the accuracy of entity identification can be improved.

Description

Entity identification method and device

Technical Field

The invention relates to a natural language processing technology in the field of artificial intelligence, in particular to an entity identification method and equipment.

Background

Named Entity Recognition (NER), which refers to a process of recognizing entities in text and Entity types to which the entities belong, is a basic task in natural language processing; by named entity recognition, the execution efficiency of information processing can be improved, and an auxiliary effect on information processing can be achieved; thus, named entity recognition plays an important role in information processing.

Generally, when named entity recognition is performed, named entity recognition is performed on a text by using a plurality of named entity recognition methods to obtain a plurality of corresponding recognition results, and then one recognition result is selected from the plurality of recognition results or the plurality of recognition results are combined to obtain a final recognition result, so as to realize named entity recognition. However, in the above process of implementing named entity recognition, the final recognition result is obtained by selecting one recognition result from a plurality of recognition results, or by combining a plurality of recognition results; therefore, the final recognition result is strongly correlated with the plurality of recognition results; thus, when there is an erroneous recognition result among the plurality of recognition results, the accuracy of the final recognition result is low, and therefore, the accuracy of the entity recognition is low.

Disclosure of Invention

The embodiment of the invention provides an entity identification method and equipment, which can improve the accuracy of entity identification.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an entity identification method, which comprises the following steps:

acquiring information to be processed and text information corresponding to the information to be processed to obtain text information to be identified;

adopting at least one basic entity recognition model to perform entity recognition on the text information to be recognized to obtain at least one corresponding recognition result; the at least one basic entity recognition model refers to at least one mode for entity recognition;

extracting text features of the text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and performing feature construction on at least one recognition result to obtain basic model features; carrying out entity recognition by combining the basic model features and the target text features to obtain a target recognition result;

the fused entity recognition model is used for performing entity recognition on the text information to be recognized by using the at least one recognition result, the target text features refer to features of characters and character strings in the text information to be recognized, and the target recognition result is an entity recognition result of the information to be processed.

An embodiment of the present invention provides an entity identification apparatus, including:

the information acquisition module is used for acquiring information to be processed and text information corresponding to the information to be processed to obtain text information to be identified;

the basic identification module is used for adopting at least one basic entity identification model to carry out entity identification on the text information to be identified so as to obtain at least one identification result corresponding to each text information; the at least one basic entity recognition model refers to at least one mode for entity recognition;

the fusion recognition module is used for extracting the text features of the text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and performing feature construction on at least one recognition result to obtain basic model features; carrying out entity recognition by combining the basic model features and the target text features to obtain a target recognition result; the fused entity recognition model is used for performing entity recognition on the text information to be recognized by using the at least one recognition result, the target text features refer to features of characters and character strings in the text information to be recognized, and the target recognition result is an entity recognition result of the information to be processed.

An embodiment of the present invention provides an entity identification device, including:

a memory for storing executable instructions;

and the processor is used for realizing the entity identification method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the method for identifying the entity provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects: the obtained target recognition result is obtained by combining at least one recognition result and the text characteristics corresponding to the text information to be recognized for feature construction; therefore, on one hand, even if at least one recognition result has a wrong recognition result, the text features corresponding to the text information to be recognized can correct the wrong recognition result; on the other hand, the text features corresponding to the text information to be recognized and at least one recognition result are combined to perform feature construction to obtain the features to be recognized for entity recognition, so that the features to be recognized are rich, and the accuracy of the obtained target recognition result is high when the features to be recognized are subjected to entity recognition; in summary, the entity identification method provided by the embodiment of the invention improves the accuracy of entity identification.

Drawings

FIG. 1 is a schematic diagram of an exemplary process for performing entity identification;

FIG. 2 is an alternative architectural diagram of an entity identification system provided by an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a server in fig. 2 according to an embodiment of the present invention;

fig. 4 is an alternative flow chart of the entity identification method according to the embodiment of the present invention;

FIG. 5 is a relational diagram of an entity recognition model provided by an embodiment of the invention;

FIG. 6 is a schematic diagram of an alternative process for extracting target text features according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative process for obtaining base model features according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating an exemplary structure of input features of a fused entity recognition model according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating an exemplary format of a feature to be identified according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of another alternative entity identification method provided by the embodiment of the invention;

FIG. 11 is a schematic flow chart of another alternative entity identification method according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of an alternative architecture of an entity recognition system provided by an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a blockchain in a blockchain network according to an embodiment of the present invention;

fig. 14 is a functional architecture diagram of a blockchain network according to an embodiment of the present invention;

FIG. 15 is a diagram illustrating an exemplary application of entity identification provided by an embodiment of the present invention;

fig. 16 is a flowchart illustrating an exemplary entity identification according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the embodiments of the present invention is for the purpose of describing the embodiments of the present invention only and is not intended to be limiting of the present invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) The entity type refers to a type to which the entity belongs, such as a place name, a person name, an organization name, a movie, a fiction, an animation, a game, an event, a song, and an App (Application) name. Entities, also called named entities, refer to instances of concepts; for example, "person name" is a concept (or entity type), and "wangxing" is a "person name" entity. "time" is an entity type, and "mid-autumn festival" is a "time" entity. Additionally, entities are also referred to as named entities, such that entity identification is named entity identification.

2) Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and aims to research various theories and methods for realizing effective communication between people and computers by using natural Language; natural language processing is a science integrating linguistics, computer science and mathematics, so that research in the field relates to natural language, namely the language used by people daily, and is closely related to the research of linguistics; natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

3) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

4) Machine Learning (ML) is a multi-domain cross discipline, relating to multi-domain disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Specially researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills; reorganizing the existing knowledge structure to improve the performance of the knowledge structure. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and inductive learning.

5) An artificial Neural Network is a mathematical model that mimics the structure and function of a biological Neural Network, and exemplary structures of the artificial Neural Network herein include Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and the like.

6) The loss function, also called cost function, is a function that maps the values of a random event or its related random variables to non-negative real numbers to represent the "risk" or "loss" of the random event.

7) A Block chain (Blockchain) is a storage structure for encrypted, chained transactions formed from blocks (blocks).

8) A Blockchain Network (Blockchain Network) incorporates new blocks into a set of nodes of a Blockchain in a consensus manner.

9) Ledger (legger) is a general term for blockchains (also called Ledger data) and state databases synchronized with blockchains. Wherein, the blockchain records the transaction in the form of a file in a file system; the state database records the transaction in the blockchain in the form of different types of Key (Key) Value pairs, and is used for supporting quick query of transaction data in the blockchain.

10) Intelligent Contracts (Smart Contracts), also known as chain codes (chaincodes) or application codes, are programs deployed in nodes of a blockchain network, and the nodes execute the intelligent Contracts called in received transactions to perform operations of updating or querying key-value data of a state database.

11) Consensus (Consensus), a process in a blockchain network, is used to agree on a transaction in a block between the nodes involved, the agreed block to be appended to the end of the blockchain and used to update the state database.

It should be noted that artificial intelligence is a comprehensive technique in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In addition, the artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology develops research and application in a plurality of fields; for example, common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autonomous, unmanned, robotic, smart medical, and smart customer service, etc.; with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important value; artificial intelligence can also be applied in the field of text processing, for example. The application of the artificial intelligence in the text processing field related in the embodiments of the present invention will be described later.

Generally, when entity recognition is performed, entity recognition is performed on a text by using a plurality of entity recognition methods to obtain a plurality of corresponding recognition results, and then one recognition result is selected from the plurality of recognition results, for example, one recognition result is selected from the plurality of recognition results by using a "leartorrank" model; or combining a plurality of recognition results through a predefined rule to obtain a final recognition result so as to realize entity recognition. However, in the process of implementing entity identification, the final identification result is obtained by selecting one identification result from a plurality of identification results, or by combining a plurality of identification results; therefore, the final recognition result is strongly correlated with the plurality of recognition results; thus, when there is an erroneous recognition result among the plurality of recognition results, the accuracy of the final recognition result is low, and therefore, the accuracy of the entity recognition is low. In addition, the predefined rule is set based on various conditions, so that a large amount of hard coding exists in implementation and the predefined rule is not easy to maintain.

It should be noted that, among a plurality of entity recognition methods employed for entity recognition of a text, there are a dictionary-based entity recognition method and a network model-based entity recognition method. For the entity recognition method based on the dictionary, a prefix tree is generally formed by collecting entities of various entity types, and then a text is matched with the formed prefix tree to obtain an entity recognition result of the text; however, when matching text to the constructed prefix tree, context information of words between the texts is not considered, thus causing a recall problem.

For the entity recognition method based on the network model, the entity recognition result of the text is generally determined through the network model. Generally, a network model for entity identification includes an input layer, a context coding layer, and a tag decoding layer; here, the input layer is used to perform vector representation on the input text, and is usually implemented by using "Pre-trained word embedding", "Character-level", "embedding", "POS tag", or "gazette", etc.; the context coding layer is used for performing semantic coding on the vector representation output by the input layer, and is generally implemented by adopting "CNN", "RNN", "Language model" or "Transformer" and the like; the tag decoding layer is used for decoding the semantic coding result to obtain an entity identification result, and is usually implemented by using "Softmax", "CRF", "RNN", or "Point network", and the like.

Referring to FIG. 1, FIG. 1 is a schematic diagram of an exemplary process for performing entity identification; as shown in fig. 1, an input text 1-1 "Michael Jeffrey Jordan waters born in Brooklyn, New York." is input into a network model 1-2 for entity recognition, and an entity recognition result 1-3 "Michael { B-PER } Jeffrey { I-PER } Jordan { E-PER } waters { O } born { O } in { O } Brooklyn { S-LOC }, { O } New { B-LOC } York { E-LOC }" is obtained; wherein the network model 1-2 comprises an input layer 1-21, a context coding layer 1-22 and a tag decoding layer 1-23, and thus is the entity recognition of the input text 1-1 by the input layer 1-21, the context coding layer 1-22 and the tag decoding layer 1-23.

In addition, regarding the error in the recognition result, one is the error caused by the recognition mode itself, for example, the accuracy of the network model of the entity recognition is low; the other is caused by a definition change of the entity type. For the definition change of the entity type, two situations exist; one situation is application scene change, for example, in some application scenes, the name type does not include the drama character, the game character and the like, while in other application scenes, the name type includes the drama character, the game character and the like, and at this time, different application scenes adopt the same entity identification method to obtain wrong entity identification results. Another case is that in different application phases in the application scenario, for example, in the first phase, the entity types include: the two corresponding entity recognition methods comprise an entity recognition method based on a dictionary and an entity recognition method based on a multilayer convolutional neural network model; in the second stage, the entity types are newly defined, and the new entity types include a name type (including the name type, the game commentator, the uploader ugc and the role name in the first stage, and a game organization in the first stage, such as "NBA", euro crown and "WWE"), an IP (Intellectual Property) type, a place name type and an organization type, and at this time, when the entity identification is performed by using the two entity identification methods in the first stage, an erroneous identification result is obtained.

Based on this, the embodiments of the present invention provide an entity identification method and device, which can correct an erroneous identification result in an entity identification process, and can perform entity identification by using richer features to be identified, so that the accuracy of an obtained entity identification result is high, and the accuracy of the entity identification result is improved. An exemplary application of the entity identification device provided in the embodiment of the present invention is described below, and the entity identification device provided in the embodiment of the present invention may be implemented as various types of user terminals such as a smart phone, a tablet computer, and a notebook computer, and may also be implemented as a server. In the following, an exemplary application will be explained when the entity identifying apparatus is implemented as a server.

It should be noted that the embodiments of the present invention may also be implemented by combining a blockchain technology, where a blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, and an encryption algorithm. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using cryptography, each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. For the entity identification method combined with the blockchain technique provided by the embodiment of the present invention, specific reference is made to the following description.

Referring to fig. 2, fig. 2 is an alternative architecture diagram of an entity identification system according to an embodiment of the present invention; as shown in fig. 2, in order to support an entity identification application, the terminal 200 is connected to the server 400 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 200 is configured to send information to be processed to the server 400 through the network 300.

A server 400, configured to obtain information to be processed and text information corresponding to the information to be processed from the terminal 200 through the network 300, so as to obtain text information to be identified; adopting at least one basic entity recognition model to perform entity recognition on the text information to be recognized to obtain at least one corresponding recognition result; extracting text features of text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and performing feature construction on at least one recognition result to obtain basic model features; and (4) combining the basic model characteristics and the target text characteristics to perform entity recognition to obtain a target recognition result.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a server in fig. 2 according to an embodiment of the present invention; the server 400 shown in fig. 3 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in server 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in FIG. 3.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments of the invention is intended to comprise any suitable type of memory. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;

a display module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the entity identification apparatus provided by the embodiments of the present invention may be implemented in software, and fig. 3 illustrates the entity identification apparatus 455 stored in the memory 450, which may be software in the form of programs and plug-ins, and includes the following software modules: an information acquisition module 4551, a basic identification module 4552, a fusion identification module 4553, an application module 4554, and a blockchain module 4555, the functions of which will be described below.

In other embodiments, the entity identifying apparatus provided in the embodiments of the present invention may be implemented in hardware, and as an example, the entity identifying apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the entity identifying method provided in the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In the following, the entity identification method provided by the embodiment of the present invention will be described in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present invention.

Referring to fig. 4, fig. 4 is an alternative flowchart of the entity identification method according to the embodiment of the present invention, which will be described with reference to the steps shown in fig. 4.

S101, obtaining information to be processed and text information corresponding to the information to be processed to obtain text information to be identified.

In the embodiment of the invention, when the entity identification equipment identifies the entity, the object identified by the entity is obtained, and the information to be processed is also obtained; the information to be processed can be information in any format, such as text, video, audio or document, and only the information in the text format can be subjected to entity identification, so that the entity identification equipment acquires the text information corresponding to the information to be processed, and obtains the text information to be identified so as to perform entity identification on the text information to be identified.

It should be noted that, after the entity identification device obtains the information to be processed, for the information to be processed in the text format, the information to be processed is directly used as the text information to be identified; for the information to be processed in other non-text formats such as audio or video, the information to be processed in other formats such as audio or video is converted into text information, and text information to be identified is obtained.

S102, performing entity recognition on the text information to be recognized by adopting at least one basic entity recognition model to obtain at least one corresponding recognition result; at least one underlying entity recognition model refers to at least one manner for performing entity recognition.

It should be noted that, different entity identification methods, that is, at least one basic entity identification model, are preset in the entity identification device, and are used for performing entity identification by using at least one entity identification method; that is, the at least one basic entity recognition model refers to at least one means for performing entity recognition, such as a dictionary-based entity recognition method and an entity recognition method based on various network models, and the like.

In the embodiment of the invention, after the entity recognition equipment obtains the text information to be recognized, entity recognition is firstly carried out on the text information to be recognized by utilizing each basic entity recognition model in at least one basic entity recognition model to obtain the recognition result corresponding to each basic entity recognition model, so that when the entity recognition of the text information to be recognized by at least one basic entity recognition model is completed, at least one entity recognition result corresponding to at least one basic entity recognition model is obtained; it is easy to know that at least one entity recognition result corresponds to at least one basic entity recognition model one to one.

S103, extracting text features of text information to be recognized by adopting a fusion entity recognition model to obtain target text features, and performing feature construction on at least one recognition result to obtain basic model features; carrying out entity recognition by combining the basic model features and the target text features to obtain a target recognition result; the fused entity recognition model is used for carrying out entity recognition on the text information to be recognized by utilizing at least one recognition result, the target text characteristics refer to characteristics related to characters and character strings in the text information to be recognized, and the target recognition result is the entity recognition result of the information to be processed.

In the embodiment of the present invention, after obtaining at least one recognition result, the entity recognition device does not directly select one recognition result from the at least one recognition result as a final recognition result, nor combine the at least one recognition result into a final recognition result, but constructs a feature for performing entity recognition again using the at least one recognition result, and combines a text feature, i.e., a target text feature, of the extracted text information to be recognized together as a feature for performing entity recognition again, so as to perform entity recognition again, where the obtained recognition result, i.e., the final recognition result, is referred to as a target recognition result.

It should be noted that the fused entity recognition model is used for performing entity recognition on text information to be recognized by using at least one recognition result, for example, "BI-LSTM-CRF" in the sequence labeling model; the target text features refer to features of characters and character strings in the text information to be recognized, such as vector features of the characters, features corresponding to the relationship between the characters and the character strings, features corresponding to the character strings, and the like; the target recognition result is an entity recognition result of the information to be processed. In addition, the entity types targeted by the at least one basic entity recognition model may be the same or different, and the entity types targeted by the at least one basic entity recognition model and the entity types targeted by the fused entity recognition model may be the same or different; and when the entity type targeted by the at least one basic entity identification model is different from the entity type targeted by the fused entity identification model, the entity type targeted by the fused entity identification model includes the entity type targeted by the at least one basic entity identification model, for example, in the entity types targeted by the fused entity identification model, the name type includes: the name of a person, the name of a role and the name of a movie and television drama are different entity types in the entity types aimed by the at least one basic entity recognition model.

It should be further noted that training data in the training process of the fused entity recognition model may be manually labeled or labeled by means of a tool, which is not specifically limited in the embodiment of the present invention; and the labeled corpus can be, for example, a video search string, a news title, and the like.

It can be understood that the obtained target recognition result is obtained by combining at least one recognition result and the text feature corresponding to the text information to be recognized for feature construction; therefore, on one hand, even if at least one recognition result has a wrong recognition result, the text features corresponding to the text information to be recognized can correct the wrong recognition result; on the other hand, the text features corresponding to the text information to be recognized and at least one recognition result are combined to perform feature construction to obtain the features to be recognized for entity recognition, so that the features to be recognized are rich, and the accuracy of the obtained target recognition result is high when the features to be recognized are subjected to entity recognition; in summary, the entity identification method provided by the embodiment of the invention improves the accuracy of entity identification.

Further, in an embodiment of the present invention, the at least one basic entity recognition model comprises at least one of a dictionary recognition model, an artificial intelligence recognition model, and a new entity type recognition model; the dictionary identification model refers to a mode of carrying out entity identification by utilizing an entity dictionary based on historical entity types, the historical entity types are historically defined entity types, namely the entity types targeted by the dictionary identification model are historically defined entity types, and the adopted entity identification method is a dictionary-based entity identification method; the artificial intelligence recognition model refers to a mode of entity recognition based on historical entity types and by utilizing a network model, the network model refers to an artificial neural network model, namely the entity type targeted by the artificial intelligence recognition model is the entity type defined by the history, and the adopted entity recognition method is an entity recognition method based on the network model, such as a CNN model; the new entity type identification model refers to a manner of entity identification based on a new entity type, and the new entity type is a newly defined entity type, that is, the entity type targeted by the new entity type identification model is the newly defined entity type, and the adopted entity identification method is a dictionary-based entity identification method or a network model-based entity identification method, for example, a CNN model based on an attention mechanism; here, the new entity type includes a historical entity type.

Exemplarily, referring to fig. 5, fig. 5 is a relationship diagram of an entity recognition model provided by an embodiment of the present invention; as shown in FIG. 5, at least one basic entity recognition model 5-1 includes three sub-models, namely a dictionary recognition model 5-11, an artificial intelligence recognition model 5-12 and a new entity type recognition model 5-13, and the output of the three sub-models is the input of the fused entity recognition model 5-2.

Further, in this embodiment of the present invention, before S102, that is, before the entity recognition device performs entity recognition on the text information to be processed by using at least one basic entity recognition model to obtain at least one corresponding recognition result, the entity recognition method further includes a training process of a new entity type recognition model.

Here, the entity recognition equipment utilizes a dictionary recognition model and an artificial intelligence recognition model to label training data (such as a video search string), and selects the training data with the same labeling result to form target training data for training a new entity type recognition model

Further, referring to fig. 6, fig. 6 is an optional flowchart illustrating the extracting of the target text feature according to the embodiment of the present invention; as shown in fig. 6, in S103 of the embodiment of the present invention, the entity identification device extracts the text features of the text information to be identified, and obtains the target text features, including S1031 and S1032, which is described below with reference to the steps shown in fig. 6.

And S1031, performing word segmentation processing on the text information to be recognized to obtain a character string sequence.

It should be noted that, when the entity identification device performs feature extraction on the text information to be identified, the feature extraction is performed by taking characters in the text information to be identified as a unit; therefore, word segmentation processing needs to be performed on the text information to be recognized, and then feature extraction is performed on characters in the character string. Here, the obtained word segmentation processing result is a character string sequence, where the character string sequence refers to a sequence formed by sequentially forming character strings in the text information to be recognized, and includes at least one character string, and each character string is a character string in the text information to be recognized. In addition, the character is the minimum unit for composing the text, for example, it can be a single character in Chinese, and can also be a single word in other languages; and the character string may be a word, including at least one character.

S1032, extracting character text features corresponding to the current target characters, so as to obtain character string text features corresponding to the current character string, and further obtain target text features corresponding to the character string sequence; the current character string is any character string in the character string sequence, the current character is any character in the current character string, the character string text features comprise at least one character text feature, the target text features comprise at least one character string text feature, and the character text features comprise at least one of semantic features of the current character, position information of the current character in the current character string, semantic features of the current character string and part-of-speech information of the current character string.

It should be noted that, the entity identification device traverses each character string in the character string sequence, and in the traversing process, the traversed current character string is the current character string, which is easily known, and the current character string is any character string in the character string sequence; aiming at the current character string, the entity recognition equipment traverses each character in the current character string, and in the traversing process, the traversed current character is the current character, and the current character is any character in the current character string.

In the embodiment of the invention, the entity recognition equipment extracts text features from at least one of four dimensions (characters, positions of the characters in character strings, parts of speech of the character strings and character strings) aiming at the current characters, wherein the extracted features are text features corresponding to the current characters, namely character text features; the entity recognition equipment extracts the text features according to the extraction process of the text features of the current characters aiming at each character in the current character string, so that when the extraction of the text features of each character in the current character string is completed, the extraction of the text features of the current character string is also completed, and the text features of the character string comprising at least one character text feature are obtained; it is readily appreciated that the number of character text features contained in a character string text feature is the number of characters contained in the current character string. The entity recognition equipment extracts the text features according to the extraction process of the text features of the current character string aiming at each character string in the character string sequence, so that when the extraction of the text features of each character string in the character string sequence is completed, the extraction of the text features of the character string sequence is also completed, and the target text features comprising at least one character string text feature are also obtained; it is easy to know that the number of character string text features included in the target text feature is the number of character strings included in the character string sequence.

It should be noted that the character text feature includes at least one of a semantic feature of the current character, position information of the current character in the current character string, a semantic feature of the current character string, and part-of-speech information of the current character string; when the character text features comprise features constructed based on the dimension of the position of the character in the character string, the character text features comprise the position information of the current target character in the current character string; illustratively, the position information of the current target character in the current character string may be represented by an IOBES (intermediate Other edge single) label (where B represents the beginning, I represents the middle, E represents the End, S represents a single character, and O represents Other labels for marking irrelevant characters), and features corresponding to the position information are obtained by mapping the IOBES label to 5 identifiers respectively.

Further, referring to fig. 7, fig. 7 is an alternative flow chart illustrating obtaining features of a base model according to an embodiment of the present invention; as shown in fig. 7, in S103 of the embodiment of the present invention, the entity identification device performs feature construction on at least one identification result to obtain basic model features, including S1033-S1035, which will be described with reference to the steps shown in fig. 7.

S1033, respectively performing character entity type feature construction and character position feature construction on the current recognition result to obtain a current character entity type feature and a current character position feature, so as to obtain at least one current character entity type feature and at least one current character position feature corresponding to at least one recognition result; the current recognition result is any one of the at least one recognition result.

It should be noted that, the entity identification device traverses each identification result in the at least one identification result, and in the traversing process, the currently traversed identification result is the current identification result; it is easy to know that the current recognition result is any one of the at least one recognition result.

In the embodiment of the invention, the identification result comprises an entity and an entity type corresponding to the entity, and the entity is composed of at least one character; therefore, the entity recognition device can perform feature construction on the entity type corresponding to the character in the current recognition result, and when the feature construction on the entity type corresponding to the character in the current recognition result is completed, the obtained construction feature is the current character entity type feature corresponding to the current recognition result. The recognition result comprises the position information of the character in the corresponding character string; therefore, the entity recognition device can perform feature construction on the position information of the characters in the current recognition result in the corresponding character string, and when the feature construction on the position information of the characters in the current recognition result in the corresponding character string is completed, the obtained construction features are the current character position features corresponding to the current recognition result. Therefore, when the feature construction of at least one recognition result is completed, at least one corresponding current character entity type feature and at least one corresponding current character position feature can be obtained. And the at least one current character entity type characteristic and the at least one current character position characteristic are in one-to-one correspondence with the at least one recognition result.

It should be noted that the current character entity type feature refers to a set formed by features corresponding to an entity type to which each character belongs in the current recognition result; the current character position feature refers to a set formed by features corresponding to the position of each character in the corresponding word in the current recognition result. In addition, the entity type includes a type of non-entity in addition to types to which various entities belong, and the corresponding character string of the character includes an entity word and a non-entity word, and the non-entity word is, for example, a single character, other characters, irrelevant characters, and the like.

S1034, counting the voting times of each entity type of each character in the text information to be recognized, which belongs to the entity types, from the at least one recognition result to obtain voting characteristics.

In the embodiment of the invention, the entity identification device counts the voting times of each character in the text information to be identified, which belongs to each entity type in the entity types, according to the entity in at least one identification result and the entity type to which the entity belongs, and takes the voting times as the voting characteristics.

It should be noted that, when the error existing in the at least one recognition result is caused by the change of the entity type definition, S1034 includes counting, from the at least one recognition result, the number of votes that each character in the text information to be recognized belongs to each entity type in the new entity types, so as to obtain the voting characteristics. Here, when the entity type corresponding to the at least one recognition result is the historical entity type, statistics of the number of votes is further performed according to a mapping relationship between the historical entity type and the new entity type.

And S1035, fusing at least one current character entity type characteristic, at least one current character position characteristic and a voting characteristic to obtain a basic model characteristic.

In the embodiment of the present invention, after the entity identification device obtains at least one current character entity type feature, at least one current character position feature and a voting feature, the granularity corresponding to the at least one current character entity type feature, the at least one current character position feature and the voting feature is a character; therefore, the entity recognition device can fuse at least one current character entity type feature, at least one current character position feature and a voting feature by taking the character as a unit, and fuse the result, namely the basic model feature.

Further, in S1033 in the embodiment of the present invention, the entity identification device performs feature construction of the character entity type and feature construction of the character position on the current identification result, respectively, to obtain a current character entity type feature and a current character position feature, which includes S10331 to S10334, and the following describes each step separately.

S10331, obtaining an entity type corresponding to each character in the current recognition result, and obtaining entity type information corresponding to the character.

In the embodiment of the invention, as the current recognition result comprises the entity and the entity type corresponding to the entity, and the entity is composed of at least one character, the entity recognition device can acquire the entity type corresponding to each character in the current recognition result, namely the entity type information corresponding to the character; it is easy to know that the character-corresponding entity type information corresponds to each character.

S10332, performing feature construction on the entity type information corresponding to the character to obtain entity type features corresponding to the character, so as to obtain the entity type features of the current character corresponding to the current recognition result; the current character entity type feature comprises at least one character corresponding entity type feature.

In the embodiment of the invention, after the entity identification equipment obtains the entity type information corresponding to the characters, the entity type information corresponding to the characters is subjected to feature construction, and then the entity type features corresponding to the characters are obtained; moreover, because the entity type features corresponding to the characters correspond to each character in the current recognition result, when the feature construction of all characters in the current recognition result is completed, the feature construction of the current recognition result is also completed, and the current character entity type features comprising at least one entity type feature corresponding to the characters are obtained; it is easy to know that the current character entity type feature corresponds to the current recognition result, and the number of the entity type features corresponding to the characters included in the current character entity type feature is the number of the characters in the text information to be recognized.

S10333, obtaining the position of each word in the corresponding character string in the current recognition result, and obtaining the position information of the character string corresponding to the word.

In the embodiment of the present invention, since the current recognition result includes the position information of the character in the character string to which the character belongs, the entity recognition device can obtain the position of each character in the current recognition result in the corresponding character string, that is, the position information of the character corresponding to the character string; it is easy to know that character-corresponding character string position information corresponds to each character.

S10334, performing feature construction on the character string position information corresponding to the character to obtain a character string position feature corresponding to the character, so as to obtain a current character position feature corresponding to a current recognition result; the current character position feature includes at least one character-corresponding string position feature.

In the embodiment of the invention, after the entity recognition equipment obtains the position information of the word corresponding to the character, the character string position information corresponding to the character is subjected to characteristic construction, and the character string position characteristic corresponding to the character is obtained; moreover, because the character string position features corresponding to the characters correspond to each character in the current recognition result, when the feature construction of all characters in the current recognition result is completed, the feature construction of the current recognition result is completed, and the current character position features including at least one character string position feature corresponding to the characters are obtained; it is easy to know that the current character position features correspond to the current recognition result, and the number of the character string position features corresponding to the characters included in the current character position features is the number of the characters in the text information to be recognized.

Further, in the embodiment of the present invention, when the dictionary recognition model is included in the at least one basic entity recognition model, S1035 is preceded by: s1036 and S1037; that is to say, before the entity identification device fuses at least one current character entity type feature, at least one current character position feature and the voting feature to obtain the basic model feature, the entity identification method further includes S1036 and S1037, and the following steps are respectively explained.

S1036, obtaining a dictionary recognition result from the at least one recognition result; and the dictionary identification result is a result of identifying the text information to be identified by the dictionary identification model.

In the embodiment of the invention, when the text information to be recognized is considered as a complete entity, the accuracy of the recognition result corresponding to the dictionary recognition model is high; therefore, in order to enhance the confidence of the recognition result obtained by the dictionary recognition model on the text information to be recognized in this case, the entity recognition equipment firstly obtains the result of the recognition of the text information to be recognized by the dictionary recognition model from at least one recognition result, and at this moment, the dictionary recognition result is also obtained.

S1037, obtaining a length ratio of an entity length corresponding to each character in the dictionary recognition result to the text length, and obtaining at least one length ratio corresponding to the dictionary recognition result.

In the embodiment of the invention, the entity recognition equipment increases the characteristic of the length ratio of the entity length to the text length so as to enhance the dictionary recognition result when the text information to be recognized is a complete entity. It is easy to know that the length ratio corresponding to the word in the non-entity is 0, and the number of the length ratios included in at least one length ratio is the number of the words in the text information to be recognized.

It should be noted that, in each character of the same entity, the length ratio of the corresponding entity length to the text length is the same.

Accordingly, in S1035 in the embodiment of the present invention, the merging, by the entity identification apparatus, at least one current character entity type feature, at least one current character position feature, and the voting feature to obtain a basic model feature includes: and the entity recognition equipment fuses at least one length ratio, at least one current character entity type characteristic, at least one current character position characteristic and voting characteristic to obtain a basic model characteristic.

Exemplarily, referring to fig. 8, fig. 8 is a schematic structural diagram of an input feature of a fused entity recognition model according to an embodiment of the present invention; as shown in fig. 8, in the input features of the fused entity recognition model, the feature 8-1 corresponding to each word includes two parts: model features 8-21 (features corresponding to each word among the basic model features) corresponding to each word and text features 8-22 (features corresponding to each word among the target text features) corresponding to each word; the model features 8-21 corresponding to each word comprise dictionary identification features 8-31 corresponding to each word (the features of the identification result structure corresponding to the dictionary identification model for each word), network model identification features 8-32 corresponding to each word (the features of the identification result structure corresponding to the artificial intelligence identification model for each word), new entity type network model identification features 8-33 corresponding to each word (the features of the identification result structure corresponding to the new entity type identification model for each word) and identification result voting features 8-34 corresponding to each word (the features of the voting features corresponding to each word). Here, the dictionary identification features 8-31 corresponding to each word further include entity type features 8-311 corresponding to each word (current character entity type features), position features 8-312 of words in entity words (current character position features), and a ratio of entity length to text length 8-313 corresponding to each word (length ratio); the network model identification characteristics 8-32 corresponding to each word also comprise entity type characteristics 8-321 (current character entity type characteristics) corresponding to each word and position characteristics 8-322 (current character position characteristics) of the word in the entity word; the new entity type network model identification characteristics 8-33 corresponding to each word also comprise entity type characteristics 8-331 (current character entity type characteristics) corresponding to each word and position characteristics 8-332 (current character position characteristics) of the word in the entity word; in addition, when the new entity type includes an IP type, a name type, an address type, and a facility type, the recognition result voting characteristics 8-34 corresponding to each word further include the number of votes 8-341 belonging to the IP type, the number of votes 8-342 belonging to the name type, the number of votes 8-343 belonging to the address type, and the number of votes 8-344 belonging to the facility type. And the text features 8-22 corresponding to each word include a word vector 8-221 (semantic features of the current character), a word vector 8-222 (semantic features of the current character string), a part-of-speech vector 8-223 (part-of-speech information of the current character string), and position features 8-224 of the word (position information of the current character in the current character string).

It should be noted that, since the richer the features are, the more accurate the obtained entity recognition result is, when the fused entity recognition model in the embodiment of the present invention uses the target text features and the basic model features, and the comparison model only uses the target text features to perform entity recognition of the IP type, the name type, the address type, and the organization type, and uses three indexes of Precision (Precision, P for short), Recall (Recall, R for short) and a ratio (F1 Score, F1) for evaluation, the evaluation result is as shown in table 1:

TABLE 1

When the dictionary recognition model, the artificial intelligence recognition model, the new entity type recognition model, and the fused entity recognition model are used for entity recognition of the IP type, the name type, the address type, and the institution type, and evaluation is performed using F1, the evaluation results are shown in table 2:

TABLE 2

Further, in S103 of the embodiment of the present invention, the entity identification device performs entity identification by combining the basic model features and the target text features to obtain a target identification result, which includes S1038-S10310, and the following steps are respectively described.

And S1038, splicing the basic model features and the target text features to obtain features to be identified.

In the embodiment of the invention, because the basic model features and the target text features are both features taking characters as granularity, the entity recognition equipment can perform fusion of the basic model features and the target text features by taking the characters as units; here, fusion is realized by splicing, and the spliced features are the features to be identified.

Illustratively, based on the feature structure shown in fig. 8, the feature to be recognized is represented in a data format of "libsvm", that is, "character 1 st feature: a characteristic value; the 2 nd feature: a characteristic value; … …, respectively; feature 15: when the characteristic value is' zero; aiming at the text to be recognized, namely the bean duck. Jackson dance teaching! "corresponds to the feature input format to be recognized as shown in fig. 9; wherein, the serial number 1 is followed by each character of the text to be recognized, the serial numbers 2 to 16 correspond to the 15 features in FIG. 8, and "! | A \ n \ n' segmentation.

S1039, carrying out semantic coding on the features to be recognized to obtain a semantic coding result.

In the embodiment of the invention, the entity identification equipment carries out semantic coding on the features to be identified, so that a semantic coding result is obtained, and the semantic coding result is used for determining the entity identification result of the information to be processed.

And S10310, decoding the semantic coding result to realize entity identification, and obtaining a target identification result.

In the embodiment of the invention, after the entity identification equipment obtains the semantic coding result, the semantic coding result is decoded, so that the entity identification of the text information to be identified can be realized, and the decoding result is the target identification result.

Further, referring to fig. 10, fig. 10 is a schematic diagram illustrating another alternative flow of the entity identification method according to the embodiment of the present invention; as shown in fig. 10, in the embodiment of the present invention, after S103, the entity identification method further includes S104 and S105, which are described below with reference to the steps shown in fig. 10.

And S104, extracting the entity and the entity type corresponding to the entity from the target identification result to obtain the information of the entity to be processed.

In the embodiment of the invention, after the entity identification device obtains the target identification result, the target identification result comprises the entity and the entity type corresponding to the entity; therefore, the entity recognition apparatus can extract the entity and the entity type corresponding to the entity from the target recognition result, where the extracted entity and the entity type corresponding to the entity are collectively used as the to-be-processed entity information.

S105, performing information retrieval according to the entity information to be processed in a preset resource library to obtain an information retrieval result; and the information retrieval result is information associated with the information to be processed in the preset resource library.

In the embodiment of the invention, entity identification is carried out on information to be processed, and the information related to the information to be processed is obtained from a preset resource library according to an obtained target identification result; therefore, after the entity identification device obtains the entity information to be processed from the target identification result, the entity information to be processed is used as a retrieval key word to perform information retrieval in a preset resource library, so that an information retrieval result is obtained; it is easy to know that the information retrieval result is information associated with the information to be processed in the preset resource library.

Further, referring to fig. 11, fig. 11 is a schematic diagram illustrating a further alternative flow of the entity identification method according to the embodiment of the present invention; as shown in fig. 11, in the embodiment of the present invention, after S103, the entity identification method further includes S106, which is described below with reference to the steps shown in fig. 11.

And S106, sending the information to be processed and the target identification result to the block chain network so that the node of the block chain network fills the information to be processed and the target identification result into the new block, and when the new blocks are identified in a consistent manner, adding the new block to the tail part of the block chain to complete the uplink.

It should be noted that the entity identification device uplinks the information to be processed and the target identification result to ensure that the information cannot be tampered.

Based on the entity identification method shown in fig. 4, referring to fig. 12, fig. 12 is another alternative architecture diagram of the entity identification system provided in the embodiment of the present invention, and includes a blockchain network 600 (exemplarily illustrating a consensus node 610-1 to a consensus node 610-3), an authentication center 700, a service entity 800, and a service entity 900, which are respectively described below.

The type of blockchain network 600 is flexible and may be, for example, any of a public chain, a private chain, or a federation chain. Taking a public link as an example, electronic devices such as a user terminal and a server of any service entity can access the blockchain network 600 without authorization; taking a federation chain as an example, an electronic device (e.g., a terminal/server) under the jurisdiction of a service entity after obtaining authorization may access the blockchain network 600, and at this time, become a client node in the blockchain network 600.

In some embodiments, the client node may act as a mere watcher of the blockchain network 600, i.e., provide functionality to support the business entity to initiate transactions (e.g., for uplink storage of data or querying of data on the chain), and may be implemented by default or selectively (e.g., depending on the specific business requirements of the business entity) with respect to the functions of the nodes of the blockchain network 600, such as the ranking function, consensus service, and ledger function, etc. Therefore, the data and the service processing logic of the service subject can be migrated to the blockchain network 600 to the maximum extent, and the credibility and traceability of the data and service processing process are realized through the blockchain network 600.

Nodes in blockchain network 600 receive transactions submitted from client nodes (e.g., client node 810, i.e., entity identification devices, shown in fig. 12 as belonging to business entity 800) of different business entities (e.g., business entity 800 shown in fig. 12), perform the transactions to update ledgers or query ledgers, and various intermediate or final results of performing the transactions may be returned for display in the client nodes of the business entity.

An exemplary application of the blockchain network is described below, taking the example that a plurality of service entities access the blockchain network to realize entity identification.

With continued reference to fig. 12, the involved business entity 800 may be an entity identification system, and the business entity 900 may be a retrieval system based on entity identification, and registers with the certificate authority 700 to obtain a respective digital certificate, where the digital certificate includes the public key of the business entity and a digital signature signed by the certificate authority 700 for the public key and identity information of the business entity, and is used to be attached to the transaction together with the digital signature of the business entity for the transaction, and is sent to the blockchain network, so that the blockchain network takes the digital certificate and signature from the transaction, verifies the authenticity of the message (i.e., whether the message is not tampered) and the identity information of the business entity sending the message, and verifies the blockchain network according to the identity, for example, whether the blockchain network has the right to initiate the transaction. Clients running electronic devices (e.g., terminals or servers) hosted by the business entity may request access from the blockchain network 600 to become client nodes.

The client node 810 of the service body 800 is configured to obtain information to be processed and obtain a target identification result corresponding to the information to be processed; the information to be processed and the target recognition result are sent to the blockchain network 600.

The operation of sending the information to be processed and the target identification result to the blockchain network 600 may be to set a service logic in the client node 810 in advance, and when the acquisition of the target identification result corresponding to the information to be processed is completed, the client node 810 automatically sends the information to be processed and the target identification result to the blockchain network 600, or a service person of the service body 800 logs in the client node 810, manually packages the information to be processed and the target identification result, and sends the information to be processed and the target identification result to the blockchain network 600. Upon transmission, the client node 810 generates a transaction corresponding to the update operation based on the information to be processed and the target identification result, specifies in the transaction the smart contract that needs to be invoked to implement the update operation, and the parameters passed to the smart contract, and the transaction also carries the digital certificate of the client node 810, a signed digital signature (e.g., obtained by encrypting a digest of the transaction using a private key in the digital certificate of the client node 810), and broadcasts the transaction to the consensus nodes in the blockchain network 600.

When a transaction is received in a consensus node in the blockchain network 600, a digital certificate and a digital signature carried by the transaction are verified, after the verification is successful, whether the service body 800 has a transaction right or not is determined according to the identity of the service body 800 carried in the transaction, and the transaction fails due to any verification judgment of the digital signature and the right verification. After successful verification, the consensus node's own digital signature (e.g., encrypted using the private key of the consensus node 610-1 to obtain a digest of the transaction) is signed and broadcast on the blockchain network 600.

After the consensus node in the blockchain network 600 receives the transaction successfully verified, the transaction is filled into a new block and broadcast. When a new block is broadcasted by a consensus node in the block chain network 600, performing a consensus process on the new block, if the consensus is successful, adding the new block to the tail of the block chain stored in the new block, updating the state database according to a transaction result, and executing a transaction in the new block: for a transaction submitting updated information to be processed and a target identification result, a key-value pair comprising the information to be processed and the target identification result is added to the state database.

A service person of the service agent 900 logs in the client node 910 (terminal 400), inputs an inquiry request of an entity identification result of information to be processed, the client node 910 obtains the inquiry request to generate a transaction corresponding to an update operation/inquiry operation, specifies an intelligent contract to be called for realizing the update operation/inquiry operation and parameters transferred to the intelligent contract in the transaction, and broadcasts the transaction to a consensus node in the blockchain network 600, wherein the transaction also carries a digital certificate of the client node 910 and a signed digital signature (for example, a digest of the transaction is encrypted by using a private key in the digital certificate of the client node 910). The query request is used for querying the information to be processed and the target identification result.

After receiving the transaction in the consensus node in the block chain network 600, verifying the transaction, filling the block and making the consensus consistent, adding the filled new block to the tail of the block chain stored by the block chain, updating the state database according to the transaction result, and executing the transaction in the new block; for example, for the submitted transaction inquiring the identification result of the information to be processed, the key value pair corresponding to the multimedia text is inquired from the state database, and the transaction result is returned.

It should be noted that fig. 12 illustrates an example of a process of directly linking the pending information and the target identification result, but in other embodiments, for a case that the data size of the pending information and the target identification result is large, the client node 810 may link the pending information and the target identification result in pairs, and store the original pending information and the target identification result in a distributed file system or a database. After the client node 910 obtains the information to be processed and the target identification result from the distributed file system or the database, it may perform verification by combining with the corresponding hash in the blockchain network 600, thereby reducing the workload of uplink operation.

As an example of a block chain, referring to fig. 13, fig. 13 is a schematic structural diagram of a block chain in a block chain network provided in an embodiment of the present invention, where a header of each block may include hash values of all transactions in the block and also include hash values of all transactions in a previous block, a record of a newly generated transaction is filled in the block and is added to a tail of the block chain after being identified by nodes in the block chain network, so as to form a chain growth, and a chain structure based on hash values between blocks ensures tamper resistance and forgery prevention of transactions in the block.

An exemplary functional architecture of the blockchain network provided by the embodiment of the present invention is described below, referring to fig. 14, fig. 14 is a schematic functional architecture diagram of the blockchain network provided by the embodiment of the present invention, which includes an application layer 601, a consensus layer 602, a network layer 603, a data layer 604, and a resource layer 605, which are described below.

The resource layer 605 encapsulates the computing, storage, and communication resources that implement each of the consensus nodes 610 in the blockchain network 600.

The data layer 604 encapsulates various data structures that implement the ledger, including blockchains implemented in files in a file system, state databases of the key-value type, and presence certificates (e.g., hash trees of transactions in blocks).

The network layer 603 encapsulates the functions of a Point-to-Point (P2P) network protocol, a data propagation mechanism and a data verification mechanism, an access authentication mechanism, and service agent identity management.

The P2P network protocol implements communication between nodes in the blockchain network 600, the data propagation mechanism ensures propagation of transactions in the blockchain network 600, and the data verification mechanism implements reliability of data transmission between nodes based on cryptography methods (e.g., digital certificates, digital signatures, public/private key pairs); the access authentication mechanism is used for authenticating the identity of the service subject added to the block chain network 600 according to an actual service scene, and endowing the service subject with the authority of accessing the block chain network 600 when the authentication is passed; the business entity identity management is used to store the identity of the business entity that is allowed to access blockchain network 600, as well as the permissions (e.g., the types of transactions that can be initiated).

The consensus layer 602 encapsulates the functions of consensus nodes in the blockchain network 600 to agree on a block (i.e., a consensus mechanism), transaction management, and ledger management. The consensus mechanism comprises consensus algorithms such as POS, POW and D POS, and the pluggable consensus algorithm is supported.

The transaction management is used for verifying the digital signature carried in the transaction received by the node, verifying the identity information of the business body and judging and confirming whether the business body has the authority to carry out the transaction (reading the related information from the identity management of the business body) according to the identity information; for the service entities authorized to access the blockchain network 600, the service entities have digital certificates issued by the certificate authority, and the service entities sign the submitted transactions by using the private keys in their digital certificates, thereby declaring their own legal identities.

The ledger administration is used to maintain blockchains and state databases. For the block with the consensus, adding the block to the tail of the block chain; executing the transaction in the acquired consensus block, updating the key-value pairs in the state database when the transaction comprises an update operation, querying the key-value pairs in the state database when the transaction comprises a query operation and returning a query result to the client node of the business entity. Supporting query operations for multiple dimensions of a state database, comprising: querying the chunk based on the chunk sequence number (e.g., hash value of the transaction); inquiring the block according to the block hash value; inquiring a block according to the transaction serial number; inquiring the transaction according to the transaction serial number; inquiring account data of a business main body according to an account (serial number) of the business main body; and inquiring the block chain in the channel according to the channel name.

The application layer 601 encapsulates various services that the blockchain network can implement, including tracing, crediting, and verifying transactions.

In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.

For example, referring to fig. 15, fig. 15 is a schematic diagram of an application of an exemplary entity identification provided by an embodiment of the present invention; as shown in fig. 15, in the stage of establishing the index of the search scenario, the document title 15-21 (text information to be identified) of the document 15-11 (information to be processed) is obtained, and the entity identification method 15-3 provided by the embodiment of the present invention is used to perform entity identification on the document title 15-21 to obtain the target identification result 15-41, and then the document 15-11 is arranged upside down according to the target identification result 15-41 and stored in the index database 15-5 (preset repository).

In the retrieval stage of a search scene, corresponding text information is extracted from input information 15-12 (information to be processed) to obtain input text 15-22 (text information to be identified), entity identification is carried out on the input text 15-22 by using an entity identification method 15-3 provided by the embodiment of the invention to obtain a target identification result 15-42, then document recall is carried out from an index library 15-5 according to the target identification result 15-42 to obtain a recalled document 15-7 (information retrieval result), the recalled document 15-7 is used as a supplement of the recalled document, and is combined with the recalled documents in other modes for sequencing and outputting a retrieval result 15-8 so as to realize presentation of the retrieval result.

Referring to fig. 16, in order to perform entity recognition on the document title 15-21 by using the entity recognition method 15-3 provided by the embodiment of the present invention, first, at least one recognition result 16-11 of the document title 16-1 (the document title 15-21 in fig. 15) is extracted, and then a feature is constructed by using the feature construction module 16-2 according to the at least one recognition result 16-11: at least one length ratio value 16-21, at least one current character entity type feature 16-22, at least one current character position feature 16-23 and a voting feature 16-24; and extracts the target text features 16-25 of the document title 16-1 using the feature construction module 16-2. Secondly, splicing at least one length ratio 16-21, at least one current character entity type characteristic 16-22, at least one current character position characteristic 16-23, a voting characteristic 16-24 and a target text characteristic 16-25 according to word granularity to obtain a characteristic 16-26 to be recognized. Then, the feature 16-26 to be recognized is sequentially input to the "Bi LSTM" model 16-3 (including a plurality of neurons 16-31) and the "CRF" layer 16-4, and the target recognition result 16-5 (target recognition result 15-41 in fig. 15) is obtained. Here, the feature construction module 16-2, the "Bi LSTM" model 16-3, and the "CRF" layer 16-4 are the fused entity recognition models. In addition, the process of performing entity identification on the input text 15-22 by using the entity identification method 15-3 provided by the embodiment of the present invention is similar to the process of performing entity identification on the document title 15-21 by using the entity identification method 15-3 provided by the embodiment of the present invention, and the embodiment of the present invention is not described herein again.

Continuing with the exemplary structure of the entity identification device 455 provided by the embodiments of the present invention as implemented as software modules, in some embodiments, as shown in fig. 3, the software modules stored in the entity identification device 455 of the memory 450 may include:

the information obtaining module 4551 is configured to obtain information to be processed and text information corresponding to the information to be processed, so as to obtain text information to be identified;

a basic identification module 4552, configured to perform entity identification on the text information to be identified by using at least one basic entity identification model, so as to obtain at least one corresponding identification result; the at least one basic entity recognition model refers to at least one mode for entity recognition;

the fused recognition module 4553 is configured to extract a text feature of the text information to be recognized by using a fused entity recognition model to obtain a target text feature, and perform feature construction on the at least one recognition result to obtain a basic model feature; carrying out entity recognition by combining the basic model features and the target text features to obtain a target recognition result; the fused entity recognition model is used for performing entity recognition on the text information to be recognized by using the at least one recognition result, the target text features refer to features of characters and character strings in the text information to be recognized, and the target recognition result is an entity recognition result of the information to be processed.

Further, the at least one base entity recognition model includes at least one of a dictionary recognition model, an artificial intelligence recognition model, and a new entity type recognition model; the dictionary identification model refers to a mode of carrying out entity identification by utilizing an entity dictionary based on historical entity types, the artificial intelligence identification model refers to a mode of carrying out entity identification by utilizing a network model based on historical entity types, the new entity type identification model refers to a mode of carrying out entity identification based on new entity types, and the new entity types comprise the historical entity types.

Further, the fusion recognition module 4553 is further configured to perform word segmentation processing on the text information to be recognized to obtain a character string sequence; extracting character text features corresponding to the current characters, so as to obtain character string text features corresponding to the current character string, and further obtain the target text features corresponding to the character string sequence; the current character string is any character string in the character string sequence, the current character is any character in the current character string, the character string text features comprise at least one character text feature, the target text features comprise at least one character string text feature, and the character text features comprise at least one of semantic features of the current character, position information of the current character in the current character string, semantic features of the current character string and part-of-speech information of the current character string.

Further, the fused recognition module 4553 is further configured to perform feature construction of a character entity type and feature construction of a character position on the current recognition result, respectively, to obtain a current character entity type feature and a current character position feature, so as to achieve obtaining of at least one current character entity type feature and at least one current character position feature corresponding to the at least one recognition result; the current identification result is any one of the at least one identification result; counting the voting times of each character in the text information to be recognized belonging to each entity type from the at least one recognition result to obtain voting characteristics; and fusing the at least one current character entity type characteristic, the at least one current character position characteristic and the voting characteristic to obtain the basic model characteristic.

Further, the fused recognition module 4553 is further configured to obtain an entity type corresponding to each character in the current recognition result, and obtain entity type information corresponding to the character; performing feature construction on the entity type information corresponding to the characters to obtain entity type features corresponding to the characters, so as to obtain the entity type features of the current characters corresponding to the current recognition result; the current character entity type characteristics comprise entity type characteristics corresponding to at least one character; acquiring the position of each character in the current recognition result in the corresponding character string to obtain the position information of the character corresponding to the character string; performing feature construction on the character string position information corresponding to the characters to obtain character string position features corresponding to the characters, so as to obtain the current character position features corresponding to the current recognition result; the current character position feature comprises at least one character corresponding character string position feature.

Further, when the at least one basic entity recognition model comprises a dictionary recognition model, the fused recognition module 4553 is further configured to obtain a dictionary recognition result from the at least one recognition result; the dictionary recognition result is a result of recognizing the text information to be recognized by the dictionary recognition model; and obtaining the length ratio of the entity length corresponding to each character in the dictionary recognition result to the text length to obtain at least one length ratio corresponding to the dictionary recognition result.

Correspondingly, the fusion identification module 4553 is further configured to fuse the at least one length ratio, the at least one current character entity type feature, the at least one current character position feature, and the voting feature to obtain the base model feature.

Further, the fusion recognition module 4553 is further configured to splice the basic model features and the target text features to obtain features to be recognized; carrying out semantic coding on the features to be identified to obtain a semantic coding result; and decoding the semantic coding result to realize entity recognition to obtain the target recognition result.

Further, the entity identifying transpose 455 further includes an application module 4554, configured to extract an entity and an entity type corresponding to the entity from the target identification result, so as to obtain information of the entity to be processed; in a preset resource library, performing information retrieval according to the entity information to be processed to obtain an information retrieval result; and the information retrieval result is information related to the information to be processed in the preset resource library.

Further, the entity identification transpose 455 further includes a block chain module 4555, configured to send the information to be processed and the target identification result to a block chain network, so that a node of the block chain network fills the information to be processed and the target identification result into a new block, and when the new block is identified in common, the new block is appended to a tail of the block chain to complete uplink.

Embodiments of the present invention provide a computer-readable storage medium storing executable instructions, which when executed by a processor, cause the processor to perform an entity identification method provided by embodiments of the present invention, for example, the entity identification method shown in fig. 4.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiments of the present invention, the obtained target recognition result is obtained by performing feature construction by combining at least one recognition result and the text feature corresponding to the text information to be recognized; therefore, on one hand, even if at least one recognition result has a wrong recognition result, the text features corresponding to the text information to be recognized can correct the wrong recognition result; on the other hand, the text features corresponding to the text information to be recognized and at least one recognition result are combined to perform feature construction to obtain the features to be recognized for entity recognition, so that the features to be recognized are rich, and the accuracy of the obtained target recognition result is high when the features to be recognized are subjected to entity recognition; in summary, the entity identification method provided by the embodiment of the invention improves the accuracy of entity identification.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. An entity identification method, comprising:

2. The method of claim 1, wherein the at least one base entity recognition model comprises at least one of a dictionary recognition model, an artificial intelligence recognition model, and a new entity type recognition model;

the dictionary identification model refers to a mode of carrying out entity identification by utilizing an entity dictionary based on historical entity types, the artificial intelligence identification model refers to a mode of carrying out entity identification by utilizing a network model based on historical entity types, the new entity type identification model refers to a mode of carrying out entity identification based on new entity types, and the new entity types comprise the historical entity types.

3. The method according to claim 1 or 2, wherein the extracting the text feature of the text information to be recognized to obtain the target text feature comprises:

performing word segmentation processing on the text information to be recognized to obtain a character string sequence;

extracting character text features corresponding to the current characters, so as to obtain character string text features corresponding to the current character string, and further obtain the target text features corresponding to the character string sequence;

the current character string is any character string in the character string sequence, the current character is any character in the current character string, the character string text features comprise at least one character text feature, the target text features comprise at least one character string text feature, and the character text features comprise at least one of semantic features of the current character, position information of the current character in the current character string, semantic features of the current character string and part-of-speech information of the current character string.

4. The method according to claim 1 or 2, wherein the performing feature construction on the at least one recognition result to obtain a base model feature comprises:

respectively carrying out character entity type feature construction and character position feature construction on the current recognition result to obtain a current character entity type feature and a current character position feature, thereby realizing the acquisition of at least one current character entity type feature and at least one current character position feature corresponding to the at least one recognition result; the current identification result is any one of the at least one identification result;

counting the voting times of each character in the text information to be recognized belonging to each entity type from the at least one recognition result to obtain voting characteristics;

and fusing the at least one current character entity type characteristic, the at least one current character position characteristic and the voting characteristic to obtain the basic model characteristic.

5. The method according to claim 4, wherein the performing feature construction of the character entity type and feature construction of the character position on the current recognition result respectively to obtain a current character entity type feature and a current character position feature comprises:

acquiring an entity type corresponding to each character in the current recognition result to obtain entity type information corresponding to the character;

performing feature construction on the entity type information corresponding to the characters to obtain entity type features corresponding to the characters, so as to obtain the entity type features of the current characters corresponding to the current recognition result; the current character entity type characteristics comprise entity type characteristics corresponding to at least one character;

acquiring the position of each character in the current recognition result in the corresponding character string to obtain the position information of the character corresponding to the character string;

performing feature construction on the character string position information corresponding to the characters to obtain character string position features corresponding to the characters, so as to obtain the current character position features corresponding to the current recognition result; the current character position feature comprises at least one character corresponding character string position feature.

6. The method of claim 4, wherein before said fusing said at least one current character entity type feature, said at least one current character position feature, and said voting feature to obtain said base model feature when said at least one base entity recognition model comprises a dictionary recognition model, said method further comprises:

obtaining a dictionary recognition result from the at least one recognition result; the dictionary recognition result is a result of recognizing the text information to be recognized by the dictionary recognition model;

acquiring the length ratio of the entity length corresponding to each character in the dictionary recognition result to the text length to obtain at least one length ratio corresponding to the dictionary recognition result;

correspondingly, the fusing the at least one current character entity type feature, the at least one current character position feature, and the voting feature to obtain the basic model feature includes:

and fusing the at least one length ratio, the at least one current character entity type characteristic, the at least one current character position characteristic and the voting characteristic to obtain the basic model characteristic.

7. The method according to claim 1 or 2, wherein the performing entity recognition by combining the base model features and the target text features to obtain a target recognition result comprises:

splicing the basic model features and the target text features to obtain features to be identified;

carrying out semantic coding on the features to be identified to obtain a semantic coding result;

and decoding the semantic coding result to realize entity recognition to obtain the target recognition result.

8. The method according to claim 1 or 2, wherein after the entity recognition is performed by combining the base model features and the target text features, and obtaining a target recognition result, the method further comprises:

extracting entities and entity types corresponding to the entities from the target identification result to obtain entity information to be processed;

in a preset resource library, performing information retrieval according to the entity information to be processed to obtain an information retrieval result; and the information retrieval result is information related to the information to be processed in the preset resource library.

9. The method according to claim 1 or 2, wherein after the entity recognition is performed by combining the base model features and the target text features, and obtaining a target recognition result, the method further comprises:

sending the information to be processed and the target identification result to a block chain network so as to enable the information to be processed and the target identification result to be transmitted to the block chain network

And the node of the block chain network fills the information to be processed and the target identification result into a new block, and when the new block is identified in a consistent manner, the new block is added to the tail part of the block chain to complete the uplink.

10. An entity identification device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 9 when executing executable instructions stored in the memory.