CN115146627B - Entity identification method, entity identification device, electronic equipment and storage medium - Google Patents

Entity identification method, entity identification device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115146627B
CN115146627B CN202210885929.1A CN202210885929A CN115146627B CN 115146627 B CN115146627 B CN 115146627B CN 202210885929 A CN202210885929 A CN 202210885929A CN 115146627 B CN115146627 B CN 115146627B
Authority
CN
China
Prior art keywords
entity
text
recognition
identification
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210885929.1A
Other languages
Chinese (zh)
Other versions
CN115146627A (en
Inventor
钟一心
王燕蒙
李剑锋
王少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210885929.1A priority Critical patent/CN115146627B/en
Publication of CN115146627A publication Critical patent/CN115146627A/en
Application granted granted Critical
Publication of CN115146627B publication Critical patent/CN115146627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The invention relates to a voice semantic technology, and discloses an entity identification method, which comprises the following steps: extracting the entity type to be identified in the identification request; an entity of the entity type to be identified in the text to be identified which is identified by the text to be identified by utilizing a target identification algorithm; masking processing is carried out on the first entity based on the entity symbols respectively to obtain an identification text; acquiring a corpus template for masking a second entity in a preset text based on entity symbols; calculating the similarity between the recognition text and each corpus template, and screening all the recognition texts by using the similarity to obtain target recognition texts; and carrying out subdivision entity recognition on the first entity by utilizing the target recognition text to obtain an entity recognition result. The present invention also relates to a blockchain technique, the first entity may be stored in a blockchain node. The invention also provides an entity identification device, equipment and medium. The invention can improve the accuracy of entity identification.

Description

Entity identification method, entity identification device, electronic equipment and storage medium
Technical Field
The present invention relates to speech semantic technology, and in particular, to a method and apparatus for entity identification, an electronic device, and a storage medium.
Background
Entity recognition technology is an important part of the field of natural language processing, and specific entities in text, such as names of people, cities, institutions, and the like, are recognized through the entity recognition technology.
However, as the application scenario of entity identification becomes finer (the ticket booking scenario is to identify not only the urban entity but also whether the identified urban entity is a departure city or a destination city), the entity identification gradually fails to meet the requirements, the existing entity identification can only identify general entities, the entities cannot be further subdivided and identified, and the accuracy of the entity identification is lower.
Disclosure of Invention
The invention provides an entity identification method, an entity identification device, electronic equipment and a storage medium, and mainly aims to improve the entity identification efficiency.
When an entity identification request is received, extracting the entity type to be identified in the entity identification request, and acquiring a text to be identified and an entity subdivision scene;
determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm;
identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm;
acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type;
based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
acquiring at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts;
and determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
Optionally, the screening all the recognition texts by using the similarity to obtain a target recognition text includes:
determining the maximum recognition similarity of each recognition text and each corpus template, and taking the maximum recognition similarity as an initial target recognition similarity;
taking the maximum value in the initial target similarity as the target similarity;
and taking the recognition text corresponding to the target similarity as the target recognition text.
Optionally, the calculating the similarity between each of the identified texts and each of the corpus templates includes:
word segmentation is carried out on the identification text to obtain a plurality of text words;
word segmentation is carried out on the corpus templates to obtain a plurality of template words;
judging whether each text word in the recognized text from the first text word is the same as the template word in the corresponding sequence in the corpus template, and determining the word sequence correlation coefficient corresponding to each text word in the recognized text according to a judging result;
calculating word relativity of the recognition text and the corpus template according to the word order correlation coefficient;
acquiring the standard word correlation degree of the identification text;
and calculating the ratio of the word relativity to the standard word relativity to obtain the similarity.
Optionally, the calculating the similarity between each of the identified texts and each of the corpus templates includes:
converting the identification text into a vector to obtain an identification text vector;
converting the corpus template into a vector to obtain a corpus template vector;
and calculating the vector distance between the recognition text vector and the corpus template vector to obtain the recognition similarity.
Optionally, the converting the recognized text into a vector to obtain a recognized text vector includes:
converting each character in the identification text into a vector to obtain a character vector;
and combining all the character vectors according to the sequence of the corresponding characters in the recognition text to obtain the recognition text vector.
In order to solve the above problems, the present invention also provides an entity recognition apparatus, including:
the system comprises an identification text construction module, a recognition text segmentation module and a recognition text segmentation module, wherein the identification text construction module is used for extracting the type of an entity to be identified in an entity identification request when the entity identification request is received, and acquiring the text to be identified and an entity segmentation scene; determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm; identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm; acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type; based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
the corpus template acquisition module is used for acquiring at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
the entity subdivision recognition module is used for calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts; and determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
Optionally, the calculating the similarity between the recognition text and each corpus template includes:
converting the identification text into a vector to obtain an identification text vector;
converting the corpus template into a vector to obtain a corpus template vector;
and calculating the vector distance between the recognition text vector and the corpus template vector to obtain the recognition similarity.
Optionally, the converting the recognized text into a vector to obtain a recognized text vector includes:
converting each character in the identification text into a vector to obtain a character vector;
and combining all the character vectors according to the sequence of the corresponding characters in the recognition text to obtain the recognition text vector.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one computer program; a kind of electronic device with high-pressure air-conditioning system
And a processor executing the computer program stored in the memory to implement the entity recognition method.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned entity identification method.
According to the embodiment of the invention, the first entity corresponding to the entity type to be identified in the text to be identified is identified through the target identification algorithm; based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text; calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts; and determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text. On the basis of identifying the entity of the text to be identified, the subdivided entity type corresponding to the entity is further identified, and the accuracy of entity identification is improved. Therefore, the entity identification method, the entity identification device, the electronic equipment and the readable storage medium provided by the embodiment of the invention improve the accuracy of entity identification.
Drawings
FIG. 1 is a flow chart of an entity identification method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of an entity identification device according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing an entity identification method according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides an entity identification method. The entity identification method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the entity recognition method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: the server can be an independent server, or can be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), basic cloud computing services such as big data and artificial intelligent platforms, and the like.
Referring to fig. 1, a flowchart of an entity identification method according to an embodiment of the present invention is shown, where in the embodiment of the present invention, the entity identification method includes:
s1, when an entity identification request is received, extracting the entity type to be identified in the entity identification request, and acquiring a text to be identified and an entity subdivision scene;
in the embodiment of the present invention, the entity identification request is a request for identifying a first entity in the text to be identified and further subdividing the identified first entity according to an entity subdivision scenario, where the first entity is an entity corresponding to the type of the entity to be identified, such as: the "city" is a scene in which the first entity of the entity type to be identified in the identified text is required to be further identified by entity subdivision, for example: the entity type to be identified is a first entity of a city, and the entity subdivision scene is an entity subdivision scene for judging whether the subdivision entity type of the first entity of the identified city is a departure city or a destination city.
S2, determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm;
in the embodiment of the invention, the preset entity algorithm library contains an entity recognition algorithm corresponding to the entity type, and the entity recognition algorithm is a regular recognition algorithm or a model recognition algorithm.
Specifically, in the embodiment of the present invention, determining, as a target recognition algorithm, an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, including:
acquiring an entity identification algorithm corresponding to each entity type in the entity identification algorithm library;
and determining an entity recognition algorithm corresponding to the entity type to be recognized as the target recognition algorithm.
S3, recognizing a first entity corresponding to the entity type to be recognized in the text to be recognized through the target recognition algorithm;
in the embodiment of the invention, the entity coarse recognition is carried out on the text to be recognized through the target recognition algorithm, so as to obtain a first entity corresponding to the entity type to be recognized in the text to be recognized.
For example: the entity type to be identified is a city, the text to be identified is open sky going to Shanghai, and then the first entity of the entity type to be identified in the text to be identified is Shanghai.
In another embodiment of the present invention, the first entity may be stored in a blockchain node, so as to improve the data access efficiency by using the high throughput characteristic of the blockchain node.
S4, acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type;
in the embodiment of the invention, the subdivided entity type is an entity type of the entity subdivision scene for subdividing the entity, for example, the entity subdivision scene is a scene for subdividing the first entity of the city, and the subdivided entity type is a departure city or a destination city. The entity symbols are sign symbols corresponding to the subdivided entity types, and the entity symbols are in one-to-one correspondence with the subdivided entity types.
S5, based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
in the embodiment of the invention, in order to remove the influence of the first entity on the text semantics to be identified, different mask processing is performed on the first entity based on the entity symbol, so as to obtain at least one identification text.
In detail, in the embodiment of the present invention, based on the entity symbol, different mask processing is performed on the first entity to obtain at least one recognition text, including:
and respectively replacing the first entity with each entity symbol to obtain at least one identification text.
For example: the text to be identified is "tomorrow goes to Shanghai", wherein the entity is "Shanghai", the subdivided entity type is "departure city" or "destination city", the entity symbol corresponding to "departure city" is "target", the entity symbol corresponding to "destination city" is "mask", then two entity symbols are shared by "target" and "mask", then the first entity is replaced by each entity symbol respectively, at least one identification text is obtained, and two identification texts of "tomorrow goes to target" and "tomorrow goes to mask" can be obtained.
S6, obtaining at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
the corpus template in the embodiment of the invention is an example text with entity differences of the entity types to be identified eliminated.
Specifically, in the embodiment of the present invention, the corpus template is text after performing different mask processing on the second entity in the preset text based on the entity symbol
In detail, the preset text is a preset example text with the same type as the text to be identified, a second entity corresponding to the subdivided entity type is marked in the preset text, the second entity is an entity corresponding to the entity type to be identified in the preset text, and further, the second entity in the preset text is replaced by an entity symbol corresponding to the subdivided entity type corresponding to the second entity respectively, so that the corpus template is obtained.
For example: the preset text is 'Beijing me flying from Beijing to Shanghai', the second entities in the preset text are 'Beijing' and 'Shanghai', the subdivided entity types corresponding to 'Beijing' are 'departure cities', 'Shanghai' are 'destination cities', the entity symbol 'target' represents the departure cities, the entity symbol 'mask' represents the destination cities, and the second entities in the preset text are respectively replaced with the entity symbol corresponding to the subdivided entity type corresponding to the second entity, so that the corpus template is 'the flying of the tomorrow me from the target to the mask'.
S7, calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by using the recognition similarity to obtain target recognition texts;
in the embodiment of the invention, the calculating the similarity between the recognition text and each corpus template comprises the following steps:
converting the identification text into a vector to obtain an identification text vector;
converting the corpus template into a vector to obtain a corpus template vector;
and calculating the vector distance between the recognition text vector and the corpus template vector to obtain the recognition similarity.
Further, in an embodiment of the present invention, converting the recognition text into a vector to obtain the recognition text vector includes:
converting each character in the identification text into a vector to obtain a character vector;
and combining all the character vectors according to the sequence of the corresponding characters in the recognition text to obtain the recognition text vector.
Specifically, in the embodiment of the present invention, combining all the character vectors according to the sequence of the corresponding characters in the recognition text to obtain the recognition text vector includes:
and connecting all the character vectors first according to the sequence of the corresponding characters in the identification text to obtain the identification text vector.
In the embodiment of the invention, the characters are Chinese characters or foreign words.
In another embodiment of the present invention, combining all the character vectors according to the order of the corresponding characters in the recognized text to obtain the recognized text vector includes:
and combining the maximum values of vector elements in all the character vectors according to the sequence of the characters corresponding to the character vector in the identification text to obtain the identification text vector.
For example: the identification text has two characters in total, and the character vector corresponding to the first character is
Figure BDA0003765851970000071
The maximum value of vector elements in the character vector is 2; the character vector corresponding to the second character is +.>
Figure BDA0003765851970000072
The maximum value of vector elements in the character vector is 4; the vector elements 2 and 4 are identified according to the character corresponding to the character vectorSequential combination in text, obtaining the recognition text vector as +.>
Figure BDA0003765851970000081
In another embodiment of the present invention, the calculating the similarity between the recognized text and each corpus template includes:
word segmentation is carried out on the identification text to obtain a plurality of text words;
word segmentation is carried out on the corpus templates to obtain a plurality of template words;
judging whether each text word in the recognized text from the first text word is the same as the template word in the corresponding sequence in the corpus template, and determining the word sequence correlation coefficient corresponding to each text word in the recognized text according to a judging result;
in detail, in the embodiment of the present invention, when the judgment results are the same, the word order correlation coefficient of the corresponding text word is 1, and when the judgment results are different, the word order correlation coefficient of the corresponding text word is 0.
Calculating word relativity of the recognition text and the corpus template according to the word order correlation coefficient;
acquiring the standard word correlation degree of the identification text;
in the embodiment of the invention, whether the words in the same sequence in the two texts are the same or not is measured through the word relativity so as to judge the similarity of the texts.
And calculating the ratio of the word relativity to the standard word relativity to obtain the identification similarity.
Specifically, in the embodiment of the invention, the term relevance is calculated by using the following formula;
Figure BDA0003765851970000082
wherein i is the order of text words in the identified text, c i For the word order phase of the text words with the order of i in the identified textThe relation coefficient, p is the number of text words in the identified text, y p The word relativity;
in detail, the embodiment of the invention calculates the standard word relevancy by using the following formula;
Figure BDA0003765851970000083
wherein x is a preset standard word order coefficient, y b And the standard word relevancy.
Specifically, in the embodiment of the present invention, the standard word relevancy may be understood as a word relevancy between the recognition text and the recognition text, and the standard word order coefficient may be 1.
Further, in the embodiment of the present invention, screening all the recognition texts by using the similarity to obtain a target recognition text includes:
determining the maximum recognition similarity of each recognition text and each corpus template, and taking the maximum recognition similarity as an initial target similarity;
taking the maximum value in the identification similarity as a target similarity;
and taking the recognition text corresponding to the target similarity as a target recognition text.
Further, in the embodiment of the present invention, the text to be identified corresponding to the target similarity is determined to be a target identification text, including:
judging whether the number of the target similarities is 1;
when the number of the target similarities is 1, taking the text to be identified corresponding to the target similarities as a target identification text;
when the number of the target similarities is not 1, taking the text to be identified corresponding to the target similarities as an initial target identification text;
and sequencing all the initial target recognition texts according to a preset sequencing rule, and taking the initial target recognition texts ranked at the first position as the target recognition texts.
The sorting rule in the embodiment of the invention can sort the first letter of pinyin or strokes of the first character of the text according to the initial target recognition, the type of the sorting rule is not limited, and the sorting result according to the sorting rule has uniqueness.
For example, there are 3 corpus templates, n recognition texts;
the similarity between the recognition text 1 and the 3 corpus templates is respectively 0.1, 0.3 and 0.9
The similarity between the recognition text 2 and the 3 corpus templates is respectively 0.2, 0.3 and 0.8
The similarity between the recognition text n and the 3 corpus templates is respectively 0.15, 0.9 and 0.7.
And then, two identification texts corresponding to the maximum similarity of 0.9 are ranked according to a preset ranking rule, and the identification text ranked at the first position is used as the target identification text.
Further, in the embodiment of the invention, the target recognition text is screened from all the recognition texts, and the most similarity between the target recognition text and the corpus template is explained, so that the possibility that the semantic expression in the recognition texts processed by the target recognition text is correct is judged to be the greatest, therefore, the mask processing corresponding to the target recognition text is considered to be correct mask processing, and the final entity recognition result of the text to be recognized is determined according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
For example: the corpus template is a text expression of correct semantics, and the higher the similarity between the recognition text and the corpus template is, the higher the correctness of the semantic expression of the classification recognition template is, namely that the entity symbol replacement in the recognition text is correct; the corpus template is 'tomorrow mask', the identification text is 'today' and 'today's flying mask ', the' today's flying mask' identifies the semantics of flying to the destination city, the 'today's flying target 'represents the semantics of flying to the departure city, the' today's flying mask' has the highest similarity with the 'tomorrow' mask ', and the entity symbol replacement of the' today's flying mask' is correct.
The method for converting the corpus template into the vector in the embodiment of the present invention is similar to the method for converting the text to be identified into the vector, and will not be described in detail herein.
S8, determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
For example: the target identification text is 'tomorrow unmask', wherein the subdivided entity type corresponding to the entity symbol 'mask' is 'destination city', the first entity replaced by the entity symbol 'mask' in the target identification text is 'Shanghai', and the final entity identification result of the text to be identified is 'tomorrow going to Shanghai (destination city)'.
Further, in the embodiment of the present invention, the target recognition text is used to perform subdivision entity recognition on the first entity, so as to recognize a subdivision entity type corresponding to the first entity, and after a final entity recognition result is obtained, the final entity recognition result may also be sent to a terminal device and/or a preset terminal device that issues the entity recognition request, where the terminal device includes: intelligent terminals such as mobile phones, computers, tablets and the like.
As shown in fig. 2, a functional block diagram of the entity recognition device of the present invention is shown.
The entity recognition apparatus 100 of the present invention may be installed in an electronic device. Depending on the implemented functions, the entity recognition means may comprise a recognition text construction module 101, a corpus template acquisition module 102, a subdivision entity recognition module 103, which may also be referred to as a unit, refers to a series of computer program segments capable of being executed by a processor of an electronic device and of performing a fixed function, which are stored in a memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the recognition text construction module 101 is configured to extract a type of an entity to be recognized in an entity recognition request when the entity recognition request is received, and obtain a text to be recognized and an entity subdivision scene; determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm; identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm; acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type; based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
the corpus template obtaining module 102 is configured to obtain at least one corpus template, where the corpus template is a text obtained by performing different mask processing on a second entity in a preset text based on the entity symbol;
the entity subdivision recognition module 103 is configured to calculate recognition similarity between each recognition text and each corpus template, and screen all the recognition texts by using the recognition similarity to obtain target recognition texts; and determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
In detail, each module in the entity identification device 100 in the embodiment of the present invention adopts the same technical means as the entity identification method described in fig. 1 and can produce the same technical effects when in use, and will not be described herein.
Fig. 3 is a schematic structural diagram of an electronic device for implementing the entity identification method according to the present invention.
The electronic device may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a solid state identification program.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in an electronic device and various types of data, such as codes of entity recognition programs, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and processes data by running or executing programs or modules (e.g., entity recognition programs, etc.) stored in the memory 11, and calling data stored in the memory 11.
The communication bus 12 may be a peripheral component interconnect standard (perIPheral component interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure classification circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.
Optionally, the communication interface 13 may comprise a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the communication interface 13 may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The entity identification program stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, which, when run in the processor 10, can implement:
when an entity identification request is received, extracting the entity type to be identified in the entity identification request, and acquiring a text to be identified and an entity subdivision scene;
determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm;
identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm;
acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type;
based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
acquiring at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts;
and determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the electronic device integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
Embodiments of the present invention may also provide a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, may implement:
when an entity identification request is received, extracting the entity type to be identified in the entity identification request, and acquiring a text to be identified and an entity subdivision scene;
determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm;
identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm;
acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type;
based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
acquiring at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts;
and determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text.
Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (9)

1. A method of entity identification, the method comprising:
when an entity identification request is received, extracting the entity type to be identified in the entity identification request, and acquiring a text to be identified and an entity subdivision scene;
determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm;
identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm;
acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type;
based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
acquiring at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts;
determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text;
screening all the identification texts by using the identification similarity to obtain target identification texts, wherein the method comprises the following steps:
determining the maximum recognition similarity of each recognition text and each corpus template, and taking the maximum recognition similarity as an initial target similarity;
taking the maximum value in the initial target similarity as the target similarity;
and taking the recognition text corresponding to the target similarity as the target recognition text.
2. The method of entity recognition according to claim 1, wherein said calculating a recognition similarity of each of the recognition texts to each of the corpus templates includes:
word segmentation is carried out on the identification text to obtain a plurality of text words;
word segmentation is carried out on the corpus templates to obtain a plurality of template words;
judging whether each text word in the recognized text from the first text word is the same as the template word in the corresponding sequence in the corpus template, and determining the word sequence correlation coefficient corresponding to each text word in the recognized text according to a judging result;
calculating word relativity of the recognition text and the corpus template according to the word order correlation coefficient;
acquiring the standard word correlation degree of the identification text;
and calculating the ratio of the word relativity to the standard word relativity to obtain the identification similarity.
3. The method of entity recognition according to claim 1, wherein said calculating a recognition similarity of each of the recognition texts to each of the corpus templates includes:
converting the identification text into a vector to obtain an identification text vector;
converting the corpus template into a vector to obtain a corpus template vector;
and calculating the vector distance between the recognition text vector and the corpus template vector to obtain the recognition similarity.
4. The entity recognition method of claim 3, wherein said converting the recognized text into a vector results in a recognized text vector, comprising:
converting each character in the identification text into a vector to obtain a character vector;
and combining all the character vectors according to the sequence of the corresponding characters in the recognition text to obtain the recognition text vector.
5. An entity identification device, comprising:
the system comprises an identification text construction module, a recognition text segmentation module and a recognition text segmentation module, wherein the identification text construction module is used for extracting the type of an entity to be identified in an entity identification request when the entity identification request is received, and acquiring the text to be identified and an entity segmentation scene; determining an entity recognition algorithm corresponding to the entity type to be recognized from a preset entity recognition algorithm library, and taking the entity recognition algorithm as a target recognition algorithm; identifying a first entity corresponding to the entity type to be identified in the text to be identified through the target identification algorithm; acquiring all subdivision entity types corresponding to the entity subdivision scene and entity symbols corresponding to each subdivision entity type; based on the entity symbols, respectively carrying out different mask processing on the first entity to obtain at least one identification text;
the corpus template acquisition module is used for acquiring at least one corpus template, wherein the corpus template is a text obtained by respectively carrying out different mask processing on a second entity in a preset text based on the entity symbol;
the entity subdivision recognition module is used for calculating the recognition similarity between each recognition text and each corpus template, and screening all the recognition texts by utilizing the recognition similarity to obtain target recognition texts; determining a final entity recognition result of the text to be recognized according to the subdivided entity type corresponding to the entity symbol in the target recognition text;
screening all the identification texts by using the identification similarity to obtain target identification texts, wherein the method comprises the following steps:
determining the maximum recognition similarity of each recognition text and each corpus template, and taking the maximum recognition similarity as an initial target similarity;
taking the maximum value in the initial target similarity as the target similarity;
and taking the recognition text corresponding to the target similarity as the target recognition text.
6. The entity recognition apparatus of claim 5, wherein said calculating a recognition similarity of each of the recognition texts to each of the corpus templates comprises:
converting the identification text into a vector to obtain an identification text vector;
converting the corpus template into a vector to obtain a corpus template vector;
and calculating the vector distance between the recognition text vector and the corpus template vector to obtain the recognition similarity.
7. The entity recognition apparatus of claim 6, wherein said converting the recognized text into a vector results in a recognized text vector, comprising:
converting each character in the identification text into a vector to obtain a character vector;
and combining all the character vectors according to the sequence of the corresponding characters in the recognition text to obtain the recognition text vector.
8. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the entity identification method of any one of claims 1 to 4.
9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the entity identification method of any one of claims 1 to 4.
CN202210885929.1A 2022-07-26 2022-07-26 Entity identification method, entity identification device, electronic equipment and storage medium Active CN115146627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210885929.1A CN115146627B (en) 2022-07-26 2022-07-26 Entity identification method, entity identification device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210885929.1A CN115146627B (en) 2022-07-26 2022-07-26 Entity identification method, entity identification device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115146627A CN115146627A (en) 2022-10-04
CN115146627B true CN115146627B (en) 2023-05-02

Family

ID=83413436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210885929.1A Active CN115146627B (en) 2022-07-26 2022-07-26 Entity identification method, entity identification device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115146627B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256795B1 (en) * 2013-03-15 2016-02-09 A9.Com, Inc. Text entity recognition
GB2569953A (en) * 2017-12-30 2019-07-10 Innoplexus Ag Method and system for extracting entity information from target data
CN111241839A (en) * 2020-01-16 2020-06-05 腾讯科技(深圳)有限公司 Entity identification method, entity identification device, computer readable storage medium and computer equipment
CN112668333A (en) * 2019-10-15 2021-04-16 华为技术有限公司 Named entity recognition method and device, and computer-readable storage medium
CN112800769A (en) * 2021-02-20 2021-05-14 深圳追一科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN113011186A (en) * 2021-01-25 2021-06-22 腾讯科技(深圳)有限公司 Named entity recognition method, device, equipment and computer readable storage medium
JP2021197179A (en) * 2020-06-11 2021-12-27 株式会社リコー Entity identification method, device, and computer readable storage medium
CN114330353A (en) * 2022-01-06 2022-04-12 腾讯科技(深圳)有限公司 Entity identification method, device, equipment, medium and program product of virtual scene

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256795B1 (en) * 2013-03-15 2016-02-09 A9.Com, Inc. Text entity recognition
GB2569953A (en) * 2017-12-30 2019-07-10 Innoplexus Ag Method and system for extracting entity information from target data
CN112668333A (en) * 2019-10-15 2021-04-16 华为技术有限公司 Named entity recognition method and device, and computer-readable storage medium
CN111241839A (en) * 2020-01-16 2020-06-05 腾讯科技(深圳)有限公司 Entity identification method, entity identification device, computer readable storage medium and computer equipment
JP2021197179A (en) * 2020-06-11 2021-12-27 株式会社リコー Entity identification method, device, and computer readable storage medium
CN113011186A (en) * 2021-01-25 2021-06-22 腾讯科技(深圳)有限公司 Named entity recognition method, device, equipment and computer readable storage medium
CN112800769A (en) * 2021-02-20 2021-05-14 深圳追一科技有限公司 Named entity recognition method and device, computer equipment and storage medium
CN114330353A (en) * 2022-01-06 2022-04-12 腾讯科技(深圳)有限公司 Entity identification method, device, equipment, medium and program product of virtual scene

Also Published As

Publication number Publication date
CN115146627A (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN112597312A (en) Text classification method and device, electronic equipment and readable storage medium
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN112883190A (en) Text classification method and device, electronic equipment and storage medium
CN113157927B (en) Text classification method, apparatus, electronic device and readable storage medium
CN113033198B (en) Similar text pushing method and device, electronic equipment and computer storage medium
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN112988963A (en) User intention prediction method, device, equipment and medium based on multi-process node
CN112883730A (en) Similar text matching method and device, electronic equipment and storage medium
CN113360768A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN113344125B (en) Long text matching recognition method and device, electronic equipment and storage medium
CN113887941A (en) Business process generation method and device, electronic equipment and medium
CN113868528A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN115146627B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN113515591B (en) Text defect information identification method and device, electronic equipment and storage medium
CN115146064A (en) Intention recognition model optimization method, device, equipment and storage medium
CN114385815A (en) News screening method, device, equipment and storage medium based on business requirements
CN113626605A (en) Information classification method and device, electronic equipment and readable storage medium
CN113822049B (en) Address auditing method, device, equipment and storage medium based on artificial intelligence
CN115221875B (en) Word weight generation method, device, electronic equipment and storage medium
CN111680513B (en) Feature information identification method and device and computer readable storage medium
CN113704478B (en) Text element extraction method, device, electronic equipment and medium
CN113627187A (en) Named entity recognition method and device, electronic equipment and readable storage medium
CN113486266A (en) Page label adding method, device, equipment and storage medium
CN115221274A (en) Text emotion classification method and device, electronic equipment and storage medium
CN116579349A (en) Text semantic segmentation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant