CN117521653A - Entity identification method, device and storage medium - Google Patents

Entity identification method, device and storage medium Download PDF

Info

Publication number
CN117521653A
CN117521653A CN202210911024.7A CN202210911024A CN117521653A CN 117521653 A CN117521653 A CN 117521653A CN 202210911024 A CN202210911024 A CN 202210911024A CN 117521653 A CN117521653 A CN 117521653A
Authority
CN
China
Prior art keywords
boundary
entity
candidate entity
character
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210911024.7A
Other languages
Chinese (zh)
Inventor
王军伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202210911024.7A priority Critical patent/CN117521653A/en
Publication of CN117521653A publication Critical patent/CN117521653A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The present disclosure relates to an entity identification method, apparatus, and storage medium. The method comprises the following steps: inputting the text to be identified into an entity identification model to obtain at least one candidate entity; word segmentation is carried out on sentences of the candidate entity in the text to be identified, and word segmentation results are obtained; determining whether boundary characters of candidate entities meet preset boundary conditions according to word segmentation results; and taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition. After at least one candidate entity is obtained through the entity recognition model, the candidate entity obtained through the entity recognition model can be subjected to secondary judgment, so that the problem of low entity recognition accuracy caused by insufficient corpus quantity or quality during training of the entity recognition model is effectively avoided, and the accuracy of the determined target entity is improved.

Description

Entity identification method, device and storage medium
Technical Field
The present disclosure relates to the field of natural language processing, and in particular, to a method, an apparatus, and a storage medium for entity identification.
Background
Entity identification is widely applied to tasks such as information extraction, information retrieval, information recommendation and the like as an important step in a natural language processing process. Disambiguation is an important step in the process of entity identification.
The existing entity disambiguation method is mainly performed by a model method based on sequence labeling, classification, clustering and the like, and has the advantages of high automation degree and strong model generalization capability; the method has the disadvantages that a large amount of corpus is manually marked for model training, the process consumes a large amount of time, the quality requirement on the corpus is high, and if conflict training effects exist between the corpora, the effect is reduced. In addition, the current model is basically based on deep learning, the whole process is a black box, and the result is difficult to control.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides an entity recognition method, apparatus, and storage medium.
According to a first aspect of an embodiment of the present disclosure, there is provided an entity identification method, including:
inputting the text to be identified into an entity identification model to obtain at least one candidate entity;
performing word segmentation on sentences of the candidate entity in the text to be identified to obtain word segmentation results;
determining whether the boundary characters of the candidate entity meet preset boundary conditions according to the word segmentation result;
and taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition.
Optionally, the determining, according to the word segmentation result, whether the boundary character of the candidate entity meets a preset boundary condition includes:
determining whether boundary characters of the candidate entity are in other word segmentation according to the word segmentation result, wherein the other word segmentation is not in the candidate entity;
and under the condition that the boundary characters of the candidate entity are not in other word segmentation, determining that the boundary characters of the candidate entity meet the preset boundary conditions.
Optionally, the method further comprises:
and under the condition that the boundary character of the candidate entity is in other word segmentation, determining whether the boundary character of the candidate entity meets a preset boundary condition according to the boundary character and the character of the boundary character which is adjacent to the boundary character in other word segmentation.
Optionally, the word segmentation result includes a word segmentation in a sentence and a part of speech of the word segmentation, and determining whether the boundary character of the candidate entity meets a preset boundary condition according to the boundary character and characters of the boundary character adjacent to each other in other word segmentation includes:
if the parts of speech of the boundary characters are the same as the parts of speech of the adjacent characters and the character strings formed by the two can independently form words, determining that the boundary characters of the candidate entity do not meet the preset boundary conditions.
Optionally, the determining, according to the word segmentation result, whether the boundary character of the candidate entity meets a preset boundary condition includes:
performing word segmentation processing on the candidate entity, and determining a first possibility of word part matching of the word segmentation in the candidate entity according to the part of speech of the word segmentation in the candidate entity;
determining a second possibility of collocation of the word part of the word to which the boundary character belongs and a target word part of the word adjacent to the word part of the word, wherein the target word does not belong to the word part in the candidate entity;
and determining whether the boundary characters of the candidate entity meet a preset boundary condition according to the comparison result of the first possibility and the second possibility.
Optionally, the determining whether the boundary character of the candidate entity meets a preset boundary condition according to the comparison result of the first possibility and the second possibility includes:
and if the first possibility is larger than the second possibility, determining that the boundary characters of the candidate entity meet the preset boundary conditions.
Optionally, the determining whether the boundary character of the candidate entity meets a preset boundary condition according to the comparison result of the first possibility and the second possibility includes:
and if the first possibility is smaller than the second possibility, determining that the boundary characters of the candidate entity do not meet the preset boundary conditions.
According to a second aspect of embodiments of the present disclosure, there is provided an entity identification apparatus, comprising:
the acquisition module is used for inputting the text to be identified into the entity identification model to obtain at least one candidate entity;
the word segmentation module is used for segmenting sentences of the candidate entity in the text to be identified to obtain word segmentation results;
the first determining module is used for determining whether the boundary characters of the candidate entity meet the preset boundary conditions according to the word segmentation result;
and the second determining module is used for taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition.
Optionally, the first determining module includes:
the first determining submodule is used for determining whether boundary characters of the candidate entity are in other word segmentation according to the word segmentation result, wherein the other word segmentation is not in the candidate entity;
and the second determining submodule is used for determining that the boundary characters of the candidate entity meet the preset boundary conditions under the condition that the boundary characters of the candidate entity are not in other segmentation words.
Optionally, the first determining module includes:
and a third determining sub-module, configured to determine, if the boundary character of the candidate entity meets a preset boundary condition according to the boundary character and a character of the boundary character adjacent to the boundary character in other word segments.
Optionally, the third determining sub-module includes:
and a fourth determining sub-module, configured to determine that the boundary character of the candidate entity does not meet the preset boundary condition if the part of speech of the boundary character is the same as that of the immediately adjacent character and the character string formed by the two can be independently formed into a word.
Optionally, the first determining module includes:
a fifth determining submodule, configured to perform word segmentation processing on the candidate entity, and determine a first possibility of word part of speech collocation of the word segmentation in the candidate entity according to the part of speech of the word segmentation in the candidate entity;
a sixth determining submodule, configured to determine a second possibility of collocation of the word part of speech of the word to which the boundary character belongs and a target word part of speech of a word adjacent to the word, where the target word does not belong to a word in the candidate entity;
a seventh determining submodule, configured to determine, according to a comparison result of the first likelihood and the second likelihood, whether a boundary character of the candidate entity meets a preset boundary condition.
Optionally, the seventh determining submodule includes:
an eighth determining submodule, configured to determine that the boundary character of the candidate entity meets the preset boundary condition if the first likelihood is greater than the second likelihood.
Optionally, the seventh determining submodule includes:
and a ninth determining submodule, configured to determine that the boundary character of the candidate entity does not meet the preset boundary condition if the first likelihood is less than the second likelihood.
According to a third aspect of embodiments of the present disclosure, there is provided an entity identification apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the entity identification method provided by the first aspect of the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the entity identification method provided by the first aspect of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
after at least one candidate entity is obtained through the entity recognition model, the sentence of the candidate entity in the text to be recognized can be segmented, and whether the boundary characters of the candidate entity meet the preset boundary conditions is determined according to the segmentation result; and taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition. Therefore, the candidate entity obtained through the entity recognition model can be subjected to secondary judgment, and the problem of low entity recognition accuracy caused by insufficient corpus quantity or quality during entity recognition model training is effectively avoided, so that the accuracy of the determined target entity is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flow chart illustrating a method of entity identification according to an exemplary embodiment.
Fig. 2 is a schematic diagram of whole sentence segmentation, shown according to an exemplary embodiment.
Fig. 3 is a block diagram illustrating an entity identification device according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating an entity recognition apparatus according to an exemplary embodiment.
Fig. 5 is a block diagram illustrating an entity recognition apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, all actions for acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
Fig. 1 is a flow chart illustrating a method of entity identification according to an exemplary embodiment. As shown in fig. 1, the method may include S101 to S104.
S101, inputting the text to be recognized into an entity recognition model to obtain at least one candidate entity.
For example, the entity recognition model may be built based on an entity dictionary, which may be a bert+lstm+crf model or an lstm+crf model, including a plurality of entity names. Taking the text to be identified as a 'how a certain mobile phone is sold for 22 years' as an example, inputting the text to be identified into an entity identification model, and obtaining candidate entities of 'certain mobile phone' and 'certain mobile phone 2'. Steps S102 to S104 are performed on each candidate entity, respectively, to determine the rationality of each candidate entity, and the target entity is determined from at least one candidate entity.
S102, word segmentation is carried out on sentences of the candidate entity in the text to be recognized, and a word segmentation result is obtained.
Illustratively, as described above, the sentences in which the candidate entities "some cell phone" and "some cell phone 2" are in the text to be identified are "how some cell phone is sold for 22 years", and the sentence is segmented, and as shown in fig. 2, the segmentation results include "some", "cell phone", "22", "year", "sale", "situation" and "how.
S103, determining whether the boundary characters of the candidate entity meet the preset boundary conditions according to the word segmentation result.
Taking a candidate entity as "a certain mobile phone 2" as an example, the boundary character of the candidate entity is "2"; by comparing the word segmentation result with the boundary character "2" of the candidate entity, it can be determined whether the boundary character of the candidate entity satisfies the preset boundary condition. Taking a candidate entity as a certain mobile phone as an example, the boundary character of the candidate entity is a machine; whether the boundary character of the candidate entity meets the preset boundary condition can be determined by comparing the word segmentation result with the boundary character machine of the candidate entity. For example, it may be determined whether the boundary characters of the candidate entity satisfy the preset boundary condition according to the part of speech of the boundary characters of the candidate entity and the part of speech of the segmentation word.
S104, taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition.
For example, in the case where the boundary character of the candidate character satisfies the preset boundary condition, the candidate entity may be output as the target entity; and deleting the candidate entity as an ambiguous entity under the condition that the boundary characters of the candidate characters do not meet the preset boundary conditions.
According to the technical scheme, after at least one candidate entity is obtained through the entity recognition model, the sentences of the candidate entity in the text to be recognized can be segmented, and whether the boundary characters of the candidate entity meet the preset boundary conditions is determined according to the segmentation result; and taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition. Therefore, the candidate entity obtained through the entity recognition model can be subjected to secondary judgment, and the problem of low entity recognition accuracy caused by insufficient corpus quantity or quality during entity recognition model training is effectively avoided, so that the accuracy of the determined target entity is improved.
Optionally, in S103, determining whether the boundary character of the candidate entity meets the preset boundary condition according to the word segmentation result may include:
determining whether boundary characters of the candidate entity are in other word segmentation according to the word segmentation result;
and under the condition that the boundary characters of the candidate entity are not in other segmentation words, determining that the boundary characters of the candidate entity meet the preset boundary conditions.
Wherein other tokens are not in the candidate entity.
Illustratively, as described above, the candidate entities are "some cell phone" and "some cell phone 2", the other word segments include "22", "year", "sale", "case" and "how", taking the candidate entity "some cell phone" as an example, the boundary character is "machine", and the boundary character is not in the other word segments, it may be determined that the candidate entity "some cell phone" does not collide with the word segment result, and there is no segmented word in the candidate entity, that is, there is no possibility that some word segment is erroneously segmented to generate the candidate entity. Therefore, in the case that the boundary character of the candidate entity is not in the other word segmentation, it may be determined that the boundary character of the candidate entity satisfies the preset boundary condition.
Optionally, the entity identification method provided by the present disclosure may further include:
and under the condition that the boundary characters of the candidate entity are in other word segmentation, determining whether the boundary characters of the candidate entity meet the preset boundary conditions according to the boundary characters and the characters of the boundary characters adjacent to each other in other word segmentation.
Illustratively, taking the candidate entity "some cell phone 2" as an example, the other words include "22", "year", "sale", "condition", and "how", the boundary character is "2", where the boundary character is in the other word "22", it may be determined that the candidate entity "some cell phone 2" conflicts with the word segmentation result, and the segmented word "22" exists in the candidate entity, and it may be determined whether the boundary character "2" of the candidate entity satisfies the preset boundary condition according to the boundary character "2" ("2" preceding in "22") and the character "2" ("2" following in "22") of the boundary character in the other word.
The word segmentation result includes word segmentation in sentences and part of speech of the word segmentation, and determining whether boundary characters of candidate entities meet preset boundary conditions according to boundary characters and characters of the boundary characters adjacent to each other in other word segmentation may include:
if the parts of speech of the boundary character and the adjacent character are the same, and the character strings formed by the boundary character and the adjacent character can independently form words, determining that the boundary character of the candidate entity does not meet the preset boundary condition.
Illustratively, as shown in fig. 2, "certain" and "mobile" are nouns (n), "22" are words (m), "year" is an adverb (q), "sell" is a verb (v), "case" is a noun (n), and "how" is a pronoun (r), where "22" consists of two front and rear "2", both of which are words. As described above, the boundary character "2" ("2" preceding in "22") and the character "2" ("2" following in "22") immediately adjacent to the boundary character in other word segmentation are both words, and the character string formed by the two can be independently segmented into words to generate the word segment "22", and the word segment "22" is determined to have a higher possibility of being erroneously segmented, so that in order to avoid this situation, it may be determined that the boundary character of the candidate entity does not satisfy the preset boundary condition.
Optionally, in S103, determining whether the boundary character of the candidate entity meets the preset boundary condition according to the word segmentation result may include:
performing word segmentation processing on the candidate entity, and determining a first possibility of word part matching of the word segmentation in the candidate entity according to the part of speech of the word segmentation in the candidate entity;
determining a second possibility of matching the two parts of speech according to the part of speech of the word segment to which the boundary character belongs and the part of speech of the target word segment adjacent to the word segment;
and determining whether the boundary characters of the candidate entity meet the preset boundary conditions according to the comparison result of the first possibility and the second possibility.
Wherein the target word does not belong to a word in the candidate entity.
Taking a candidate entity "a certain mobile phone 2" as an example, performing word segmentation on the candidate entity to obtain word segmentation results "a certain mobile phone" and "2", wherein the certain mobile phone "and the certain mobile phone" are nouns, and the "2" are words, determining a value of a first possibility through a preset word frequency library, storing probabilities of occurrence of various part-of-speech collocations in a text in the word frequency library, for example, determining that the probability of occurrence of a noun+noun+number word collocation in the text is p1, and determining that the first possibility is p1. The word segment to which the boundary character belongs is "22", the corresponding part of speech is a number word, the word segment is the "year" of the target word segment immediately adjacent to the word segment, the corresponding part of speech is an graduated word, the value of the second possibility can be determined through a preset word frequency library, for example, the probability that the match of the number word and the graduated word occurs in the text is p2, and the second possibility can be determined to be p2. It may be determined whether the boundary character of the candidate entity satisfies a preset boundary condition according to a comparison result of the first likelihood and the second likelihood.
Optionally, determining whether the boundary character of the candidate entity meets the preset boundary condition according to the comparison result of the first likelihood and the second likelihood may include:
if the first probability is greater than the second probability, determining that the boundary character of the candidate entity meets the preset boundary condition.
For example, as described above, if the probability that the match of the noun+noun+number word occurs in the text is greater than the probability that the match of the number word+graduated word occurs in the text, i.e., p1 > p2, it may be determined that the match of the noun+noun+number word occurs in the text and the probability that the match is reasonable is higher, so that it may be determined that the boundary characters of the candidate entity satisfy the preset boundary condition.
Optionally, determining whether the boundary character of the candidate entity meets the preset boundary condition according to the comparison result of the first likelihood and the second likelihood may include:
if the first probability is smaller than the second probability, determining that the boundary characters of the candidate entity do not meet the preset boundary conditions.
For example, as described above, if the probability that the match of the noun+noun+number word occurs in the text is smaller than the probability that the match of the number word+graduated word occurs in the text, that is, p1 < p2, it may be determined that the match of the number word+graduated word occurs in the text and the probability that the match is reasonable is higher, and therefore, it may be determined that the boundary characters of the candidate entity do not satisfy the preset boundary condition.
Based on the same inventive concept, the present disclosure also provides an entity recognition apparatus. Fig. 3 is a block diagram of an entity identification device 300 provided in an exemplary embodiment of the present disclosure. Referring to fig. 3, the entity recognition apparatus 300 may include:
the obtaining module 301 is configured to input a text to be identified into an entity identification model to obtain at least one candidate entity;
the word segmentation module 302 is configured to segment a sentence in which the candidate entity is located in the text to be identified, so as to obtain a word segmentation result;
a first determining module 303, configured to determine, according to the word segmentation result, whether a boundary character of the candidate entity meets a preset boundary condition;
a second determining module 304, configured to take the candidate entity as a target entity if the boundary character of the candidate character meets the preset boundary condition.
After at least one candidate entity is obtained through the entity recognition model, the sentence of the candidate entity in the text to be recognized can be segmented, and whether the boundary characters of the candidate entity meet the preset boundary conditions is determined according to the segmentation result; and taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition. Therefore, the candidate entity obtained through the entity recognition model can be subjected to secondary judgment, and the problem of low entity recognition accuracy caused by insufficient corpus quantity or quality during entity recognition model training is effectively avoided, so that the accuracy of the determined target entity is improved.
Optionally, the first determining module 303 includes:
the first determining submodule is used for determining whether boundary characters of the candidate entity are in other word segmentation according to the word segmentation result, wherein the other word segmentation is not in the candidate entity;
and the second determining submodule is used for determining that the boundary characters of the candidate entity meet the preset boundary conditions under the condition that the boundary characters of the candidate entity are not in other segmentation words.
Optionally, the first determining module 303 includes:
and a third determining sub-module, configured to determine, if the boundary character of the candidate entity meets a preset boundary condition according to the boundary character and a character of the boundary character adjacent to the boundary character in other word segments.
Optionally, the third determining sub-module includes:
and a fourth determining sub-module, configured to determine that the boundary character of the candidate entity does not meet the preset boundary condition if the part of speech of the boundary character is the same as that of the immediately adjacent character and the character string formed by the two can be independently formed into a word.
Optionally, the first determining module 303 includes:
a fifth determining submodule, configured to perform word segmentation processing on the candidate entity, and determine a first possibility of word part of speech collocation of the word segmentation in the candidate entity according to the part of speech of the word segmentation in the candidate entity;
a sixth determining submodule, configured to determine a second possibility of collocation of the word part of speech of the word to which the boundary character belongs and a target word part of speech of a word adjacent to the word, where the target word does not belong to a word in the candidate entity;
a seventh determining submodule, configured to determine, according to a comparison result of the first likelihood and the second likelihood, whether a boundary character of the candidate entity meets a preset boundary condition.
Optionally, the seventh determining submodule includes:
an eighth determining submodule, configured to determine that the boundary character of the candidate entity meets the preset boundary condition if the first likelihood is greater than the second likelihood.
Optionally, the seventh determining submodule includes:
and a ninth determining submodule, configured to determine that the boundary character of the candidate entity does not meet the preset boundary condition if the first likelihood is less than the second likelihood.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the entity identification method provided by the present disclosure.
Fig. 4 is a block diagram illustrating an entity identification device 800 according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 4, apparatus 800 may include one or more of the following components: a first processing component 802, a first memory 804, a first power component 806, a multimedia component 808, an audio component 810, a first input/output interface 812, a sensor component 814, and a communication component 816.
The first processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The first processing component 802 may include one or more first processors 820 to execute instructions to perform all or part of the steps of the entity identification method described above. Further, the first processing component 802 may include one or more modules that facilitate interactions between the first processing component 802 and other components. For example, the first processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the first processing component 802.
The first memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The first memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The first power supply component 806 provides power to the various components of the device 800. The first power supply component 806 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 800 is in an operational mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the first memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The first input/output interface 812 provides an interface between the first processing component 802 and a peripheral interface module, which may be a keyboard, click wheel, button, or the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, an orientation or acceleration/deceleration of the device 800, and a change in temperature of the device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above-described entity identification methods.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as first memory 804, comprising instructions executable by first processor 820 of apparatus 800 to perform the above-described entity identification method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
The apparatus may be a stand-alone electronic device or may be part of a stand-alone electronic device, for example, in one embodiment, the apparatus may be an integrated circuit (Integrated Circuit, IC) or a chip, where the integrated circuit may be an IC or may be a collection of ICs; the chip may include, but is not limited to, the following: GPU (Graphics Processing Unit, graphics processor), CPU (Central Processing Unit ), FPGA (Field Programmable Gate Array, programmable logic array), DSP (Digital Signal Processor ), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), SOC (System on Chip, SOC, system on Chip or System on Chip), etc. The integrated circuits or chips described above may be used to execute executable instructions (or code) to implement the entity identification methods described above. The executable instructions may be stored on the integrated circuit or chip or may be retrieved from another device or apparatus, such as the integrated circuit or chip including a processor, memory, and interface for communicating with other devices. The executable instructions may be stored in the memory, which when executed by the processor implement the entity identification method described above; alternatively, the integrated circuit or chip may receive executable instructions through the interface and transmit the executable instructions to the processor for execution to implement the entity identification method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described entity identification method when executed by the programmable apparatus.
Fig. 5 is a block diagram illustrating an entity identification device 1900 according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to fig. 5, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by a second memory 1932 for storing instructions, such as applications, that are executable by the second processing component 1922. The application program stored in the second memory 1932 may include one or more modules each corresponding to a set of instructions. Further, the second processing component 1922 is configured to execute instructions to perform the entity identification method described above.
The apparatus 1900 may further comprise a second power supply component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and a second input/output interface 1958. The apparatus 1900 may operate based on an operating system stored in the memory 1932, such as Windows Server TM ,Mac OS X TM ,Unix TM ,Linux TM ,FreeBSD TM Or the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method of entity identification, comprising:
inputting the text to be identified into an entity identification model to obtain at least one candidate entity;
performing word segmentation on sentences of the candidate entity in the text to be identified to obtain word segmentation results;
determining whether the boundary characters of the candidate entity meet preset boundary conditions according to the word segmentation result;
and taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition.
2. The method according to claim 1, wherein determining whether the boundary character of the candidate entity satisfies a preset boundary condition according to the word segmentation result comprises:
determining whether boundary characters of the candidate entity are in other word segmentation according to the word segmentation result, wherein the other word segmentation is not in the candidate entity;
and under the condition that the boundary characters of the candidate entity are not in other word segmentation, determining that the boundary characters of the candidate entity meet the preset boundary conditions.
3. The method according to claim 2, wherein the method further comprises:
and under the condition that the boundary character of the candidate entity is in other word segmentation, determining whether the boundary character of the candidate entity meets a preset boundary condition according to the boundary character and the character of the boundary character which is adjacent to the boundary character in other word segmentation.
4. A method according to claim 3, wherein the word segmentation result includes a word segment in a sentence and a part of speech of the word segment, and the determining whether the boundary character of the candidate entity meets a preset boundary condition according to the boundary character and the character of the boundary character adjacent to the other word segment comprises:
if the parts of speech of the boundary characters are the same as the parts of speech of the adjacent characters and the character strings formed by the two can independently form words, determining that the boundary characters of the candidate entity do not meet the preset boundary conditions.
5. The method according to claim 1, wherein determining whether the boundary character of the candidate entity satisfies a preset boundary condition according to the word segmentation result comprises:
performing word segmentation processing on the candidate entity, and determining a first possibility of word part matching of the word segmentation in the candidate entity according to the part of speech of the word segmentation in the candidate entity;
determining a second possibility of collocation of the word part of the word to which the boundary character belongs and a target word part of the word adjacent to the word part of the word, wherein the target word does not belong to the word part in the candidate entity;
and determining whether the boundary characters of the candidate entity meet a preset boundary condition according to the comparison result of the first possibility and the second possibility.
6. The method of claim 5, wherein determining whether the boundary character of the candidate entity satisfies a preset boundary condition based on the comparison of the first likelihood and the second likelihood comprises:
and if the first possibility is larger than the second possibility, determining that the boundary characters of the candidate entity meet the preset boundary conditions.
7. The method of claim 5, wherein determining whether the boundary character of the candidate entity satisfies a preset boundary condition based on the comparison of the first likelihood and the second likelihood comprises:
and if the first possibility is smaller than the second possibility, determining that the boundary characters of the candidate entity do not meet the preset boundary conditions.
8. An entity identification device, comprising:
the acquisition module is used for inputting the text to be identified into the entity identification model to obtain at least one candidate entity;
the word segmentation module is used for segmenting sentences of the candidate entity in the text to be identified to obtain word segmentation results;
the first determining module is used for determining whether the boundary characters of the candidate entity meet the preset boundary conditions according to the word segmentation result;
and the second determining module is used for taking the candidate entity as a target entity under the condition that the boundary character of the candidate character meets the preset boundary condition.
9. An entity identification device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the method of any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 7.
CN202210911024.7A 2022-07-29 2022-07-29 Entity identification method, device and storage medium Pending CN117521653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210911024.7A CN117521653A (en) 2022-07-29 2022-07-29 Entity identification method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210911024.7A CN117521653A (en) 2022-07-29 2022-07-29 Entity identification method, device and storage medium

Publications (1)

Publication Number Publication Date
CN117521653A true CN117521653A (en) 2024-02-06

Family

ID=89765055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210911024.7A Pending CN117521653A (en) 2022-07-29 2022-07-29 Entity identification method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117521653A (en)

Similar Documents

Publication Publication Date Title
CN107608532B (en) Association input method and device and electronic equipment
EP3734472A1 (en) Method and device for text processing
CN111832315B (en) Semantic recognition method, semantic recognition device, electronic equipment and storage medium
CN111831806A (en) Semantic integrity determination method and device, electronic equipment and storage medium
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN111078884A (en) Keyword extraction method, device and medium
CN111813932B (en) Text data processing method, text data classifying device and readable storage medium
CN112035651B (en) Sentence completion method, sentence completion device and computer readable storage medium
CN113920293A (en) Information identification method and device, electronic equipment and storage medium
CN111538998B (en) Text encryption method and device, electronic equipment and computer readable storage medium
CN111079422B (en) Keyword extraction method, keyword extraction device and storage medium
CN112528671A (en) Semantic analysis method, semantic analysis device and storage medium
CN111832297A (en) Part-of-speech tagging method and device and computer-readable storage medium
CN111079421A (en) Text information word segmentation processing method, device, terminal and storage medium
CN108108356B (en) Character translation method, device and equipment
CN111414766B (en) Translation method and device
KR102327790B1 (en) Information processing methods, devices and storage media
CN117521653A (en) Entity identification method, device and storage medium
CN111222316B (en) Text detection method, device and storage medium
CN115017324A (en) Entity relationship extraction method, device, terminal and storage medium
CN110147426B (en) Method for determining classification label of query text and related device
CN112149653B (en) Information processing method, information processing device, electronic equipment and storage medium
CN111414731B (en) Text labeling method and device
CN116069936B (en) Method and device for generating digital media article
CN109669549B (en) Candidate content generation method and device for candidate content generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination