WO2021169351A1 - 指代消解的方法、装置及电子设备 - Google Patents

指代消解的方法、装置及电子设备 Download PDF

Info

Publication number
WO2021169351A1
WO2021169351A1 PCT/CN2020/124482 CN2020124482W WO2021169351A1 WO 2021169351 A1 WO2021169351 A1 WO 2021169351A1 CN 2020124482 W CN2020124482 W CN 2020124482W WO 2021169351 A1 WO2021169351 A1 WO 2021169351A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
target text
word segmentation
word
vector
Prior art date
Application number
PCT/CN2020/124482
Other languages
English (en)
French (fr)
Inventor
刘通
祝官文
孟函可
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021169351A1 publication Critical patent/WO2021169351A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application belongs to the field of natural language processing technology, and in particular relates to a method, device and electronic device for referencing resolution.
  • Natural language is the crystallization of human wisdom. Although natural language processing is one of the most difficult problems in artificial intelligence, the research on natural language processing has always been a hot spot.
  • Referential resolution refers to the task of clarifying the referential relationship between pronouns and antecedents. Referential resolution plays an extremely important role in supporting natural language processing application scenarios such as information extraction, dialogue systems, machine translation, and machine reading comprehension. For example, when referring to resolution is used in a dialogue system, pronouns can be replaced with corresponding antecedents, thereby improving the accuracy of dialogue intention recognition and element extraction.
  • Referential resolution generally includes two types: explicit pronoun resolution and zero pronoun resolution.
  • Explicit pronoun resolution refers to determining which noun phrase the explicit pronoun points to in the expression.
  • Zero-pronoun resolution is a special kind of resolution for the phenomenon of zero-reference.
  • Zero-pronoun resolution infers the omitted part based on the context relationship, that is, the zero pronoun refers to which linguistic unit in the preceding text.
  • the referential resolution mentioned in this application file refers to the dominant pronoun resolution.
  • pronoun resolution technology is based on syntactic analysis, part-of-speech tagging and entity extraction, combined with manual rule sets, to resolve pronouns. This method is time-consuming and labor-intensive, and does not have generalization capabilities.
  • the deep learning method uses a neural network architecture to train through a large amount of corpus to learn the degree of semantic relevance between words and eliminate pronouns according to the degree of relevance.
  • the embodiments of the present application provide a method and device for reference resolution, which can solve the problem of insufficient accuracy of reference resolution in related technologies.
  • an embodiment of the present application provides a method for referential resolution, the method includes: after obtaining the target text that needs to be resolved, obtaining the semantic features of the target text, and obtaining the part-of-speech features of the target text , At least one of the location feature and the knowledge feature; then the obtained different types of features are formed into an input matrix and then input into the neural network model to obtain the referential resolution result.
  • At least one of part-of-speech features, location features, and knowledge features is added to the information input to the neural network model, which increases the information types of the neural network model input data, thereby increasing Refers to the accuracy of the digestion results.
  • the word sense feature includes a word vector matrix corresponding to the target text.
  • the acquiring the semantic feature of the target text includes:
  • the word vectors corresponding to each of the word segmentation are spliced into a word vector matrix.
  • obtaining the part-of-speech feature of the target text includes:
  • the part-of-speech information corresponding to each word segmentation is mapped into part-of-speech features.
  • acquiring the location feature of the target text includes:
  • the location information corresponding to each word segmentation is mapped into a location feature.
  • obtaining the knowledge feature of the target text includes:
  • the knowledge information corresponding to each word segmentation is mapped into knowledge features.
  • the neural network model includes a feature extractor and a classification sub-network.
  • the feature extractor is used to extract features of the input matrix to obtain a feature matrix, the feature matrix including feature vectors corresponding to each of the word segmentation.
  • the classification sub-network is used to obtain a referential resolution result based on the feature matrix.
  • the classification sub-network includes a splicing layer, a fully connected neural network and an output layer.
  • the splicing layer is used to splice the feature vector corresponding to each remaining participle with the feature vector corresponding to the pronoun to obtain a matching vector; the remaining participles are several participles included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network includes a residual connection layer, a splicing layer, a fully connected neural network and an output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each of the word segmentation.
  • the splicing layer is used to splice the encoding vector corresponding to each remaining word segmentation with the encoding vector corresponding to the pronoun to obtain a matching vector; the remaining word segmentation is a number of word segmentation included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network adds a residual connection layer, which makes the neural network model converge faster through the residual connection layer, and improves the training efficiency of the neural network model.
  • the classification sub-network includes a selection layer, a splicing layer, a fully connected neural network, and an output layer.
  • the selection layer is used to filter out the feature vector corresponding to each candidate antecedent and the feature vector corresponding to the pronoun from the feature matrix.
  • the splicing layer is used to splice the feature vector corresponding to each candidate antecedent with the feature vector corresponding to the pronoun to obtain a matching vector.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network adds a selection layer.
  • the selection layer filters out the feature vectors that are obviously not related to the referential result, and retains the feature vectors with high correlation. On the one hand, it reduces the amount of calculation and improves the overall efficiency of the program; on the other hand, it improves the accuracy of the referential resolution results.
  • the classification sub-network includes a residual connection layer, a selection layer, a splicing layer, a fully connected neural network and an output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each of the word segmentation.
  • the selection layer is used to filter out the coding vector corresponding to each candidate antecedent and the coding vector corresponding to the pronoun from the coding matrix.
  • the splicing layer is used to splice the coding vector corresponding to each candidate antecedent with the coding vector corresponding to the pronoun to obtain a matching vector.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • an embodiment of the present application provides an apparatus for referring to a resolution, which includes: a first acquisition module, a second acquisition module, a composition module, and a resolution module.
  • the first obtaining module is used to obtain the target text to be referred to and resolved;
  • the second acquisition module is configured to acquire the semantic feature of the target text, and acquire at least one target feature of the part-of-speech feature, location feature, and knowledge feature of the target text;
  • the composition module is used to compose the word sense feature and the target feature into an input matrix
  • the resolution module is used to input the input matrix into a neural network model to obtain a referential resolution result.
  • the word sense feature includes a word vector matrix corresponding to the target text.
  • the second acquisition module includes a word sense feature acquisition module, a part-of-speech feature acquisition module, a location feature acquisition module, and a knowledge feature acquisition module.
  • the word sense feature obtaining module is used to obtain the word sense feature of the target text.
  • the word meaning feature acquisition module is specifically used for:
  • the word vectors corresponding to each of the word segmentation are spliced into a word vector matrix.
  • the part-of-speech feature obtaining module is used to obtain the part-of-speech feature of the target text.
  • the part-of-speech feature acquisition module is specifically configured to:
  • the part-of-speech information corresponding to each word segmentation is mapped into part-of-speech features.
  • the location feature acquiring module is used to acquire the location feature of the target text.
  • the location feature acquisition module is specifically configured to:
  • the location information corresponding to each word segmentation is mapped into a location feature.
  • the knowledge feature acquisition module is used to acquire the knowledge feature of the target text.
  • the knowledge feature acquisition module is specifically used for:
  • the knowledge information corresponding to each word segmentation is mapped into knowledge features.
  • the neural network model includes a feature extractor and a classification sub-network.
  • the feature extractor is used to extract features of the input matrix to obtain a feature matrix, the feature matrix including feature vectors corresponding to each of the word segmentation.
  • the classification sub-network is used to obtain a referential resolution result based on the feature matrix.
  • the classification sub-network includes a splicing layer, a fully connected neural network and an output layer.
  • the splicing layer is used to splice the feature vector corresponding to each remaining participle with the feature vector corresponding to the pronoun to obtain a matching vector; the remaining participles are several participles included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network includes a residual connection layer, a splicing layer, a fully connected neural network and an output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each of the word segmentation.
  • the splicing layer is used to splice the encoding vector corresponding to each remaining word segmentation with the encoding vector corresponding to the pronoun to obtain a matching vector; the remaining word segmentation is a number of word segmentation included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network adds a residual connection layer, which makes the neural network model converge faster through the residual connection layer, and improves the training efficiency of the neural network model.
  • the classification sub-network includes a selection layer, a splicing layer, a fully connected neural network and an output layer.
  • the selection layer is used to filter out the feature vector corresponding to each candidate antecedent and the feature vector corresponding to the pronoun from the feature matrix.
  • the splicing layer is used to splice the feature vector corresponding to each candidate antecedent with the feature vector corresponding to the pronoun to obtain a matching vector.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network adds a selection layer.
  • the selection layer filters out the feature vectors that are obviously not related to the referential result, and retains the feature vectors with high correlation. On the one hand, it reduces the amount of calculation and improves the overall efficiency of the program; on the other hand, it improves the accuracy of the referential resolution results.
  • the classification sub-network includes a residual connection layer, a selection layer, a splicing layer, a fully connected neural network, and an output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each of the word segmentation.
  • the selection layer is used to filter out the coding vector corresponding to each candidate antecedent and the coding vector corresponding to the pronoun from the coding matrix.
  • the splicing layer is used to splice the coding vector corresponding to each candidate antecedent with the coding vector corresponding to the pronoun to obtain a matching vector.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, , So that the electronic device implements the method in any one of the first aspect and the possible implementation manners of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it achieves the possibilities of the first aspect and the first aspect. Implement any of the methods described in the manner.
  • the embodiments of the present application provide a computer program product, which when the computer program product runs on an electronic device, causes the electronic device to execute the method described in any one of the foregoing first aspect and the possible implementation manners of the first aspect .
  • Fig. 1 is a schematic flow chart of an end-to-end neural network-based reference resolution method provided by the prior art
  • Figure 2 is a schematic flow chart of a method for establishing a Chinese pronoun resolution model provided by the prior art
  • FIG. 3 is an application scenario of the method for referencing resolution provided by an embodiment of the present application
  • FIG. 4 is another application scenario of the method for reference resolution provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device to which the method for referencing resolution provided by an embodiment of the present application is applicable;
  • FIG. 6A is a schematic flowchart of a method for referencing resolution provided by an embodiment of the present application.
  • FIG. 6B is another schematic flowchart of a method for referencing resolution provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a mapping network used in a method for referencing resolution provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for referencing resolution according to another embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a method for referencing resolution according to another embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a method for referencing resolution according to another embodiment of the present application.
  • FIG. 11 is a schematic diagram of the structure of a device for referencing resolution according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a second acquisition module in a referral resolution device provided by an embodiment of the present application.
  • the term “if” can be construed as “when” or “once” or “in response to determination” or “in response to detecting ".
  • the first example is a reference resolution method based on an end-to-end neural network.
  • the technical solution of the first example includes the following steps:
  • the word vector (embedding) is obtained by training on a massive data set, that is, the feature embedding in Figure 1.
  • the disadvantage of the first example is that: on the one hand, the input feature is single, that is, the input word vector, which does not use more information of the sentence; on the other hand, when all words in the sentence are matched and scored, there are many useless calculations and computing power. high cost.
  • the second example is a method for establishing a Chinese pronoun resolution model.
  • the technical solution of the second example mainly includes the following steps:
  • word2vec word to vector
  • each word vector is mapped to a one-dimensional number, and combined to form a resolution vector (or called a sentence vector);
  • the disadvantage of the second example is: on the one hand, the feature of the input LSTM is only the word vector obtained by using word2vec, and no more information of the sentence is used; on the other hand, the word vector is encoded by LSTM and mapped into a one-dimensional number. A lot of information is lost; on the other hand, the relationship between word vectors is not considered.
  • this application proposes a neural network reference resolution method that combines a variety of information. Specifically, based on the word vector information of words, and at least one of the three types of information: part-of-speech information, location information, and knowledge information, the neural network model is used for referential resolution, which increases the types of input data and improves Refers to the accuracy of the digestion results.
  • FIG. 3 shows a schematic diagram of the first application scenario of the method for referencing resolution provided in an embodiment of this application.
  • the first application scenario is the application scenario of man-machine dialogue.
  • the application scenario includes the user 31 and the user equipment 32.
  • the user equipment 32 deploys a man-machine dialogue system, such as a voice assistant.
  • the user 31 wakes up the voice assistant of the user device 32 by voice inputting a wake-up word or performing a preset user operation, etc., and then voice inputs text.
  • the text includes but is not limited to keywords, sentences, etc.
  • the user equipment 32 outputs the result corresponding to the input text through the voice assistant.
  • the voice input "introduce Jay Chou".
  • the user interface of the voice assistant of the user device 32 displays the search results for "introduction to Jay Chou” under a certain search engine, or displays the search results for "introduction to Jay Chou” under a certain application.
  • the user interface of the user device 32 voice assistant displays the search results of a certain search engine: "Jay Chou is a famous Chinese singer.”.
  • the voice assistant can recognize the text corresponding to the user's voice as "who is his wife?". However, if the voice assistant does not have the human comprehension ability, it will not be able to determine what "he” means, and therefore it will not be able to output correct results.
  • the embodiment of the present application provides a method for referencing resolution, which can be applied to user equipment.
  • the embodiment of the present application may enable the user equipment to perform natural language processing on the text to realize the ability of reference resolution.
  • the user equipment 32 obtains the historical text of the man-machine dialogue process to constitute the target text. For example, combining the two historical texts “Jay Chou is a famous Chinese singer.” and “Who is his wife?" into the target text: “Jay Chou is a famous Chinese singer. Who is his wife?"; another example, “Introduce Jay Chou”, “Jay Chou is a famous Chinese singer.” and “Who is his wife?" These three historical texts are combined into the target text: “Introduce Jay Chou. Jay Chou is a famous Chinese singer. His wife.” who is it?". Then, the user device 32 performs reference resolution for the target text. After the reference resolution is completed, the electronic device 32 can know that the text "Who is his wife?" input by the user 31 refers to "Jay Chou". Output accurate results for "Who is Jay Chou's wife?"
  • User equipment in the first application scenario includes, but is not limited to, mobile phones, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, laptops, ultra-mobile personal computers (ultra -mobile personal computer (UMPC), netbook, personal digital assistant (PDA), smart speaker, TV set top box (STB) or TV, etc.
  • AR augmented reality
  • VR virtual reality
  • laptops ultra-mobile personal computers
  • PDA personal digital assistant
  • STB TV set top box
  • TV TV set top box
  • the user can input user instructions by voice after waking up the voice assistant. After the user equipment understands the user's voice instruction through the voice assistant, it executes the user's voice instruction.
  • the user voice inputs "find Li Lei's phone number and call him for me”.
  • the voice assistant can recognize the text corresponding to the user's voice as "Find Li Lei's phone number and call him for me”.
  • Using the reference resolution method of the present application can enable the user equipment to perform natural language processing on the text to achieve the ability of reference resolution.
  • the user equipment can know that the text entered by the user "find Li Lei's phone number, help me call him", “he” refers to "Li Lei", so as to "find Li Lei's phone number, help me give Li Lei made an accurate response to the user's instruction.
  • the voice assistant of the user device looks up Li Lei's phone number in the contact list and dials.
  • FIG. 4 is a schematic diagram of a second application scenario of the method for referencing resolution provided in an embodiment of this application.
  • the second application scenario is the application scenario of the information extraction system.
  • This application scenario includes two user equipments and one server 43.
  • the two user equipments are the first user equipment 41 and the second user equipment 42 respectively.
  • the first user equipment 41 and the second user equipment 42 respectively communicate with the server 43 through a wireless communication network.
  • the server 43 is deployed with an information extraction system.
  • the server 43 obtains the text sent by the first user equipment 41 and/or the second user equipment 42. Information extraction is performed on the text obtained from the first user equipment 41 or the second user equipment 42 to obtain the knowledge expression corresponding to the text.
  • a large-scale knowledge base is constructed from massive amounts of data.
  • the server 43 obtains the text "Trump was born in New York, and he is the 45th president of the United States.” from the first user device 41.
  • the server 43 does not have human comprehension capabilities, it would not be able to determine what "he” in the text refers to, and therefore would not be able to obtain correct knowledge expression results.
  • the embodiment of the present application provides a method for referencing resolution, and the method can be applied to a server.
  • the embodiment of the present application may enable the server to perform natural language processing on the text to realize the ability of reference resolution. After the server 43 resolves the reference to the text, it can know that the text "Trump was born in New York, he is the 45th president of the United States.” Born in New York, Trump is the 45th president of the United States.” Output accurate knowledge expression results.
  • the server 43 processes the text through reference resolution, and extracts the knowledge expression as: Trump, yes, the 45th President of the United States. It should be understood that the example here is a knowledge expression structure of triples.
  • the server includes but is not limited to an independent server, a distributed server, a server cluster, or a cloud server, etc.
  • the embodiment of the present application does not impose any restriction on the specific type of the server.
  • Wireless communication networks include, but are not limited to, wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), ZigBee, Bluetooth (bluetooth, BT), and global mobile communications System (Global System of Mobile communication, GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (Wideband Multiple Access, WCDMA) ), Long Term Evolution (LTE), 5th generation mobile networks (5G) and future communication networks.
  • WLAN wireless local area networks
  • WiFi wireless fidelity
  • ZigBee ZigBee
  • Bluetooth bluetooth
  • GSM Global System of Mobile communication
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • 5G 5th generation mobile networks
  • electronic devices such as servers or user equipment can also perform referential resolution on texts stored in local storage.
  • the electronic device does not need to interact with other devices to obtain the text to be resolved.
  • the reference resolution method provided in the embodiments of the present application can be applied to electronic devices such as user terminals or servers, and the embodiments of the present application do not impose any restrictions on the specific types of electronic devices.
  • FIG. 5 shows a schematic diagram of the structure of the electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes a camera serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110.
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including WLAN (such as Wi-Fi), BT, global navigation satellite system (GNSS), frequency modulation (FM), and short-range wireless communication technologies. (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify, and convert it into electromagnetic waves to radiate through the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include GSM, GPRS, CDMA, WCDMA, time-division code division multiple access (TD-SCDMA), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR Technology etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 may include one or N display screens 194, and N is an integer greater than one.
  • the electronic device 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include one or M cameras 193, and M is an integer greater than one.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects the frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA, CTIA
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but with different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, and so on.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch device”.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
  • the button 190 includes a power-on button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
  • the electronic device 100 may support one or N SIM card interfaces, and N is an integer greater than one.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 may also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • FIG. 6A and 6B show a flow chart of an implementation of a method for referencing resolution provided by an embodiment of the present application.
  • the method of reference resolution is applicable to situations where the text needs to be resolved by reference.
  • the reference resolution method is applied to electronic equipment and can be executed by a reference resolution device configured in the electronic device.
  • the means for referencing resolution may be implemented by software, hardware, or a combination of software and hardware of an electronic device.
  • the reference resolution method can be applied to the user terminal shown in FIG. 3, it can also be applied to the server shown in FIG. 4, and it can also be applied to the electronic device with the hardware structure shown in FIG. middle.
  • the method for referencing resolution includes step S610 to step S640, and the specific implementation principles of each step are as follows.
  • the target text is the object to be referred to and resolved.
  • sentence text For example, sentence text.
  • the target text may be a text obtained by the electronic device instantly, may also be a text stored in a memory that is communicatively coupled with the electronic device, or may be a text obtained from other electronic devices.
  • the memory to which the electronic device is communicatively coupled includes an internal memory or an external memory of the electronic device.
  • the target text may be the text input by the user through the input unit of the electronic device, such as a button or touch screen; it may also be the audio collection unit of the user through the electronic device, such as a microphone.
  • the collected voice data can also be a picture including text that is instantly captured by the user through the camera of the electronic device; it can also be a picture including text that is instantly scanned by the user through the scanning device of the electronic device; it can also be stored in the electronic device
  • the text in the picture needs to be extracted as the target text by enabling the picture recognition function of the electronic device.
  • the text in the voice data needs to be recognized as the target text by starting the audio-to-text function of the electronic device.
  • S620 Acquire semantic features, part-of-speech features, location features, and knowledge features of the target text
  • the input target text is first segmented and part-of-speech tagging. Then obtain the semantic features, part-of-speech features, location features and knowledge features corresponding to each word segmentation of the target text.
  • the target text is subjected to word segmentation processing to obtain several word segmentation included in the target text. Then map each word segmentation to four vectors of fixed length, which are embedding, part-of-speech vector, position vector, and knowledge vector. Embedding represents the semantic features of the word segmentation, the part-of-speech vector represents the part-of-speech feature of the word segmentation, the position vector represents the location feature of the word segmentation, and the knowledge vector represents the knowledge feature of the word segmentation.
  • stop words and/or non-characteristic words can be removed first, and then a number of word segments included in the target text can be obtained.
  • the word segmentation of the target text can be expressed as embedding through a word vector, or a word embedding (word embedding) model.
  • word embedding word embedding
  • Methods of creating word embedding models include but are not limited to Word2Vec, LSA (Latent Semantic Analysis), Glove (Global Vectors for Word Representation), fastText, ELMo (Embeddings from Language Models), GPT (Generative Pre-Training) or BERT (Bidirectional Encoder) Representation from Transformers) and so on.
  • the embodiment of the present application uses the word vector model to convert the abstract existing text in the real world into a vector that can be manipulated by mathematical formulas. Processing the data into machine-processable data enables the implementation of this application.
  • the Word2Vec method as an example to illustrate, perform pre-training on a large-scale corpus to obtain the word vector corresponding to each word.
  • a word vector database is established, and the corresponding relationship between each word and the word vector is stored in the word vector database.
  • searching the corresponding relationship the word vector corresponding to each word segment included in the target text can be obtained.
  • the target text includes T participles, and each participle corresponds to a word vector of length K. T and K are integers greater than one.
  • acquiring the part-of-speech features of the target text includes:
  • the part-of-speech information corresponding to each word segmentation is mapped into part-of-speech features.
  • the part-of-speech tagging can be performed while the target text is segmented, and the part-of-speech information of the word segmentation included in the target text can be obtained. Map the part-of-speech information of each word segmentation into part-of-speech features. Part-of-speech features can be represented by part-of-speech vectors.
  • the part-of-speech identification result is mapped into a fixed-length part-of-speech vector through a mapping network, that is, mapped into a part-of-speech feature. That is to say, referring to Figure 7, the one-dimensional part-of-speech information is mapped into a multi-dimensional part-of-speech vector through the mapping network.
  • the mapping network may be a single-layer fully connected layer.
  • the object referred to by a pronoun is a noun. Therefore, in this example, the noun can be identified as 1, as the candidate antecedent, and the part of speech other than pronouns and nouns can be identified as 0.
  • the complexity of calculation is reduced, and the cost of computing power is saved.
  • this example marks the part-of-speech information of the participle after the pronoun as 0.
  • the complexity of calculation is reduced, and the cost of computing power is saved.
  • the numbers corresponding to different parts of speech here can also use other numbers; the part of speech information can be one digit, two digits, or even more digits; no specific part of speech feature or length of part of speech vector limited. It should be understood that the three examples here cannot be construed as specific limitations on the application.
  • acquiring the location feature of the target text includes:
  • the location information corresponding to each word segmentation is mapped into a location feature.
  • the position information of each word segment included in the target text is obtained.
  • the location information can be expressed by the distance between the participle and the pronoun, where the distance can be the word distance or a quantity positively related to the word distance. For example, to sort each participle, with pronouns as the center, the number of participles between each participle and pronouns is recorded as the distance; another example is to sort each participle, and the number of participles between each participle and the pronoun is The product of the number and the length of a single participle is recorded as the distance. Then, the location information of each word segmentation is mapped into location features. Location features can be represented by location vectors.
  • the pronoun takes the pronoun as the center, and obtain the distance between each participle and the pronoun.
  • the length of each participle can be recorded as 1, or as 2 or other numbers.
  • the length of a punctuation mark can be recorded as 0, or as 1 or other numbers.
  • the distance between each participle and the pronoun is mapped into a fixed-length position vector through the mapping network, that is, the position feature. That is to say, one-dimensional position information is mapped into a multi-dimensional position vector through the mapping network.
  • the mapping network may be a single-layer fully connected layer.
  • part-of-speech tagging may be performed while the target text is segmented to obtain position information of nouns included in the target text.
  • the location information can be expressed by the distance between the noun and the pronoun. Then, the location information of each noun is mapped into a location feature, that is, a location vector. It should be understood that for part of speech other than nouns, the position vector is an all-zero vector.
  • the distance between each noun and the pronoun is obtained with the pronoun as the center.
  • the length of each participle can be recorded as 1, or as 2 or other numbers.
  • the length of a punctuation mark can be recorded as 0, or as 1 or other numbers.
  • the distance between each noun and the pronoun is mapped into a fixed-length position vector through the mapping network, that is, the position feature. That is to say, one-dimensional position information is mapped into a multi-dimensional position vector through the mapping network.
  • the mapping network may be a single-layer fully connected layer.
  • pronouns are usually used to refer to the participle that appears before the pronoun, that is, the pronoun refers to the antecedent. Therefore, in other embodiments of the present application, on the basis of the foregoing two embodiments, the position information of the participle or noun after the pronoun is marked as 0.
  • each word segmentation or punctuation is not specifically limited; the position information can also be one digit, two digits, or even more digits; the length of the position feature or the position vector There is no specific limitation. It should be understood that the two examples here cannot be construed as specific limitations on the application.
  • acquiring the knowledge feature of the target text includes:
  • the knowledge information corresponding to each word segmentation is mapped into knowledge features.
  • the mapping network may be a single-layer fully connected layer.
  • the entity results of nouns include but are not limited to: person entity, place entity, object entity, animal entity, organization, or non-existence, etc.
  • the corresponding identification is performed to obtain the knowledge information.
  • the matching results include but are not limited to: successful matching, non-matching, or other matching results, etc.
  • Other matching results include the inability to determine whether the matching is successful or not, or the matching result is not returned.
  • Knowledge information for example, 1 means the match is successful, 0 means no match or other matching results.
  • the pronoun is determined to be a personal pronoun based on the part-of-speech tagging result, for example, he(s) or she(s).
  • the entity result obtained by looking up nouns in the knowledge base is a person entity. At this time, it is determined that the entity result of the noun matches the pronoun successfully. Mark the knowledge information of the noun as 1.
  • the pronoun is determined to be a demonstrative pronoun based on the part of speech tagging results, for example: here or there.
  • the entity result obtained by searching for nouns in the knowledge base is a location entity or an organization.
  • the knowledge information of the noun is marked as 1.
  • the pronoun is determined to be "it” based on the part of speech tagging results.
  • the entity result obtained by searching for nouns in the knowledge base is an object entity or an animal entity.
  • the knowledge information of the noun is marked as 1.
  • the knowledge information is marked as 0.
  • pronouns are usually used to refer to the participle that appears before the pronoun, that is, the pronoun refers to the antecedent. Therefore, in some other embodiments of the present application, on the basis of the foregoing two embodiments, the knowledge information of the participle or noun after the pronoun is marked as 0.
  • knowledge information can be one digit, two digits, or even more digits; there is no specific limitation on the length of knowledge features or knowledge vectors . It should be understood that the examples here cannot be interpreted as specific limitations on the application.
  • mapping network that maps part-of-speech information into part-of-speech features (part-of-speech vectors)
  • mapping network that maps location information into location features (location vectors)
  • knowledge vectors a mapping network that maps knowledge information into knowledge features
  • S630 Combine the word sense feature, part of speech feature, location feature, and knowledge feature into an input matrix
  • the semantic features, part-of-speech features, location features, and knowledge features corresponding to each word segmentation included in the target text are spliced to obtain an input matrix.
  • the word segmentation included in the target text is less than the preset maximum length, it needs to be filled with a 0 vector.
  • the target text includes word segmentation greater than the preset maximum length, the excess word segmentation needs to be removed.
  • the maximum length of the word segmentation can be preset before the interception, or the length of the word segmentation can be preset after the interception, which is not limited in this application.
  • S640 Input the input matrix into a neural network model to obtain a referential resolution result.
  • the neural network model is a trained neural network model.
  • the neural network model is used to resolve the text and obtain the result of the reference resolution.
  • Both the feature extractor and the classification sub-network are neural network models based on the machine learning technology in artificial intelligence.
  • the feature extractor and the classification sub-network can be trained together to obtain a neural network model for reference resolution.
  • the embodiment of the present application does not specifically limit the structure of the neural network model.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • BiLSTM BiLSTM or Transfomer networks
  • the training process of the neural network model can be implemented in other electronic devices such as cloud servers.
  • the training process of the neural network model can be implemented locally on the server, and can also be implemented on other electronic devices that communicate with the server.
  • the electronic device trains the neural network model locally, or obtains the trained neural network model from other electronic devices, and deploys the trained neural network model, the reference resolution of the target text can be realized on the electronic device.
  • a neural network model is used to perform feature extraction on the input matrix to obtain a feature vector corresponding to each word segmentation. Then, the feature vectors corresponding to the pronouns and the feature vectors corresponding to the rest of the word segmentation except the pronouns are spliced into matching vectors, and each matching vector is used to represent a set of referential relations. Finally, each group of referential relations is scored, so as to obtain the scoring results of each group of referential relations. Each scoring result is positively correlated with the degree of matching between the group of referential relations, that is, the scoring result reflects the degree of matching between the pronouns and the participles in the same group.
  • the neural network model can output at least one scoring result. The number of scoring results output by the neural network model depends on the specific structure of the neural network model, and this application does not limit the number.
  • the neural network model outputs N scoring results, and N is an integer equal to or greater than 1.
  • the neural network model determines whether the pronouns and participles of each group constitute a referential relationship according to whether the scoring results exceed a preset threshold, and finally outputs the scoring results that exceed the preset threshold.
  • the referential relationship of the threshold is an integer equal to or greater than 1.
  • the neural network model selects a set of referential relations with the highest score as the referential resolution result of the target text.
  • the neural network model outputs the sum of the scores of each group of referential relations is 1. In some embodiments of the present application, the sum of the scores of each group of reference relationships output by the neural network model is not 1. Whether the sum value is 1, depends on whether the output layer of the neural network model is normalized, which is not limited in this application.
  • the neural network model includes a feature extractor and a classification sub-network.
  • the feature extractor is used to extract features of the input matrix to obtain a feature matrix, and the feature matrix includes feature vectors corresponding to each of the word segmentation.
  • the classification sub-network is used to obtain a referential resolution result based on the feature matrix.
  • the classification sub-network includes: a splicing layer, a fully connected neural network, and an output layer.
  • the splicing layer is used to splice the feature vector corresponding to each remaining participle with the feature vector corresponding to the pronoun to obtain a matching vector; the remaining participles are several participles included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the matching degree between each scoring result and the referential relationship corresponding to the matching vector is positively correlated.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the input matrix may also be composed of word sense features and part-of-speech features; in some other embodiments, the input matrix may also be composed of word sense features and The location features form the input matrix; in some other embodiments, the input matrix can also be composed of word sense features and knowledge features; in some other embodiments, the input matrix can also be composed of word sense features, part-of-speech features, and location features.
  • the word sense feature of the target text is obtained, and at least one of the three features of the part-of-speech feature, location feature, and knowledge feature of the target text is obtained; and then the word sense feature and the target feature
  • the input matrix is composed; finally, the result of reference resolution is obtained based on the input matrix.
  • the classification sub-network may further include a selection layer. That is, in these embodiments, the classification sub-network includes: a selection layer, a splicing layer, a fully connected neural network, and an output layer.
  • the selection layer is used to filter the feature vector corresponding to the candidate antecedent and the feature vector corresponding to the pronoun from the feature matrix output by the feature extractor.
  • the candidate antecedents may be screened based on at least one of the part-of-speech information, location information, and knowledge information, and then the feature vector corresponding to the candidate antecedents may be screened from the feature matrix.
  • the position information can be used to filter out the participles before the pronouns as candidate antecedents.
  • part-of-speech information can be used to filter nouns in participles as candidate antecedents.
  • the knowledge information can be used to filter the participles belonging to the entity and matching the pronoun as candidate antecedents. For example, the word segmentation whose knowledge information is 1 is selected as the candidate antecedent word.
  • part-of-speech information and location information can be used to filter out nouns before pronouns as candidate antecedents.
  • part-of-speech information and knowledge information can be used to filter nouns belonging to entities and matching pronouns as candidate antecedents. For example, the word segmentation whose knowledge information is 1 and the part-of-speech information is 1 is selected as the candidate antecedent word.
  • position information and knowledge information can be used to filter the participles that belong to the entity and match the pronouns as candidate antecedents.
  • part-of-speech information can be used to filter the nouns that belong to the entity and match the pronoun as candidate antecedents.
  • the number of candidate antecedent words filtered by the selection layer can be preset, that is, the number of matching vectors of subsequent stitching layers can be limited.
  • the quantity is an empirical value, which needs to be weighed against the actual situation such as the amount of calculation and the accuracy of the result. There is no specific limit on this quantity.
  • the splicing layer is used to splice the feature vector corresponding to each candidate antecedent with the feature vector corresponding to the pronoun to obtain a matching vector.
  • the other layers are the same as the example shown in FIG. 6B, please refer to the foregoing.
  • the selection layer is not set, and the feature vectors corresponding to each antecedent and the pronoun need to be spliced to obtain a matching vector.
  • candidate antecedents and corresponding feature vectors are screened out.
  • the subsequent splicing layer only the feature vectors corresponding to candidate antecedents and pronouns need to be spliced to obtain matching vectors.
  • reducing the number of matching vectors also reduces the computational complexity of scoring, saves computing power costs, and improves efficiency.
  • the candidate antecedents with a high degree of relevance to the referential relationship are screened out, and irrelevant words are filtered out, and the accuracy of the referential resolution results is improved.
  • the classification sub-network may further include a residual connection layer.
  • a residual connection layer based on the embodiment shown in FIG. 8.
  • the classification sub-network includes: a residual connection layer, a selection layer, a splicing layer, and a full connection Neural network and output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each word segmentation.
  • the selection layer is used to filter out the coding vector corresponding to the candidate antecedent and the coding vector corresponding to the pronoun from the coding matrix.
  • the splicing layer is used to splice the encoding vector corresponding to each candidate antecedent with the encoding vector corresponding to the pronoun to obtain a matching vector.
  • the neural network model can converge faster and the training efficiency of the neural network model can be improved.
  • the first step the input layer
  • the word vector (embedding) corresponding to each of the 16 word segments is obtained.
  • the word vector corresponding to the i-th word segmentation is e i
  • the value of i is an integer in the numerical range [1,16].
  • the length of the word vector is K.
  • each part-of-speech information is mapped into a first preset length of part-of-speech feature through the first mapping network.
  • mapping matrix of the first mapping network is M1
  • size of M1 is 1 ⁇ K
  • first preset length is K
  • the matrix formed by the part-of-speech information of 16 word segmentation is I1
  • the size of I1 is 16 ⁇ 1.
  • the size of S is 16 ⁇ K.
  • each position information is mapped into a position feature of a second preset length through the second mapping network.
  • the mapping matrix of the second mapping network is M2
  • the size of M2 is also 1 ⁇ K
  • the second preset length is K.
  • the matrix formed by the location information of the 16 word segmentation is spliced into I2, and the size of I2 is 16 ⁇ 1.
  • each knowledge information is mapped into a knowledge feature of a third preset length through a third mapping network.
  • the mapping matrix of the third mapping network is M3, the size of M3 is also 1 ⁇ K, and the third preset length is K.
  • the matrix formed by the location information of the 16 word segmentation is spliced into I3, and the size of I3 is 16 ⁇ 1.
  • the maximum input length T of the neural network model is exactly 16 word segmentation, and there is no need to intercept or zero-fill the word segmentation of the target text.
  • the feature extractor can be CNN, RNN, BiLSTM or Transfomer, which can be flexibly selected according to needs, which improves the applicability of this application.
  • the input matrix is composed of four matrices E, S, P, and Q that are spliced together to obtain a matrix F1, and the size of F1 is 16 ⁇ 4K.
  • the feature extractor is f(x)
  • the input matrix is F1
  • the output feature vector is F2
  • the size of F2 is 16 ⁇ K.
  • the residual connection layer connects the feature vector and the input feature with the residual.
  • g() is the activation function.
  • the activation function is, for example, the ReLU function.
  • the size of the input matrix F1 is 16 ⁇ 4K.
  • the size of the output feature vector F2 is 16 ⁇ K.
  • the size of the additional matrix M4 is 4K ⁇ K.
  • the size of the coding matrix F3 is 16 ⁇ K.
  • the 16 components of the coding matrix are coding vectors corresponding to the 16 word segmentation respectively.
  • the neural network model includes a residual connection layer. It should be understood that in other examples, the residual connection layer may not be provided.
  • the fourth step select the layer
  • the coding vector corresponding to the word segmentation that may become the candidate antecedent and the coding vector corresponding to the pronoun are retained.
  • the participles that may become candidate antecedents can include nouns, and nouns include human nouns, ground nouns, and so on.
  • the maximum number of words is m.
  • the selected coding vector output by the selection layer is F4, and the size of F4 is m ⁇ K. If the total number of candidate antecedent words and pronouns selected according to the part-of-speech information is less than m, then all 0 vectors are used to supplement, that is, zeros are added. If the total number of candidate antecedent words and pronouns selected according to the part-of-speech information exceeds m, the excess candidate antecedent words are filtered out by means of interception, for example, the excess candidate candidate words farthest from the pronoun are filtered.
  • the selection layer filters out 4 word segmentation and corresponding coding vectors.
  • the part-of-speech information just three candidate antecedents "Jay Chou”, “China” and “singer” were screened out, as well as a pronoun "he”.
  • the coding vector corresponding to each of the 4 word segmentation is filtered in the coding matrix according to the order.
  • the neural network model includes a selection layer. It should be understood that in other examples, the selection layer may not be provided.
  • the selection layer screens candidate antecedents based on part-of-speech information. It should be understood that in other examples, candidate antecedents may also be screened based on at least one of the part-of-speech information, location information, and knowledge information.
  • the splicing layer splices the coding vector corresponding to the pronoun and the coding vector corresponding to each candidate antecedent to obtain a matching vector.
  • the size of each matching vector is 1 ⁇ 2K.
  • each matching vector is spliced into a matrix F5, and the size of F5 is (m-1) ⁇ 2K.
  • the selection layer filters out 4 word segmentation and corresponding coding vectors.
  • the 3 candidate antecedents selected were “Jay Chou”, “China” and “singer", and 1 pronoun "he”.
  • the coding vector corresponding to the pronoun "he” is spliced with the coding vector corresponding to the candidate antecedents "Jay Chou”, “China” and “singer” respectively.
  • the three matching vectors respectively represent three sets of referential relations: [ ⁇ ] ⁇ ->[Jay Chou], [ ⁇ ] ⁇ ->[China], and [ ⁇ ] ⁇ ->[Singer].
  • the number of pronouns is 1 in the m participles selected. It should be understood that in other examples, the number of pronouns may also be multiple.
  • the sixth step fully connected neural network
  • the fully connected neural network scores each matching vector and obtains the scoring result of each matching vector.
  • the scoring result is positively correlated with the degree of matching between the group of referential relations. In other words, the higher the score of a group of referential relations, the greater the probability that the group of referential relations will be established.
  • a fully connected neural network can include a fully connected layer and an activation function.
  • the fully connected neural network scores 3 matching vectors, and obtains the scoring results of each group of referential relations.
  • [He] ⁇ ->[Jay Chou] This group of referential relations is scored at 0.9
  • [ ⁇ ] ⁇ ->[China] This group of referential relations is scored at 0.4
  • [ ⁇ ] ⁇ ->[Singer] this group
  • the score for referring to the relationship is 0.3.
  • FIG. 11 shows a structural block diagram of the reference resolution device provided in an embodiment of the present application. For ease of description, only those related to the embodiment of the present application are shown. part.
  • the means for decomposing includes: a first obtaining module M1101, a second obtaining module M1102, a composition module M1103, and a digesting module M1104.
  • the first obtaining module M1101 is used to obtain the target text to be referred to and resolved;
  • the second acquisition module M1102 is configured to acquire the semantic feature of the target text, and acquire at least one target feature of the part-of-speech feature, location feature, and knowledge feature of the target text;
  • the composition module M1103 is used to compose the word sense feature and the target feature into an input matrix
  • the resolution module M1104 is used to input the input matrix into the neural network model to obtain the referential resolution result.
  • the word sense feature includes a word vector matrix corresponding to the target text.
  • the second acquiring module M1102 includes a word sense feature acquiring module M11021, a part-of-speech feature acquiring module M11022, a location feature acquiring module M11023, and a knowledge feature acquiring module M11024.
  • the word sense feature obtaining module M11021 is used to obtain the word sense feature of the target text.
  • the word sense feature acquisition module M11021 is specifically used for:
  • the word vectors corresponding to each of the word segmentation are spliced into a word vector matrix.
  • the part-of-speech feature obtaining module M11022 is used to obtain the part-of-speech feature of the target text.
  • the part-of-speech feature acquisition module M11022 is specifically used for:
  • the part-of-speech information corresponding to each word segmentation is mapped into part-of-speech features.
  • the location feature acquisition module M11023 is used to acquire the location feature of the target text.
  • the position feature acquisition module M11023 is specifically used for:
  • the location information corresponding to each word segmentation is mapped into a location feature.
  • the knowledge feature obtaining module M11024 is used to obtain the knowledge feature of the target text.
  • the knowledge feature acquisition module M11024 is specifically used for:
  • the knowledge information corresponding to each word segmentation is mapped into knowledge features.
  • the neural network model includes a feature extractor and a classification sub-network.
  • the feature extractor is used to extract features of the input matrix to obtain a feature matrix, the feature matrix including feature vectors corresponding to each of the word segmentation.
  • the classification sub-network is used to obtain a referential resolution result based on the feature matrix.
  • the classification sub-network includes a splicing layer, a fully connected neural network and an output layer.
  • the splicing layer is used to splice the feature vector corresponding to each remaining participle with the feature vector corresponding to the pronoun to obtain a matching vector; the remaining participles are several participles included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network includes a residual connection layer, a splicing layer, a fully connected neural network and an output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each of the word segmentation.
  • the splicing layer is used to splice the encoding vector corresponding to each remaining word segmentation with the encoding vector corresponding to the pronoun to obtain a matching vector; the remaining word segmentation is a number of word segmentation included in the target text, except for the pronoun The participle.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network includes a selection layer, a splicing layer, a fully connected neural network and an output layer.
  • the selection layer is used to filter out the feature vector corresponding to each candidate antecedent and the feature vector corresponding to the pronoun from the feature matrix.
  • the splicing layer is used to splice the feature vector corresponding to each candidate antecedent with the feature vector corresponding to the pronoun to obtain a matching vector.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the classification sub-network includes a residual connection layer, a selection layer, a splicing layer, a fully connected neural network, and an output layer.
  • the residual connection layer is used to perform residual connection between the input matrix and the feature matrix to obtain a coding matrix.
  • the coding matrix includes coding vectors corresponding to each of the word segmentation.
  • the selection layer is used to filter out the coding vector corresponding to each candidate antecedent and the coding vector corresponding to the pronoun from the coding matrix.
  • the splicing layer is used to splice the coding vector corresponding to each candidate antecedent with the coding vector corresponding to the pronoun to obtain a matching vector.
  • the fully connected neural network is used to score each of the matching vectors.
  • the output layer outputs the referential relationship corresponding to the matching vector with the highest score as the referential resolution result.
  • the embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be realized.
  • the embodiments of the present application provide a computer program product.
  • the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM), random access memory (Random Access Memory, RAM), electric carrier signal, telecommunications signal and software distribution medium.
  • ROM read-only memory
  • RAM random access memory
  • electric carrier signal telecommunications signal and software distribution medium.
  • U disk mobile hard disk, floppy disk or CD-ROM, etc.
  • computer-readable media cannot be electrical carrier signals and telecommunication signals.
  • the disclosed electronic device and method may be implemented in other ways.
  • the electronic device embodiments described above are only illustrative.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

一种指代消解的方法及装置,适用于人工智能技术领域,所述方法包括:获取待指代消解的目标文本;获取所述目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征中的至少一个目标特征;将所述词义特征和所述目标特征组成输入矩阵;将所述输入矩阵输入神经网络模型,得到指代消解结果。所述方法增加了神经网络模型输入数据的信息种类,从而提高了指代消解结果的准确性。

Description

指代消解的方法、装置及电子设备
本申请要求于2020年2月24日提交国家知识产权局、申请号为202010113756.2、发明名称为“指代消解的方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于自然语言处理技术领域,尤其涉及一种指代消解的方法、装置及电子设备。
背景技术
自然语言是人类智慧的结晶。虽然自然语言处理是人工智能中最为困难的问题之一,但是对自然语言处理的研究一直都是热点。
指代作为一种常见的语言现象,广泛存在于自然语言表达中。然而,指代增加了自然语言处理的难度。指代消解是指明确代词与先行词的指代关系的任务。指代消解对信息抽取、对话系统、机器翻译以及机器阅读理解等自然语言处理的应用场景都有极为重要的支撑作用。例如,指代消解用于对话系统中,可以将代词替换为所对应的先行词,从而提高对话意图识别与要素抽取的准确性。
指代消解一般来说包括两种:显性代词消解和零代词消解。显性代词消解是指在表达中确定显式代词指向哪个名词短语。零代词消解是针对零指代现象的一类特殊的消解,零代词消解根据上下文关系推断出省略部分,即零代词,指代前文哪个语言学单位。本申请文件所述指代消解均指显性代词消解。
传统的指代消解技术是根据句法分析、词性标注和实体抽取,并结合人工规则集,对代词进行消解,这种方法费时费力,不具有泛化能力。
近年来,随着人工智能和深度学习技术的不断突破,很多自然语言处理任务逐渐采用深度学习架构来处理。与传统技术不同的是,深度学习方法是使用神经网络架构,通过大量语料进行训练,学习到词与词之间的语义相关程度,根据相关程度对代词进行消解。
发明内容
本申请实施例提供了指代消解的方法及装置,可以解决相关技术中指代消解的准确度不够的问题。
第一方面,本申请实施例提供了一种指代消解的方法,该方法包括:获取需要进行指代消解的目标文本后,获取所述目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征中的至少一个;再将获取到的不同种类的特征组成输入矩阵后输入神经网络模型,得到指代消解结果。
第一方面的实施例中,输入神经网络模型的信息,除了词义特征外,还增加了词性特征,位置特征和知识特征中的至少一个,增加了神经网络模型输入数据的信息种类,从而提高了指代消解结果的准确度。
在第一方面的一种可能的实现方式中,所述词义特征包括所述目标文本对应的词向量矩阵。
在第一方面的一种可能的实现方式中,所述获取所述目标文本的词义特征,包括:
将每个分词转换成词向量,所述目标文本包括若干个所述分词;
将各个所述分词对应的词向量拼接成词向量矩阵。
在第一方面的一种可能的实现方式中,获取所述目标文本的词性特征,包括:
获取每个分词对应的词性信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述词性信息,映射成词性特征。
在第一方面的一种可能的实现方式中,获取所述目标文本的位置特征,包括:
获取每个分词对应的位置信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述位置信息,映射成位置特征。
在第一方面的一种可能的实现方式中,获取所述目标文本的知识特征,包括:
获取每个分词对应的知识信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述知识信息,映射成知识特征。
在第一方面的一种可能的实现方式中,所述神经网络模型包括特征抽取器和分类子网络。
所述特征抽取器用于提取所述输入矩阵的特征,获得特征矩阵,所述特征矩阵包括各个所述分词对应的特征向量。
所述分类子网络用于基于所述特征矩阵获得指代消解结果。
作为第一方面的第一个示例,所述分类子网络包括拼接层,全连接神经网络和输出层。
所述拼接层用于将每个其余分词对应的特征向量,与代词对应的特征向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
作为第一方面的第二个示例,所述分类子网络包括残差连接层,拼接层,全连接神经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。所述编码矩阵包括各个所述分词对应的编码向量。
所述拼接层用于将每个其余分词对应的编码向量,与代词对应的编码向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
第二个示例相比于第一个示例,分类子网络增加了残差连接层,通过残差连接层使神经网络模型更快收敛,提高神经网络模型的训练效率。
作为第一方面的第三个示例,所述分类子网络包括选择层,拼接层,全连接神经网络和输出层。
所述选择层用于从所述特征矩阵中筛选出每个候选先行词对应的特征向量,和代词对应的特征向量。
所述拼接层用于将每个所述候选先行词对应的特征向量,与所述代词对应的特征向量进行拼接,获得匹配向量。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
第三个示例相比于第一个示例,分类子网络增加了选择层,通过选择层过滤掉了明显与指代结果不相关的特征向量,保留了相关度高的特征向量。一方面,减少了计算量,提高方案整体效率;另一方面,提高了指代消解结果的准确性。
作为第一方面的第四个示例,所述分类子网络包括残差连接层,选择层,拼接层,全连接神 经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。所述编码矩阵包括各个所述分词对应的编码向量。
所述选择层用于从所述编码矩阵中筛选出每个候选先行词对应的编码向量,和代词对应的编码向量。
所述拼接层用于将每个所述候选先行词对应的编码向量,与所述代词对应的编码向量进行拼接,获得匹配向量。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
第二方面,本申请实施例提供了一种指代消解的装置,包括:第一获取模块,第二获取模块,组成模块和消解模块。
其中,所述第一获取模块,用于获取待指代消解的目标文本;
所述第二获取模块,用于获取所述目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征中的至少一个目标特征;
所述组成模块,用于将所述词义特征和所述目标特征组成输入矩阵;
所述消解模块,用于将所述输入矩阵输入神经网络模型,得到指代消解结果。
在第二方面的一种可能的实现方式中,所述词义特征包括所述目标文本对应的词向量矩阵。
在第二方面的一种可能的实现方式中,所述第二获取模块包括词义特征获取模块,词性特征获取模块,位置特征获取模块和知识特征获取模块。
所述词义特征获取模块,用于获取所述目标文本的词义特征。
在第二方面的一种可能的实现方式中,所述词义特征获取模块,具体用于:
将每个分词转换成词向量,所述目标文本包括若干个所述分词;
将各个所述分词对应的词向量拼接成词向量矩阵。
所述词性特征获取模块,用于获取所述目标文本的词性特征。
在第二方面的一种可能的实现方式中,,所述词性特征获取模块,具体用于:
获取每个分词对应的词性信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述词性信息,映射成词性特征。
所述位置特征获取模块,用于获取所述目标文本的位置特征。
在第二方面的一种可能的实现方式中,所述位置特征获取模块,具体用于:
获取每个分词对应的位置信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述位置信息,映射成位置特征。
所述知识特征获取模块,用于获取所述目标文本的知识特征。
在第二方面的一种可能的实现方式中,所述知识特征获取模块,具体用于:
获取每个分词对应的知识信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述知识信息,映射成知识特征。
在第二方面的一种可能的实现方式中,所述神经网络模型包括特征抽取器和分类子网络。
所述特征抽取器用于提取所述输入矩阵的特征,获得特征矩阵,所述特征矩阵包括各个所述分词对应的特征向量。
所述分类子网络用于基于所述特征矩阵获得指代消解结果。
作为第二方面的第一个示例,所述分类子网络包括拼接层,全连接神经网络和输出层。
所述拼接层用于将每个其余分词对应的特征向量,与代词对应的特征向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
作为第二方面的第二个示例,所述分类子网络包括残差连接层,拼接层,全连接神经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。所述编码矩阵包括各个所述分词对应的编码向量。
所述拼接层用于将每个其余分词对应的编码向量,与代词对应的编码向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
第二个示例相比于第一个示例,分类子网络增加了残差连接层,通过残差连接层使神经网络模型更快收敛,提高神经网络模型的训练效率。
作为第二方面的第三个示例,所述分类子网络包括选择层,拼接层,全连接神经网络和输出层。
所述选择层用于从所述特征矩阵中筛选出每个候选先行词对应的特征向量,和代词对应的特征向量。
所述拼接层用于将每个所述候选先行词对应的特征向量,与所述代词对应的特征向量进行拼接,获得匹配向量。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
第三个示例相比于第一个示例,分类子网络增加了选择层,通过选择层过滤掉了明显与指代结果不相关的特征向量,保留了相关度高的特征向量。一方面,减少了计算量,提高方案整体效率;另一方面,提高了指代消解结果的准确性。
作为第二方面的第四个示例,所述分类子网络包括残差连接层,选择层,拼接层,全连接神经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。所述编码矩阵包括各个所述分词对应的编码向量。
所述选择层用于从所述编码矩阵中筛选出每个候选先行词对应的编码向量,和代词对应的编码向量。
所述拼接层用于将每个所述候选先行词对应的编码向量,与所述代词对应的编码向量进行拼接,获得匹配向量。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
第三方面,本申请实施例提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,使得所述电子设备实现如第一方面和第一方面可能的实现方式中任一所述的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面和第一方面可能的实现方式中任一所述的方法。
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述第一方面和第一方面可能的实现方式中任一所述的方法。
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述。
附图说明
图1是现有技术提供的一种基于端到端神经网络的指代消解方法的流程示意图;
图2是现有技术提供的一种中文代词消解模型建立方法的流程示意图;
图3是本申请实施例提供的指代消解的方法的一应用场景;
图4是本申请实施例提供的指代消解的方法的另一应用场景;
图5是本申请实施例提供的指代消解的方法所适用于的电子设备的结构示意图;
图6A是本申请一实施例提供的指代消解的方法的一流程示意图;
图6B是本申请一实施例提供的指代消解的方法的另一流程示意图;
图7是本申请一实施例提供的指代消解的方法中使用的映射网络的示意图;
图8是本申请另一实施例提供的指代消解的方法的流程示意图;
图9是本申请另一实施例提供的指代消解的方法的流程示意图;
图10是本申请另一实施例提供的指代消解的方法的流程示意图;
图11是本申请一实施例提供的一种指代消解的装置的结构示意图;
图12是本申请一实施例提供的一种指代消解的装置中第二获取模块的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。
还应当理解,在本申请实施例中,“若干个”和“一个或多个”是指一个、两个或两个以上;“和/或”,描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
如在本申请说明书和所附权利要求书中所使用的那样,术语“若”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
近年来,随着人工智能和深度学习技术的不断突破,很多自然语言处理任务逐渐采用深度学习架构来处理。先通过两个现有技术的例子,对基于深度学习架构的指代消解方法进行举例说明。
第一个例子,为一种基于端到端神经网络的指代消解方法。
如图1所示,第一个例子的技术方案包括以下步骤:
1.在海量数据集上训练得到词向量(embedding),即图1中的特征embedding。
2.通过双向长短期记忆网络(Bi-directional Long Short-Term Memory,BiLSTM)得到每个词的编码向量。
3.通过注意力机制,将跨度(span)进行级联,得到span的分布式表达。
4.经过全连接神经网络,将所有词语进行配对,计算不同的分布式表达对的得分,若得分超过一定的阈值,则认为具有指代关系。
第一个例子的缺点在于:一方面,输入特征单一,即输入词向量,没有利用到语句的更多信息;另一方面,对语句中的所有词进行配对打分,存在许多无用计算,算力成本高。
第二个例子,为一种中文代词消解模型建立方法。
如图2所示,第二个例子的技术方案主要包括以下步骤:
1.使用word2vec(word to vector)方法在大规模数据集上训练得到词向量(embedding);
2.将词向量作为输入特征,经过长短期记忆网络(Long Short-Term Memory,LSTM),将每个词向量映射到一维数字,组合形成消解向量(或称为句子向量);
3.根据设定阈值,对消解向量进行排序,提取向量中最大元素以及次大元素,其所对应的词语作为具有指代关系的词语对。
第二个例子的缺点在于:一方面,输入LSTM的特征仅仅是使用word2vec得到的词向量,没有利用到语句的更多信息;另一方面,词向量通过LSTM进行编码后,映射成一维数字,丢失了大量信息;再一方面,没有考虑到词向量之间的相互关系。
本申请考虑到现有技术方案的不足,提出一种结合多种信息的神经网络指代消解方法。具体地,基于词语的词向量信息,还有词性信息,位置信息,和知识信息这三种信息中的至少一种,通过神经网络模型进行指代消解,增加了输入数据的信息种类,提高了指代消解结果的准确度。
为了说明本申请的技术方案,下面通过具体实施例来进行说明。
先通过两个非限制示例对本申请实施例的应用场景进行举例说明。
图3所示,为本申请实施例提供的指代消解的方法的第一个应用场景示意图。
如图3所示,第一个应用场景为人机对话的应用场景。该应用场景包括用户31和用户设备32。用户设备32部署有人机对话系统,例如语音助手等。
用户31通过例如语音输入唤醒词或执行预设用户操作等方式,唤醒用户设备32的语音助手 后,语音输入文本。文本包括但不限于关键词、语句等。用户设备32通过语音助手输出与输入文本对应的结果。
作为一非限制性示例,用户31通过唤醒词唤醒用户设备32的语音助手后,语音输入“介绍一下周杰伦”。用户设备32语音助手的用户界面,显示某一搜索引擎下针对“介绍一下周杰伦”的搜索结果,或者显示某一应用针对“介绍一下周杰伦”的搜索结果。例如,用户设备32语音助手的用户界面,显示某一搜索引擎的搜索结果:“周杰伦是中国著名的歌手。”。
若用户32接着语音输入“他的老婆是谁”,语音助手可以识别用户语音对应的文本为“他的老婆是谁?”。但是语音助手若没有人类的理解能力,就无法确定“他”是指什么,因而无法输出正确的结果。
为了使用户设备32的语音助手理解文本,准确地输出文本对应的结果。本申请实施例提供一种指代消解方法,该方法可以应用于用户设备。本申请实施例可以使用户设备具有对文本进行自然语言处理,以实现指代消解的能力。
用户设备32获取人机对话过程的历史文本构成目标文本。例如,将“周杰伦是中国著名的歌手。”和“他的老婆是谁?”这两个历史文本组合成目标文本:“周杰伦是中国著名的歌手。他的老婆是谁?”;又如,将“介绍一下周杰伦”、“周杰伦是中国著名的歌手。”和“他的老婆是谁?”这三个历史文本组合成目标文本:“介绍一下周杰伦。周杰伦是中国著名的歌手。他的老婆是谁?”。然后,用户设备32针对目标文本进行指代消解,指代消解完成后,电子设备32能知晓用户31输入的文本“他的老婆是谁?”中的“他”指的是“周杰伦”,从而针对“周杰伦的老婆是谁?”输出准确的结果。
第一个应用场景中的用户设备包括但不限于手机、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、智能音箱、电视机顶盒(set top box,STB)或电视等,本申请实施例对用户设备的具体类型不作任何限制。
此外,在其他人机对话的应用场景中,用户在唤醒语音助手后,可以通过语音输入用户指令。用户设备通过语音助手理解用户语音指令后,执行该用户语音指令。
例如,用户唤醒用户设备的语音助手后,用户语音输入“找到李雷的电话号码,帮我给他打个电话”。语音助手可以识别用户语音对应的文本为“找到李雷的电话号码,帮我给他打个电话”。采用本申请的指代消解方法,可以使用户设备具有对文本进行自然语言处理,以实现指代消解的能力。用户设备能知晓用户输入的文本“找到李雷的电话号码,帮我给他打个电话”中的“他”指的是“李雷”,从而针对“找到李雷的电话号码,帮我给李雷打个电话”这一用户指令,做出准确的响应。用户设备的语音助手在联系人列表中查找李雷的电话号码,并进行拨号。
应理解,上述示例仅为示例性描述,并不能解释为对本申请的限制。更一般地,用户设备可以通过理解用户语音指令并响应,本申请对用户语音指令不做具体限定。
接着介绍本申请实施例提供的指代消解的方法的第二个应用场景。
图4所示,为本申请实施例提供的指代消解的方法的第二个应用场景示意图。第二个应用场景为信息抽取系统的应用场景。该应用场景包括两个用户设备和一个服务器43。两个用户设备分别为第一用户设备41和第二用户设备42。第一用户设备41和第二用户设备42分别与服务器43通过无线通信网络进行通信。服务器43部署有信息抽取系统。
服务器43获取第一用户设备41和/或第二用户设备42发送的文本。对从第一用户设备41或第二用户设备42获取到的文本分别进行信息抽取,获得文本对应的知识表达。
作为一非限制性示例,由于文本中会存在海量的指代表达,从海量数据中构建大规模知识库。例如,服务器43从第一用户设备41获取到文本“特朗普生于纽约,他是美国第45任总统。”。
但是服务器43若没有人类的理解能力,就无法确定文本中的“他”是指什么,因而无法获得正确的知识表达结果。
为了使服务器43理解文本,准确地输出文本对应的知识表达结果。本申请实施例提供一种指代消解方法,该方法可以应用于服务器。本申请实施例可以使服务器具有对文本进行自然语言处理,以实现指代消解的能力。服务器43针对文本进行指代消解后,能知晓文本“特朗普生于纽约,他是美国第45任总统。”中的“他”指的是“特朗普”,从而针对“特朗普生于纽约,特朗普是美国第45任总统。”输出准确的知识表达结果。
例如,服务器43对文本经过指代消解的处理,抽取出知识表达为:特朗普,是,第45任美国总统。应理解,此处示例为三元组的知识表达结构。
第二个应用场景中,服务器包括但不限于独立的服务器、分布式服务器、服务器集群、或云服务器等,本申请实施例对服务器的具体类型不作任何限制。
无线通信网络包括但不限于无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络)、紫蜂协议(ZigBee)、蓝牙(bluetooth,BT)、全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE))、第五代移动通信网络(5th generation mobile networks,5G)和未来采用的通信网络等。
需要说明的是,在其他应用场景中,服务器或用户设备等电子设备还可以对本地存储器中存储的文本进行指代消解。在这些应用场景中,电子设备不需要与其他设备进行交互以获取待进行指代消解的文本。
本申请实施例提供的指代消解方法可以应用于用户终端或服务器等电子设备上,本申请实施例对电子设备的具体类型不作任何限制。
图5示出了电子设备100的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器 (application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在 一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括WLAN(如Wi-Fi),BT,全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括GSM,GPRS,CDMA,WCDMA,时分码分多址(time-division code division multiple access,TD-SCDMA),LTE,BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或M个摄像头193,M为大于1的整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码 器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例 中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。 可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
接下来介绍本申请实施例提供的一种指代消解方法的实现流程。
图6A和图6B示出了本申请实施例提供的一种指代消解的方法的实现流程图。所述指代消解的方法适用于需要对文本进行指代消解的情形。所述指代消解的方法应用于电子设备,可由配置于电子设备的指代消解的装置执行。所述指代消解的装置可由电子设备的软件、硬件、或软件和硬件的组合实现。作为示例而非限定,该指代消解的方法可以应用于图3所示的用户终端中,还可以应用于图4所示的服务器中,还可以应用于具有图5所示硬件结构的电子设备中。如图6A所示,所述指代消解的方法包括步骤S610至步骤S640,各个步骤的具体实现原理如下。
S610,获取待指代消解的目标文本;
在本申请实施例中,目标文本为待指代消解的对象。例如,句子文本。
目标文本可以为电子设备即时获取到的文本,还可以为与电子设备通信耦合的存储器中存储的文本,还可以为从其他电子设备获取的文本。其中,电子设备通信耦合的存储器包括电子设备的内部存储器或外部存储器。
在本申请实施例的非限制性示例中,目标文本可以是用户通过电子设备的输入单元,例如按键或触摸显示屏即时输入的文本;还可以是用户通过电子设备的音频采集单元,例如麦克风即时采集到的语音数据;还可以是用户通过电子设备的摄像头即时拍摄到的包括文本的图片;还可以是用户通过电子设备的扫描装置即时扫描到的包括文本的图片;还可以是存储在与电子设备通信 耦合的存储器中的文本;还可以是电子设备通过有线或无线通信网络从其它电子设备处所获取到的文本等。
需要说明的是,针对包括文本的图片,需要通过启用电子设备的图片识别功能提取图片中的文本作为目标文本。针对语音数据,需要通过启动电子设备的音频转文字功能识别语音数据中的文本作为目标文本。
S620,获取所述目标文本的词义特征,词性特征,位置特征和知识特征;
在本申请实施例中,首先对输入的目标文本进行分词与词性标注。然后获取目标文本的各个分词对应的词义特征,词性特征,位置特征和知识特征。
例如,结合图6B所示,将目标文本进行分词处理,得到目标文本包括的若干个分词。再将各个分词映射为固定长度的四个向量,分别为embedding,词性向量,位置向量,知识向量。Embedding表示分词的词义特征,词性向量表示分词的词性特征,位置向量表示分词的位置特征,知识向量表示分词的知识特征。
又如,在进行分词处理后,可以先去除停用词和/或非特征词等,再得到目标文本包括的若干个分词。
在本申请实施例中,可以通过词向量,或称词嵌入(word embedding)模型将目标文本的分词表达成embedding。创建词嵌入模型的方法包括但不限于Word2Vec、LSA(Latent Semantic Analysis)、Glove(Global Vectors for Word Representation)、fastText、ELMo(Embeddings from Language Models)、GPT(Generative Pre-Training)或BERT(Bidirectional Encoder Representation from Transformers)等。本申请实施例通过词向量模型,把真实世界抽象存在的文本转换成可以进行数学公式操作的向量。将数据处理成可由机器处理的数据,使得本申请能够实施。
以采用Word2Vec方法为例进行说明,在大规模语料上进行预先训练,得到每个词对应的词向量。从而建立了词向量数据库,词向量数据库中存储了各个词与词向量的对应关系。通过查找对应关系,就可以获取到目标文本包括的各分词所对应的词向量。例如,目标文本包括T个分词,每个分词对应一个长度为K的词向量。T和K为大于1的整数。
可选地,获取所述目标文本的词性特征,包括:
获取每个分词对应的词性信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述词性信息,映射成词性特征。
其中,在对目标文本进行分词的同时可以进行词性标注,可以获得目标文本包括的分词的词性信息。将每个分词的词性信息映射成词性特征。词性特征可以用词性向量来表示。
作为一非限制性示例,根据词性标注的结果,将名词标识为1;代词标识为2,其他词性的标识为0。不同数字用于区分不同的词性。然后再将词性的标识结果通过映射网络映射成固定长度的词性向量,即映射成词性特征。也就是说,参见图7所示,通过映射网络将一维的词性信息映射成多维的词性向量。可选地,映射网络可以为单层全连接层。
由于通常情况下,代词指代的对象为名词。因而在本示例中,可以将名词标识为1,作为候选先行词,将代词和名词外的词性标识为0。通过这种设置,一方面,减少了计算的复杂度,节省了算力成本。另一方面,降低了相关程度低词性的分词对指代消解结果的影响,进一步提高了结果的准确性。
作为另一非限制性示例,根据词性标注的结果,将不同的词性用不同的数字进行标识。本示 例中,不是将代词和名词外的词性均标识为0,而是每种不同的词性用不同的数字进行标识。也就是说,用不同的数字标识不同的词性信息。本示例的其余过程与前一示例类似,此处不再赘述。
作为另一非限制性示例,由于代词通常情况下是用于指代代词前面出现的分词,即代词指代的是先行词。因而,本示例在前两个示例的基础上,将代词之后的分词的词性信息标识为0。通过这种设置,一方面,减少了计算的复杂度,节省了算力成本。另一方面,降低了相关程度低的分词对指代消解结果的影响,进一步提高了结果的准确性。
需要说明的是,此处不同词性对应的数字还可以采用其他数字;词性信息可以为一位数,还可以为二位数,甚至是更多位数;对词性特征或词性向量的长度不作具体限定。应理解,此处三个示例不能解释为对本申请的具体限制。
可选地,获取所述目标文本的位置特征,包括:
获取每个分词对应的位置信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述位置信息,映射成位置特征。
在本申请一些实施例中,获取目标文本包括的每个分词的位置信息。位置信息可以用分词与代词之间的距离来表示,此处距离可以为词距,也可以为与词距正相关的量。例如,对各个分词进行排序,以代词为中心,每个分词与代词之间间隔的分词的数量记为距离;又如,对每个分词进行排序,每个分词与代词之间间隔的分词的数量与单个分词所占长度的乘积记为距离。然后,将每个分词的位置信息映射成位置特征。位置特征可以用位置向量来表示。
作为一非限制性示例,以代词为中心,获取各分词与代词之间的距离。例如,每个分词所占长度可以记为1,也可以记为2或其他数字。一个标点符号所占长度可以记为0,也可以记为1或其他数字。然后再将每个分词与代词的距离通过映射网络映射成固定长度的位置向量,即位置特征。也就是说通过映射网络将一维的位置信息映射成多维的位置向量。可选地,映射网络可以为单层全连接层。
在本申请另一些实施例中,在对目标文本进行分词的同时可以进行词性标注,获取目标文本包括的名词的位置信息。位置信息可以用名词与代词之间的距离来表示。然后,将每个名词的位置信息映射成位置特征,即位置向量。应理解,除名词外的其他词性的分词,位置向量为全0向量。
作为一非限制性示例,根据词性标注的结果,以代词为中心,获取每个名词与代词之间的距离。例如,每个分词所占长度可以记为1,也可以记为2或其他数字。一个标点符号所占长度可以记为0,也可以记为1或其他数字。然后再将每个名词与代词的距离通过映射网络映射成固定长度的位置向量,即位置特征。也就是说通过映射网络将一维的位置信息映射成多维的位置向量。可选地,映射网络可以为单层全连接层。
如前所述,由于代词通常情况下是用于指代代词前面出现的分词,即代词指代的是先行词。因而,在本申请另一些实施例中,在前述两种实施例的基础上,将代词之后的分词或名词的位置信息标识为0。
需要说明的是,对每个分词或标点符号所占长度不作具体限定;位置信息还可以为一位数,还可以为二位数,甚至是更多位数;对位置特征或位置向量的长度不作具体限定。应理解,此处两个示例不能解释为对本申请的具体限制。
可选地,获取所述目标文本的知识特征,包括:
获取每个分词对应的知识信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述知识信息,映射成知识特征。
其中,去预先设置的知识库查找名词,确定目标文本包括的各名词的实体结果,然后,将名词的实体结果与该名词后面的代词进行匹配,对不同的匹配结果进行对应的标识,以获得知识信息。然后再将知识信息通过映射网络映射成固定长度的知识向量,即知识特征。可选地,映射网络可以为单层全连接层。
名词的实体结果包括但不限于:人物实体、地点实体、物体实体、动物实体、组织机构、或不存在等等。根据名词的实体结果与代词进行匹配的匹配结果,进行对应的标识,就得到知识信息。匹配结果包括但不限于:匹配成功、不匹配、或其他匹配结果等。其他匹配结果包括无法确定是匹配成功还是不匹配,或者没有返回匹配结果等。知识信息,例如,1表示匹配成功,0表示不匹配或者其他匹配结果。
作为本申请一非限制性示例,若根据词性标注结果确定代词是人称代词,例如:他(们)或她(们)。在知识库中查找名词得到的实体结果是人物实体。此时,确定名词的实体结果与代词匹配成功。把名词的知识信息标注为1。同理,若根据词性标注结果确定代词是指示代词,例如:这里或那里。在知识库中查找名词得到的实体结果是地点实体或组织机构。此时,确定名词的实体结果与代词匹配成功,把名词的知识信息标注为1。同理,若根据词性标注结果确定代词是“它”。在知识库中查找名词得到的实体结果是物体实体或动物实体。此时,确定名词的实体结果与代词匹配成功,把名词的知识信息标注为1。除了匹配成功外的匹配结果,即不匹配或其他匹配结果,知识信息均标识为0。
如前所述,由于代词通常情况下是用于指代代词前面出现的分词,即代词指代的是先行词。因而,在本申请另一些实施例中,在前述两种实施例的基础上,将代词之后的分词或名词的知识信息标识为0。
需要说明的是,对用于标识知识信息的数字不作具体限定;知识信息可以为一位数,还可以为二位数,甚至是更多位数;对知识特征或知识向量的长度不作具体限定。应理解,此处示例不能解释为对本申请的具体限制。
应理解,将词性信息映射成词性特征(词性向量)的映射网络,将位置信息映射成位置特征(位置向量)的映射网络,以及将知识信息映射成知识特征(知识向量)的映射网络,这三个映射网络的结构和/或参数可以相同,也可以不相同,本申请对此不予限制。
S630,将所述词义特征、词性特征、位置特征和知识特征组成输入矩阵;
其中,将目标文本包括的各分词对应的词义特征、词性特征、位置特征和知识特征进行拼接,得到输入矩阵。
需要说明的是,当目标文本包括的分词不足预设最大长度,则需要用0向量补齐。当目标文本包括的分词大于预设最大长度,则需要将超量的分词去除。
例如,可以截取前预设最大长度个分词,也可以截取后预设长度个分词,本申请对此不予限制。
S640,将所述输入矩阵输入神经网络模型,得到指代消解结果。
在本申请实施例中,神经网络模型为训练后的神经网络模型。神经网络模型用于对文本进行指代消解,获得指代消解结果。
特征抽取器和分类子网络均为以人工智能中机器学习技术为基础的神经网络模型。特征抽取器和分类子网络可以一同进行训练,以获得用于指代消解的神经网络模型。
本申请实施例对神经网络模型的结构不作具体限制。例如,卷积神经网络(Convolutional Neural Networks,CNN)、循环神经网络(Recurrent Neural Network,RNN)、BiLSTM或Transfomer网络等。
需要说明的是,当电子设备为用户设备时,神经网络模型的训练过程可以在其他电子设备例如云服务器等实现。当电子设备为服务器时,神经网络模型的训练过程可以在服务器本地实现,还可以在与服务器通信的其他电子设备上实现。当电子设备在本地训练神经网络模型,或者从其他电子设备获取训练后的神经网络模型,并部署经过训练的神经网络模型后,可以在电子设备实现对目标文本的指代消解。
在步骤S640,利用神经网络模型对输入矩阵进行特征提取,获得每个分词对应的特征向量。再将代词对应的特征向量和除代词外的其余各个分词对应的特征向量拼接成匹配向量,每个匹配向量用于表示一组指代关系。最后对各组指代关系进行打分,从而获得各组指代关系的打分结果。每个打分结果与该组指代关系之间的匹配程度正相关,也就是说,打分结果反映了代词与同组的分词之间构成指代关系的匹配程度。神经网络模型可以输出至少一个打分结果。神经网络模型输出的打分结果的数量视神经网络模型的具体结构而定,本申请对此数量不予限定。
在一些实施例中,神经网络模型输出N个打分结果,N为等于或大于1的整数。在一些实施例中,神经网络模型在获取到各组匹配向量的的打分结果后,根据打分结果是否超过预设阈值判断每组的代词与分词是否构成指代关系,最后输出打分结果超过预设阈值的指代关系。在一些实施例中,神经网络模型选择打分最高的一组指代关系作为目标文本的指代消解结果。
在本申请一些实施例中,神经网络模型输出的,各组指代关系的打分之和为1。在本申请一些实施例中,神经网络模型输出的,各组指代关系的打分之和不为1。和值是否为1,视神经网络模型的输出层是否进行归一化而定,本申请对此不予限制。
在图6B所示示例中,神经网络模型包括特征抽取器和分类子网络。
其中,所述特征抽取器用于提取所述输入矩阵的特征,获得特征矩阵,所述特征矩阵包括各个所述分词对应的特征向量。所述分类子网络用于基于所述特征矩阵获得指代消解结果。
在图6B所示示例中,所述分类子网络包括:拼接层、全连接神经网络和输出层。
所述拼接层用于将每个其余分词对应的特征向量,与代词对应的特征向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。每个打分结果与该匹配向量对应的指代关系之间的匹配程度正相关。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
可选地,在图6A和图6B所示实施例的基础上,在其他一些实施例中,还可以由词义特征和词性特征组成输入矩阵;在其他一些实施例中,还可以由词义特征和位置特征组成输入矩阵;在其他一些实施例中,还可以由词义特征和知识特征组成输入矩阵;在其他一些实施例中,还可以由词义特征、词性特征和位置特征组成输入矩阵等。
更一般地,在本申请实施例中,获取目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征这三个特征中的至少一个目标特征;再将词义特征和目标特征组成输入矩阵;最后基于输入矩阵得到指代消解结果。
在本申请实施例中,除了利用词义特征,还联合了其他特征,例如词性特征、位置特征或知识特征,进行指代消解。由于增加了用于指代消解的信息的种类,提高了指代消解结果的准确度。
可选地,在图6B所示实施例的基础上,在其他一些实施例中,如图8所示,所述分类子网络还可以包括:选择层。也就是说,在这些实施例中,分类子网络包括:选择层,拼接层,全连接神经网络和输出层。
所述选择层用于从特征抽取器输出的特征矩阵中筛选出候选先行词对应的特征向量,和代词对应的特征向量。
在一些实施例中,可以先通过词性信息、位置信息和知识信息中的至少一种信息筛选候选先行词,然后从特征矩阵中筛选候选先行词对应的特征向量。
作为一非限制性示例,如前所述,由于代词通常指代代词前面出现的分词,因而可以利用位置信息筛选出代词前的分词作为候选先行词。
作为另一非限制性示例,如前所述,由于代词通常指代名词,因而可以利用词性信息筛选分词中的名词作为候选先行词。
作为另一非限制性示例,由于代词通常指代知识信息属于实体且与代词匹配的分词,可以利用知识信息筛选属于实体且与代词匹配的分词作为候选先行词。例如,筛选知识信息为1的分词作为候选先行词。
作为一非限制性示例,如前所述,由于代词通常指代代词前面出现的名词,因而可以利用词性信息和位置信息,筛选出代词前的名词作为候选先行词。
作为另一非限制性示例,可以利用词性信息和知识信息,筛选属于实体且与代词匹配的名词作为候选先行词。例如,筛选知识信息为1,且词性信息为1的分词作为候选先行词。
作为另一非限制性示例,可以利用位置信息和知识信息,筛选代词前面的,属于实体且与代词匹配的分词作为候选先行词。
作为另一非限制性示例,可以利用词性信息、位置信息和知识信息,筛选代词前面的,属于实体且与代词匹配的名词作为候选先行词。
在一些实施例中,可以预先设置选择层筛选出的候选先行词的数量,也就是说,可以对后续拼接层的匹配向量的数量进行限制。数量为经验值,需要权衡计算量和结果准确性等实际情况,对此数量不做具体限制。
拼接层用于将每个候选先行词对应的特征向量,与代词对应的特征向量进行拼接,获得匹配向量。其他层与图6B所示示例相同,请参见前述。
在图6B的示例中,并未设置选择层,需要拼接各个先行词与代词对应的特征向量,以获得匹配向量。而在图8的示例中,通过设置选择层,筛选出候选先行词以及对应的特征向量,在后续的拼接层,只需要拼接候选先行词与代词对应的特征向量,以获得匹配向量。一方面,减少了匹配向量的数量,也就减少了打分的运算复杂度,节约了算力成本,提高了效率。另一方面,由于对各先行词进行筛选,筛选出与指代关系相关度高的候选先行词,过滤掉了不相关的词语,提高了指代消解结果的准确性。
可选地,在图6B或图8所示实施例的基础上,在其他一些实施例中,所述分类子网络还可以包括:残差连接层。此处以在图8所示实施例的基础上增设残差连接层为例,如图9所示,在这些实施例中,分类子网络包括:残差连接层,选择层,拼接层,全连接神经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。
其中,所述编码矩阵包括各个分词对应的编码向量。
此时,选择层用于从编码矩阵中筛选出候选先行词对应的编码向量,和代词对应的编码向量。
拼接层用于将每个候选先行词对应的编码向量,与代词对应的编码向量进行拼接,获得匹配向量。
在图9的示例中,通过增设残差连接层,使神经网络模型实现更快地收敛,提高神经网络模型训练效率。
接下来以第一个应用场景中的目标文本“介绍一下周杰伦。周杰伦是中国著名歌手。他的老婆是谁?”为例,对本申请实施例提供的一种指代消解方法的实现流程做详细介绍,如图10所示。应理解,此处的目标文本的具体内容仅为示例性描述,不能解释为对本申请的具体限制。
如图10所示,对目标文本进行指代消解的过程如下:
第一步,输入层
对目标文本“介绍一下周杰伦。周杰伦是中国著名歌手。他的老婆是谁?”进行分词和词性标注,得到目标文本包括的按序排列的16个分词:“介绍”、“一下”、“周杰伦”、“。”、“周杰伦”、“是”、“中国”、“著名”、“歌手”、“。”、“他”、“的”、“老婆”、“是”、“谁”和“?”。应理解,此示例未去除非特征词,例如标点符号。在其他示例中,还可以去除停用词等。
通过词向量模型,得到16个分词各自对应的词向量(embedding)。
例如,第i个分词对应的词向量例如e i,i的取值为数值区间[1,16]内的整数。假设词向量的长度为K。将16个分词各自对应的词向量拼接成词向量矩阵,得到E=[e 1;e 2;e 3;...;e 16],即E大小为16×K。
根据16个分词各自的词性信息,将各个词性信息通过第一映射网络映射成第一预设长度的词性特征。
例如,假设第一映射网络的映射矩阵为M1,M1大小为1×K,第一预设长度为K。16个分词的词性信息拼接成的矩阵为I1,I1的大小为16×1。通过第一映射网络后,目标文本的词性特征为S=I1·M1,S的大小为16×K。
根据16个分词各自的位置信息,将各个位置信息通过第二映射网络映射成第二预设长度的位置特征。
例如,假设第二映射网络的映射矩阵为M2,M2大小也为1×K,第二预设长度为K。16个分词的位置信息拼接成的矩阵为I2,I2的大小为16×1。通过第二映射网络后,目标文本的位置特征为P=I2·M2,P的大小为16×K。
根据16个分词各自的知识信息,将各个知识信息通过第三映射网络映射成第三预设长度的知识特征。
例如,假设第三映射网络的映射矩阵为M3,M3大小也为1×K,第三预设长度为K。16个分词的位置信息拼接成的矩阵为I3,I3的大小为16×1。通过第三映射网络后,目标文本的知识特征为Q=I3·M3,Q的大小为16×K。
在本示例中,神经网络模型的输入最大长度T刚好为16个分词,不需要对目标文本的分词进行截取或补零。
第二步,通用特征抽取器
将输入矩阵输入特征抽取器进行特征提取,获得特征向量。
特征抽取器采用CNN,RNN,BiLSTM或Transfomer均可,可以根据需要灵活选择,提高本申请的适用性。
例如,输入矩阵为四个矩阵E,S,P和Q进行拼接而成,得到矩阵F1,F1的大小为16×4K。假设特征抽取器为f(x),输入矩阵是F1,输出的特征向量为F2,F2的大小为16×K。
第三步,残差连接层
残差连接层将特征向量与输入特征进行残差连接。
由于通常情况下输入特征F1和输出的特征向量F2维度不相同,需要增加额外的矩阵M4,残差层的输出为编码矩阵F3,F3=g(F2+F1·M4)。其中,g()为激活函数。激活函数例如ReLU函数。
例如,输入矩阵F1的大小为16×4K。输出的特征向量F2的大小为16×K。额外的矩阵M4的大小为4K×K。编码矩阵F3的大小为16×K。编码矩阵的16个分量为分别对应16个分词的编码向量。
在本示例中,神经网络模型包括残差连接层。应理解,在其他示例中,也可以不设置残差连接层。
第四步,选择层
根据词性信息,保留可能成为候选先行词的分词所对应的编码向量,和代词所对应的编码向量。可能成为候选先行词的分词可以包括名词,名词包括人名词,地名词等。
假设经过选择后,最大词数为m。则选择层输出的经过选择后的编码向量为F4,F4的大小为m×K。若根据词性信息筛选出的候选先性词和代词的总数量不足m个,则采用全0向量补充,即补零。若根据词性信息筛选出的候选先性词和代词的总数量超过m个,则采用截取的方式过滤掉超量的候选先行词,例如将距离代词最远的超量候选候选词过滤。
例如,假设选择层筛选出4个分词及对应的编码向量。根据词性信息,刚好筛选出3个候选先行词“周杰伦”、“中国”和“歌手”,以及1个代词“他”。并根据各个分词的排序,在编码矩阵按照排序筛选这4个分词各自对应的编码向量。
在本示例中,神经网络模型包括选择层。应理解,在其他示例中,也可以不设置选择层。
在本示例中,选择层根据词性信息筛选候选先行词。应理解,在其他示例中,还可以根据词性信息、位置信息和知识信息中的至少一个信息筛选候选先行词。
第五步,拼接层
拼接层将代词对应的编码向量与各个候选先行词对应的编码向量进行拼接,得到匹配向量。每个匹配向量的大小为1×2K。
假设筛选出的m个分词中,代词的数量为1个。该代词对应的编码向量与m-1个候选先行词各自对应的编码向量分别进行拼接,得到m-1个匹配向量,则各个匹配向量拼接成矩阵F5,F5的大小为(m-1)×2K。
例如,假设选择层筛选出4个分词及对应的编码向量。筛选出的3个候选先行词为“周杰伦”、“中国”和“歌手”,以及1个代词“他”。将代词“他”对应的编码向量分别与候选先行词为“周杰伦”、“中国”和“歌手”各自对应的编码向量进行拼接。得到3个匹配向量。3个匹配向量分别表示三组指代关系:[他]<->[周杰伦],[他]<->[中国],和[他]<->[歌手]。
在本示例中,假设筛选出的m个分词中,代词的数量为1个。应理解,在其他示例中,代词的数量还可以为多个。
第六步,全连接神经网络
全连接神经网络对每个匹配向量进行打分,得到每个匹配向量的打分结果。打分结果与该组指代关系之间的匹配程度正相关。也就是说,一组指代关系的打分越高,则表示该组指代关系成立的可能性越大。全连接神经网络可以包括全连接层和激活函数。
例如,全连接神经网络对3个匹配向量进行打分,得到各组指代关系的打分结果。[他]<->[周杰伦]这组指代关系的打分为0.9,[他]<->[中国]这组指代关系的打分为0.4,和[他]<->[歌手]这组指代关系的打分为0.3。
第七步,输出
输出打分最高的前N个指代关系,作为正确的指代消解结果,即指代匹配结果。通常情况下,选择打分最高的指代关系,作为指代匹配结果。
例如,输出打分最高的指代关系:[他]<->[周杰伦],作为指代消解结果。也就是说,通过神经网络模型,识别出目标文本中的代词“他”指代的是“周杰伦”。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的指代消解的方法,图11示出了本申请实施例提供的指代消解的装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。
参照图11,指代消解的装置包括:第一获取模块M1101,第二获取模块M1102,组成模块M1103和消解模块M1104。
其中,所述第一获取模块M1101,用于获取待指代消解的目标文本;
所述第二获取模块M1102,用于获取所述目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征中的至少一个目标特征;
所述组成模块M1103,用于将所述词义特征和所述目标特征组成输入矩阵;
所述消解模块M1104,用于将所述输入矩阵输入神经网络模型,得到指代消解结果。
可选地,所述词义特征包括所述目标文本对应的词向量矩阵。
可选地,如图12所示,所述第二获取模块M1102包括词义特征获取模块M11021,词性特征获取模块M11022,位置特征获取模块M11023和知识特征获取模块M11024。
所述词义特征获取模块M11021,用于获取所述目标文本的词义特征。
具体地,所述词义特征获取模块M11021,具体用于:
将每个分词转换成词向量,所述目标文本包括若干个所述分词;
将各个所述分词对应的词向量拼接成词向量矩阵。
所述词性特征获取模块M11022,用于获取所述目标文本的词性特征。
具体地,所述词性特征获取模块M11022,具体用于:
获取每个分词对应的词性信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述词性信息,映射成词性特征。
所述位置特征获取模块M11023,用于获取所述目标文本的位置特征。
具体地,所述位置特征获取模块M11023,具体用于:
获取每个分词对应的位置信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述位置信息,映射成位置特征。
所述知识特征获取模块M11024,用于获取所述目标文本的知识特征。
具体地,所述知识特征获取模块M11024具体用于:
获取每个分词对应的知识信息,所述目标文本包括若干个所述分词;
将每个所述分词对应的所述知识信息,映射成知识特征。
可选地,所述神经网络模型包括特征抽取器和分类子网络。
所述特征抽取器用于提取所述输入矩阵的特征,获得特征矩阵,所述特征矩阵包括各个所述分词对应的特征向量。
所述分类子网络用于基于所述特征矩阵获得指代消解结果。
作为一非限制性示例,所述分类子网络包括拼接层,全连接神经网络和输出层。
所述拼接层用于将每个其余分词对应的特征向量,与代词对应的特征向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
作为另一非限制性示例,所述分类子网络包括残差连接层,拼接层,全连接神经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。所述编码矩阵包括各个所述分词对应的编码向量。
所述拼接层用于将每个其余分词对应的编码向量,与代词对应的编码向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
作为另一非限制性示例,所述分类子网络包括选择层,拼接层,全连接神经网络和输出层。
所述选择层用于从所述特征矩阵中筛选出每个候选先行词对应的特征向量,和代词对应的特征向量。
所述拼接层用于将每个所述候选先行词对应的特征向量,与所述代词对应的特征向量进行拼接,获得匹配向量。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
作为另一非限制性示例,所述分类子网络包括残差连接层,选择层,拼接层,全连接神经网络和输出层。
所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。所述编码矩阵包括各个所述分词对应的编码向量。
所述选择层用于从所述编码矩阵中筛选出每个候选先行词对应的编码向量,和代词对应的编码向量。
所述拼接层用于将每个所述候选先行词对应的编码向量,与所述代词对应的编码向量进行拼接,获得匹配向量。
所述全连接神经网络用于对每个所述匹配向量进行打分。
所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
需要说明的是,上述模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/电子设备的任何实体或装置、记录介质、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的电子设备和方法,可以通过其它的方式实现。例如,以上所描述的电子设备实施例仅仅是示意性的。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种指代消解的方法,其特征在于,包括:
    获取待指代消解的目标文本;
    获取所述目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征中的至少一个目标特征;
    将所述词义特征和所述目标特征组成输入矩阵;
    将所述输入矩阵输入神经网络模型,得到指代消解结果。
  2. 如权利要求1所述的方法,其特征在于,所述词义特征包括所述目标文本对应的词向量矩阵。
  3. 如权利要求1或2所述的方法,其特征在于,所述获取所述目标文本的词义特征,包括:
    将每个分词转换成词向量,所述目标文本包括若干个所述分词;
    将各个所述分词对应的词向量拼接成词向量矩阵。
  4. 如权利要求1或2所述的方法,其特征在于,
    获取所述目标文本的词性特征,包括:
    获取每个分词对应的词性信息,所述目标文本包括若干个所述分词;以及
    将每个所述分词对应的所述词性信息,映射成词性特征;
    获取所述目标文本的位置特征,包括:
    获取每个分词对应的位置信息,所述目标文本包括若干个所述分词;以及
    将每个所述分词对应的所述位置信息,映射成位置特征;
    获取所述目标文本的知识特征,包括:
    获取每个分词对应的知识信息,所述目标文本包括若干个所述分词;以及
    将每个所述分词对应的所述知识信息,映射成知识特征。
  5. 如权利要求1或2所述的方法,其特征在于,所述神经网络模型包括特征抽取器和分类子网络;
    所述特征抽取器用于提取所述输入矩阵的特征,获得特征矩阵,所述特征矩阵包括各个所述分词对应的特征向量;
    所述分类子网络用于基于所述特征矩阵获得指代消解结果。
  6. 如权利要求5所述的方法,其特征在于,所述分类子网络包括拼接层,全连接神经网络和输出层;
    所述拼接层用于将每个其余分词对应的特征向量,与代词对应的特征向量进行拼接,获得匹配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词;
    所述全连接神经网络用于对每个所述匹配向量进行打分;
    所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
  7. 如权利要求5所述的方法,其特征在于,所述分类子网络包括选择层,拼接层,全连接神经网络和输出层;
    所述选择层用于从所述特征矩阵中筛选出每个候选先行词对应的特征向量,和代词对应的特征向量;
    所述拼接层用于将每个所述候选先行词对应的特征向量,与所述代词对应的特征向量进行拼接,获得匹配向量;
    所述全连接神经网络用于对每个所述匹配向量进行打分;
    所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
  8. 如权利要求6或7所述的方法,其特征在于,所述分类子网络还包括残差连接层;
    所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。
  9. 一种指代消解的装置,其特征在于,包括:
    第一获取模块,用于获取待指代消解的目标文本;
    第二获取模块,用于获取所述目标文本的词义特征,获取所述目标文本的词性特征,位置特征和知识特征中的至少一个目标特征;
    组成模块,用于将所述词义特征和所述目标特征组成输入矩阵;
    消解模块,用于将所述输入矩阵输入神经网络模型,得到指代消解结果。
  10. 如权利要求9所述的装置,其特征在于,所述词义特征包括所述目标文本对应的词向量矩阵。
  11. 如权利要求9或10所述的装置,其特征在于,所述第二获取模块包括词义特征获取模块,词性特征获取模块,位置特征获取模块和知识特征获取模块;
    所述词义特征获取模块,用于获取所述目标文本的词义特征;
    所述词性特征获取模块,用于获取所述目标文本的词性特征;
    所述位置特征获取模块,用于获取所述目标文本的位置特征;
    所述知识特征获取模块,用于获取所述目标文本的知识特征。
  12. 如权利要求11所述的装置,其特征在于,
    所述词义特征获取模块,用于:
    将每个分词转换成词向量,所述目标文本包括若干个所述分词;以及
    将各个所述分词对应的词向量拼接成词向量矩阵;
    所述词性特征获取模块,用于:
    获取每个分词对应的词性信息,所述目标文本包括若干个所述分词;以及
    将每个所述分词对应的所述词性信息,映射成词性特征;
    所述位置特征获取模块,用于:
    获取每个分词对应的位置信息,所述目标文本包括若干个所述分词;以及
    将每个所述分词对应的所述位置信息,映射成位置特征;
    所述知识特征获取模块,用于:
    获取每个分词对应的知识信息,所述目标文本包括若干个所述分词;以及
    将每个所述分词对应的所述知识信息,映射成知识特征。
  13. 如权利要求9或10所述的装置,其特征在于,所述神经网络模型包括特征抽取器和分类子网络;
    所述特征抽取器用于提取所述输入矩阵的特征,获得特征矩阵,所述特征矩阵包括各个所述分词对应的特征向量;
    所述分类子网络用于基于所述特征矩阵获得指代消解结果。
  14. 如权利要求13所述的装置,其特征在于,所述分类子网络包括拼接层,全连接神经网络和输出层;
    所述拼接层用于将每个其余分词对应的特征向量,与代词对应的特征向量进行拼接,获得匹 配向量;所述其余分词为所述目标文本包括的若干个分词中,除所述代词外的分词;
    所述全连接神经网络用于对每个所述匹配向量进行打分;
    所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
  15. 如权利要求13所述的装置,其特征在于,所述分类子网络包括选择层,拼接层,全连接神经网络和输出层;
    所述选择层用于从所述特征矩阵中筛选出每个候选先行词对应的特征向量,和代词对应的特征向量;
    所述拼接层用于将每个所述候选先行词对应的特征向量,与所述代词对应的特征向量进行拼接,获得匹配向量;
    所述全连接神经网络用于对每个所述匹配向量进行打分;
    所述输出层输出打分最高的匹配向量对应的指代关系,作为指代消解结果。
  16. 如权利要求14或15所述的装置,其特征在于,所述分类子网络还包括残差连接层;
    所述残差连接层用于对所述输入矩阵和所述特征矩阵进行残差连接,得到编码矩阵。
  17. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时,使得所述电子设备实现如权利要求1至8任一项所述的方法。
  18. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的方法。
PCT/CN2020/124482 2020-02-24 2020-10-28 指代消解的方法、装置及电子设备 WO2021169351A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010113756.2 2020-02-24
CN202010113756.2A CN113297843B (zh) 2020-02-24 2020-02-24 指代消解的方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2021169351A1 true WO2021169351A1 (zh) 2021-09-02

Family

ID=77318561

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124482 WO2021169351A1 (zh) 2020-02-24 2020-10-28 指代消解的方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN113297843B (zh)
WO (1) WO2021169351A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168738A (zh) * 2021-12-16 2022-03-11 北京感易智能科技有限公司 篇章级事件抽取方法、系统和设备
CN114494872A (zh) * 2022-01-24 2022-05-13 北京航空航天大学 一种嵌入式轻量化遥感目标检测系统
CN114168738B (zh) * 2021-12-16 2024-06-07 北京感易智能科技有限公司 篇章级事件抽取方法、系统和设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963358B (zh) * 2021-12-20 2022-03-04 北京易真学思教育科技有限公司 文本识别模型训练方法、文本识别方法、装置及电子设备
CN116562303B (zh) * 2023-07-04 2023-11-21 之江实验室 一种参考外部知识的指代消解方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766320A (zh) * 2016-08-23 2018-03-06 中兴通讯股份有限公司 一种中文代词消解模型建立方法及装置
WO2018174815A1 (en) * 2017-03-24 2018-09-27 Agency For Science, Technology And Research Method and apparatus for semantic coherence analysis of texts
CN108595408A (zh) * 2018-03-15 2018-09-28 中山大学 一种基于端到端神经网络的指代消解方法
CN109446517A (zh) * 2018-10-08 2019-03-08 平安科技(深圳)有限公司 指代消解方法、电子装置及计算机可读存储介质
CN110134944A (zh) * 2019-04-08 2019-08-16 国家计算机网络与信息安全管理中心 一种基于强化学习的指代消解方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7813916B2 (en) * 2003-11-18 2010-10-12 University Of Utah Acquisition and application of contextual role knowledge for coreference resolution
RU2601166C2 (ru) * 2015-03-19 2016-10-27 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Разрешение анафоры на основе технологии глубинного анализа
CN107402913B (zh) * 2016-05-20 2020-10-09 腾讯科技(深圳)有限公司 先行词的确定方法和装置
JP6727610B2 (ja) * 2016-09-05 2020-07-22 国立研究開発法人情報通信研究機構 文脈解析装置及びそのためのコンピュータプログラム
US10482885B1 (en) * 2016-11-15 2019-11-19 Amazon Technologies, Inc. Speaker based anaphora resolution
US10366161B2 (en) * 2017-08-02 2019-07-30 International Business Machines Corporation Anaphora resolution for medical text with machine learning and relevance feedback
CN107679041B (zh) * 2017-10-20 2020-12-01 苏州大学 基于卷积神经网络的英文事件同指消解方法及系统
CN109271529B (zh) * 2018-10-10 2020-09-01 内蒙古大学 西里尔蒙古文和传统蒙古文双文种知识图谱构建方法
CN109885841B (zh) * 2019-03-20 2023-07-11 苏州大学 基于结点表示法的指代消解方法
CN110705206B (zh) * 2019-09-23 2021-08-20 腾讯科技(深圳)有限公司 一种文本信息的处理方法及相关装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766320A (zh) * 2016-08-23 2018-03-06 中兴通讯股份有限公司 一种中文代词消解模型建立方法及装置
WO2018174815A1 (en) * 2017-03-24 2018-09-27 Agency For Science, Technology And Research Method and apparatus for semantic coherence analysis of texts
CN108595408A (zh) * 2018-03-15 2018-09-28 中山大学 一种基于端到端神经网络的指代消解方法
CN109446517A (zh) * 2018-10-08 2019-03-08 平安科技(深圳)有限公司 指代消解方法、电子装置及计算机可读存储介质
CN110134944A (zh) * 2019-04-08 2019-08-16 国家计算机网络与信息安全管理中心 一种基于强化学习的指代消解方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG QIMENG, YU LONGTIAN, SHENGWAI WUMAIER AISHAN: "Anaphora Resolution of Uyghur Personal Pronouns Based on Multi-attention Mechanism", ACTA AUTOMATICA SINICA, vol. 47, no. 6, 1 June 2021 (2021-06-01), pages 1412 - 1421, XP055841852, DOI: 10.16383/j.aas.c180678 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168738A (zh) * 2021-12-16 2022-03-11 北京感易智能科技有限公司 篇章级事件抽取方法、系统和设备
CN114168738B (zh) * 2021-12-16 2024-06-07 北京感易智能科技有限公司 篇章级事件抽取方法、系统和设备
CN114494872A (zh) * 2022-01-24 2022-05-13 北京航空航天大学 一种嵌入式轻量化遥感目标检测系统

Also Published As

Publication number Publication date
CN113297843B (zh) 2023-01-13
CN113297843A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
CN110111787B (zh) 一种语义解析方法及服务器
WO2021169351A1 (zh) 指代消解的方法、装置及电子设备
AU2019418925B2 (en) Photographing method and electronic device
WO2023125335A1 (zh) 问答对生成的方法和电子设备
WO2021244457A1 (zh) 一种视频生成方法及相关装置
US11636852B2 (en) Human-computer interaction method and electronic device
WO2021254411A1 (zh) 意图识别方法和电子设备
CN112989767B (zh) 医学词语标注方法、医学词语映射方法、装置及设备
CN111625670A (zh) 一种图片分组方法及设备
CN111881315A (zh) 图像信息输入方法、电子设备及计算机可读存储介质
CN112256868A (zh) 零指代消解方法、训练零指代消解模型的方法及电子设备
WO2022062884A1 (zh) 文字输入方法、电子设备及计算机可读存储介质
CN113468929A (zh) 运动状态识别方法、装置、电子设备和存储介质
CN109285563B (zh) 在线翻译过程中的语音数据处理方法及装置
WO2021031862A1 (zh) 一种数据处理方法及其装置
CN112988984B (zh) 特征获取方法、装置、计算机设备及存储介质
CN113506566B (zh) 声音检测模型训练方法、数据处理方法以及相关装置
CN114547616A (zh) 检测垃圾软件的方法、装置及电子设备
CN114238554A (zh) 一种文本标注提取方法
CN114528842A (zh) 一种词向量构建方法、装置、设备及计算机可读存储介质
WO2024067630A1 (zh) 一种输入方法、电子设备和存储介质
CN112416984B (zh) 一种数据处理方法及其装置
WO2021238338A1 (zh) 语音合成方法及装置
CN116050402B (zh) 文本地址识别方法、电子设备及存储介质
WO2024051730A1 (zh) 跨模态检索方法、装置、设备、存储介质及计算机程序

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922238

Country of ref document: EP

Kind code of ref document: A1