CN112668327A - Information extraction method and device, computer equipment and storage medium - Google Patents

Information extraction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112668327A
CN112668327A CN202011538124.7A CN202011538124A CN112668327A CN 112668327 A CN112668327 A CN 112668327A CN 202011538124 A CN202011538124 A CN 202011538124A CN 112668327 A CN112668327 A CN 112668327A
Authority
CN
China
Prior art keywords
information
dialogue
vector
user portrait
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011538124.7A
Other languages
Chinese (zh)
Inventor
邹义宏
陈林
吴伟佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimin Insurance Agency Co Ltd
Original Assignee
Weimin Insurance Agency Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimin Insurance Agency Co Ltd filed Critical Weimin Insurance Agency Co Ltd
Priority to CN202011538124.7A priority Critical patent/CN112668327A/en
Publication of CN112668327A publication Critical patent/CN112668327A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an information extraction method, an information extraction device, computer equipment and a storage medium. The method comprises the following steps: acquiring dialogue information between a user and a service object; extracting user portrait information and an information type of the user portrait information from the dialogue information; determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position; determining an attribution target of the user portrait information based on the dialogue information, the location information, and the information type, the attribution target being a person or thing to which the user portrait information belongs. The method can improve the accuracy of information extraction.

Description

Information extraction method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an information extraction method and apparatus, a computer device, and a storage medium.
Background
With the development of computer technology, enterprises can adopt technologies such as artificial intelligence and big data to collect information of users and construct user figures, so that more and more real requirements of the users can be obtained, and the users can be better served. In the process of collecting user information, dependency parsing is usually performed on a chat sentence of a user to obtain information of the user.
However, in the chat sentence of the user, there is often a problem of omitting the subject, and the conventional information extraction method cannot accurately extract information.
Disclosure of Invention
In view of the above, it is necessary to provide an information extraction method, an apparatus, a computer device, and a storage medium capable of improving accuracy.
A method of information extraction, the method comprising:
acquiring dialogue information between a user and a service object;
extracting user portrait information and an information type of the user portrait information from the dialogue information;
determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position;
determining an attribution target of the user portrait information based on the dialogue information, the location information, and the information type, the attribution target being a person or thing to which the user portrait information belongs.
An information extraction apparatus, the apparatus comprising:
the conversation information acquisition module is used for acquiring conversation information between the user and the service object;
the key information extraction module is used for extracting user portrait information and the information type of the user portrait information from the dialogue information; determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position;
a relationship attribution module for determining an attribution object of the user portrait information based on the dialogue information, the location information and the information type, the attribution object being a person or thing to which the user portrait information belongs.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The information extraction method, the information extraction device, the computer equipment and the storage medium acquire the dialogue information between the user and the service object; the user portrait information and the information type of the user portrait information are extracted from the dialogue information, so that the position of the user portrait information in the dialogue information can be accurately determined based on the dialogue information and the user portrait information, the position information of the position is obtained, the attribution object of the user portrait information can be accurately determined based on the dialogue information, the position information and the information type, the problem that the user cannot accurately extract information when a subject is omitted from the dialogue information is solved, and the accuracy of information extraction can be improved.
Drawings
FIG. 1 is a diagram of an exemplary environment in which the information extraction method may be implemented;
FIG. 2 is a schematic flow chart diagram illustrating a method for extracting information in one embodiment;
FIG. 3 is a flowchart illustrating the step of filtering out associated session information associated with current session information from historical session information based on the current session information in one embodiment;
FIG. 4 is a block diagram of a BiMPM network in one embodiment;
FIG. 5 is a block diagram of a BilSTM + CRF model in one embodiment;
FIG. 6 is a flowchart illustrating the steps for determining an attribution object for user portrait information based on dialog information, location information, and information type in one embodiment;
FIG. 7 is a schematic flow chart of information extraction in another embodiment;
FIG. 8 is a schematic diagram of an interface of an information extraction method in one embodiment;
FIG. 9 is a block diagram showing the structure of an information extracting apparatus according to an embodiment;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence and Machine Learning (ML). Artificial intelligence is a theory, technology and application system which simulates, extends and expands human intelligence, senses environment, acquires knowledge and obtains the best result by using a digital computer or a machine controlled by the digital computer, so that the machine has the functions of sensing, reasoning and decision making. Machine learning relates to multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and researches how a computer simulates or realizes the learning behavior of human beings so as to obtain new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
By adopting the technology such as artificial intelligence, machine learning and the like, more accurate user portrait information, information types of the user portrait information and attribution objects of the user portrait information can be extracted from the dialogue information, so that more accurate information extraction from the object information is realized.
The information extraction method provided by the application can be applied to the application environment shown in fig. 1. Wherein, the terminal 102 of the user communicates with the computer device 104 of the service object through the network. A user sends a conversation to computer equipment 104 where a service object is located through a terminal 102, and the computer equipment 104 acquires conversation information between the user and the service object; extracting user portrait information and an information type of the user portrait information from the dialogue information; determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring position information of the position; based on the session information, the location information, and the information type, an attribution object of the user portrait information is determined. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The computer device 104 may be a terminal or a server. When the computer device 104 is a terminal, it may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. When the computer device 104 is a server, it may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
In one embodiment, as shown in fig. 2, an information extraction method is provided, which is described by taking the method as an example applied to the computer device in fig. 1, and includes the following steps:
step 202, obtaining dialogue information between the user and the service object.
A service object is an object that provides a service to a user. Alternatively, the service object may be a service person or an intelligent service robot, but is not limited thereto. The service provided by the service object may be a consulting service, a business handling service, an after-sales service, etc. The service object can provide services of a certain industry, for example, in the insurance industry, insurance application services, claim settlement services, insurance release services, consultation services and the like can be provided; in the communication industry, communication service handling services, communication service unsubscription services, consultation services and the like can be provided.
In the process of providing the service to the user by the service object, the user and the service object carry out a conversation, and the computer equipment can acquire the object information between the user and the service object. The dialogue information may specifically be at least one of text information, image information, video information, audio information, and the like.
In one embodiment, the computer device may pre-process the session information after obtaining the session information between the user and the service object, and extract the user portrait information and the information type of the user portrait information from the pre-processed session information.
The preprocessing may include special symbol processing, English case and case conversion, and simplified character unification. The special symbol processing may be deleting the special symbol, converting the special symbol into a normal symbol, marking the special symbol, and the like.
Preprocessing may also include replacing with special markup text using a canonical match mailbox, web site, phone, identification number, etc.
Step 204, the user portrait information and the information type of the user portrait information are extracted from the dialog information.
The user portrait information refers to abstracting each concrete information of the user into labels, and concreting the user image by using the labels, thereby providing targeted service for the user. The user representation information may include user behavior information, user attribute information, and the like. The user behavior information may specifically be browsing information of the user, purchasing information of the user, preference information of the user, and the like. The user attribute information may specifically be the age, sex, height, weight, skin color, place of daily use, and the like of the user.
The information type of the user portrait information refers to a type to which the user portrait information belongs. For example, the information type of the user portrait information "male" is "sex", the information type of the user portrait information "25" is "age", the information type of the user portrait information "180 cm" is "height", the information type of the user portrait information "watch article a for 5 minutes" is "browsing information", and the information type of the user portrait information "buy product B" is "purchase information".
And step 206, determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position.
The user representation information is extracted from the dialog information, and the computer device may determine a location in the dialog information where the user representation is located based on the location in the dialog information from which the user representation information was extracted, and obtain location information for the location.
In one embodiment, the computer device calculates a string length between the location and a location at the beginning of the dialog information. The character string length indicates position information of the user portrait information in the dialogue information. For example, a dialog message "ask for how old your family is, respectively? User 43, two children 3, 8 years old ", user profile information is 43, the computer device determines the location information of the user profile information in the dialog information as offset: 20, the character string length between the position of the user image information "43" and the position at the head of the dialogue information is 20. As another example, the dialog message "ask you for and mr. how old children are, respectively, mr. user 38, me 34, 5 years old and half old, and two years old and less than one year old", the user portrait information is 34, the computer device determines that the location information of the user portrait information in the dialog message is offset: 27, the character string length between the position of the user image information "34" and the position at the head of the dialogue information is 27.
In another embodiment, the computer device calculates a string length between the location and the location at the end of the dialog message. The character string length indicates position information of the user portrait information in the dialogue information. For example, a dialog message "ask for how old your family is, respectively? User 43, two children 3, 8 years old ", user profile information is 43, the computer device determines the location information of the user profile information in the dialog information as offset: 10, the character string length between the position of the user image information "43" and the position at the end of the dialogue information is 10.
And step 208, determining the attribution object of the user portrait information based on the dialogue information, the position information and the information type, wherein the attribution object is a person or object to which the user portrait information belongs.
The belonging object refers to a person or an object to which the user profile information belongs. For example, the dialog message is "ask for how old your family is respectively? When the wife 43 and two children are 3 and 8 years old, the user image information 43 is attributed to the wife, and the user image information 3 and 8 years old are attributed to the two children. As another example, the dialog message is "ask for refrigerator multiple requests? If [ user ] is 10 jin, the object to which the user image information is 10 jin is a refrigerator.
It is understood that in the dialog information of the user with the service object, the user often omits the subject, for example, "ask how old you are wife? [ user ]43 "extracts user profile information 43, the information type is age, and position information of the user profile information is determined to be offset: and 16, based on the conversation information, the position information and the information type, the attribution object of the user portrait information can be determined as a wife.
Specifically, the computer device inputs the dialogue information, the position information and the information type into a trained relation attribution model, and determines an attribution object of the user portrait information through the trained relation attribution model.
In the embodiment, the dialogue information between the user and the service object is obtained; the user portrait information and the information type of the user portrait information are extracted from the dialogue information, so that the position of the user portrait information in the dialogue information can be accurately determined based on the dialogue information and the user portrait information, the position information of the position is obtained, the attribution object of the user portrait information can be accurately determined based on the dialogue information, the position information and the information type, the problem that the user cannot accurately extract information when a subject is omitted from the dialogue information is solved, and the accuracy of information extraction can be improved. In addition, according to the characteristics of interactivity, spoken language and the like in the user chat process, the information included in the conversation information can be more effectively mined from multiple layers.
In one embodiment, the dialog information includes current dialog information and associated dialog information; obtaining session information between a user and a service object, comprising: acquiring current conversation information between a user and a service object and historical conversation information corresponding to the current conversation information; and screening out associated dialogue information related to the current dialogue information from the historical dialogue information based on the current dialogue information.
The current dialog information refers to the current dialog information of the user and the service object. The historical dialog information refers to dialog information that occurred before the current dialog information. The associated dialog information refers to dialog information that is associated with the current dialog information in the history dialog information.
In one embodiment, the computer device obtains current dialog information in a current preset window, and obtains historical dialog information in each preset window before the current preset window. For example, the dialog information in the current preset window is User _ n, and the historical dialog information in each preset window before the current preset window is User _ n-1, User _ n-2 … … User _ 1.
In another embodiment, the computer device obtains the latest specified number of session information as current session information, and obtains information before the current session information as historical session information in chronological order. The specified number may be set as desired.
And the computer equipment splices the current conversation information and the associated conversation information according to the time sequence to obtain the conversation information, and extracts the user portrait information and the information type of the user portrait information from the conversation information. Alternatively, the time sequence may be from morning to evening, or from evening to morning.
It can be understood that the current dialog information and the historical dialog information may be in the same round of dialog between the user and the service object, and an association relationship between the current dialog information and the historical dialog information is bound to exist, and in the historical dialog information, some noise information irrelevant to the current dialog information may also exist, so that, based on the current dialog information, the associated dialog information relevant to the current dialog information can be screened out from the historical dialog information, and the noise information irrelevant to the current dialog information is screened out, so that based on the current dialog information and the associated dialog information, the information of the user can be extracted more accurately and more quickly.
In one embodiment, as shown in fig. 3, screening the historical dialog information for associated dialog information related to the current dialog information based on the current dialog information includes:
step 302, performing vector conversion on the current dialogue information to obtain a current dialogue vector; and carrying out vector conversion on the historical dialogue information to obtain a historical dialogue vector.
The current dialog vector is a vector that characterizes current dialog information. A historical dialogue vector is a vector that characterizes historical dialogue information.
Specifically, the computer device performs word segmentation on current dialogue information or historical dialogue vectors by using a trained word vector model to obtain words, performs vector conversion on the words to obtain word vectors, and then performs splicing on the word vectors to obtain the current dialogue vectors or the historical dialogue vectors. Either the current dialog vector or the historical dialog vector is a sentence vector. The word vector model may specifically be a Skip-Gram model.
The training mode of the word vector model comprises the following steps: and acquiring each special word, fusing each special word into a word segmentation dictionary, segmenting the training corpus by adopting the word segmentation dictionary, inputting the segmented training corpus into a word vector model for training, and acquiring a trained word vector model. The computer equipment can adopt a word segmentation dictionary to segment the training corpus by using a forward maximum matching method.
The proper words can be proper nouns in various industries, such as insurance application, claim settlement, diseases, risk types, insurance names and the like in the insurance industry, and the proper nouns such as servers, switches and the like in the communication industry.
The corpus may include corpus of dialogs between other users and service objects, and may also include various documents or texts in a knowledge base, etc. For example, the corpus may include insurance-like documents for the insurance industry, text in an insurance-like repository, dialog corpuses between other users and insurance service objects, and the like.
The computer equipment trains the word vector model by adopting the special words, so that the trained word vector model can more quickly carry out vector conversion on the dialogue information comprising the special words, and the cost of manual labeling is reduced.
And 304, splicing the current dialogue vector and the historical dialogue vector to obtain a spliced dialogue vector.
Optionally, the computer device may splice the current dialog vector after the historical dialog vector to obtain a spliced dialog vector, or may splice the historical dialog vector after the current dialog vector to obtain a spliced dialog vector. For example, the current dialog vector is a, the historical dialog vector is B, and the computer device splices the current dialog vector a and the historical dialog vector B to obtain a spliced dialog vector (a, B) or (B, a).
Step 306, determining semantic similarity between the current dialogue information and each sub-dialogue information in the historical dialogue information respectively based on the spliced dialogue vector, and screening out associated dialogue information related to the current dialogue information from the historical dialogue information based on the semantic similarity.
And the computer equipment inputs the spliced conversation vector into a trained context-related text pair matching network, determines semantic similarity between the current conversation information and each sub-conversation information in the historical conversation information through the context-related text pair matching network, and screens out associated conversation information related to the current conversation information from the historical conversation information. Wherein, the context-related text pair matching network is a network for screening out the related text in the context. The context-related text pair Matching network may specifically be bimp (bidirectional Multi-perceptual Matching network).
Specifically, the keyword feature information of the current dialog information and the keyword feature information of each sub-dialog information in the historical dialog information are extracted from the matching network through the context-related text, and the semantic similarity between the current dialog information and each sub-dialog information in the historical dialog information is determined by taking the keyword feature information as the main feature.
The computer device may use the sub-dialog information corresponding to the semantic similarity higher than the preset similarity as the associated dialog information. The preset similarity can be set according to needs. For example, if the preset similarity is 50%, the semantic similarity between the current dialog information and the sub-dialog information a is 40%, the semantic similarity between the current dialog information and the sub-dialog information B is 56%, and the semantic similarity between the current dialog information and the sub-dialog information C is 80%, then the sub-dialog information B and the sub-dialog information C are both associated dialog information related to the current dialog information.
Fig. 4 is a block diagram of a BiMPM network in one embodiment. The BiMPM network comprises a word representation layer, a context representation layer, a matching layer, an aggregation layer and a prediction layer, the computer equipment inputs the spliced conversation vector into the BiMPM network, the spliced conversation vector is processed sequentially through the word representation layer, the context representation layer, the matching layer, the aggregation layer and the prediction layer, and relevant conversation information related to current conversation information is screened out from historical conversation information.
And respectively segmenting words of the corresponding current dialogue information in the spliced dialogue vector and each sub-dialogue information in the historical dialogue information aiming at the spliced dialogue vector obtained by splicing the current dialogue vector and the historical dialogue vector. Specifically, the current dialog information is represented at the word representation level as words p1, p2, p3, … … pM. Any one of the sub-dialog information is represented as words q1, q2, q3, … … qN. The word representation layer divides the current dialogue information and the sub-dialogue information into words and inputs the words into the context representation layer. The context representation layer extracts the word sequence relation of any two adjacent words in the current dialogue information and the sub-dialogue information and uses the word sequence representation vector to represent. Specifically, the word order relationship between any two adjacent words may be a word order relationship from the beginning of a sentence to the end of a sentence, or a word order relationship from the end of a sentence to the beginning of a sentence. And the matching layer matches the word sequence expression vector of the current dialogue information with the word sequence expression vectors of any two adjacent words in the sub-dialogue information and outputs a matching vector. And the matching layer matches the word sequence expression vectors of any two adjacent words in the current dialogue information with the word sequence expression vectors of the sub-dialogue information and outputs matching vectors. And the matching layer inputs the matching vectors into the aggregation layer, and the aggregation layer aggregates the matching vectors according to the word order relation to obtain an aggregation vector. And inputting the aggregation vectors into a prediction layer, and predicting semantic similarity between the current conversation information and the sub-conversation information by the prediction layer according to each aggregation vector.
In this embodiment, vector conversion is performed on current dialog information and historical dialog information respectively, then the current dialog vector and the historical dialog vector obtained by the vector conversion are spliced to obtain a spliced dialog vector, semantic similarity between the current dialog information and each sub-dialog information in the historical dialog information is determined based on the spliced dialog vector, associated dialog information related to the current dialog information can be accurately screened from the historical dialog information based on the semantic similarity between the current dialog information and each sub-dialog information in the historical dialog information, noise information unrelated to the current dialog information is screened, and then the current dialog information and the associated dialog information can be more accurate, and user portrait information can be more quickly extracted.
In one embodiment, the training of the context-dependent text to the matching network includes: acquiring a dialogue training text; the dialogue training texts comprise a current dialogue training text and a historical dialogue training text, wherein the historical dialogue training text comprises a positive training text related to the current dialogue training text and a negative training text unrelated to the current dialogue training text; inputting the current dialogue training text, the positive training text and the negative training text into the context-related text pair matching network, and training the context-related text pair matching network to obtain the trained context-related text pair matching network.
The dialog training text is a dialog text for training. The dialog training text may be historical dialog information of the user and the service object, which is the object of information extraction, or historical dialog information of other users and the service object, and is not limited to this.
The computer equipment can set a certain text in the dialogue training texts as a current dialogue training text, the text before the current dialogue training text is a historical dialogue training text, then the text related to the current dialogue training text is set as a positive training text in the historical dialogue training, the text unrelated to the current dialogue training text is a negative training text, the current dialogue training text, the positive training text and the negative training text are input into the context-related text pair matching network, machine learning can be adopted to train the context-related text pair matching network, and the trained context-related text pair matching network is obtained.
In this embodiment, the current dialog training text, the positive training text related to the current dialog training text, and the negative training text unrelated to the current dialog training text are input into the context-related text pair matching network, so that a more accurate context-related text pair matching network can be trained, and thus, more accurate associated dialog information related to the current dialog information can be screened in the subsequent information extraction process.
In one embodiment, extracting the user representation information from the dialog information, and the information type of the user representation information, includes: carrying out vector conversion on the dialogue information to obtain a dialogue vector; labeling the conversation vector to obtain labeling information; and decoding each piece of labeled information to extract the user portrait information and the information type of the user portrait information.
The annotation information may include one or more of B-LOC, I-LOC, B-Person, I-Person, B-Organization, I-Organization, O, and the like. B-LOC represents the beginning of the geographical location (location), I-LOC represents the middle of the geographical location, B-Person represents the beginning of the name of the Person, I-Person represents the middle of the name of the Person, B-Organization represents the beginning of the Organization, I-Organization represents the middle of the Organization, and O represents non-entity information.
In one embodiment, the computer device may perform vector transformation on the dialogue information by using the trained word vector model to obtain a dialogue vector, input the dialogue vector into the BiLSTM + CRF model, and label the dialogue vector by using the BiLSTM + CRF model to obtain labeled information. The BilSTM + CRF model comprises a BilSTM (Bi-directional Long-Short Term Memory) layer and a CRF layer. The BilSTM layer can well capture the dependency relationship of longer distance in the dialogue vector.
Specifically, the computer device inputs the dialogue vector into a BilSTM + CRF model, labels the dialogue vector through the BilSTM + CRF model, and obtains labeling information, including: the computer equipment inputs the dialogue vectors into a BilSTM layer, obtains the dependency relationship in the dialogue vectors through the BilSTM layer, inputs the dialogue vectors with the dependency relationship into a CRF layer, and labels the dialogue vectors through the CRF layer to obtain the labeling information.
FIG. 5 is a block diagram of the BilSTM + CRF model in one embodiment. The computer equipment inputs the dialogue vector corresponding to 'big Chinese' into a BilSTM + CRF model, and labels the dialogue vector through the BilSTM + CRF model. Wherein, the dialogue vector can be represented by a one hot vector.
The computer device then decodes the annotation information to extract the user representation information and the information type of the user representation information from the dialog information. For example, the label information is "O", "B-family member", "I-family member", "O", "B-family member", "I-family member", "B-age", "I-age", "O", the label information is decoded, the beginning part of the family member can be extracted from the "B-family member", the information type is family member, the middle part of the family member can be extracted from the "I-family member", the information type is family member, i.e. "B-family member", "I-family member" constitutes the word content of the family member, the information type is family member; the beginning part of the age can be extracted from the "B-age", the information type is the age, the middle part of the age can be extracted from the "I-age", the information type is the age, namely the "B-age", the "I-age" constitutes the word content of the age, and the information type is the age.
The computer device may input the dialogue vectors into the BilSTM layer, and in other embodiments, may input the dialogue vectors into a Convolutional Neural Network (CNN), and may input the dialogue vectors into a transform model.
The computer device may input the dialog vector carrying the dependency relationship into the CRF layer for labeling, and in other embodiments, may also input the dialog vector carrying the dependency relationship into a bert (bidirectional Encoder responses from transforms) model for labeling.
In this embodiment, the dialog information is subjected to vector conversion to obtain a dialog vector; labeling the conversation vector to obtain labeling information; and decoding each marking information, so that the user portrait information and the information type of the user portrait information can be accurately extracted.
In one embodiment, determining a home object for user portrait information based on the dialog information, the location information, and the information type includes: respectively carrying out vector embedding processing on the dialogue information, the position information and the information type to obtain a dialogue vector, a position vector and a type vector; splicing the conversation vector, the position vector and the type vector to obtain an information extraction vector; and performing densification processing on the information extraction vector, performing logistic regression processing on the information extraction vector subjected to the densification processing to obtain the probability that the user portrait information belongs to each candidate object, and predicting the attribution object of the user portrait information based on the probability that the user portrait information belongs to each candidate object.
Vector embedding (Embed) refers to the process of converting text into vectors. The dialogue vector is a vector obtained by vector embedding of dialogue information. The type vector is a vector obtained by embedding the information type into the vector. The information extraction vector is obtained by splicing the conversation vector, the position vector and the type vector. The logistic regression process is a generalized linear regression analysis and is commonly used in the fields of data mining, automatic disease diagnosis, economic prediction and the like.
The position vector is a vector obtained by vector embedding the position information. The position vector is used for representing the relative position between the entity where the user portrait information is located and other entities in the dialogue information. An entity is text that has an actual meaning. For example, an entity can be "age," "Shenzhen," "we," "34," and so forth. Typically, a plurality of entities are included in the dialog information, and each dimension in the position vector may represent a relative position between the entity where the user portrait information is located and other entities. The relative position may be the number of character strings between the entity where the user portrait information is located and other entities.
The computer equipment compresses high-dimensional information extraction vectors into low-dimensional dense vectors through a word vector model, performs logistic regression processing on the low-dimensional dense vectors to obtain the probability that the user portrait information belongs to each candidate object, inputs the probability that the user portrait information belongs to each candidate object into a feed-forward network, and determines the attribution object of the user portrait information from each candidate object through the feed-forward network. The Word vector model may be a Word2Vec model. The value of the dense vector is a common Double array. The candidate may be any person or object in the dialog information. Further, the candidate object may also be any person or object in the sentence in which the user object information is located.
The dense vector is relative to a sparse vector obtained after one-hot processing, wherein the sparse vector means that the dimensionality of the whole vector is high, but most elements are 0; dense vectors are in contrast thereto, most elements are not 0. The word vector model projects the encoded words of the sparse vector into a low-dimensional space, obtaining a more dense vector than the sparse vector, i.e. a dense vector.
In one embodiment, the candidate object with the highest probability may be determined as the home object for the user portrait information through a feed forward network. In another embodiment, the candidate object corresponding to the second highest probability may be determined to be the home object for the user portrait information through a feed forward network. The specific manner in which the user representation information is attributed through the feed forward network is not limited.
In this embodiment, the dialog information, the position information, and the information type are respectively subjected to vector embedding processing to obtain a dialog vector, a position vector, and a type vector; splicing the conversation vector, the position vector and the type vector to obtain an information extraction vector; the information extraction vector is subjected to densification processing, the information extraction vector subjected to densification processing is subjected to logistic regression processing, the probability that the user portrait information belongs to each candidate object is obtained, and the attribution object of the vector prediction user portrait information can be accurately extracted based on the probability that the user portrait information belongs to each candidate object.
In one embodiment, splicing the dialogue vector, the location vector and the type vector to obtain an information extraction vector comprises: splicing the conversation vector and the position vector to obtain a conversation vector carrying position information; coding the dialogue vector carrying the position information to obtain a coding vector; and splicing the coding vector and the type vector to obtain an information extraction vector.
The encoding vector is a vector obtained by encoding a dialogue vector carrying position information.
Alternatively, the encoding vector may be spliced before the type vector or spliced after the type vector, without being limited thereto. For example, the current dialog vector is a, the historical dialog vector is B, and the computer device splices the current dialog vector a and the historical dialog vector B to obtain a spliced dialog vector (a, B) or (B, a).
And the computer equipment splices the conversation vector and the position vector, and the spliced conversation vector comprises the position information corresponding to the position vector, so that the spliced conversation vector carries the position information.
And the computer equipment adopts the trained coding model to code the dialogue vector carrying the position information to obtain a coding vector. The coding model may be one of a bidirectional Neural Network (RNN) model, a Long Short-Term Memory (LSTM) model, a Gated Recurrent Unit (GRU), and the like.
In this embodiment, the dialogue vector and the position vector are spliced to obtain a dialogue vector carrying position information; coding the dialogue vector carrying the position information to obtain a coding vector; and splicing the coding vector and the type vector to obtain an information extraction vector, wherein the information extraction vector comprises position information, so that an attribution object of the user portrait information can be more accurately extracted.
FIG. 6 is a flow diagram that illustrates the steps for determining an attribution object for user portrait information based on dialog information, location information, and information type, in one embodiment. The computer equipment carries out vector embedding processing on the dialogue information, the position information and the information type respectively to obtain a dialogue vector, a position vector and a type vector; splicing (concatenate) the conversation vector and the position vector to obtain a conversation vector carrying position information; inputting the dialogue vector carrying the position information into a coding model, and coding the sentence vector through the coding model to obtain a coding vector; splicing (concatenate) the coding vector and the type vector to obtain an information extraction vector; performing densification processing on the information extraction vector to obtain a dense vector; and performing logistic regression processing on the dense vectors, and predicting the attribution object of the user portrait information based on the information extraction vectors after the logistic regression processing.
In one embodiment, the method further comprises: and displaying the user portrait information, the information type and the attribution object in a user information display area of a terminal interface of the service object.
When the computer equipment is a terminal, the terminal interface of the service object comprises a user information display area, and user portrait information, information types and attribution objects can be displayed.
When the computer equipment is a server, the server sends the user portrait information, the information type and the attribution object to a terminal where the service object is located, and a user information display area is included in a terminal interface and can display the user portrait information, the information type and the attribution object.
The user information display area may be one of a right area, a left area, and an upper area of the terminal interface.
The computer equipment displays the user portrait information, the information type and the attribution object in a terminal interface of the service object, so that the service object can quickly acquire the user portrait information, the information type and the attribution object when serving a user, and service is better provided for the user.
The user portrait information, the information type and the attribution object are all structured information. Structured information means that the information is analyzed and then decomposed into a plurality of components which are associated with each other, and each component has a clear hierarchical structure.
In one embodiment, the method further comprises: and performing similarity matching on the processing object and each preset text, and determining a target object matched with the processing object from each preset text, wherein the processing object comprises any one of user portrait information, an information type and an attribution object, and the target object is any one of target user portrait information, a target information type and a target attribution object.
The preset text can be set as required.
In one embodiment, the computer device performs similarity matching between the processing object and each preset text by using a literal similarity function, and determines a target object matched with the processing object from each preset text. Specifically, when the jaccard distance or the edit distance between the processing object and the preset text is smaller than the preset distance, the processing object is matched with the preset text, and the preset text is taken as the target object. The Jaccard Distance (Jaccard Distance) is an index for measuring the difference between two sets. The edit distance is a quantitative measure of the difference between two strings (e.g., english text) by how many times a string is changed into another string.
In another embodiment, the computer device performs similarity matching between the processing object and each preset text by using a shallow semantic similarity function, and determines a target object matched with the processing object from each preset text.
For example, different user portrait information is good in body, and the computer device can normalize the different user portrait information to obtain unified user portrait information which is good in body.
In this embodiment, a literal similarity function is used to perform similarity matching between the processing object and each preset text, and a target object matched with the processing object is determined from each preset text, so that user portrait information, information types and attribution objects can be respectively normalized, sentences or words in different expression modes and identical images are unified into the same target object, and the target object can be used for subsequent data screening and use more quickly.
In one embodiment, the target user representation information, the target information type, and the target home object are presented in a user information presentation area of a terminal interface of the service object.
When the computer equipment is a terminal, a user information display area is included in a terminal interface of the service object, and target user portrait information, a target information type and a target attribution object can be displayed.
When the computer equipment is a server, the server sends the target user portrait information, the target information type and the target attribution object to a terminal where the service object is located, and a user information display area is included in a terminal interface and can display the target user portrait information, the target information type and the target attribution object.
The user information display area may be one of a right area, a left area, and an upper area of the terminal interface.
In another embodiment, when the computer device extracts the numerical class information from the user portrait information, the information type and the home object, the numerical class information is converted by using a regular rule to obtain target information, and the target information is displayed in a terminal interface of the service object. For example, if the user portrait information is born in 1994, the user portrait information is transformed using a regular rule to arrive at the age of 26.
Fig. 7 is a schematic flow chart of information extraction in another embodiment. The computer equipment provides an information extraction device, which comprises a conversation information acquisition module, a key information extraction module, a relation attribution module and a normalization module. The computer equipment acquires current dialogue information and historical dialogue information between a user and a service object, respectively carries out text preprocessing on the current dialogue information and the historical dialogue information, and respectively inputs the current dialogue information and the historical dialogue information after the text preprocessing into a word vector model after training to obtain a current dialogue vector and a historical dialogue vector. The computer equipment can adopt insurance documents and documents in a knowledge base to train the word vector model in advance to obtain the trained word vector model.
And the computer equipment splices the current dialogue vector and the historical dialogue vector to obtain a spliced dialogue vector, and screens out the associated dialogue information related to the current dialogue information from the historical dialogue information through the trained context-related text pair matching network.
The computer equipment inputs the current dialogue information and the related dialogue information into the encoder, and then inputs the data output by the encoder into the CRF layer, so as to extract the user portrait information and the information type of the user portrait information. The encoder can be a bidirectional long-short term memory network, dependency relationship between current conversation information and relevant conversation information can be extracted through the bidirectional long-short term memory network, the current conversation information carrying the dependency relationship and the relevant conversation information carrying the dependency relationship are input into a CRF layer, and user portrait information and information types of the user portrait information are extracted. The encoder may also be a Convolutional Neural Network (CNN) or transformer model.
The computer equipment determines the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, acquires the position information of the position, inputs the current dialogue information, the associated dialogue information, the position information and the information type into a relation extraction network, and outputs the attribution object of the user portrait information. The relation extraction network carries out embedding, coding, loading and prediction processing on the current conversation information, the associated conversation information, the position information and the information type in sequence, and determines the attribution object of the user portrait information.
The computer equipment carries out similarity matching on a processing object and each preset text, and determines a target object matched with the processing object from each preset text, wherein the processing object comprises any one of user portrait information, an information type and an attribution object, and the target object is any one of target user portrait information, a target information type and a target attribution object; and displaying the target user portrait information, the target information type and the target attribution object in a terminal interface of the service object.
In one embodiment, another information extraction method is provided, which is applied to a terminal of a service object, and includes the following steps: initiating an information extraction request to a server; acquiring user portrait information, an information type of the user portrait information and a home object of the user portrait information, which are sent by a server; wherein the attribution object is a person or thing to which the user portrait information belongs, the attribution object is determined by the server based on the dialogue information, the location information and the information type between the user and the service object, the location information is obtained based on the location of the user portrait information in the dialogue information, the location is determined based on the dialogue information and the user portrait information, the user portrait information and the information type are extracted from the dialogue information, and the dialogue information is obtained based on the information extraction request; and displaying the user portrait information, the information type and the attribution object in the user information display area.
The information extraction request may include session information, or may include attribute information of the session information, but is not limited thereto. Attribute information such as the number of pieces of dialog information, time range, etc.
In one embodiment, the terminal receives the trigger selection of the candidate information from the user, determines the candidate information triggering the selection as the dialogue information, generates an information extraction request including the dialogue information, and sends the information extraction request to the server. The server acquires the session information from the information extraction request.
In another embodiment, the terminal sends an information extraction request including the attribute information to the server. The server acquires the dialogue information corresponding to the attribute information from the storage based on the attribute information in the information extraction request.
Alternatively, the terminal may display the user portrait information, the information type and the attribution object in an arrangement manner in the user information display area, or may display the user portrait information, the information type and the attribution object in a table form, without being limited thereto.
In this embodiment, a terminal initiates an information extraction request to a server; the server determines conversation information between a user and a service object based on the information extraction request, extracts user portrait information and the information type of the user portrait information from the conversation information, can accurately determine the position of the user portrait information in the conversation information based on the conversation information and the user portrait information to obtain the position information of the position, and can accurately determine the attribution object of the user portrait information based on the conversation information, the position information and the information type, so that the problem that the user cannot accurately extract the information when the user omits a subject in the conversation information is solved, the accuracy of information extraction can be improved, and the user portrait information, the information type and the attribution object are sent to the terminal; the terminal may present user portrait information, information type, and home object in the user information presentation area.
In one embodiment, as shown in fig. 8, a visual interface of a terminal of a service object includes a conversation area 802 and a user information presentation area 804, wherein the visual interface may be an interface of a social product or an interface of a temporary work session; and displaying the conversation information between the user and the service object in a conversation area of the visual interface, so that a belonging object of user portrait information, information type and user portrait information can be extracted from the conversation information between the user and the service object to be displayed in the user information display area, wherein the display form comprises a conversation type, a table format, a tag type and the like. In a specific implementation process, the computer device obtains dialogue information 'asking for you and mr, how many ages of children are, mr 38, me 34, old and half 5 years old, old and half less than one year old', whether you and mr are abnormal in body health, how well the children have immunity, and all the people are good 'between the children's bodies in a chat window, user portrait information, information types of the user portrait information and home objects of the user portrait can be extracted from the dialogue information, the information types of the user portrait information and the home objects of the user portrait are ages, and the home objects of 61 are old people; the user portrait information comprises children, the information type of the children is family members, and the attribution object of the children is the user; the user portrait information includes information types 34, 34 are age, and the attribution object of 34 is the principal; the user profile information includes information types of 38, 38 as age, and 38 as spouse; the user portrait information comprises information types of 'half-5-year, half-year', and one-year ', and attribution objects of' half-5-year, half-year ', and one-year' are children; the user portrait information includes "all is good", the type of information of all is good "is age, and the attribution object of all is good" is child.
The computer equipment displays the extracted user portrait information, the information type of the user portrait information and the attribution object of the user portrait in the interface, so that the service object can quickly and directly know the conditions and requirements of the user when serving one or more users, the service is more quickly provided for the user, the answering efficiency is improved, and the answering content has pertinence.
It should be understood that, although the steps in the flowcharts of fig. 2, 3, 6 and 7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 3, 6, and 7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or in alternation with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 9, an information extracting apparatus is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a session information acquisition module 902, a key information extraction module 904, and a relationship attribution module 906, wherein:
a session information obtaining module 902, configured to obtain session information between the user and the service object.
A key information extraction module 904 for extracting the user portrait information and the information type of the user portrait information from the dialog information; and determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position.
A relationship attribution module 906, configured to determine an attribution object of the user portrait information based on the session information, the location information, and the information type, where the attribution object is a person or thing to which the user portrait information belongs.
The information extraction device acquires the dialogue information between the user and the service object; the user portrait information and the information type of the user portrait information are extracted from the dialogue information, so that the position of the user portrait information in the dialogue information can be accurately determined based on the dialogue information and the user portrait information, the position information of the position is obtained, the attribution object of the user portrait information can be accurately determined based on the dialogue information, the position information and the information type, the problem that the user cannot accurately extract information when a subject is omitted from the dialogue information is solved, and the accuracy of information extraction can be improved.
In one embodiment, the dialog information includes current dialog information and associated dialog information; the session information obtaining module 902 is further configured to obtain current session information between the user and the service object, and historical session information corresponding to the current session information; and screening out associated dialogue information related to the current dialogue information from the historical dialogue information based on the current dialogue information.
In an embodiment, the session information obtaining module 902 is further configured to perform vector conversion on the current session information to obtain a current session vector; carrying out vector conversion on the historical dialogue information to obtain a historical dialogue vector; splicing the current dialogue vector and the historical dialogue vector to obtain a spliced dialogue vector; and based on the semantic similarity, respectively screening out associated dialogue information related to the current dialogue information from the historical dialogue information based on the semantic similarity between the current dialogue information and each sub-dialogue information in the historical dialogue information.
In one embodiment, the information extraction device further includes a training module, configured to obtain a dialog training text; the dialogue training texts comprise a current dialogue training text and a historical dialogue training text, wherein the historical dialogue training text comprises a positive training text related to the current dialogue training text and a negative training text unrelated to the current dialogue training text; inputting the current dialogue training text, the positive training text and the negative training text into the context-related text pair matching network, and training the context-related text pair matching network to obtain the trained context-related text pair matching network.
In an embodiment, the key information extracting module 904 is further configured to perform vector conversion on the dialog information to obtain a dialog vector; labeling the conversation vector to obtain labeling information; and decoding each piece of labeled information to extract the user portrait information and the information type of the user portrait information.
In one embodiment, the relationship attribution module 906 is further configured to perform vector embedding processing on the session information, the location information, and the information type, respectively, to obtain a session vector, a location vector, and a type vector; splicing the conversation vector, the position vector and the type vector to obtain an information extraction vector; and performing densification processing on the information extraction vector, performing logistic regression processing on the information extraction vector subjected to the densification processing to obtain the probability that the user portrait information belongs to each candidate object, and predicting the attribution object of the user portrait information based on the probability that the user portrait information belongs to each candidate object.
In one embodiment, the relationship attribution module 906 is further configured to splice the dialogue vector and the location vector to obtain a dialogue vector carrying location information; coding the dialogue vector carrying the position information to obtain a coding vector; and splicing the coding vector and the type vector to obtain an information extraction vector.
In one embodiment, the information extracting apparatus further includes a presentation module for presenting the user portrait information, the information type, and the home object in a user information presentation area of the terminal interface of the service object.
In an embodiment, the information extraction apparatus further includes a normalization module, configured to perform similarity matching between the processing object and each preset text, and determine a target object matching the processing object from each preset text, where the processing object includes any one of user portrait information, an information type, and a home object, and the target object is any one of target user portrait information, a target information type, and a target home object.
In this embodiment, the information extraction device includes a session information acquisition module, a key information extraction module, a relationship attribution module and a normalization module, and then a developer can expand the extracted information based on a certain module without performing overall improvement on the whole device, so that the information extraction device is expanded at higher efficiency.
In one embodiment, another information extraction apparatus is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: the device comprises a request initiating module, an information acquiring module and a display module, wherein:
and the request initiating module is used for initiating an information extraction request to the server.
The information acquisition module is used for acquiring the user portrait information sent by the server, the information type of the user portrait information and the attribution object of the user portrait information; the attribution object is a person or a thing to which user portrait information belongs, the attribution object is determined by the server based on conversation information between the user and the service object, position information and an information type, the position information is obtained based on a position where the user portrait information is located in the conversation information, the position is determined based on the conversation information and the user portrait information, the user portrait information and the information type are extracted from the conversation information, and the conversation information is obtained based on an information extraction request.
And the display module is used for displaying the user portrait information, the information type and the attribution object in the user information display area.
In the information extraction device, the terminal initiates an information extraction request to the server; the server determines conversation information between a user and a service object based on the information extraction request, extracts user portrait information and the information type of the user portrait information from the conversation information, can accurately determine the position of the user portrait information in the conversation information based on the conversation information and the user portrait information to obtain the position information of the position, and can accurately determine the attribution object of the user portrait information based on the conversation information, the position information and the information type, so that the problem that the user cannot accurately extract the information when the user omits a subject in the conversation information is solved, the accuracy of information extraction can be improved, and the user portrait information, the information type and the attribution object are sent to the terminal; the terminal may present user portrait information, information type, and home object in the user information presentation area.
For specific limitations of the information extraction device, reference may be made to the above limitations of the information extraction method, which are not described herein again. The modules in the information extraction device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an information extraction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An information extraction method, characterized in that the method comprises:
acquiring dialogue information between a user and a service object;
extracting user portrait information and an information type of the user portrait information from the dialogue information;
determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position;
determining an attribution target of the user portrait information based on the dialogue information, the location information, and the information type, the attribution target being a person or thing to which the user portrait information belongs.
2. The method of claim 1, wherein the dialog information includes current dialog information and associated dialog information; the acquiring of the dialog information between the user and the service object includes:
acquiring current conversation information between a user and a service object and historical conversation information corresponding to the current conversation information;
and screening out associated dialogue information related to the current dialogue information from the historical dialogue information based on the current dialogue information.
3. The method of claim 2, wherein the screening the historical dialog information for associated dialog information that is related to the current dialog information based on the current dialog information comprises:
performing vector conversion on the current dialogue information to obtain a current dialogue vector; performing vector conversion on the historical dialogue information to obtain a historical dialogue vector;
splicing the current dialogue vector and the historical dialogue vector to obtain a spliced dialogue vector;
and determining semantic similarity between the current dialogue information and each sub-dialogue information in the historical dialogue information respectively based on the spliced dialogue vector, and screening out associated dialogue information related to the current dialogue information from the historical dialogue information based on the semantic similarity.
4. The method of claim 1, wherein said determining a home object for said user representation information based on said session information, said location information, and said information type comprises:
respectively carrying out vector embedding processing on the dialogue information, the position information and the information type to obtain a dialogue vector, a position vector and a type vector;
splicing the dialogue vector, the position vector and the type vector to obtain an information extraction vector;
and performing densification processing on the information extraction vector, performing logistic regression processing on the information extraction vector subjected to the densification processing to obtain the probability that the user portrait information belongs to each candidate object, and predicting the attribution object of the user portrait information based on the probability that the user portrait information belongs to each candidate object.
5. The method of claim 4, wherein the concatenating the dialogue vector, the location vector, and the type vector into an information extraction vector comprises:
splicing the dialogue vector and the position vector to obtain a dialogue vector carrying position information;
coding the dialogue vector carrying the position information to obtain a coding vector;
and splicing the coding vector and the type vector to obtain an information extraction vector.
6. The method according to any one of claims 1 to 5, further comprising:
and performing similarity matching on a processing object and each preset text, and determining a target object matched with the processing object from each preset text, wherein the processing object comprises any one of the user portrait information, the information type and the attribution object, and the target object is any one of the target user portrait information, the target information type and the target attribution object.
7. An information extraction method, characterized in that the method comprises:
initiating an information extraction request to a server;
acquiring user portrait information sent by the server, the information type of the user portrait information and an attribution object of the user portrait information; wherein the attribution object is a person or thing to which the user portrait information belongs, the attribution object is determined by the server based on conversation information between a user and a service object, location information obtained based on a location in which the user portrait information is located in the conversation information, and the information type, the location being determined based on the conversation information and the user portrait information, the user portrait information and the information type being extracted from the conversation information, the conversation information being obtained based on the information extraction request;
and displaying the user portrait information, the information type and the attribution object in a user information display area.
8. An information extraction apparatus, characterized in that the apparatus comprises:
the conversation information acquisition module is used for acquiring conversation information between the user and the service object;
the key information extraction module is used for extracting user portrait information and the information type of the user portrait information from the dialogue information; determining the position of the user portrait information in the dialogue information based on the dialogue information and the user portrait information, and acquiring the position information of the position;
a relationship attribution module for determining an attribution object of the user portrait information based on the dialogue information, the location information and the information type, the attribution object being a person or thing to which the user portrait information belongs.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202011538124.7A 2020-12-23 2020-12-23 Information extraction method and device, computer equipment and storage medium Pending CN112668327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011538124.7A CN112668327A (en) 2020-12-23 2020-12-23 Information extraction method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011538124.7A CN112668327A (en) 2020-12-23 2020-12-23 Information extraction method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112668327A true CN112668327A (en) 2021-04-16

Family

ID=75408082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011538124.7A Pending CN112668327A (en) 2020-12-23 2020-12-23 Information extraction method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112668327A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077353A (en) * 2021-04-22 2021-07-06 北京十一贝科技有限公司 Method, apparatus, electronic device, and medium for generating underwriting conclusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077353A (en) * 2021-04-22 2021-07-06 北京十一贝科技有限公司 Method, apparatus, electronic device, and medium for generating underwriting conclusion
CN113077353B (en) * 2021-04-22 2024-02-02 北京十一贝科技有限公司 Method, device, electronic equipment and medium for generating nuclear insurance conclusion

Similar Documents

Publication Publication Date Title
CN106980683B (en) Blog text abstract generating method based on deep learning
CN111858944B (en) Entity aspect level emotion analysis method based on attention mechanism
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
WO2021042516A1 (en) Named-entity recognition method and device, and computer readable storage medium
CN112084334B (en) Label classification method and device for corpus, computer equipment and storage medium
CN111144120A (en) Training sentence acquisition method and device, storage medium and electronic equipment
CN112614559A (en) Medical record text processing method and device, computer equipment and storage medium
CN114139551A (en) Method and device for training intention recognition model and method and device for recognizing intention
WO2023029501A1 (en) Smart interrogation method and apparatus, electronic device, and storage medium
CN114416995A (en) Information recommendation method, device and equipment
CN112307168A (en) Artificial intelligence-based inquiry session processing method and device and computer equipment
CN116628186B (en) Text abstract generation method and system
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN112183030A (en) Event extraction method and device based on preset neural network, computer equipment and storage medium
CN114519356A (en) Target word detection method and device, electronic equipment and storage medium
CN111581972A (en) Method, device, equipment and medium for identifying corresponding relation between symptom and part in text
CN115394393A (en) Intelligent diagnosis and treatment data processing method and device, electronic equipment and storage medium
CN115525757A (en) Contract abstract generation method and device and contract key information extraction model training method
CN116050352A (en) Text encoding method and device, computer equipment and storage medium
CN114613462A (en) Medical data processing method and device, electronic equipment and storage medium
Sathyendra et al. Helping users understand privacy notices with automated query answering functionality: An exploratory study
CN112668327A (en) Information extraction method and device, computer equipment and storage medium
CN116956925A (en) Electronic medical record named entity identification method and device, electronic equipment and storage medium
CN116956816A (en) Text processing method, model training method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination