WO2023065211A1 - 一种信息获取方法以及装置 - Google Patents
一种信息获取方法以及装置 Download PDFInfo
- Publication number
- WO2023065211A1 WO2023065211A1 PCT/CN2021/125260 CN2021125260W WO2023065211A1 WO 2023065211 A1 WO2023065211 A1 WO 2023065211A1 CN 2021125260 W CN2021125260 W CN 2021125260W WO 2023065211 A1 WO2023065211 A1 WO 2023065211A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- node
- event
- knowledge graph
- personal knowledge
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 132
- 230000000875 corresponding effect Effects 0.000 claims description 129
- 238000012545 processing Methods 0.000 claims description 78
- 230000015654 memory Effects 0.000 claims description 61
- 239000013598 vector Substances 0.000 claims description 56
- 238000004458 analytical method Methods 0.000 claims description 42
- 238000003860 storage Methods 0.000 claims description 39
- 238000004590 computer program Methods 0.000 claims description 14
- 230000008451 emotion Effects 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 13
- 230000002596 correlated effect Effects 0.000 claims description 7
- 238000013473 artificial intelligence Methods 0.000 abstract description 17
- 238000013528 artificial neural network Methods 0.000 description 64
- 230000006870 function Effects 0.000 description 34
- 238000010586 diagram Methods 0.000 description 28
- 238000012549 training Methods 0.000 description 28
- 238000000605 extraction Methods 0.000 description 27
- 230000008569 process Effects 0.000 description 27
- 238000004891 communication Methods 0.000 description 21
- 239000011159 matrix material Substances 0.000 description 15
- 238000013500 data storage Methods 0.000 description 14
- 230000002996 emotional effect Effects 0.000 description 13
- 230000006399 behavior Effects 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 10
- 238000010276 construction Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000004927 fusion Effects 0.000 description 9
- 238000003058 natural language processing Methods 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 7
- 235000013399 edible fruits Nutrition 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 230000000306 recurrent effect Effects 0.000 description 6
- MHABMANUFPZXEB-UHFFFAOYSA-N O-demethyl-aloesaponarin I Natural products O=C1C2=CC=CC(O)=C2C(=O)C2=C1C=C(O)C(C(O)=O)=C2C MHABMANUFPZXEB-UHFFFAOYSA-N 0.000 description 5
- 239000000047 product Substances 0.000 description 5
- 238000013519 translation Methods 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000013523 data management Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 240000001417 Vigna umbellata Species 0.000 description 2
- 235000011453 Vigna umbellata Nutrition 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 102100033814 Alanine aminotransferase 2 Human genes 0.000 description 1
- 101710096000 Alanine aminotransferase 2 Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 240000006394 Sorghum bicolor Species 0.000 description 1
- 235000011684 Sorghum saccharatum Nutrition 0.000 description 1
- 229910000831 Steel Inorganic materials 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- QVFWZNCVPCJQOP-UHFFFAOYSA-N chloralodol Chemical compound CC(O)(C)CC(C)OC(O)C(Cl)(Cl)Cl QVFWZNCVPCJQOP-UHFFFAOYSA-N 0.000 description 1
- 235000009508 confectionery Nutrition 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
Definitions
- This application relates to the field of artificial intelligence, in particular to an information acquisition method and device.
- the embodiment of the present application provides an information acquisition method and device, which are used to extract more accurate information from the text input by the user by combining neural network and syntactic analysis, and save the relevant information of the user through the personal knowledge map, which can realize More efficient data retrieval.
- the present application provides an information acquisition method, including: acquiring the input text of the target user, the input text includes at least one word, and at least one word forms at least one event; based on the input text, the output sequence is acquired, and the output The sequence includes at least one event type and elements; obtain the personal knowledge graph according to the output sequence, the personal knowledge graph includes multiple nodes, the multiple nodes include type nodes and element nodes, and the type nodes are used to represent the type of at least one event, the element The node is used to represent the element of at least one event, and the type node corresponding to the type in the same event is associated with the element node corresponding to the element, that is, the type node in the same event is associated with the element node, and the personal knowledge map is used for the target Users make recommendations.
- the types and elements of events generated by target users are accurately extracted in units of events, and a knowledge map is constructed, so that various events of target users can be saved more conveniently and accurately, and the related information of target users Knowledge is more accurately recorded. Therefore, when recommending target users in the future, accurate information can be accurately queried in units of events, and complete events can be queried accurately through the association relationship between nodes, which improves the accuracy of data query and the effectiveness of recommendation.
- a personal knowledge graph is constructed for the user, which can be constructed or updated based on entities extracted from the input text.
- the personal knowledge graph constructed in the embodiment of the present application The granularity of the knowledge map is smaller, which can record the user's information more accurately and improve the accuracy of the description of the user.
- retrieval can be performed more efficiently through nodes, so that recommendations for users can be made more efficiently.
- the output sequence also includes an association relationship between elements of at least one event
- the element nodes corresponding to the elements that have an association relationship with the same event in the personal knowledge map are associated for example, can be The type, element and association relationship of the event extracted from the input text, the association relationship includes the association relationship between types and/or elements, after the type node and element node are constructed, the type node and the element node can also be connected according to the association relationship Element nodes, so that complete events can be identified in the personal knowledge graph through the association relationship, and events can be recorded more completely; or, if the output sequence also includes at least one emotional category of the event, the same event in the personal knowledge graph corresponds to The element nodes of the system are related by emotion category. For example, the emotion category of an event can be extracted from the input text, and the nodes in the same event can be connected according to the emotion category to complete the complete record of the emotion event.
- complete records can be carried out according to different types of events.
- events of interest can be connected to element nodes according to the association relationship between elements
- emotional events can be connected to element nodes according to emotional categories.
- Strong generalization ability record more types of events through corresponding connection methods, and can adapt to more application scenarios.
- the output sequence may include the type elements of the first event, the first event is any one of the aforementioned at least one event, and the aforementioned acquisition of the personal knowledge map according to the output sequence may be Including: if the initial knowledge graph includes the information of the first event, update the element nodes corresponding to the first event included in the initial knowledge graph or the association relationship between element nodes to obtain a personal knowledge graph; if the initial knowledge graph does not include For information about the first event, the type node and element node of the first event are added to the initial knowledge graph, and the type node and element node of the first event are associated to obtain a personal knowledge graph.
- the events in the personal knowledge graph can be updated or added, thereby enriching the information included in the personal knowledge graph.
- an initial sequence corresponding to the input text is obtained through a text processing model, and the initial sequence includes a vector representation of at least one word in the input text and a first category label corresponding to at least one word; Perform syntactic analysis on the input text to obtain a feature sequence, the feature sequence includes at least one word corresponding to the second category label; combine the initial sequence and the feature sequence to obtain an output sequence, the output sequence includes elements and types of the at least one event.
- the neural network and syntactic analysis are combined to extract more accurate information from the input text, and then use the more accurate information extracted from the input text to generate or update the personal knowledge of the target user Graph, so that the personal knowledge graph can more accurately reflect the characteristics of the user, so that the personal knowledge graph can be used to make more accurate recommendations for target users in the future.
- the aforementioned combination of the initial sequence and the feature sequence to obtain the output sequence and obtain the personal knowledge map may include: correcting the initial sequence according to the feature sequence to obtain the output sequence; according to the output sequence, obtaining the personal knowledge map .
- the feature sequence can be used to correct the initial sequence extracted by the neural network, so that the information extracted from the input text in various ways can be combined to obtain more accurate information, and use more accurate
- the personal knowledge graph can be obtained from the information, so as to obtain a personal knowledge graph that can more accurately describe the target user.
- the foregoing method may further include: acquiring a first knowledge graph, where the first knowledge graph includes multiple nodes, where the multiple nodes include information about at least one entity, and the first personal knowledge A node in the graph may represent an entity, or may represent an element or type of an event; obtain associated information associated with a node in the personal knowledge graph from the first knowledge graph; use the associated information to expand the personal knowledge graph, Get an expanded personal knowledge map.
- the first knowledge graph can be used to expand the personal knowledge graph.
- the data in the first knowledge graph does not depend on the user's input data, so that the personal knowledge graph includes more information, so that subsequent More information can be found in the personal knowledge graph.
- the aforementioned outputting the output sequence corresponding to the input text through the text processing model may include: taking the input text as the input of the text processing model, and outputting the initial sequence, wherein the text processing model is used to perform the following steps : Perform natural language processing on the input text to obtain a feature vector sequence and an entity sequence.
- the entity sequence includes at least one vector representation corresponding to each word in the word, and the feature vector sequence includes the feature vector corresponding to the input text; obtain the vector in the entity sequence Corresponding position information; merging the position information and feature vector sequence to obtain a fusion sequence; classifying the entities corresponding to the fusion sequence to obtain a label sequence, and the initial sequence includes the vector representation and label sequence corresponding to each word.
- the text can be converted into a vector representation by the neural network, and the context information of each word in the input text and the relationship between words can be extracted, so that accurate information can be extracted from the input text.
- the foregoing method may further include: obtaining information of at least one node matching the output sequence from the personal knowledge map; generating recommendation information for the target user according to the information of at least one node, and the recommendation information is used for Make recommendations for target users.
- the implementation of the present application can be applied to recommendation scenarios, so that more accurate information related to user input text can be efficiently retrieved in combination with finer-grained personal knowledge graphs, so that more efficient and accurate recommendations can be made for users, and user experience.
- the aforementioned obtaining information of at least one node matching the output sequence from the personal knowledge graph may include: screening out information of at least one first node corresponding to the output sequence from the personal knowledge graph; The information of at least one second node associated with the at least one first node is searched from the personal knowledge graph, and the information of the at least one node includes the information of at least one first node and the information of at least one second node.
- the embodiment of this application provides a specific way of querying data from the personal knowledge graph.
- the information of the first node and the information of the second node are information of different domains. Therefore, the embodiments of the present application can realize cross-domain recommendation for users and improve user experience.
- each node in the personal knowledge graph includes a corresponding weight, and the weight of each node is negatively correlated with the storage duration or update duration, and each node is any node in the personal knowledge graph,
- the saving time is the time for saving the information of each node
- the update time is the time for updating the information included in each node last time. Therefore, in the embodiment of the present application, the user's information can be recorded by attenuating the weight, so as to realize the memory of the user's knowledge.
- the aforementioned generation of recommendation information for target users based on the information of at least one node includes: sorting at least one node according to the weight corresponding to at least one node; Ranking of nodes generates recommendation information.
- the recommendation sequence can be arranged based on the weight, so as to recommend more effective information for the user and improve user experience.
- the aforementioned acquiring the input text of the target user may include: acquiring user input data, where the input data includes at least one of image, text, or voice; and extracting the input text from the input data.
- the foregoing method may further include: acquiring structured data of the target user, where the structured data is data in a preset format; extracting at least one event information from the structured data according to preset rules; The personal knowledge graph is updated according to the information of at least one event to obtain an updated personal knowledge graph.
- the present application also provides a graphical user interface GUI, which is characterized in that the graphical user interface is stored in an electronic device, and the electronic device includes a display screen, a memory, and one or more processors, and one or more processors are used for Executing one or more computer programs stored in the memory, the graphical user interface includes:
- the personal knowledge graph is used to make recommendations for the target user.
- the GUI may further include: displaying a permission request, where the permission request is used to indicate whether to use the target user's input text to acquire the personal knowledge graph.
- the user's input information can be collected through the application program (application, APP) installed in the user's smart terminal, and then it can be displayed on the display interface whether the input data in each APP is allowed to be collected as a knowledge source of the personal knowledge graph, thereby Improve user data privacy and security.
- application application
- the GUI may further include: in response to acquiring association information associated with nodes in the personal knowledge graph from the first knowledge graph, and using the association information to expand the personal knowledge graph After obtaining the expanded personal knowledge graph, displaying the expanded personal knowledge graph, the first knowledge graph includes multiple nodes, the multiple nodes include information about at least one entity, and the first personal knowledge graph includes A node can represent a type of entity, or it can represent an element or type of event.
- the GUI may further include: displaying the first knowledge graph.
- the GUI may further include: in response to generating recommendation information for the target user according to the information of at least one node acquired in the personal knowledge graph, and displaying the recommendation information, the recommendation information is used for The target user makes recommendations.
- each node in the personal knowledge graph includes a corresponding weight
- the at least one node is sorted according to the corresponding weight
- the GUI may further include: responding to the information according to the at least one node and the Sorting of at least one node generates the recommendation information, and displays the recommendation information.
- the GUI may further include: displaying input text in response to the target user's input operation on the first input interface, where the input text is extracted from the input data of the target user, and the input The data includes at least one of image, text or voice data.
- the GUI may further include: updating the personal knowledge graph according to the acquired structured data in response to the user's input operation on the second input interface, and displaying the updated personal knowledge graph,
- the structured data is data in a preset format.
- an information acquisition device including:
- the input module is used to obtain the input text of the target user, the input text includes at least one word, and at least one word forms at least one event;
- a text processing module configured to obtain an output sequence based on the input text, the output sequence including at least one type and element of an event;
- the obtaining module is used to obtain the personal knowledge map according to the output sequence.
- the personal knowledge map includes multiple nodes, the multiple nodes include type nodes and feature nodes, the type node is used to represent the type of at least one event, and the feature node is used to represent at least one
- the elements of the event are associated with the type nodes and element nodes in the same event, and the personal knowledge graph is used to make recommendations for target users.
- the output sequence also includes an association relationship between elements of at least one event, then the element nodes corresponding to the elements that have an association relationship with the same event in the personal knowledge map are associated; if the output sequence Also includes the emotional category, and the element nodes corresponding to the same event in the personal knowledge graph are associated through the emotional category.
- the output sequence may include the type elements of the first event, the first event is any one of the aforementioned at least one event, and the acquisition module is specifically configured to: if the initial knowledge graph includes For the information of the first event, update the element nodes corresponding to the first event included in the initial knowledge graph and the relationship between the element nodes to obtain the personal knowledge graph; if the initial knowledge graph does not include the information of the first event, then in The type node and element node of the first event are added to the initial knowledge graph, and the type node and element node of the first event are associated to obtain a personal knowledge graph.
- the text processing module is specifically configured to: obtain an initial sequence corresponding to the input text through a text processing model, and the initial sequence includes a vector representation of at least one word in the input text and the first sequence corresponding to at least one word A category label; perform syntactic analysis on the input text to obtain a feature sequence, the feature sequence includes at least a second category label corresponding to a word; combine the initial sequence and the feature sequence to obtain an output sequence, and the output sequence includes elements and types of at least one event.
- the text processing module is specifically configured to: correct a portion of the initial sequence that does not match the feature sequence to obtain an output sequence.
- the text processing module is further configured to: if each word in the feature sequence corresponds to multiple second category labels, determine a unique second category label for each word, and obtain the updated feature sequence.
- the text processing module is specifically configured to: obtain the initial sequence through a text processing model according to the input text, wherein the text processing model is used to perform the following steps: perform natural language processing on the input text to obtain A feature vector sequence and an entity sequence, the entity sequence includes a vector representation corresponding to each word in at least one word, the feature vector sequence includes the feature vector corresponding to the input text; obtain the position information corresponding to the vector in the entity sequence; fuse the position information and features The vector sequence is obtained to obtain the fusion sequence; the entity corresponding to the fusion sequence is classified to obtain the label sequence, and the initial sequence includes the vector representation corresponding to each word and the label sequence.
- the device further includes an expansion module, configured to: acquire a first knowledge graph, where the first knowledge graph includes multiple nodes, where the multiple nodes include information about at least one type of entity, and the first knowledge graph includes information about at least one type of entity.
- a node in a personal knowledge graph can represent an entity, or can represent an element or type of an event; obtain associated information associated with a node in the personal knowledge graph from the first knowledge graph; use the associated information to analyze personal knowledge The graph is expanded to obtain the expanded personal knowledge graph.
- the device further includes a recommendation module, configured to: acquire information of at least one node matching the output sequence from the personal knowledge map; generate recommendation information for the target user according to the information of at least one node, and recommend The information is used to make recommendations for target users.
- a recommendation module configured to: acquire information of at least one node matching the output sequence from the personal knowledge map; generate recommendation information for the target user according to the information of at least one node, and recommend The information is used to make recommendations for target users.
- the recommendation module is specifically configured to: filter out the information of at least one first node corresponding to the output sequence from the personal knowledge graph; find at least one node associated with the at least one first node from the personal knowledge graph The information of one second node, the information of at least one node includes the information of at least one first node and the information of at least one second node.
- the information of the first node and the information of the second node are information of different domains.
- each node in the personal knowledge map includes a corresponding weight, and the weight of each node is negatively correlated with the storage time or update time.
- the storage time is the time for saving the information of each node, and the update
- the duration is the duration from the last update of the information included in each node.
- the recommendation module is specifically configured to: rank the at least one node according to the weight corresponding to the at least one node; generate recommendation information according to the information of the at least one node and the ranking of the at least one node.
- the input module is specifically configured to: acquire user input data, where the input data includes at least one of image, text, or voice data; and extract input text from the input data.
- the input module is also used to obtain the structured data of the target user, and the structured data is data in a preset format;
- the obtaining module is also used to extract information of at least one event from the structured data according to preset rules
- the acquisition module is further configured to update the personal knowledge map according to the information of at least one event, to obtain an updated personal knowledge map.
- an embodiment of the present application provides an information acquisition device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor calls the program code in the memory to execute any one of the above-mentioned first aspects The processing-related functions in the information acquisition method shown.
- the embodiment of the present application provides an electronic device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor calls the program code in the memory to execute any one of the above-mentioned first aspects. Functions related to processing in the information acquisition method shown.
- the embodiment of the present application provides an information acquisition device.
- the information acquisition device can also be called a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit obtains program instructions through the communication interface, and the program instructions are
- the processing unit executes, and the processing unit is configured to perform processing-related functions in the first aspect or any optional implementation manner of the first aspect.
- the embodiment of the present application provides a computer-readable storage medium, including instructions, which, when run on a computer, cause the computer to execute the method in the above-mentioned first aspect and any optional implementation manner of the first aspect.
- the embodiments of the present application provide a computer program product including instructions, which, when run on a computer, cause the computer to execute the method in the above-mentioned first aspect and any optional implementation manner of the first aspect.
- Fig. 1 is a schematic diagram of an artificial intelligence subject framework applied in the present application
- FIG. 2 is a schematic diagram of a system architecture provided by the present application.
- FIG. 3 is a schematic diagram of a convolutional neural network structure provided by an embodiment of the present application.
- FIG. 4 is a schematic flow diagram of an information acquisition method provided by the present application.
- FIG. 5 is a schematic flow chart of another information acquisition method provided by the present application.
- FIG. 6 is a schematic flowchart of another information acquisition method provided by the present application.
- FIG. 7 is a schematic flow diagram of a neural network execution provided by the present application.
- FIG. 8 is a schematic flowchart of another information acquisition method provided by the present application.
- FIG. 9 is a schematic diagram of an event record provided by the present application.
- FIG. 10 is a schematic flow chart of updating PKG provided by the present application.
- FIG. 11 is a schematic flow chart of setting weights for nodes provided by the present application.
- FIG. 12 is a schematic flow diagram of a PKG expansion provided by the present application.
- FIG. 13 is a schematic diagram of an application scenario of the information acquisition method provided by the present application.
- FIG. 14 is a schematic flowchart of a recommendation rule of the information acquisition method provided by the present application.
- FIG. 15 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 16 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 17 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 18 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 19 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 20 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 21 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 22 is a schematic diagram of another application scenario of the information acquisition method provided by this application.
- FIG. 23 is a schematic diagram of the structure of the method for deploying information acquisition in the terminal provided by the present application.
- FIG. 24 is a schematic flowchart of another information acquisition method provided by the present application.
- Figure 25 is a schematic structural diagram of a PKG provided by the present application.
- FIG. 26 is a schematic structural diagram of an information acquisition device provided by the present application.
- Fig. 27 is a schematic structural diagram of another information acquisition device provided by the present application.
- FIG. 28 is a schematic structural diagram of an electronic device provided by the present application.
- FIG. 29 is a schematic diagram of a chip structure provided by the present application.
- AI artificial intelligence
- AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
- Figure 1 shows a schematic structural diagram of the main framework of artificial intelligence.
- the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( Vertical axis) to illustrate the above artificial intelligence theme framework in two dimensions.
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
- IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
- the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
- the basic platform includes distributed computing framework and network and other related platform guarantees and supports, which can include cloud storage and Computing, interconnection network, etc.
- sensors communicate with the outside to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
- Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
- machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
- Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies.
- the typical functions are search and matching.
- Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
- some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.
- Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
- the embodiment of the present application involves the application of neural network and natural language processing (NLP).
- NLP natural language processing
- Corpus also known as free text, which can be words, phrases, sentences, fragments, articles and any combination thereof. For example, "the weather is really nice today" is a corpus.
- Entity Objects that exist in the corpus. For example, in a corpus of "Xiao Ming went out for a walk with the dog", it can include entities: “Xiao Ming” and "dog". And each entity has corresponding one or more categories, for example, the category label of "Xiao Ming" is "person”, and the category label of "dog” is "animal”.
- the self-attention model refers to effectively encoding a sequence of data (such as natural corpus "your mobile phone is very good.") into several multi-dimensional vectors, which is convenient for numerical operations.
- the mutual similarity information of each element is called self-attention.
- Loss function It can also be called the cost function (cost function), a measure that compares the difference between the predicted output of the machine learning model for the sample and the real value of the sample (also called the supervision value), which is used to measure The difference between the predicted output of a machine learning model for a sample and the true value of the sample.
- the loss function may generally include loss functions such as error square mean square, cross entropy, logarithm, and exponential.
- the error mean square can be used as the loss function, defined as Specifically, a specific loss function can be selected according to the actual application scenario.
- Stochastic gradient The number of samples in machine learning is very large, so the loss function calculated each time is calculated from the data obtained by random sampling, and the corresponding gradient is called stochastic gradient.
- Backpropagation An algorithm that calculates the gradient of model parameters and updates model parameters based on the loss function.
- the neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal until the output will generate an error loss, and updating the parameters in the initial neural network model by backpropagating the error loss information, so that the error loss converges.
- the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
- Neural machine translation is a typical task in natural language processing. The task is to give a sentence in the source language and output its corresponding sentence in the target language.
- the words in the sentences of the source language and the target language are encoded into vector representations, and the relationship between words and sentences and sentences are calculated in the vector space to perform translation tasks.
- Pre-trained language model It is a natural language sequence encoder that encodes each word in the natural language sequence into a vector representation to perform prediction tasks.
- the training of PLM includes two phases, namely pre-training (pre-training) phase and fine-tuning (finetuning) phase.
- pre-training phase the model performs language model task training on large-scale unsupervised texts, thereby learning word representation.
- fine-tuning stage the model is initialized with the parameters learned in the pre-training stage, and it can be successfully trained with fewer steps on downstream tasks such as text classification or sequence labeling.
- the semantic information obtained by pre-training is successfully transferred to downstream tasks.
- Embedding Refers to the feature representation of the sample.
- BiLSTM+CRF It is a neural network-based named entity recognition model, which is a model based on word embedding and word embedding. BiLSTM and CRF are two different layers in the named entity recognition model.
- Sigmoid multi-label classification model The label of a sample is not limited to one category, but can have multiple categories, and different categories are related. For example, a piece of clothing has attributes such as long sleeves and lace. These two attribute tags are not mutually exclusive, but related.
- Schemas A data format used to limit the format of the data to be added to the knowledge map; it is equivalent to a data model in a certain field, including meaningful concept types in the field and the attributes of these types. Its role is mainly to standardize the expression of structured data. A piece of data must satisfy the predefined entity object and its type of Schema before it is allowed to be updated into the knowledge graph.
- Elasticsearch It is a distributed, highly scalable, high real-time search and data analysis engine. It is very convenient to enable large amounts of data to be searched, analyzed and explored. Taking full advantage of the horizontal scalability of Elasticsearch can make data more valuable in the production environment.
- the implementation principle of Elasticsearch is mainly divided into the following steps. First, the user submits the data to the Elasticsearch database, and then uses the word segmentation controller to segment the corresponding sentence, and stores its weight and word segmentation results into the data. When the user searches for data At that time, the results will be ranked and scored according to the weight, and then the returned results will be presented to the user.
- Transformers library Provides models for natural language understanding (NLU) or natural language generation (NLG), such as BERT (bidirectional encoder representations from transformers), GPT-2, RoBERTa, XLM, DistilBert , XLNet, CTRL, etc., have multiple pre-trained models and support multiple languages.
- NLU natural language understanding
- NLG natural language generation
- the natural language processing method provided in the embodiment of the present application can be executed on a server, and can also be executed on a terminal device.
- the terminal device may be a mobile phone with an image processing function, a tablet personal computer (tablet personal computer, TPC), a media player, a smart TV, a notebook computer (laptop computer, LC), a personal digital assistant (personal digital assistant, PDA) ), personal computer (personal computer, PC), camera, video camera, smart watch, wearable device (wearable device, WD) or self-driving vehicle, etc., which are not limited in this embodiment of the present application.
- the embodiment of the present application provides a system architecture 200 .
- the system architecture includes a database 230 and a client device 240 .
- the data collection device 260 is used to collect data and store it in the database 230 , and the training module 202 generates the target model/rule 201 based on the data maintained in the database 230 .
- the following will describe in more detail how the training module 202 obtains the target model/rule 201 based on the data.
- the target model/rule 201 is the neural network mentioned in the following embodiments of this application. For details, refer to the relevant descriptions in the following Figures 4A-12 .
- the calculation module may include a training module 202, and the target model/rule obtained by the training module 202 may be applied to different systems or devices.
- the execution device 210 is configured with a transceiver 212, which can be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to perform data interaction with external devices, and the "user" can
- the client device 240 inputs data to the transceiver 212 , for example, the client device 240 may send the target task to the execution device 210 , request the execution device to train the neural network, and send the training database to the execution device 210 .
- the execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .
- the calculation module 211 uses the target model/rule 201 to process the input data. Specifically, the calculation module 211 is used to: obtain the input text of the target user, which includes at least one word, and at least one word forms at least one event; obtains an output sequence based on the input text, and the output sequence includes the type and element of at least one event ;acquire a personal knowledge graph according to the output sequence, the personal knowledge graph includes multiple nodes, the multiple nodes include a type node and a feature node, the type node is used to represent the type of at least one event, and the feature node is used to represent the element of at least one event, The type node corresponding to the type in the same event is associated with the element node corresponding to the element, which means that the type node in the same event is associated with the element node, and the personal knowledge graph is used to make recommendations for target users.
- the transceiver 212 returns the constructed neural network to the client device 240, so as to deploy the neural network in the client device 240 or other devices.
- the training module 202 can obtain corresponding target models/rules 201 based on different data for different tasks, so as to provide users with better results.
- the data input into the execution device 210 can be determined according to the user's input data, for example, the user can operate in the interface provided by the transceiver 212 .
- the client device 240 can automatically input data to the transceiver 212 and obtain a result. If the client device 240 needs to obtain authorization from the user for automatically inputting data, the user can set corresponding permissions in the client device 240 .
- the user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be specific ways such as display, sound, and action.
- the client device 240 may also serve as a data collection terminal and store the collected data associated with the target task into the database 230 .
- the training or updating process mentioned in this application can be performed by the training module 202 .
- the training process of the neural network is to learn the way to control the space transformation, and more specifically, to learn the weight matrix.
- the purpose of training the neural network is to make the output of the neural network as close as possible to the expected value. Therefore, the weight of each layer of the neural network in the neural network can be updated according to the difference between the predicted value and the expected value of the current network.
- vector (of course, before the first update, the weight vector can usually be initialized first, that is, to pre-configure the parameters for each layer in the deep neural network). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value.
- the value output by the neural network is close to the expected value or equal to the expected value.
- the difference between the predicted value and the expected value of the neural network can be measured by a loss function or an objective function. Taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference.
- the training of the neural network can be understood as the process of reducing the loss as much as possible.
- the target model/rule 201 is obtained by training according to the training module 202.
- the target model/rule 201 may be the self-attention model in the present application, and the self-attention model may include depth volume Productive neural networks (deep convolutional neural networks, DCNN), recurrent neural networks (recurrent neural network, RNNS) and other networks.
- the neural network mentioned in this application can include various types, such as deep neural network (deep neural network, DNN), convolutional neural network (convolutional neural network, CNN), recurrent neural network (recurrent neural networks, RNN) or residual network other neural networks etc.
- the database 230 may be used to store a sample set for training.
- the execution device 210 generates a target model/rule 201 for processing samples, and iteratively trains the target model/rule 201 using the sample set in the database to obtain a mature target model/rule 201.
- the target model/rule 201 is specifically expressed as Neural Networks.
- the neural network obtained by the execution device 210 can be applied to different systems or devices.
- the execution device 210 may call data, codes, etc. in the data storage system 250 , and may also store data, instructions, etc. in the data storage system 250 .
- the data storage system 250 may be placed in the execution device 210 , or the data storage system 250 may be an external memory relative to the execution device 210 .
- the calculation module 211 can process the samples acquired by the execution device 210 through the neural network to obtain a prediction result, and the specific form of the prediction result is related to the function of the neural network.
- FIG. 2 is only an exemplary schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, devices, modules, etc. shown in the figure does not constitute any limitation.
- the data storage system 250 is an external memory relative to the execution device 210 . In other scenarios, the data storage system 250 may also be placed in the execution device 210 .
- the target model/rule 201 trained according to the training module 202 can be applied to different systems or devices, such as mobile phones, tablet computers, notebook computers, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) , a vehicle terminal, etc., may also be a server or a cloud device.
- augmented reality augmented reality, AR
- virtual reality virtual reality
- a vehicle terminal etc.
- server or a cloud device may also be a server or a cloud device.
- the target model/rule 201 may be the self-attention model in the present application in the embodiment of the present application.
- the self-attention model provided in the embodiment of the present application may include CNN, deep convolutional neural networks (deep convolutional neural networks) , DCNN), recurrent neural network (recurrent neural network, RNN) and other networks.
- Execution device 210 is realized by one or more servers, and optionally cooperates with other computing devices, such as data storage, routers, load balancers, etc.; execution device 210 can be arranged on one physical site, or distributed across multiple on the physical site. The execution device 210 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the steps of the information acquisition method corresponding to the following FIGS. 4-25 of this application.
- Each local device can represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.
- Each user's local device can interact with the execution device 210 through any communication mechanism/communication standard communication network, and the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
- the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like.
- the wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, long term evolution (long term evolution, LTE) system, global system for mobile communication (GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (zigbee), Any one or combination of radio frequency identification technology (radio frequency identification, RFID), long range (long range, Lora) wireless communication, near field communication (near field communication, NFC).
- the wired network may include an optical fiber communication network or a network composed of coaxial cables.
- one or more aspects of the execution device 210 may be implemented by each local device, for example, the local device 301 may provide the execution device 210 with local data or feedback calculation results.
- the execution device 210 may also be implemented by a local device.
- the local device 301 implements the functions of the device 210 and provides services for its own users, or provides services for the users of the local device 302 .
- the characteristics of users can be represented by user portraits.
- User portraits can be divided into basic portraits and preference portraits.
- Basic portraits can generate labels based on actual basic facts, such as registration time, channel source, user location, etc. for simple information Extraction can also be based on the labels generated by the machine learning model to predict the user's attributes, such as gender, age, car ownership, etc. (use the labeled data set (user characteristics and labels) to train a more accurate model, and use the trained Model, you can make score predictions for other users of unknown gender and age.
- Preference portraits depend on item tags. Usually, the degree of user preference for items is calculated through user exposure, clicks, purchases, and other behaviors on platform items.
- this application provides an information acquisition method that combines neural network and symbolic analysis to extract user information, builds a user's personal knowledge map, and stores more accurate and detailed user information through a finer-grained personal knowledge map.
- the method provided by the present application may specifically include: obtaining the input text of the target user, including at least one word in the input text, and the at least one word forms at least one event; then obtaining an output sequence based on the input text, including at least one event in the output sequence
- the types and elements of each event, the way to obtain the output sequence can include many ways, the type and elements of the events included in the input text can be analyzed through syntactic analysis, and the events included in the input text can also be output through the neural network The type and elements of the event, etc.; according to the output sequence, the personal knowledge graph is obtained.
- the personal knowledge graph includes multiple nodes.
- the multiple nodes include type nodes and element nodes.
- the type node is used to represent the type of at least one event
- the element node is used to represent The element of at least one event
- the type node corresponding to the type in the same event is associated with the element node corresponding to the element, that is, the type node and the element node in the same event are associated
- the personal knowledge graph is used to make recommendations for target users .
- the types and elements of events generated by the target user are accurately extracted in units of events, and a knowledge map is constructed, so that each event of the target user can be saved more conveniently and accurately, and the target user's Relevant knowledge is more accurately recorded. Therefore, when recommending target users in the future, accurate information can be accurately queried in units of events, and complete events can be queried accurately through the association relationship between nodes, which improves the accuracy of data query and the effectiveness of recommendation.
- FIG. 4 it is a schematic flowchart of an information acquisition method provided by the present application, as described below.
- the input text may be obtained according to data input by the target user.
- the input data of the target user may be acquired, and then the input text is extracted from the input data.
- the input data of the target user There are many ways to obtain the input data of the target user. Specifically, the data input by the user through the terminal interface may be obtained, the data input by the user may be received from other devices, or the historical input data of the user may be queried from historical data.
- one or more types of data such as image, voice or text input by the user may be received, and then the input data may be recognized, thereby extracting the input text from the input data.
- the image input by the user is an image
- the image can be recognized and the text can be extracted from it
- the data input by the user is voice
- the input data can be voice recognized to extract the text from the voice data
- the user input If the data is text, the text can be directly used as the input text, or after the input text is translated, the translated text can be used as the input text, etc., so that the method provided by this application can be applied to various input methods, thus It can be applied to more scenarios and has high generalization ability.
- the text processing model is used to extract information from the input text, and output the extracted information in the form of vector to obtain the initial sequence.
- the text processing model can be used to extract entities and classification labels corresponding to the entities from the input text to obtain an initial sequence.
- the initial sequence may include information about entities extracted from the input text, classification labels corresponding to entities, or associations between entities, and the like.
- the input text may contain one or more entities, and when there are multiple entities, the multiple entities may form one or more events, and the vector representation of each entity in the input text may be extracted through the text processing model, and each entity’s Contextual meaning or the relationship between various entities, etc.
- the initial sequence may include an entity sequence and a label sequence
- the specific steps performed by the text processing model may include: performing natural language processing on the input text to obtain a feature vector sequence and an entity sequence, and the entity sequence includes The vector representation corresponding to each word in at least one word, the feature vector sequence includes the feature vector corresponding to the input text; obtain the position information corresponding to the vector in the entity sequence; fuse the position information and feature vector sequence to obtain a fusion sequence; for the fusion sequence
- the corresponding entities are classified to obtain a label sequence. Therefore, in the embodiment of the present application, the entities in the input text and the meanings represented by the entities can be extracted through the neural network, so that information can be extracted from the input text efficiently and quickly.
- the text processing model may include one or more models for extracting information from text.
- the text processing model can include pre-trained language models, such as pretrain bert, self-attention model, etc., which are used to convert text into vector representations, and can also include BiLSTM+CRF models, Sigmoid models, etc. to further process vector representations model, etc., so that usable information can be extracted from the text.
- the feature sequence can include entities included in the input text and entities relationship between etc.
- the input text can be "Xiaohong is buying an apple"
- the entities can be extracted from the input text through syntactic analysis as “Xiaohong” and “Apple”, the relationship between the entities is “buy”, and the time is “now “, and can further determine the actual meaning (or category) represented by each entity, such as “Little Red” means a person, "Apple” means a fruit or a mobile phone, etc.
- the information of the entities included in the input text can also be obtained by analyzing the syntax of the input text. In this way, the information obtained by the two methods can be combined to obtain more accurate information, so that more accurate information can be extracted from the input text.
- step 402 may be executed first, step 403 may be executed first, or step 402 and step 403 may be executed at the same time, which may be adjusted according to actual application scenarios. , which is not limited in this application.
- each entity may correspond to one or more features, and additional information can be added to the features corresponding to each word according to the preset format for The unique meaning represented by each word or each entity is identified, and the updated feature sequence is obtained. For example, if the entity includes "apple", you can limit whether the specific type of the entity is a fruit or a mobile phone by adding additional information, such as adding "mobile phone” in the feature sequence to indicate that "apple” is a kind of "mobile phone”. , allowing for a more accurate determination of the unique meaning represented by each entity.
- the personal knowledge graph already exists, you can combine the preset format and the initial personal knowledge graph to query the limited features corresponding to the entity. For example, if the input text is "Xiaohong is eating an apple", you can combine the preset grammar Format, query the specific type represented by "apple” in the personal knowledge map is fruit, not equipment, so that the entity "apple” can be classified as an additional feature of "fruit”.
- the initial knowledge graph can be updated or a personal knowledge graph can be generated according to the feature sequence and the output sequence.
- the personal knowledge map may include one or more nodes, and each node may include information extracted from the data input by the target user, such as each node may include information such as event types or event elements extracted from the input text, Nodes with associations are connected to each other.
- the personal knowledge graph can be used to represent the characteristics of the target user, or can be used to record information related to the target user, such as information about the target user or information input by the target user.
- the personal knowledge graph can include multiple nodes, which can be divided into type nodes and element nodes.
- Type nodes are used to represent the type of events
- element nodes are used to represent elements of events
- the types and elements of the events generated by the target users are accurately extracted in units of events, and a knowledge map is constructed, so that each event of the target user can be saved more conveniently and accurately, and the target user relevant knowledge to be more accurately recorded. Therefore, when recommending target users in the future, accurate information can be accurately queried in units of events, and complete events can be queried accurately through the association relationship between nodes, which improves the accuracy of data query and the effectiveness of recommendation.
- the neural network and syntactic analysis are combined to extract more accurate information from the input text, and then use the more accurate information extracted from the input text to generate or update the personal knowledge map of the target user, so that personal knowledge
- the graph can more accurately reflect the characteristics of the user, so that the personal knowledge graph can be used to make more accurate recommendations for target users in the future.
- a personal knowledge graph is constructed for the user, which can be constructed or updated based on entities extracted from the input text. Compared with the user portrait, the personal knowledge graph constructed in the embodiment of the present application The granularity of the knowledge map is smaller, which can record the user's information more accurately and improve the accuracy of the description of the user. And through the way of nodes, it can be retrieved more efficiently, so that it can be recommended for users more efficiently.
- the specific manner of obtaining the personal knowledge map may include: correcting the initial sequence according to the feature sequence to obtain an output sequence; and obtaining the personal knowledge map according to the output sequence.
- the information included in the feature sequence and the output sequence can be matched. If the feature sequence does not match the output sequence, the unmatched part in the output sequence can be corrected, such as replacing the unmatched part in the output sequence For the corresponding part in the feature sequence, or replace the unmatched part in the output sequence with the corresponding part in the feature sequence for fusion, and replace the unmatched part in the output sequence with the fused part, etc., to obtain the output sequence.
- the output sequence can be corrected by using the feature sequence, so that the information extracted from the input text in various ways can be combined to obtain more accurate information, and the personal knowledge map can be obtained using more accurate information , so as to obtain a personal knowledge graph that can more accurately describe the target user.
- the output sequence includes an association relationship between at least one word, the at least one word forms at least one event, and the at least one word includes elements in the at least one event.
- personal knowledge graphs can be constructed in units of events. Specifically, the type of at least one event can be obtained from the output sequence, such as the schedule event class and the attention event class; then the information of each event can be obtained from the corrected entity sequence according to the type of each event in the at least one event; Then use the information of each event to update the initial knowledge graph to obtain a personal knowledge graph.
- the personal knowledge graph can be generated or updated in units of events, so when querying information in the personal knowledge graph later, the required information can be quickly queried in units of events, improving query efficiency.
- the specific way to obtain the personal knowledge graph may include: taking the first event as an example, if the initial knowledge graph includes the information of the first event, then use the output sequence to update the information of the first event included in the initial knowledge graph, such as Add element nodes to the first event, and connect element nodes with associated relationships to obtain a personal knowledge map; if the personal knowledge map does not include the information of the first event, add the first event included in the output sequence to the initial knowledge map Information, such as adding the type node and element node of the first event, connecting the element node and the type node, and connecting element nodes with an association relationship to obtain a personal knowledge map.
- the elements of each event can be obtained from the entity sequence, and the association relationship between the elements of each event, and then the element nodes can be connected according to the association relationship; or, the characteristics of each event can be obtained from the entity sequence and the corresponding emotion categories. It can be understood that if the output sequence includes the relationship between the elements of each event, then the element nodes corresponding to the elements with the same event in the personal knowledge map are associated; if the output sequence also includes the emotional category , then the element nodes corresponding to the same event in the personal knowledge graph are associated through emotional categories.
- the first knowledge graph may also be used to expand the target user's personal knowledge graph.
- the first knowledge graph is obtained, the first knowledge graph includes a plurality of nodes, and each node has at least one associated node, and the nodes in the first personal knowledge graph may represent a kind of entity, or may represent an event
- the associated information associated with the nodes in the personal knowledge graph can be obtained from the first knowledge graph;
- the personal knowledge graph is expanded by using the associated information, and the expanded personal knowledge graph is obtained.
- Knowledge graph For example, the same node as the entity of the personal knowledge graph can be queried in the first knowledge graph, and then the information of the node associated with the node can be found from the first knowledge graph, and the information can be used to expand the personal knowledge graph.
- the first knowledge graph may be a general knowledge graph or a knowledge graph of other users, so that the content included in the personal knowledge graph of the target user may be expanded through various graphs.
- each node in the general knowledge graph can represent an entity; when the first knowledge graph includes other users' personal knowledge graphs, the first personal knowledge graph
- Each node in can represent the elements or types of events, etc.
- the first knowledge graph can be used to expand the personal knowledge graph, so that more information can be included in the personal knowledge graph, so that more information can be queried in the personal knowledge graph later.
- information of at least one node matching the output sequence may be queried from the personal knowledge map, and then recommendation information is generated for the target user according to the information of the at least one node, Recommendations are then made based on the recommendation information.
- the information of at least one first node corresponding to the output sequence can be screened out from the personal knowledge map; the information of at least one second node associated with at least one first node can be found from the personal knowledge map, and the information of at least one node Information of at least one first node and information of at least one second node are included.
- the information of the third node associated with the second node can also be found, or the information of the fourth node associated with the third node can also be found, and the specific query input can be adjusted according to the actual application scenario. This is not limited.
- the first node and the second node may include information in different fields, and the different fields mean that the entities included in the first node and the second node belong to different fields, such as the first node includes music-related information,
- the second node may include information about the music-related TV series.
- the user can be represented by a graph, so that when querying the nodes related to the user's input text, the nodes related to the user's input text can be efficiently queried through the association relationship between the nodes. information.
- each node in the personal knowledge map includes a corresponding weight, wherein the weight of any node (called the fifth node for ease of distinction) is negatively correlated with the storage duration or update duration, and the storage
- the duration is the duration of saving the information of the fifth node
- the update duration is the duration of the last update of the information included in the fifth node, that is, the longer the storage time or update duration of the fifth node, the smaller the weight of the fifth node. Therefore, in the embodiment of the present application, the user's information can be recorded by attenuating the weight, so as to realize the memory of the user's knowledge.
- the recommendation information may be generated with reference to the weight of each node. Specifically, the at least one node may be sorted according to the weight corresponding to the at least one node, and the recommendation information may be generated according to the information of the at least one node and the ranking of the at least one node.
- the structured data of the target user can also be obtained, and the structured data is data in a preset format; the information of at least one event is extracted from the structured data according to preset rules; and the personal knowledge graph is analyzed according to the information of at least one event Update to get the updated personal knowledge map.
- the method provided in this application may be deployed in a terminal or a cloud server.
- a cloud server When deployed in a cloud server, services can be provided to users through the cloud platform. Therefore, in the embodiment of this application, events are used as the organizational structure, and different types of entities are used to represent and store different behaviors and information of users, so as to construct a personal knowledge graph that conforms to the user's own usage characteristics. Combining the obtained recommendation type, intent type and node weight to make recommendations.
- Personal Knowledge Graph (PKG) uses events as a bridge to connect different types of entities, which can design paths more flexibly, even if there is no large amount of user behavior data or user logs, it will not affect the recommendation. This method solves the cold start problem when using user portraits.
- the information acquisition method provided by this application can be divided into multiple parts, specifically including: information extraction 501 , PKG construction 502 outputting PKG 503 , and recommendation 504 based on PKG.
- step of information extraction 501 accurate information can be extracted from the user's input data, and then the information can be used to construct a PKG, and based on the PKG, an appropriate entity can be recommended for the user.
- GUI graphical user interface
- the graphical user interface is stored in an electronic device, and the electronic device includes a display screen, a memory, one or a plurality of processors for executing one or more computer programs stored in the memory, the graphical user interface may include:
- the personal knowledge graph includes a plurality of nodes , the multiple nodes include a type node and an element node, the type node is used to represent the type of the at least one event, the element node is used to represent the element of the at least one event, and the type node and element corresponding to the type in the same event
- Corresponding feature nodes are associated with the personal knowledge graph for recommendation for the target user.
- the GUI may further include: displaying a permission request, where the permission request is used to indicate whether to use the target user's input text to acquire the personal knowledge graph.
- the user's input information can be collected through the application program (application, APP) installed in the user's smart terminal, and then it can be displayed on the display interface whether the input data in each APP is allowed to be collected as a knowledge source of the personal knowledge graph, thereby Improve user data privacy and security.
- application application
- the GUI may further include: displaying a first knowledge graph, the first knowledge graph includes multiple nodes, the multiple nodes include information about at least one entity, and the first personal knowledge A node in the graph may represent an entity, or may represent an element or type of an event; in response to obtaining associated information associated with a node in the personal knowledge graph from the first knowledge graph, and using the associated information to After the personal knowledge graph is expanded, an expanded personal knowledge graph is obtained, and the expanded personal knowledge graph is displayed.
- the GUI may further include: in response to generating recommendation information for the target user according to the information of at least one node acquired in the personal knowledge graph, and displaying the recommendation information, the recommendation information is used for The target user makes recommendations.
- each node in the personal knowledge graph includes a corresponding weight
- the at least one node is sorted according to the corresponding weight
- the GUI may further include: responding to the information according to the at least one node and the Sorting of at least one node generates the recommendation information, and displays the recommendation information.
- the GUI may further include: displaying input text in response to the target user's input operation on the first input interface, where the input text is extracted from the input data of the target user, and the input The data includes at least one of image, text or voice data.
- the GUI may further include: updating the personal knowledge graph according to the acquired structured data in response to the user's input operation on the second input interface, and displaying the updated personal knowledge graph,
- the structured data is data in a preset format.
- the flow of information extraction may be as shown in FIG. 6 .
- the process of information extraction may include various methods, such as extracting information through a neural network and extracting information through syntax analysis as shown in FIG. 6 .
- the input text is obtained.
- the input text may include data entered by the user in chat, search, and comment, and may also be text recognized from data such as images, voices, or videos.
- the information can be extracted from the input text through the neural network and syntactic analysis respectively, and the following exemplifications are introduced respectively.
- the neural network can be trained, so that the neural network can extract entity information and the relationship between entities from the input text.
- the neural network can be trained using priors such as the user's daily chat or labeled data, and then the neural network can be used to identify the sentence category of the input text or the context information of each word in the text.
- the pre-trained language model bert to perform feature extraction on the input text
- the output of bert is divided into tokens (that is, the feature vector sequence after the text is extracted by word) and CLS (including the input The vector of the feature of the entire sentence of the text); then send the tokens sequence into the BiLSTM+CRF model for the sequence labeling task, convert the entity position information extracted by the sequence labeling task into a feature vector, add it to the feature vector of the CLS, and then input it into the sigmoid model Multi-label classification is performed to finally obtain an output sequence, which includes an entity sequence and a classification label sequence corresponding to the entity.
- the entity sequence includes entity location information, and the label sequence can include the category corresponding to each entity.
- the input text is grammatically analyzed, that is, the grammatical function of each word in the input text is analyzed, so as to obtain the corresponding feature sequence of the input text. For example, enter the text “I like you", where “I” is the subject, “like” is the verb, and “you” is the object.
- a preset schema that is, a preset syntax format
- its part of speech may be a person's name, weather or item name, etc.
- "light rain” can be determined by combining the content included in the PKG and the preset schema is a type of weather, so that the part-of-speech feature is added to the field as the weather type, so that the field has a unique part-of-speech feature.
- the feature sequence obtained by syntactic analysis can be used to match the output sequence of the neural network. If the output sequence matches the feature sequence, the output sequence can be used as the final information extraction result.
- the feature sequence can be used to correct the output sequence, and the output sequence can be used as the final information extraction result.
- the information of each entity in the output sequence can be matched with the information of each field in the feature sequence, such as matching the part of speech, semantics, and the relationship between entities or fields. If part of the information in the output sequence does not match the information corresponding to the feature sequence, the unmatched information in the output sequence can be replaced with the corresponding information in the feature sequence. For example, if the input text contains the word "apple”, if the category of the apple in the output sequence is fruit, and the part of speech assigned to the field "apple" in the feature sequence is equipment, then the category label fruit in the output sequence can be replaced by equipment, so as to realize the correction of the output sequence.
- the neural network and syntactic analysis can be combined to extract information from the input text, and the information extracted by the two methods can be combined to obtain the final and more accurate information, and the accuracy of information extraction can be improved.
- Solve the problem of long tail distribution For example, the top 20% of frequently used entities account for 80% of the entities in users’ daily chats, and most of the entities can be identified through the trained neural network.
- the method of adding "symbols" to the above method can be understood as completing and correcting the long-tail entities with low frequency of use through syntactic analysis, so as to improve the extraction accuracy of long-tail entities.
- this application uses a combination of syntactic analysis and neural networks to construct the user's personal knowledge graph.
- the idea of storing user data in the way of organizing knowledge graphs integrates all APP or user behavior operation information into a personal knowledge graph, and organizes the personal knowledge graph structure in units of event, attention, and communication nodes. This organizational structure facilitates efficient extraction of user information.
- the neural network is used to analyze the text content, and the method of combining CRF, multi-label classification, Bi-LSTM, CRF and other technologies is used to extract the multi-angle content required by the map, which is the acquisition of knowledge in the map. provide a more efficient path.
- structured data that is, data in a preset format
- information can be extracted from input text according to preset rules.
- the structured data may be the data entered by the user in applications such as calendar, address book, and photo album with preset data formats.
- the extraction process can be shown in Figure 8, taking the new contact as an example: firstly, it is known that the source of information is the application program "Contacts”, and its intent (that is, the type of event) can be understood as "communication”, and the corresponding information is then With a specific template, such as name, contact information, position, etc., entity recognition and relationship extraction are performed under the template, that is, entities in the structured data and the relationship between entities are identified, and finally a list of entities and a list of relationships are obtained .
- application scenarios such as creating new contacts, building calendar events, and browsing information streams have been realized.
- information extraction can also be performed according to the corresponding format, so as to extract the information of entities and the relationship between entities.
- the output sequence can also be extracted by combining the neural network and syntax analysis, which can be adjusted according to the actual application scenario, which is not limited in this application.
- the PKG After obtaining the extraction result of information extraction, that is, the output sequence, the PKG can be constructed based on the output sequence, such as adding content included in the output sequence to the PKG, or updating the part of the PKG corresponding to the output sequence.
- PKG when constructing PKG, it can be divided into multiple parts, including knowledge analysis, knowledge generation, map construction or map expansion, etc., which will be introduced separately below.
- the entities included in the input text and the relationship types between entities can be obtained. That is, according to the relationship type between the entities, the connection relationship between the entities can be constructed in the PKG, so as to realize the connection between the nodes.
- the relation can be converted into a triplet of ⁇ entity field 1, relation class, entity field 2> in combination with the preset class definition rules and the entity and relation class in the output sequence.
- the corresponding entity 1 and entity 2 should be personal names or personal pronouns
- the corresponding entity 1 of "relationship category: director/author/screenwriter/producer/composer/lyricist” is a person's name or personal pronouns
- entity 2 should be film and television works/books/songs, etc.
- ⁇ Xiaohong family member, Xiaoming>, ⁇ Xiaoming, director, Red Sorghum> and so on.
- the event type can be identified. If the type of the event is a schedule event, event element analysis can be performed, and if the event type is recognized to include an event of interest, sentiment analysis can be performed.
- It can determine the elements of each entity in the event in the input text, and can store the entities of the event type in the form of tuples according to the corresponding rules of different event types. For example, output [(entity field 1, companion), (entity field 2, destination), (entity field 3, watching)] to represent event elements.
- the entity categories corresponding to a catering event should include companions, destinations, start time, end time, food, etc. (the above categories do not require all to appear in one event at the same time).
- each event element can be stored in the PKG as a node, and if there is an event obtained by analysis in the PKG, the information of the event included in the PKG can be updated, thereby Realize the real-time update of PKG, so that PKG can be used to save the knowledge about users in real time, and realize lifelong learning for users.
- sentiment analysis can be performed to determine whether the emotion category in the input text is a positive, negative, or neutral emotion.
- this part can be processed by using the method of combining regular (regex) discrimination and naive Bayesian classifier: for simple texts with obvious emotional tendencies, regular methods can be used for discrimination, such as "I like XXX ", "I don't like XXX”, "I hate XXX”; for scenes with high text description complexity, the Naive Bayesian classifier can be used to classify the text: after learning and training the classification task on the data set, we can get classification model. In the training process, it is first necessary to collect and divide the data information corresponding to the classification category, and ensure that the length of each data information is similar.
- the text information appears in the form of a sentence, which contains relatively rich and diverse content, it is necessary to segment it, divide the sentence information into finer-grained lexical information, and perform some feature processing (such as removing punctuation marks) , stop words and other features, select keywords and smoothing techniques, etc. Then count the frequency of occurrence of each word in different emotional categories to calculate its conditional probability, combined with conditional independence assumptions, a bag of words model is obtained, namely Naive Bayeux Adams model, so as to get the emotional category.
- time is important information to measure the moment when user behavior or attention arises and disappears, and is one of the event elements. It is very helpful to record the time when the entity is generated, the time when the event occurs and ends, and further provide suggestions for users.
- the function of this module is to standardize the time of natural language expressions involved in the processing process and store them in the same format for subsequent use.
- time entity expressions such as "next Monday”, “tomorrow”, and “yesterday afternoon” can be standardized into the time form of "xxxx-xx-xx xx:xx:xx”.
- time information of the user's submission request is also obtained, which will be stored as user knowledge.
- the analyzed knowledge can be integrated so as to be stored in the PKG.
- the obtained time, relational links, or event elements can be integrated, that is, the integration is performed in units of events.
- the input text of the user can be obtained, such as "Next Tuesday, I will go to see the creed with Mengfan", and then through the aforementioned information extraction and knowledge analysis steps, the event type, entity and Entity classes etc.
- search event 2 the text "a little sweet” entered by the user is obtained, and the event type, entity, and entity type of the event are determined through the aforementioned information extraction and knowledge analysis.
- the knowledge in the PKG can be updated or the knowledge that does not exist in the original can be added: link the extracted entity list, and use the inverted index in elasticSearch to search and match.
- the inverted index in elasticSearch divides and processes all matched fields again, and stores the information table upside down.
- the matching of entity fields can provide a descending list according to the matching score of the entity with the entity from high to low, and the entity with the highest score can be used as the entity corresponding to the field for linking.
- the specific processing is shown in Figure 10. After the entity is obtained through information extraction and knowledge analysis, entity search and matching is performed in the PKG.
- entity linking is performed, that is, entities with associated relationships are associated. If no entity is matched, other entity fields mentioned in the text can be considered to intelligently distinguish the entity mentioned by the user, perform reasoning and disambiguation, and improve the correct rate of the link.
- fuzzy matching can be performed on the user's personal knowledge, such as matching fields with similar meanings in the PKG and the extracted knowledge or similar fields, and disambiguating through reasoning, that is, reasoning that the meanings of the PKG and the extracted knowledge are similar Whether the field or similar fields are actually the same entity, if so, you can continue to link knowledge, that is, link the relationship between the entity and the entity. If there is no matching entity in the PKG that is the same or similar to the extracted information, it can be added according to the new knowledge.
- the PKG is constructed according to a predefined schema.
- the entity list contains the relevant information of "Zhou Xiaoyu” and "Young You”
- the entity [actor, movie name]]
- the personal knowledge map is constructed in units of events.
- the method provided by this application can describe the user and save the user's knowledge at a finer granularity through the personal knowledge map, so that Describe the user more accurately or save the user's knowledge, so that the subsequent knowledge can be traced more accurately and more accurate user information can be queried.
- the personal knowledge map provided by this application records and stores the user's operation behavior in units of user-operated events. For each operation behavior of the user, it is divided into different intent types and the information under the corresponding intent is analyzed to obtain the event elements of the operation behavior and add them to the map.
- the content related to the user's operating habits and the relevant content of the elements themselves can be quickly obtained, which is more suitable for the user's usage habits.
- the occurrence time of the behavior is stored, which provides a way for subsequent iterative update or sequential search.
- the event-based knowledge map provides a new way of organizing information and provides a new channel for searching and analyzing different needs.
- This application sets a weight for each node and updates the weight regularly or in real time to realize the memory of user knowledge.
- the corresponding weight of the node can be updated through memory attenuation.
- the calculation method of the weight can be expressed as for:
- ⁇ , ⁇ , and ⁇ are the weighting coefficients after normalization processing, and the number of times is the number of occurrences of events determined by the input text .
- the association relationship with other nodes in reflects the importance of the node, d represents the node in-degree, and D max represents the maximum in-degree.
- the second item is the time factor, which decreases as the node's creation time or update time increases, that is, it is negatively correlated with the node's creation time or update time.
- the first knowledge graph can be used to expand the target personal knowledge graph.
- the first knowledge graph as a knowledge graph (common knowledge graph, CKG) as an example, the CKG may include various entities and associations between entities.
- the entities included in the general knowledge graph can be entities of the same domain or different domains.
- the entities in the PKG can be searched in the CKG, and after matching the same entity as the entity in the PKG, continue to search for the information of the associated node in the CKG, and expand the information of the node in the CKG and the information of the associated node into the PKG, thereby augmenting the PKG with the richer information contained in the CKG.
- the information of the node corresponding to the entity in the output sequence can be queried from the PKG, and the information of the node can be used to generate recommendation information for the user.
- PKG can be applied to various user-specific recommendation scenarios, such as input method recommendation, search recommendation, itinerary reminder, or commodity recommendation.
- the associated node can be queried from the PKG based on the entity, and the entity to be input by the user can be predicted from the associated node, and Make recommendations in the user's display interface.
- Entity prediction screenshots improve the problem of information clutter and even lower prediction accuracy caused by various types of information currently input by the recommended user when recommending PKG. For example, when the user enters "I'm going tomorrow", the recommendation type at this time should be mainly place names (of course, there may be other types of scenarios). Based on PKG, entities of related place name types can be recommended first, which can improve the accuracy of prediction. rate and improve user experience.
- a method combining regex with a Bayesian probability model can be used.
- the recommended types can be obtained directly through regularization, and each type is arranged according to the probability of occurrence.
- Complicated expressions are calculated using neural networks, and the Bayesian probability model is used to calculate the probability of different texts followed by different entity types, and based on this, the recommended type is given.
- the Bayesian probability model is used to calculate the probability of different texts followed by different entity types, and based on this, the recommended type is given.
- recommendation methods based on knowledge graphs can include Embeddig-based methods, path-based methods, and methods that combine Embeddig and paths.
- the recommendation sorting can be performed based on the PKG path. Specifically, the recommendation can be made in combination with the obtained recommendation type, intent type, and node weight.
- events are used as the organizational structure, and events are used as a bridge to connect different types of entities, so as to design paths more flexibly. This method solves the recommendation problem in scenarios where entities do not belong to the same domain.
- the recommended rules may include, as shown in Figure 14, first, according to the recommendation type and event type, search for the intent node related to the entity list in the PKG, then obtain other nodes connected to the intent node and sort them by weight, and select the weight The highest entry is used as the recommended word.
- the recommendation can be made based on the existing nodes of PKG. For example, take movies as an example: if a user mentions science fiction movies such as Creed many times, then the terms "creed” and "science fiction movie” will be recommended in combination with user intentions during the recommendation process.
- PKG and CKG can be combined, and the extended entries of the inferred nodes can be added to the recommendation system as the user's feature vector.
- a personal knowledge graph reflecting user characteristics can be constructed based on user information, but also knowledge of undiscovered relationships can be supplemented and expanded through reasoning, so as to realize the completion and expansion of the user's existing knowledge, and
- the reasoning content is included in the scope of recommendation, so that more information can be recommended for users, and cross-domain recommendation can be realized to improve user experience.
- a user pays attention to multiple sci-fi movies, such as "The Avengers”, “Creed”, “Steel”, etc., and based on reasoning, digs out the user's potential interest points - sci-fi movies, adds sci-fi movies to the personal knowledge map, and in The corresponding context recommends "science fiction movies" for users.
- Cross-domain reasoning can combine the content of different vertical domains for reasoning. For example, users pay attention to content such as the song “Red Bean” and the movie “The World”, and cross-domain reasoning can infer that the user's potential attention includes Wang Xiaofei.
- the personal knowledge graph provided by this application can include finer-grained information, so as to have a finer granularity of personalized recommendation, and make fine-grained recommendations for users in combination with user data, recommendations, intent types, and weights.
- the type of recommendation such as a person's name
- the type of intent such as entertainment
- the weight reflects the different degrees of attention of users to things.
- the knowledge map is used to store the user's behavioral operation data, and a unique personal knowledge map exclusive to the user is constructed.
- recommendation systems organized user data in the form of tables, which differed from graph storage in terms of storage clarity and search efficiency.
- the storage method of the map can quickly obtain the content directly related to the current content or n-hop related content, while the data storage of the table takes a long time to query and access.
- the method provided by this application can be deployed on the terminal, and the user can receive or send information in the communication software.
- a GUI is shown in Figure 15.
- the user can send a message in the communication software.
- the message sent by the user can be used as the input text , extract the entities and the relationship between entities from the input text, and construct the PKG.
- GUIs are shown in Figure 16.
- the matching text can be filtered out from the PKG, and the text that the user is about to input can be predicted and displayed on the display interface, so that the user can Input can be realized quickly.
- a GUI is shown in Figure 17.
- the method provided by this application can be deployed on the terminal, and the user can obtain the text entered by the user in the search program, and can extract entity information from the text entered by the user, such as "Faye Wong", “ “Red Bean”, and the emotion category corresponding to the entity, etc., and added to the PKG.
- a GUI is shown in Figure 18.
- the matching text can be filtered out from the PKG, and the text that the user is about to input can be predicted and displayed on the display interface, so that the user can Fast implementation of input.
- a GUI is shown in Figure 19.
- the method provided by this application can be deployed on the terminal, the user can input in the calendar APP, and the terminal can obtain the text of the user's structured schedule event, obtain structured data, and extract from the structured data Get the entity, time and other information corresponding to the event elements, and add the extracted information to the PKG, so as to record the user's schedule, so as to remind the user in time.
- the node corresponding to the text input by the user can be screened out in the PKG, and the associated nodes can be further screened out. And display the information of the associated node in the display interface of the voice assistant. Moreover, when displaying the information of the associated nodes, the information of the associated nodes can also be sorted, combined with the weight value of each node, and sorted according to the weight value from large to small, such as the node with a larger weight value Arrange information to a location that is more convenient for the user to enter.
- a GUI is shown in Figure 20.
- the user can ask the contact information of "Wang Meng" in the voice assistant, and the terminal can query the information related to the entity "Wang Meng” in the PKG, and then filter out the contact information of the category. information and display it on the display interface of the terminal.
- a GUI is shown in Figure 21.
- the user can request the voice assistant to play music, and the terminal can search the PKG for music-related information. If the music "red bean” is found, the music "red bean” can be played.
- a GUI as shown in FIG. 22 can learn the user's preference information through the user's daily input data, and use it as the recommendation information of the search recommendation engine.
- the information acquisition method provided in this application may be deployed on a terminal, and the architecture deployed on the terminal may be as shown in FIG. 23 .
- the application scenario layer can include the business application program (APP) installed in the terminal.
- the application scenario layer and the algorithm layer are connected through an algorithm interface.
- the business APP can receive user data sets and receive feedback from the algorithm layer.
- the search/recommendation engine is applicable The data.
- the core part of the architecture is the algorithm layer, which can include multiple modules: 1) The construction of PKG (knowledge system), which mainly involves: a. The learning of user behavior data (text data) is knowledge extraction; b. User knowledge generation and storage ; c. Construction of user knowledge graph, 2) PKG expansion: knowledge reasoning and knowledge update completion; 3) PKG use: such as intent prediction and knowledge ranking.
- PKG knowledge system
- the data management layer is used to store or manage user data, such as PKG, CKG or schema, etc. can be stored in the data management layer, providing data storage and management functions for the algorithm layer, and serving as the basic platform for the query engine or reasoning process.
- user data may be obtained, such as user input data, which may include structured data and unstructured data.
- Information extraction is performed on the input data, entities and the relationship between entities are extracted, and the types of events (that is, intent categories) or event elements formed by entities are also analyzed. Sentiment analysis is also performed on entities to analyze the emotional categories.
- Event information can also be extracted from input data and stored in a preset format.
- user knowledge such as event information, emotion category, intent category (event type), entity event relationship and event elements can be extracted and stored.
- information related to user knowledge is also queried from the general knowledge graph (ie CKG), and user knowledge is updated or completed based on this information, so as to obtain more complete user knowledge.
- CKG general knowledge graph
- the constructed PKG may be as shown in Figure 25, wherein, with the target user "I" as the center, various types of events related to the target user are stored, and nodes with associated relationships are connected.
- the weight is set for each node through the memory decay mechanism, so as to realize the memory of user knowledge through the weight method, so that the target user can be recommended more effectively.
- event type prediction (that is, intent prediction) can be performed based on the information extracted from the input data, and the prediction information can be queried from the PKG, and the prediction information can be sorted according to the weight of each node and recommended for users. Improve user experience.
- the information acquisition device may include:
- the input module 2601 is configured to acquire the input text of the target user, the input text includes at least one word, and at least one word forms at least one event;
- a text processing module 2602 configured to obtain an output sequence based on the input text, and the output sequence includes at least one type and element of an event;
- the obtaining module 2603 is used to obtain the personal knowledge map according to the output sequence.
- the personal knowledge map includes multiple nodes, and the multiple nodes include type nodes and element nodes.
- the type nodes are used to indicate the type of at least one event
- the element nodes are used to indicate at least one event type.
- the elements of an event, the type node corresponding to the type in the same event and the element node corresponding to the element are associated with the element node corresponding to the same event, and the personal knowledge graph is used to make recommendations for target users.
- the output sequence includes the association relationship between the elements of each event, the element nodes corresponding to the elements with the association relationship of the same event in the personal knowledge map are associated; if the output sequence Also includes the emotional category, and the element nodes corresponding to the same event in the personal knowledge graph are associated through the emotional category.
- the acquisition module 2603 is specifically configured to: if the initial knowledge graph includes the information of the first event, update the element node corresponding to the first event included in the initial knowledge graph and the relationship between the element nodes Correlation relationship, get the personal knowledge map, the first event is any event in at least one event; if the initial knowledge map does not include the information of the first event, then add the type of the first event and the corresponding element in the initial knowledge map node, and associate the type node of the first event with the element node to obtain a personal knowledge map.
- the obtaining module 2603 is specifically configured to: obtain the initial sequence corresponding to the input text through the text processing model, and the initial sequence includes at least one word vector representation in the input text and at least one word corresponding to the first category label; perform syntactic analysis on the input text to obtain a feature sequence, the feature sequence includes a second category label corresponding to at least one word; combine the initial sequence and the feature sequence to obtain an output sequence, and the output sequence includes elements and types of at least one event.
- the text processing module 2602 is specifically configured to: correct the part of the initial sequence that does not match the feature sequence to obtain an output sequence.
- the text processing module 2602 is further configured to: if each word in the feature sequence corresponds to multiple second category labels, determine a unique second category label for each word, and obtain the updated feature sequence.
- the text processing module 2602 is specifically configured to: obtain the initial sequence according to the input text through a text processing model, wherein the text processing model is used to perform the following steps: perform natural language processing on the input text, Get the feature sequence and entity sequence, the entity sequence includes at least one vector representation corresponding to each word in the word, the feature sequence includes the feature vector corresponding to the input text; obtain the position information corresponding to the vector in the entity sequence; fuse the position information and feature sequence , to obtain the fusion sequence; classify the entity corresponding to the fusion sequence to obtain the label sequence, and the initial sequence includes the vector representation corresponding to each word and the label sequence.
- the text processing model is used to perform the following steps: perform natural language processing on the input text, Get the feature sequence and entity sequence, the entity sequence includes at least one vector representation corresponding to each word in the word, the feature sequence includes the feature vector corresponding to the input text; obtain the position information corresponding to the vector in the entity sequence; fuse the position information and feature sequence , to obtain the fusion sequence; classify the entity corresponding
- the device further includes an expansion module 2604, configured to: acquire a first knowledge graph, where the first knowledge graph includes multiple nodes, where the multiple nodes include information about at least one type of entity, the The nodes in the first personal knowledge graph can represent an entity, or can represent the elements or types of events; obtain the associated information associated with the nodes in the personal knowledge graph from the first knowledge graph; use the associated information to personal The knowledge map is expanded to obtain the expanded personal knowledge map.
- an expansion module 2604 configured to: acquire a first knowledge graph, where the first knowledge graph includes multiple nodes, where the multiple nodes include information about at least one type of entity, the The nodes in the first personal knowledge graph can represent an entity, or can represent the elements or types of events; obtain the associated information associated with the nodes in the personal knowledge graph from the first knowledge graph; use the associated information to personal The knowledge map is expanded to obtain the expanded personal knowledge map.
- the device further includes a recommendation module 2605, configured to: obtain information of at least one node matching the output sequence from the personal knowledge graph; generate recommendation information for the target user according to the information of at least one node, The recommendation information is used to make recommendations for target users.
- a recommendation module 2605 configured to: obtain information of at least one node matching the output sequence from the personal knowledge graph; generate recommendation information for the target user according to the information of at least one node, The recommendation information is used to make recommendations for target users.
- the recommendation module 2605 is specifically configured to: filter out the information of at least one first node corresponding to the output sequence from the personal knowledge graph; find the information associated with the at least one first node from the personal knowledge graph
- the information of at least one second node, the information of at least one node includes the information of at least one first node and the information of at least one second node.
- the information of the first node and the information of the second node are information of different domains.
- each node in the personal knowledge map includes a corresponding weight, and the weight of each node is negatively correlated with the storage time or update time.
- the storage time is the time for saving the information of each node, and the update
- the duration is the duration from the last update of the information included in each node.
- the recommendation module is specifically configured to: rank the at least one node according to the weight corresponding to the at least one node; generate recommendation information according to the information of the at least one node and the ranking of the at least one node.
- the input module 2601 is specifically configured to: acquire user input data, where the input data includes at least one of image, text, or voice data; and extract input text from the input data.
- the input module 2601 is also used to acquire the structured data of the target user, and the structured data is data in a preset format;
- the obtaining module 2603 is further configured to extract information of at least one event from the structured data according to preset rules;
- the acquiring module 2603 is further configured to update the personal knowledge map according to the information of at least one event, to obtain an updated personal knowledge map.
- FIG. 27 is a schematic structural diagram of another information acquisition device provided by the present application, as described below.
- the information acquiring device may include a processor 2701 and a memory 2702 .
- the processor 2701 and memory 2702 are interconnected by wires. Wherein, program instructions and data are stored in the memory 2702 .
- the memory 2702 stores program instructions and data corresponding to the steps in the above-mentioned Fig. 4-Fig. 25 .
- the processor 2701 is configured to execute the method steps performed by the information acquisition device shown in any one of the embodiments in FIG. 4 to FIG. 25 .
- the information acquiring device may also include a transceiver 2703, configured to receive or send data.
- a transceiver 2703 configured to receive or send data.
- the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a program, and when it is run on a computer, the computer executes the above-mentioned embodiment described in Figures 6-25. steps in the method.
- the aforementioned information acquisition device shown in FIG. 27 is a chip.
- FIG. 28 is a schematic structural diagram of another electronic device provided by the present application, as described below.
- the electronic device may include a processor 2801 and a memory 2802 .
- the processor 2801 and memory 2802 are interconnected by wires. Wherein, program instructions and data are stored in the memory 2802 .
- the memory 2802 stores program instructions and data corresponding to the steps in the above-mentioned Fig. 4-Fig. 25 .
- the processor 2801 is configured to execute the method steps executed by the aforementioned electronic device shown in FIGS. 4-25 .
- the electronic device may further include a transceiver 2803, configured to receive or send data.
- a transceiver 2803 configured to receive or send data.
- the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a program, and when it runs on the computer, the computer executes the above-mentioned embodiment described in Figure 4- Figure 25. steps in the method.
- the aforementioned electronic device shown in FIG. 28 is a chip.
- the embodiment of the present application also provides an information acquisition device.
- the information acquisition device can also be called a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
- the processing unit is configured to execute the aforementioned method steps in FIGS. 4-25 .
- the embodiment of the present application also provides a digital processing chip.
- the digital processing chip integrates a circuit and one or more interfaces for realizing the above-mentioned processor 2701, processor 2801, or the functions of the processor 2701 and the processor 2801.
- a memory is integrated in the digital processing chip
- the digital processing chip can complete the method steps in any one or more of the foregoing embodiments.
- no memory is integrated in the digital processing chip, it can be connected to an external memory through a communication interface.
- the digital processing chip implements the actions performed by the information acquisition device, the information acquisition device or the electronic device in the above-mentioned embodiments according to the program code stored in the external memory.
- the embodiment of the present application also provides a computer program product, which, when running on a computer, causes the computer to execute the steps of the method described in the embodiments shown in FIGS. 4-25 .
- the information acquisition device provided in the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit.
- the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the information acquisition method described in the embodiments shown in FIGS. 6-25 above.
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
- ROM Read-only memory
- RAM random access memory
- the aforementioned processing unit or processor may be a central processing unit (central processing unit, CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), digital signal processing (digital signal processor, DSP), application specific integrated circuit (ASIC) or field programmable logic gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor or any conventional processor or the like.
- FIG. 29 is a schematic structural diagram of a chip provided by an embodiment of the present application.
- the chip may be represented as a neural network processor NPU 290, and the NPU 290 is mounted to the main CPU ( On the Host CPU), the tasks are assigned by the Host CPU.
- the core part of the NPU is the operation circuit 2903, and the operation circuit 2903 is controlled by the controller 2904 to extract matrix data in the memory and perform multiplication operations.
- the operation circuit 2903 includes multiple processing units (process engine, PE).
- arithmetic circuit 2903 is a two-dimensional systolic array.
- the arithmetic circuit 2903 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
- arithmetic circuitry 2903 is a general purpose matrix processor.
- the operation circuit fetches the data corresponding to the matrix B from the weight memory 2902, and caches it in each PE in the operation circuit.
- the operation circuit takes the data of matrix A from the input memory 2901 and performs matrix operation with matrix B, and the obtained partial results or final results of the matrix are stored in the accumulator (accumulator) 2908 .
- the unified memory 2906 is used to store input data and output data.
- the weight data directly accesses the controller (direct memory access controller, DMAC) 2905 through the storage unit, and the DMAC is transferred to the weight storage 2902.
- the input data is also transferred to the unified memory 2906 through the DMAC.
- a bus interface unit (bus interface unit, BIU) 2910 is used for the interaction between the AXI bus, the DMAC and the instruction fetch buffer (IFB) 2909.
- the bus interface unit 2910 (bus interface unit, BIU) is used for the instruction fetch memory 2909 to obtain instructions from the external memory, and for the storage unit access controller 2905 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
- the DMAC is mainly used to move the input data in the external memory DDR to the unified memory 2906, to move the weight data to the weight memory 2902, or to move the input data to the input memory 2901.
- the vector computing unit 2907 includes a plurality of computing processing units, and if necessary, further processes the output of the computing circuit, such as vector multiplication, vector addition, exponent operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
- vector computation unit 2907 can store the vector of the processed output to unified memory 2906 .
- the vector calculation unit 2907 can apply a linear function and/or a nonlinear function to the output of the operation circuit 2903, such as performing linear interpolation on the feature plane extracted by the convolutional layer, and for example, a vector of accumulated values to generate an activation value.
- the vector computation unit 2907 generates normalized values, pixel-level summed values, or both.
- the vector of processed outputs can be used as an activation input to operational circuitry 2903, eg, for use in subsequent layers in a neural network.
- An instruction fetch buffer (instruction fetch buffer) 2909 connected to the controller 2904 is used to store instructions used by the controller 2904;
- the unified memory 2906, the input memory 2901, the weight memory 2902 and the fetch memory 2909 are all On-Chip memories. External memory is private to the NPU hardware architecture.
- each layer in the cyclic neural network can be performed by the operation circuit 2903 or the vector calculation unit 2907 .
- the processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the program execution of the above-mentioned methods in FIGS. 4-25 .
- the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
- the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal A computer, server, or network device, etc.) executes the methods described in the various embodiments of the present application.
- a computer device which can be a personal A computer, server, or network device, etc.
- all or part of them may be implemented by software, hardware, firmware or any combination thereof.
- software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- wired eg, coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
- the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请提供了人工智能领域的一种信息获取方法以及装置,用于通过个人知识图谱来保存用户的相关信息,可以实现更高效的数据检索。该方法可以包括:首先获取目标用户的输入文本,该输入文本中包括的至少一个词形成至少一个事件;随后基于输入文本获取输出序列,输出序列中包括至少一个事件的类型和要素;根据输出序列获取个人知识图谱,个人知识图谱中包括多个节点,具体可以包括类型节点和要素节点,类型节点用于表示至少一个事件的类型,要素节点用于表示至少一个事件的要素,同一个事件中的类型节点和要素节点相关联,个人知识图谱用于为目标用户进行推荐。
Description
本申请涉及人工智能领域,尤其涉及一种信息获取方法以及装置。
随着大数据等技术的迅速发展和广泛应用,企业的关注点日益聚焦于如何利用大数据进行精准营销等服务,“用户画像”的概念应运而生:借助大数据,从用户行为中挖掘出有价值的信息,应用于用户的各个阶段,提升用户体验。然而,用户画像通常以用户使用过的物品相关的信息为单位建立标签,可能导致后续为用户推荐的信息不准确。因此,如何得到更准确用户表征用户的信息,成为亟待解决的问题。
发明内容
本申请实施例提供了一种信息获取方法以及装置,用于结合神经网络以及句法分析,从用户输入的文本中提取到更准确的信息,并通过个人知识图谱来保存用户的相关信息,可以实现更高效的数据检索。
有鉴于此,第一方面,本申请提供一种信息获取方法,包括:获取目标用户的输入文本,输入文本中包括至少一个词,至少一个词形成至少一个事件;基于输入文本获取输出序列,输出序列中包括至少一个事件的类型和要素;根据输出序列获取个人知识图谱,个人知识图谱中包括多个节点,多个节点包括类型节点和要素节点,类型节点用于表示至少一个事件的类型,要素节点用于表示至少一个事件的要素,同一个事件中的类型对应的类型节点和要素对应的要素节点相关联,即同一个事件中的类型节点和要素节点相关联,个人知识图谱用于为目标用户进行推荐。
在本申请实施方式中,以事件为单位对目标用户产生的事件的类型和要素进行准确地抽取,并构建了知识图谱,从而可以更方便准确地保存目标用户的各个事件,对目标用户的相关知识进行更准确地记录。因此在后续针对目标用户进行推荐时,可以以事件为单位准确查询到准确的信息,且通过节点之间的关联关系,准确查询到完整的事件,提高数据查询的准确性,提高推荐有效性。并且,本申请实施方式中,针对用户构建了个人知识图谱,该个人知识图谱可以基于从输入文本中提取到的实体来构建或者更新得到,相对于用户画像,本申请实施方式中构建得到的个人知识图谱的粒度更小,可以对用户的信息进行更精确的记录,提高对用户的描述的精确程度。且通过节点的方式可以更高效地进行检索,从而可以更高效地为用户进行推荐。
在一种可能的实施方式中,若输出序列中还包括至少一个事件的要素之间的关联关系,则个人知识图谱中同一事件具有关联关系的要素对应的要素节点之间相关联,例如,可以输入文本中提取到事件的类型、要素和关联关系,该关联关系包括类型和/或要素之间的关联关系,在构建了类型节点和要素节点之后,还可以根据该关联关系来连接类型节点和要素节点,从而可以通过关联关系在个人知识图谱中标识出完整的事件,对事件进行更完整 的记录;或者,若输出序列中还包括至少一个事件的情感类别,则个人知识图谱中同一事件对应的要素节点之间通过情感类别相关联,例如,可以从输入文本中提取到事件的情感类别,并根据情感类别来连接同一事件中的节点,完成对情感类事件的完整记录。
因此,本申请实施方式中,可以根据不同类型的事件进行完整记录,如关注类事件即可根据要素之间的关联关系来连接要素节点,情感类事件可以根据情感类别来连接要素节点,具有较强的泛化能力,对更多类型的事件通过相应的连接方式进行记录,可以适应更多应用场景。
在一种可能的实施方式中,输出序列中可以包括第一事件的类型要要素,第一事件即前述的至少一个事件中的任意一个事件,前述的根据所述输出序列获取个人知识图谱,可以包括:若初始知识图谱中包括第一事件的信息,则更新初始知识图谱中包括的第一事件对应的要素节点或要素节点之间的关联关系,得到个人知识图谱;若初始知识图谱中不包括第一事件的信息,则在初始知识图谱中增加第一事件的类型节点和要素节点,并将第一事件的类型节点和要素节点进行关联,得到个人知识图谱。
因此,本申请实施方式中,可以对个人知识图谱中的事件进行更新或者新增,从而丰富个人知识图谱中所包括的信息。
在一种可能的实施方式中,通过文本处理模型得到所述输入文本对应的初始序列,初始序列中包括所述输入文本中的至少一个词的向量表示以及至少一个词对应的第一类别标签;对输入文本进行句法分析,得到特征序列,特征序列包括至少一个词对应的第二类别标签;结合初始序列和特征序列得到输出序列,输出序列中包括所述至少一个事件的要素和类型。
因此,本申请实施方式中,结合了神经网络和句法分析来从输入文本中提取到更准确的信息,然后使用从输入文本中提取到的更准确的信息来生成或者更新得到目标用户的个人知识图谱,从而使个人知识图谱可以更准确地体现用户的特征,从而可以使后续可以使用该个人知识图谱为目标用户进行更准确的推荐。
在一种可能的实施方式中,前述的结合初始序列和特征序列得到输出序列,获取个人知识图谱,可以包括:根据特征序列对初始序列进行修正,得到输出序列;根据输出序列,获取个人知识图谱。
因此,本申请实施方式中,可以使用特征序列对通过神经网络提取到的初始序列进行修正,从而可以结合多种方式从输入文本中抽取到的信息,来得到更准确的信息,并使用更准确的信息获取个人知识图谱,从而得到能更准确描述目标用户的个人知识图谱。
在一种可能的实施方式中,前述的方法还可以包括:获取第一知识图谱,第一知识图谱中包括多个节点,该多个节点包括至少一种实体的信息,该第一个人知识图谱中的节点可以表示一种实体,或者,可以表示事件的要素或者类型;从第一知识图谱中获取与个人知识图谱中的节点相关联的关联信息;使用关联信息对个人知识图谱进行扩充,得到扩充后的个人知识图谱。
因此,本申请实施方式中,可以使用第一知识图谱来扩充个人知识图谱,第一知识图谱中的数据不依赖于用户的输入数据,使个人知识图谱所包括的信息更多,以便于后续可 以在个人知识图谱中查询到更多信息。
在一种可能的实施方式中,前述的通过文本处理模型输出输入文本对应的输出序列,可以包括:将输入文本作为文本处理模型的输入,输出初始序列,其中,文本处理模型用于执行以下步骤:对输入文本进行自然语言处理,得到特征向量序列和实体序列,实体序列包括至少一个词中每个词对应的向量表示,特征向量序列中包括输入文本对应的特征向量;获取实体序列中的向量对应的位置信息;融合位置信息和特征向量序列,得到融合序列;对融合序列对应的实体进行分类,得到标签序列,初始序列中包括每个词对应的向量表示以及标签序列。
因此,本申请实施方式中,可以由神经网络来将文本转换为向量表示,提取出输入文本中各个词的上下文信息以及词之间的关联关系,从而可以从输入文本中提取到准确的信息。
在一种可能的实施方式中,前述的方法还可以包括:从个人知识图谱中获取与输出序列匹配的至少一个节点的信息;根据至少一个节点的信息为目标用户生成推荐信息,推荐信息用于针对目标用户进行推荐。
本申请实施方式可以应用于推荐场景中,从而结合更细粒度的个人知识图谱,可以高效地检索到与用户输入文本相关的更精确的信息,从而可以针对用户实现更高效准确的推荐,提高用户体验。
在一种可能的实施方式中,前述的从个人知识图谱中获取与输出序列匹配的至少一个节点的信息,可以包括:从个人知识图谱中筛选出输出序列对应的至少一个第一节点的信息;从个人知识图谱中查找与至少一个第一节点关联的至少一个第二节点的信息,至少一个节点的信息包括至少一个第一节点的信息和至少一个第二节点的信息。本申请实施方式提供了从个人知识图谱中查询数据的具体方式。
在一种可能的实施方式中,第一节点的信息和第二节点的信息为不同域的信息。因此,本申请实施方式可以实现针对用户的跨域推荐,提高用户体验。
在一种可能的实施方式中,个人知识图谱中每个节点包括对应的权重,每个节点的权重与保存时长或者更新时长呈负相关关系,每个节点是个人知识图谱中的任意一个节点,保存时长为保存每个节点的信息的时长,更新时长为距离上一次更新每个节点中包括的信息的时长。因此,本申请实施方式中,可以通过衰减权重的方式对用户的信息进行记录,从而实现对用户知识的记忆。
在一种可能的实施方式中,前述的根据至少一个节点的信息为目标用户生成推荐信息,包括:根据至少一个节点对应的权重,对至少一个节点进行排序;根据至少一个节点的信息以及至少一个节点的排序生成推荐信息。
因此,本申请实施方式中,可以基于权重来对推荐顺序进行排列,从而为用户推荐更有效的信息,提高用户体验。
在一种可能的实施方式中,前述的获取目标用户的输入文本,可以包括:获取用户输入数据,输入数据包括图像、文本或者语音中的至少一种数据;从输入数据中提取输入文本。
因此,本申请实施方式中,可以适应多种输入场景,泛化能力强,提高用户体验。
在一种可能的实施方式中,前述的方法还可以包括:获取目标用户的结构化数据,结构化数据为预设格式的数据;按照预设规则从结构化数据中提取至少一个事件的信息;根据至少一个事件的信息对个人知识图谱进行更新,得到更新后的个人知识图谱。
因此,本申请实施方式中,除了通过神经网络和句法分析的方式从输入文本中提取到信息,还可以从目标用户的结构化数据中提取到信息并更新个人知识图谱,从而可以通过更多的方式更新个人知识图谱,使个人知识图谱中可以包括更多信息。
第二方面,本申请还提供一种图形用户界面GUI,其特征在于,图形用户界面存储在电子设备中,电子设备包括显示屏、存储器、一个或多个处理器,一个或多个处理器用于执行存储在该存储器中的一个或多个计算机程序,图形用户界面包括:
响应于目标用户的输入操作生成个人知识图谱,显示该个人知识图谱,其中,该目标用户的输入文本中包括至少一个词,该至少一个词形成至少一个事件,该个人知识图谱中包括多个节点,该多个节点包括类型节点和要素节点,该类型节点用于表示该至少一个事件的类型,该要素节点用于表示该至少一个事件的要素,同一个事件中的类型节点和要素节点相关联,该个人知识图谱用于为该目标用户进行推荐。
在一种可能的实施方式中,该GUI还可以包括:显示权限请求,该权限请求用于指示是否使用该目标用户的输入文本获取该个人知识图谱。例如,可以通过用户的智能终端中安装的应用程序(application,APP)采集用户的输入信息,则可以在显示界面中显示是否允许采集各个APP中的输入数据,作为个人知识图谱的知识来源,从而提高用户的数据隐私安全性。
在一种可能的实施方式中,该GUI还可以包括:响应于从第一知识图谱中获取与该个人知识图谱中的节点相关联的关联信息,并使用该关联信息对该个人知识图谱进行扩充后得到扩充后的个人知识图谱,显示该扩充后的个人知识图谱,该第一知识图谱中包括多个节点,该多个节点包括至少一种实体的信息,该第一个人知识图谱中的节点可以表示一种实体,或者,可以表示事件的要素或者类型。
在一种可能的实施方式中,该GUI还可以包括:显示第一知识图谱。
在一种可能的实施方式中,该GUI还可以包括:响应于根据该个人知识图谱中获取到的至少一个节点的信息为该目标用户生成推荐信息,显示该推荐信息,该推荐信息用于针对该目标用户进行推荐。
在一种可能的实施方式中,该个人知识图谱中每个节点包括对应的权重,该至少一个节点按照对应的权重进行排序,该GUI还可以包括:响应于根据该至少一个节点的信息以及该至少一个节点的排序生成该推荐信息,显示该推荐信息。
在一种可能的实施方式中,该GUI还可以包括:响应于该目标用户针对第一输入界面的输入操作,显示输入文本,该输入文本为从该目标用户的输入数据中提取得到,该输入数据包括图像、文本或者语音中的至少一种数据。
在一种可能的实施方式中,该GUI还可以包括:响应于用户针对第二输入界面的输入操作,并根据获取到的结构化数据更新该个人知识图谱,显示更新后的该个人知识图谱, 该结构化数据为预设格式的数据。
第三方面,本申请提供一种信息获取装置,包括:
输入模块,用于获取目标用户的输入文本,输入文本中包括至少一个词,至少一个词形成至少一个事件;
文本处理模块,用于基于输入文本获取输出序列,输出序列中包括至少一个事件的类型和要素;
获取模块,用于根据输出序列获取个人知识图谱,个人知识图谱中包括多个节点,多个节点包括类型节点和要素节点,类型节点用于表示至少一个事件的类型,要素节点用于表示至少一个事件的要素,同一个事件中的类型节点和要素节点相关联,个人知识图谱用于为目标用户进行推荐。
在一种可能的实施方式中,若输出序列中还包括至少一个事件的要素之间的关联关系,则个人知识图谱中同一事件具有关联关系的要素对应的要素节点之间相关联;若输出序列中还包括情感类别,则个人知识图谱中同一事件对应的要素节点之间通过情感类别相关联。
在一种可能的实施方式中,输出序列中可以包括第一事件的类型要要素,第一事件即前述的至少一个事件中的任意一个事件,获取模块,具体用于:若初始知识图谱中包括第一事件的信息,则更新初始知识图谱中包括的第一事件对应的要素节点以及要素节点之间的关联关系,得到个人知识图谱;若初始知识图谱中不包括第一事件的信息,则在初始知识图谱中增加第一事件的类型节点和要素节点,并将第一事件的类型节点和要素节点进行关联,得到个人知识图谱。
在一种可能的实施方式中,文本处理模块,具体用于:通过文本处理模型得到输入文本对应的初始序列,初始序列中包括输入文本中的至少一个词的向量表示以及至少一个词对应的第一类别标签;对输入文本进行句法分析,得到特征序列,特征序列包括至少一个词对应的第二类别标签;结合初始序列和特征序列得到输出序列,输出序列中包括至少一个事件的要素和类型。
在一种可能的实施方式中,文本处理模块,具体用于:对初始序列中与特征序列不匹配的部分进行修正,得到输出序列。
在一种可能的实施方式中,文本处理模块,还用于:若特征序列中每个词对应多种第二类别标签,则为每个词确定唯一的第二类别标签,得到更新后的特征序列。
在一种可能的实施方式中,文本处理模块,具体用于:根据输入文本,通过文本处理模型,得到初始序列,其中,文本处理模型用于执行以下步骤:对输入文本进行自然语言处理,得到特征向量序列和实体序列,实体序列包括至少一个词中每个词对应的向量表示,特征向量序列中包括输入文本对应的特征向量;获取实体序列中的向量对应的位置信息;融合位置信息和特征向量序列,得到融合序列;对融合序列对应的实体进行分类,得到标签序列,初始序列中包括每个词对应的向量表示以及标签序列。
在一种可能的实施方式中,该装置还包括,扩充模块,用于:获取第一知识图谱,第一知识图谱中包括多个节点,该多个节点包括至少一种实体的信息,该第一个人知识图谱中的节点可以表示一种实体,或者,可以表示事件的要素或者类型;从第一知识图谱中获 取与个人知识图谱中的节点相关联的关联信息;使用关联信息对个人知识图谱进行扩充,得到扩充后的个人知识图谱。
在一种可能的实施方式中,装置还包括,推荐模块,用于:从个人知识图谱中获取与输出序列匹配的至少一个节点的信息;根据至少一个节点的信息为目标用户生成推荐信息,推荐信息用于针对目标用户进行推荐。
在一种可能的实施方式中,推荐模块,具体用于:从个人知识图谱中筛选出输出序列对应的至少一个第一节点的信息;从个人知识图谱中查找与至少一个第一节点关联的至少一个第二节点的信息,至少一个节点的信息包括至少一个第一节点的信息和至少一个第二节点的信息。
在一种可能的实施方式中,第一节点的信息和第二节点的信息为不同域的信息。
在一种可能的实施方式中,个人知识图谱中每个节点包括对应的权重,每个节点的权重与保存时长或者更新时长呈负相关关系,保存时长为保存每个节点的信息的时长,更新时长为距离上一次更新每个节点中包括的信息的时长。
在一种可能的实施方式中,推荐模块,具体用于:根据至少一个节点对应的权重,对至少一个节点进行排序;根据至少一个节点的信息以及至少一个节点的排序生成推荐信息。
在一种可能的实施方式中,输入模块,具体用于:获取用户输入数据,输入数据包括图像、文本或者语音中的至少一种数据;从输入数据中提取输入文本。
在一种可能的实施方式中,
输入模块,还用于获取目标用户的结构化数据,结构化数据为预设格式的数据;
获取模块,还用于按照预设规则从结构化数据中提取至少一个事件的信息;
获取模块,还用于根据至少一个事件的信息对个人知识图谱进行更新,得到更新后的个人知识图谱。
第四方面,本申请实施例提供一种信息获取装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的信息获取方法中与处理相关的功能。
第五方面,本申请实施例提供一种电子设备,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的信息获取方法中与处理相关的功能。
第六方面,本申请实施例提供了一种信息获取装置,该信息获取装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能。
第七方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面任一可选实施方式中的方法。
第八方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面任一可选实施方式中的方法。
图1为本申请应用的一种人工智能主体框架示意图;
图2为本申请提供的一种系统架构示意图;
图3为本申请实施例提供的一种卷积神经网络结构示意图;
图4为本申请提供的一种信息获取方法的流程示意图;
图5为本申请提供的另一种信息获取方法的流程示意图;
图6为本申请提供的另一种信息获取方法的流程示意图;
图7为本申请提供的一种神经网络执行的流程示意图;
图8为本申请提供的另一种信息获取方法的流程示意图;
图9为本申请提供的一种事件记录的示意图;
图10为本申请提供的一种更新PKG的流程示意图;
图11为本申请提供的一种为节点设置权重的流程示意图;
图12为本申请提供的一种PKG扩充的流程示意图;
图13为本申请提供的信息获取方法的一种应用场景示意图;
图14为本申请提供的信息获取方法的一种推荐规则的流程示意图;
图15为本申请提供的信息获取方法的另一种应用场景示意图;
图16为本申请提供的信息获取方法的另一种应用场景示意图;
图17为本申请提供的信息获取方法的另一种应用场景示意图;
图18为本申请提供的信息获取方法的另一种应用场景示意图;
图19为本申请提供的信息获取方法的另一种应用场景示意图;
图20为本申请提供的信息获取方法的另一种应用场景示意图;
图21为本申请提供的信息获取方法的另一种应用场景示意图;
图22为本申请提供的信息获取方法的另一种应用场景示意图;
图23为本申请提供的在终端中部署信息获取方法的架构示意图;
图24为本申请提供的另一种信息获取方法的流程示意图;
图25为本申请提供的一种PKG的结构示意图;
图26为本申请提供的一种信息获取装置的结构示意图;
图27为本申请提供的另一种信息获取装置的结构示意图;
图28为本申请提供的一种电子设备的结构示意图;
图29为本申请提供的一种芯片结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的信息获取方法可以应用于人工智能(artificial intelligence,AI)场 景中。AI是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、 智能交通、智能医疗、自动驾驶、智慧城市等。
本申请实施例涉及了神经网络和自然语言处理(natural language processing,NLP)的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
语料(Corpus):也称为自由文本,其可以是字、词语、句子、片段、文章及其任意组合。例如,“今天天气真好”即为一段语料。
实体:语料中存在的对象,如一段语料“小明出去遛狗了”,则其中可以包括实体:“小明”、“狗”。且每种实体具有相应的一种或者多种类别,如“小明”的类别标签为“人”,“狗”的类别标签为“动物”。
自注意力模型(self-attention model),是指将一个序列数据(如自然语料“你的手机很不错。”)有效编码成为若干多维的向量,方便进行数值运算,该多维向量融合了序列中每个元素的相互之间的相似度信息,该相似度被称为自注意力。
损失函数(loss function):也可以称为代价函数(cost function),一种比较机器学习模型对样本的预测输出和样本的真实值(也可以称为监督值)区别的度量,即用于衡量机器学习模型对样本的预测输出和样本的真实值之间的区别。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失函数,定义为
具体可以根据实际应用场景选择具体的损失函数。
梯度:损失函数关于参数的导数向量。
随机梯度:机器学习中样本数量很大,所以每次计算的损失函数都由随机采样得到的数据计算,相应的梯度称作随机梯度。
反向传播(back propagation,BP):一种计算根据损失函数计算模型参数梯度、更新模型参数的算法。神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
神经机器翻译(neural machine translation):神经机器翻译是自然语言处理的一个典型任务。该任务是给定一个源语言的句子,输出其对应的目标语言句子的技术。在常用的神经机器翻译模型中,源语言和目标语言的句子中的词均会编码成为向量表示,在向量空间进行计算词与词以及句子与句子之间的关联,从而进行翻译任务。
预训练语言模型(pre-trained language model,PLM):是一种自然语言序列编码器,将自然语言序列中的每个词进行编码为一个向量表示,从而进行预测任务。PLM的训练包含两个阶段,即预训练(pre-training)阶段和微调(finetuning)阶段。在预训练阶段,该模型在大规模无监督文本上进行语言模型任务的训练,从而学习到词表示方式。在微调阶段,该模型利用预训练阶段学到的参数做初始化,在文本分类(text classification)或序列标注(sequence labeling)等下游任务(Downstream Task)上进行较少步骤的训练,就可以成功把预训练得到的语义信息成功迁移到下游任务上来。
Embedding:指样本的特征表示。
BiLSTM+CRF:是基于神经网络的命名实体识别模型,它是一种基于词嵌入和字嵌入的模型。BiLSTM和CRF是命名实体识别模型中两个不同的层。
Sigmoid多标签分类模型:一个样本的标签不仅仅局限于一个类别,可以具有多个类别,不同类之间是有关联的。比如一件衣服,其具有的特征类别有长袖、蕾丝等属性等,这两个属性标签不是互斥的,而是有关联的。
Schemas:一种数据格式,用于限定待加入知识图谱的数据的格式;相当于某个领域内的数据模型,包含了该领域内有意义的概念类型以及这些类型的属性。其作用主要是用来规范结构化数据的表达,一条数据必须满足Schema预先定义好的实体对象及其类型,才被允许更新到知识图谱中。
Elasticsearch:是一种分布式、高扩展、高实时的搜索与数据分析引擎。可以很方便的使大量数据具有搜索、分析和探索的能力。充分利用Elasticsearch的水平伸缩性,能使数据在生产环境变得更有价值。Elasticsearch的实现原理主要分为以下几个步骤,首先用户将数据提交到Elasticsearch数据库中,再通过分词控制器去将对应的语句分词,将其权重和分词结果一并存入数据,当用户搜索数据时候,再根据权重将结果排名,打分,再将返回结果呈现给用户。
Transformers库:提供用于自然语言理解(natural language understanding,NLU)或自然语言生成(natural language generation,NLG)等的模型,如BERT(bidirectional encoder representations from transformers),GPT-2,RoBERTa,XLM,DistilBert,XLNet,CTRL等,拥有多预训练模型,支持多种语言。
本申请实施例提供的自然语言处理方法可以在服务器上被执行,还可以在终端设备上被执行。其中该终端设备可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等,本申请实施例对此不作限定。
参见附图2,本申请实施例提供了一种系统架构200。该系统架构中包括数据库230、客户设备240。数据采集设备260用于采集数据并存入数据库230,训练模块202基于数据库230中维护的数据生成目标模型/规则201。下面将更详细地描述训练模块202如何基于数据得到目标模型/规则201,目标模型/规则201即本申请以下实施方式中所提及的神经网络,具体参阅以下图4A-图12中的相关描述。
计算模块可以包括训练模块202,训练模块202得到的目标模型/规则可以应用不同的系统或设备中。在附图2中,执行设备210配置收发器212,该收发器212可以是无线收发器、光收发器或有线接口(如I/O接口)等,与外部设备进行数据交互,“用户”可以通过客户设备240向收发器212输入数据,例如,客户设备240可以向执行设备210发送目标任务,请求执行设备训练神经网络,并向执行设备210发送用于训练的数据库。
执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等 存入数据存储系统250中。
计算模块211使用目标模型/规则201对输入的数据进行处理。具体地,计算模块211用于:获取目标用户的输入文本,输入文本中包括至少一个词,至少一个词形成至少一个事件;基于输入文本获取输出序列,输出序列中包括至少一个事件的类型和要素;根据输出序列获取个人知识图谱,个人知识图谱中包括多个节点,多个节点包括类型节点和要素节点,类型节点用于表示至少一个事件的类型,要素节点用于表示至少一个事件的要素,同一个事件中的类型对应的类型节点和要素对应的要素节点相关联,即表示同一个事件中的类型节点和要素节点相关联,个人知识图谱用于为目标用户进行推荐。
最后,收发器212将构建得到的神经网络返回给客户设备240,以在客户设备240或者其他设备中部署该神经网络。
更深层地,训练模块202可以针对不同的任务,基于不同的数据得到相应的目标模型/规则201,以给用户提供更佳的结果。
在附图2中所示情况下,可以根据用户的输入数据确定输入执行设备210中的数据,例如,用户可以在收发器212提供的界面中操作。另一种情况下,客户设备240可以自动地向收发器212输入数据并获得结果,若客户设备240自动输入数据需要获得用户的授权,用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端将采集到与目标任务关联的数据存入数据库230。
在本申请所提及的训练或者更新过程可以由训练模块202来执行。可以理解的是,神经网络的训练过程即学习控制空间变换的方式,更具体即学习权重矩阵。训练神经网络的目的是使神经网络的输出尽可能接近期望值,因此可以通过比较当前网络的预测值和期望值,再根据两者之间的差异情况来更新神经网络中的每一层神经网络的权重向量(当然,在第一次更新之前通常可以先对权重向量进行初始化,即为深度神经网络中的各层预先配置参数)。例如,如果网络的预测值过高,则调整权重矩阵中的权重的值从而降低预测值,经过不断的调整,直到神经网络输出的值接近期望值或者等于期望值。具体地,可以通过损失函数(loss function)或目标函数(objective function)来衡量神经网络的预测值和期望值之间的差异。以损失函数举例,损失函数的输出值(loss)越高表示差异越大,神经网络的训练可以理解为尽可能缩小loss的过程。
如图2所示,根据训练模块202训练得到目标模型/规则201,该目标模型/规则201在本申请实施例中可以是本申请中的自注意力模型,该自注意力模型可以包括深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNNS)等等网络。本申请提及的神经网络可以包括多种类型,如深度神经网络(deep neural network,DNN)、卷积神经网络(convolutional neural network,CNN)、循环神经网络(recurrent neural networks,RNN)或残差网络其他神经网络等。
其中,在训练阶段,数据库230可以用于存储有用于训练的样本集。执行设备210生成用于处理样本的目标模型/规则201,并利用数据库中的样本集合对目标模型/规则201进行迭代训练,得到成熟的目标模型/规则201,该目标模型/规则201具体表现为神经网 络。执行设备210得到的神经网络可以应用不同的系统或设备中。
在推理阶段,执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。数据存储系统250可以置于执行设备210中,也可以为数据存储系统250相对执行设备210是外部存储器。计算模块211可以通过神经网络对执行设备210获取到的样本进行处理,得到预测结果,预测结果的具体表现形式与神经网络的功能相关。
需要说明的是,附图2仅是本申请实施例提供的一种系统架构的示例性的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在附图2中,数据存储系统250相对执行设备210是外部存储器,在其它场景中,也可以将数据存储系统250置于执行设备210中。
根据训练模块202训练得到的目标模型/规则201可以应用于不同的系统或设备中,如应用于手机,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端设备等。
该目标模型/规则201在本申请实施例中可以是本申请中的自注意力模型,具体的,本申请实施例提供的自注意力模型可以包括CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)等等网络。
参见附图3,本申请实施例还提供了一种系统架构300。执行设备210由一个或多个服务器实现,可选的,与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备;执行设备210可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码实现本申请以下图4-图25对应的信息获取方法的步骤。
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。具体地,该通信网络可以包括无线网络、有线网络或者无线网络与有线网络的组合等。该无线网络包括但不限于:第五代移动通信技术(5th-Generation,5G)系统,长期演进(long term evolution,LTE)系统、全球移动通信系统(global system for mobile communication,GSM)或码分多址(code division multiple access,CDMA)网络、宽带码分多址(wideband code division multiple access,WCDMA)网络、无线保真(wireless fidelity,WiFi)、蓝牙(bluetooth)、紫蜂协议(zigbee)、射频识别技术(radio frequency identification,RFID)、远程(long range,Lora)无线通信、近距离无线通信(near field communication,NFC)中的任意一种或多种的组合。该有线网络可以包括光纤通信网络或同轴电缆组成的网络等。
在另一种实现中,执行设备210的一个方面或多个方面可以由每个本地设备实现,例 如,本地设备301可以为执行设备210提供本地数据或反馈计算结果。
需要注意的,执行设备210的所有功能也可以由本地设备实现。例如,本地设备301实现执行设备210的功能并为自己的用户提供服务,或者为本地设备302的用户提供服务。
通常,可以通过用户画像的方式来表示用户的特征,用户画像可以分为基础画像和偏好画像,基础画像可通过实际基本事实而产生标签,如注册时间,渠道来源,用户所在地区等进行简单信息提取,也可基于机器学习模型对用户的属性预测产生的标签,如性别,年龄,有车一族等(用标注好的数据集(用户特征和标签)训练出较为准确的模型,用训练好的模型,就可以给其他的未知性别和年龄的用户做评分预测。偏好画像赖于物品标签。通常用户对物品的偏好程度,是通过用户对平台物品的曝光,点击,购买等行为计算出来的。然而,当构建用户画像时,考虑与用户相关的物品画像,即通过人工或者机器学习方式获得的物品标签。通过分析用户的行为数据,例如浏览、收藏、分享等,为用户打上相应的该物品所拥有的标签。因此,一定程度上用户画像的质量以及粒度取决于物品画像。当物品画像有所偏差或粒度不均,会导致用户画像质量下降。
因此,本申请提供一种信息获取方法,结合了神经网络和符号分析来抽取用户的信息,并构建用户的个人知识图谱,通过更细粒度的个人知识图谱来保存更准确更详细的用户信息。本申请提供的方法具体可以包括:获取目标用户的输入文本,输入文本中包括至少一个词,该至少一个词形成至少一个事件;随后基于输入文本获取输出序列,该输出序列中包括至少一个事件中每个事件的类型和要素,获取输出序列的方式可以包括多种,可以通过句法分析的方式分析输入文本中所包括的事件的类型和要素,也可以通过神经网络来输出输入文本中包括的事件的类型和要素等;根据输出序列获取个人知识图谱,个人知识图谱中包括多个节点,该多个节点包括类型节点和要素节点,类型节点用于表示至少一个事件的类型,要素节点用于表示至少一个事件的要素,同一个事件中的类型对应的类型节点和要素对应的要素节点相关联,即同一个事件中的类型节点和要素节点相关联,该个人知识图谱用于为目标用户进行推荐。
因此,本申请实施方式中,以事件为单位对目标用户产生的事件的类型和要素进行准确地抽取,并构建了知识图谱,从而可以更方便准确地保存目标用户的各个事件,对目标用户的相关知识进行更准确地记录。因此在后续针对目标用户进行推荐时,可以以事件为单位准确查询到准确的信息,且通过节点之间的关联关系,准确查询到完整的事件,提高数据查询的准确性,提高推荐有效性。
下面对本申请提供的信息获取方法进行详细介绍。
参阅图4,本申请提供的一种信息获取方法的流程示意图,如下所述。
401、获取目标用户的输入文本。
其中,该输入文本可以是根据目标用户输入的数据得到。
具体地,可以获取目标用户的输入的数据,然后从输入数据中提取到输入文本。获取目标用户的输入数据的方式可以包括多种,具体可以获取用户通过终端界面输入的数据,可以从其他设备接收用户输入的数据,或者从历史数据中查询用户的历史输入数据等。
例如,可以接收用户输入的图像、语音或者文本等数据中的一种或者多种,然后对输 入数据进行识别,从而从输入数据中提取到输入文本。如若用户输入的图像是图像,则可以对该图像进行识别,从中提取出文本;若用户输入的数据为语音,则可以对输入数据进行语音识别,从而从语音数据中提取到文本;若用户输入的数据为文本,则可以直接将该文本作为输入文本,或者对输入的文本进行翻译后,将翻译得到的文本作为输入文本等,从而使本申请提供的方法可以适用于各种输入方式,从而可以应用于更多的场景,具有很高的泛化能力。
402、通过文本处理模型得到输入文本的初始序列。
其中,文本处理模型用于从输入文本中提取信息,并通过向量的形式输出提取到的信息,得到初始序列。
具体地,该文本处理模型可以用于从输入文本中提取实体以及实体对应的分类标签等,得到初始序列。即初始序列中可以包括从输入文本中提取到的实体的信息、实体对应的分类标签或者实体之间的关联关系等。如输入文本中可以包括一个或者多个实体,当存在多个实体时,该多个实体可以形成一个或者多个事件,可以通过文本处理模型提取输入文本中的各个实体的向量表示,各个实体的上下文含义或者各个实体之间的关联关系等。
在一种可能的实施方式中,该初始序列中可以包括实体序列和标签序列,文本处理模型具体执行的步骤可以包括:对输入文本进行自然语言处理,得到特征向量序列和实体序列,实体序列包括至少一个词中每个词对应的向量表示,特征向量序列中包括输入文本对应的特征向量;获取实体序列中的向量对应的位置信息;融合位置信息和特征向量序列,得到融合序列;对融合序列对应的实体进行分类,得到标签序列。因此,本申请实施方式中,可以通过神经网络抽取输入文本中的实体以及实体所表示的含义,从而可以高效快速地从输入文本中抽取到信息。
该文本处理模型可以包括一种或者多种用于从文本中提取信息的模型。例如,该文本处理模型可以包括预训练语言模型,如pretrain bert、自注意力模型等用于将文本转换为向量表示的模型,还可以包括BiLSTM+CRF模型、Sigmoid模型等对向量表示进行进一步处理的模型等,从而可以从文本中提取到可用的信息。
403、对输入文本进行句法分析,得到特征序列。
其中,除了通过神经网络来抽取输入文本所包括的信息,还可以通过对输入文本进行句法分析,从而从输入文本中提取到特征序列,该特征序列可以包括输入文本中所包括的实体以及实体之间的关联关系等。
例如,输入文本可以是“小红在买苹果”,可以通过句法分析,从输入文本中提取到实体为“小红”、“苹果”,实体之间的关系为“买”,时间为“现在”,并可以进一步确定每个实体所表示的实际含义(或者称为类别),如“小红”表示人物,“苹果”表示水果或者手机等。
可以理解为,除了通过神经网络来提取输入文本所包括的信息之外,还可以通过对输入文本进行句法分析,来分析得到输入文本中所包括的实体的信息。从而使后续可以结合两种方式得到的信息来最终得到更准确的信息,从而可以实现从输入文本中抽取得到更准确的信息。
需要说明的是,本申请对步骤402和步骤403的执行顺序不作限定,可以先执行步骤402,也可以先执行步骤403,还可以同时执行步骤402和步骤403,具体可以根据实际应用场景进行调整,本申请对此不作限定。
此外,在通过句法分析得到输入文本中每个词对应的特征之后,每个实体可能对应一种或者多种特征,可以根据预先设定的格式为每个词对应的特征添加附加信息,用于标识出每个词或者每个实体所表示的唯一含义,得到更新后的特征序列。例如,若实体包括“苹果”,则可以通过增加附加信息的方式,限定该实体的具体类型是水果还是手机,如在特征序列中增加“手机”,用于表示“苹果”是一种“手机”,从而可以更准确地确定每个实体所表示的唯一含义。
此外,若已存在个人知识图谱之前,可以结合预先设定的格式以及初始的个人知识图谱,查询实体对应限定特征,如输入文本若为“小红在吃苹果”,可以结合预先设定的语法格式,查询个人知识图谱中“苹果”所表示的具体类型为水果,而并非是设备,从而可以为实体“苹果”附加分类为“水果”的附加特征。
404、根据特征序列以及输出序列,获取个人知识图谱。
其中,在得到特征序列以及输出序列之后,即可根据该特征序列以及输出序列对初始的知识图谱进行更新或者生成个人知识图谱。该个人知识图谱中可以包括一个或者多个节点,每个节点可以包括目标用户输入的数据中提取到的信息,如每个节点可以包括从输入文本中提取到的事件类型或者事件要素等信息,具有关联关系的节点之间互相连接。该个人知识图谱可以用于表示目标用户的特征,或者可以用于记录目标用户相关的信息,如目标用户的信息或者目标用户输入的信息等。
具体地,个人知识图谱中可以包括多个节点,该多个节点可以分为类型节点和要素节点,类型节点用于表示事件的类型,要素节点用于表示事件的要素,同一个事件的类型节点和要素节点相关联。例如,输入为本为“小红计划明天看电影”,可以从中提取到实体“小红”、“电影”,时间为“明天”,该时间和实体为事件要素,事件的类型为“娱乐”,因此可以建立类型节点“娱乐”,要素节点“小红”、“电影”以及“明天”,同一个事件的类型节点和要素节点之间相关联。
因此,在本申请实施方式中,以事件为单位对目标用户产生的事件的类型和要素进行准确地抽取,并构建了知识图谱,从而可以更方便准确地保存目标用户的各个事件,对目标用户的相关知识进行更准确地记录。因此在后续针对目标用户进行推荐时,可以以事件为单位准确查询到准确的信息,且通过节点之间的关联关系,准确查询到完整的事件,提高数据查询的准确性,提高推荐有效性。并且,结合了神经网络和句法分析来从输入文本中提取到更准确的信息,然后使用从输入文本中提取到的更准确的信息来生成或者更新得到目标用户的个人知识图谱,从而使个人知识图谱可以更准确地体现用户的特征,从而可以使后续可以使用该个人知识图谱为目标用户进行更准确的推荐。并且,本申请实施方式中,针对用户构建了个人知识图谱,该个人知识图谱可以基于从输入文本中提取到的实体来构建或者更新得到,相对于用户画像,本申请实施方式中构建得到的个人知识图谱的粒度更小,可以对用户的信息进行更精确的记录,提高对用户的描述的精确程度。且通过节 点的方式可以更高效地进行检索,从而可以更高效地为用户进行推荐。
在一种可能的实施方式中,获取个人知识图谱的具体方式可以包括:根据特征序列对初始序列进行修正,得到输出序列;根据输出序列,获取个人知识图谱。具体地,可以对特征序列和输出序列所包括的信息进行匹配,若特征序列与输出序列不匹配,则可以对输出序列中不匹配的部分进行修正,如将输出序列中的不匹配的部分替换为特征序列中对应的部分,或者对输出序列中的不匹配的部分替换为特征序列中对应的部分进行融合,并将输出序列中不匹配的部分替换为融合后的部分等,得到输出序列。
因此,本申请实施方式中,可以使用特征序列对输出序列进行修正,从而可以结合多种方式从输入文本中抽取到的信息,来得到更准确的信息,并使用更准确的信息获取个人知识图谱,从而得到能更准确描述目标用户的个人知识图谱。
在一种可能的实施方式中,输出序列包括至少一个词之间的关联关系,该至少一个词形成至少一个事件,该至少一个词中包括至少一个事件中的元素。进一步地,可以以事件为单位来构建个人知识图谱。具体地,可以从输出序列中获取至少一个事件的类型,如日程事件类、关注事件类;随后可以根据至少一个事件中每个事件的类型从修正后的实体序列中获取每个事件的信息;然后使用每个事件的信息对初始知识图谱进行更新,得到个人知识图谱。
本申请实施方式中,可以以事件为单位生成或者更新个人知识图谱,因此后续再个人知识图谱中查询信息时,可以以事件为单位快速查询到所需信息,提高查询效率。
获取个人知识图谱的具体方式可以包括:以第一事件为例,若初始知识图谱中包括第一事件的信息,则使用输出序列对初始知识图谱中包括的第一事件的信息进行更新,如为第一事件增加要素节点,以及连接具有关联关系的要素节点,得到个人知识图谱;若个人知识图谱中不包括第一事件的信息,则在初始知识图谱中增加输出序列中包括的第一事件的信息,如增加第一事件的类型节点和要素节点,并连接要素节点与类型节点,还可以连接具有关联关系的要素节点,得到个人知识图谱。
具体地,可以根据从实体序列中获取每个事件的要素,以及每个事件的要素之间的关联关系,然后根据该关联关系来连接要素节点;或者,从实体序列中获取每个事件的特征以及对应的情感类别。可以理解为,若输出序列中包括还每个事件的要素之间的关联关系,则个人知识图谱中同一事件具有关联关系的要素对应的要素节点之间相关联;若输出序列中还包括情感类别,则个人知识图谱中同一事件对应的要素节点之间通过情感类别相关联。
因此,可以根据不同的事件类型,获取不同的事件相关信息,适应更多的场景,泛化能力强。
此外,在一种可能的实施方式中,还可以使用第一知识图谱来对目标用户的个人知识图谱进行扩充。具体地,获取第一知识图谱,第一知识图谱中包括多个节点,每个节点具有关联的至少一个节点,该第一个人知识图谱中的节点可以表示一种实体,或者,可以表示事件的要素或者类型,具有关联关系的实体之间连接;可以从第一知识图谱中获取与个人知识图谱中的节点相关联的关联信息;使用关联信息对个人知识图谱进行扩充,得到扩充后的个人知识图谱。如可以在第一知识图谱中查询与个人知识图谱的实体相同的节点, 然后从第一知识图谱中查找出与该节点关联的节点的信息,并使用该信息来扩充个人知识图谱。
可选地,第一知识图谱可以是通用知识图谱,也可以是其他用户的知识图谱,从而可以通过多种图谱来扩充目标用户的个人知识图谱所包括的内容。例如,当第一知识图谱是通用知识图谱时,该通用知识图谱中的每个节点可以表示一种实体,当该第一知识图谱包括其他用户的个人知识图谱时,该第一个人知识图谱中的每个节点可以表示事件的要素或者类型等。
因此,本申请实施方式中,可以使用第一知识图谱来扩充个人知识图谱,使个人知识图谱所包括的信息更多,以便于后续可以在个人知识图谱中查询到更多信息。
在一种可能的实施方式中,在得到输出序列之后,还可以从个人知识图谱中查询与该输出序列匹配的至少一个节点的信息,然后根据该至少一个节点的信息为目标用户生成推荐信息,然后基于该推荐信息进行推荐。
具体地,可以从个人知识图谱中筛选出输出序列对应的至少一个第一节点的信息;从个人知识图谱中查找与至少一个第一节点关联的至少一个第二节点的信息,至少一个节点的信息包括至少一个第一节点的信息和至少一个第二节点的信息。此外,还可以查找出与第二节点关联的第三节点的信息,或者还可以查找与第三节点关联的第四节点的信息等,具体的查询入度可以根据实际应用场景调整,本申请对此不作限定。
其中,该第一节点和第二节点中可以包括不同域的信息,该不同域即表示第一节点和第二节点中包括的实体属于不同领域,如第一节点中包括了音乐相关的信息,第二节点中可以包括于该音乐相关的电视剧的信息。
因此,本申请实施方式中,可以通过图谱的方式来表征用户,从而在查询与用户的输入文本相关的节点时,可以通过节点之间的关联关系,高效地查询出与用户的输入文本相关的信息。
在一种可能的实施方式中,个人知识图谱中每个节点包括对应的权重,其中,任意一个节点(为便于区分称为第五节点)的权重与保存时长或者更新时长呈负相关关系,保存时长为保存第五节点的信息的时长,更新时长为距离上一次更新第五节点中包括的信息的时长,即第五节点的保存时间或者更新时长越长,则第五节点的权重越小。因此,本申请实施方式中,可以通过衰减权重的方式对用户的信息进行记录,从而实现对用户知识的记忆。
在生成推荐信息的过程中,可以参考每个节点的权重来生成该推荐信息。具体地,可以根据至少一个节点对应的权重,对至少一个节点进行排序,根据至少一个节点的信息以及至少一个节点的排序生成推荐信息。
此外,还可以获取目标用户的结构化数据,该结构化数据为预设格式的数据;按照预设规则从结构化数据中提取至少一个事件的信息;根据至少一个事件的信息对个人知识图谱进行更新,得到更新后的个人知识图谱。
因此,本申请实施方式中,除了通过神经网络和句法分析的方式从输入文本中提取到信息,还可以从目标用户的结构化数据中提取到信息并更新个人知识图谱,从而可以通过 更多的方式更新个人知识图谱,使个人知识图谱中可以包括更多信息。
此外,在一种可能的实施方式中,本申请提供的方法可以部署于终端或者云服务器中。当部署于云服务器中时,可以通过云平台为用户提供服务。因此,本申请实施方式中,以事件为组织结构,以不同类型的实体来表示、存储用户的不同行为和信息,从而构建出符合用户本人使用特征的个人知识图谱。结合获得的推荐类型、意图类型及节点权重进行推荐。个人知识图谱(Personal Knowledge graph,PKG)以事件为桥梁连接不同类型的实体,可以更灵活地设计路径,即使没有大量的用户行为数据或用户日志,也不影响推荐。这种方式很好地解决了使用用户画像时的冷启动问题。
前述对本申请提供的信息获取方法的流程进行了介绍,下面结合具体的应用场景,对本申请提供的信息获取方法进行进一步介绍。
首先,如图5所示,本申请提供的信息获取方法可以分为多个部分,具体可以包括:信息抽取501、PKG构建502输出PKG503、以及基于PKG进行推荐504。
可以理解为,在信息抽取501步骤中,可以从用户的输入数据中抽取到准确的信息,随后可以使用该信息来构建PKG,并基于PKG来为用户推荐合适的实体。
此外,因以下详细实施例中涉及到界面显示,因此首先对本申请还提供的一种图形用户界面GUI进行介绍,该图形用户界面存储在电子设备中,该电子设备包括显示屏、存储器、一个或多个处理器,该一个或多个处理器用于执行存储在该存储器中的一个或多个计算机程序,该图形用户界面可以包括:
响应于目标用户的输入操作生成个人知识图谱,显示该个人知识图谱,其中,该目标用户的输入文本中包括至少一个词,该至少一个词形成至少一个事件,该个人知识图谱中包括多个节点,该多个节点包括类型节点和要素节点,该类型节点用于表示该至少一个事件的类型,该要素节点用于表示该至少一个事件的要素,同一个事件中的类型对应的类型节点和要素对应的要素节点相关联该个人知识图谱用于为该目标用户进行推荐。
在一种可能的实施方式中,该GUI还可以包括:显示权限请求,该权限请求用于指示是否使用该目标用户的输入文本获取该个人知识图谱。例如,可以通过用户的智能终端中安装的应用程序(application,APP)采集用户的输入信息,则可以在显示界面中显示是否允许采集各个APP中的输入数据,作为个人知识图谱的知识来源,从而提高用户的数据隐私安全性。
在一种可能的实施方式中,该GUI还可以包括:显示第一知识图谱,该第一知识图谱中包括多个节点,该多个节点包括至少一种实体的信息,该第一个人知识图谱中的节点可以表示一种实体,或者,可以表示事件的要素或者类型;响应于从该第一知识图谱中获取与该个人知识图谱中的节点相关联的关联信息,并使用该关联信息对该个人知识图谱进行扩充后得到扩充后的个人知识图谱,显示该扩充后的个人知识图谱。
在一种可能的实施方式中,该GUI还可以包括:响应于根据该个人知识图谱中获取到的至少一个节点的信息为该目标用户生成推荐信息,显示该推荐信息,该推荐信息用于针对该目标用户进行推荐。
在一种可能的实施方式中,该个人知识图谱中每个节点包括对应的权重,该至少一个 节点按照对应的权重进行排序,该GUI还可以包括:响应于根据该至少一个节点的信息以及该至少一个节点的排序生成该推荐信息,显示该推荐信息。
在一种可能的实施方式中,该GUI还可以包括:响应于该目标用户针对第一输入界面的输入操作,显示输入文本,该输入文本为从该目标用户的输入数据中提取得到,该输入数据包括图像、文本或者语音中的至少一种数据。
在一种可能的实施方式中,该GUI还可以包括:响应于用户针对第二输入界面的输入操作,并根据获取到的结构化数据更新该个人知识图谱,显示更新后的该个人知识图谱,该结构化数据为预设格式的数据。
下面结合本申请提供的GUI,分别对前述图5中所示出多个步骤进行介绍。
一、信息抽取
示例性地,信息抽取的流程可以如图6所示。
其中,信息抽取的流程可以包括多种方式,如图6中所示出的通过神经网络来抽取信息以及通过句法分析来抽取信息。
首先获取输入文本,该输入文本可以包括用户聊天输入、搜索输入、评论输入的数据等,还可以是从图像、语音或者视频等数据中识别出来的文本。
在得到输入文本之后,可以分别通过神经网络和句法分析来从输入文本中提取到信息,下面分别进行示例性介绍。
1、神经网络
其中,可以对神经网络进行训练,使神经网络来从输入的文本中提取到实体信息以及实体之间的关联关系等。如可以使用用户的日常聊天或者标注数据等先验来训练神经网络,然后通过神经网络来识别输入的文本的句子类别或者文本中各个词的上下文信息等。
示例性地,如图7所示,首先使用预训练语言模型bert,对输入文本进行特征抽取,bert的输出分为tokens(即将文本按词进行特征抽取后的特征向量序列)和CLS(包含输入文本整句特征的向量);然后将tokens序列送入BiLSTM+CRF模型进行序列标注任务,将序列标注任务提取到的实体位置信息转换为特征向量,与CLS的特征向量进行相加后输入sigmoid模型进行多标签分类,最终得到输出序列,该输出序列包括实体序列和实体对应的分类标签序列,实体序列包括了实体位置信息,标签序列则可以包括每个实体对应的类别。
2、句法分析
如图6中所示,首先对输入文本进行语法分析,即对输入文本中的各个词的语法功能进行分析,从而得到输入文本对应的特征序列。例如,输入文本“我喜欢你”,其中“我”是主语,“喜欢”是谓语,“你”是宾语。
可以理解为,通过对输入文本进行句法分析,可以识别出输入文本中的各个词的语义特征以及词性特征。
通常,不同类型的语料包括的实体以及对应的词性可能不相同,因此可以通过句法分析,得到输入文本的词性标注(pos tag)、语义特征、不同字段的实体类别等符号特征。
此外,还可以结合PKG和预先设定的schema,即预先设定的语法格式,确定每个字段 的限定特征,并针对相应字段加入附加信息,得到特征序列。例如,针对输入文本“今天在下小雨”中的实体字段“小雨”,其词性可能是人名、天气或者物品名称等,此时可以结合PKG所包括的内容以及预先设定的schema,确定“小雨”是一种天气类型,从而为该字段添加词性特征为天气类型,从而使该字段具有唯一的词性特征。
随后,可以使用句法分析得到的特征序列,与神经网络的输出序列进行匹配。若输出序列与特征序列匹配,则可以将输出序列作为最终的信息抽取结果。
若输出序列与特征序列不匹配,则可以使用特征序列对输出序列进行修正,并将输出序列作为最终的信息抽取结果。
具体地,可以对输出序列中各个实体的信息与特征序列中各个字段的信息分别进行匹配,如匹配词性、语义、实体或者字段之间的关系等进行匹配。若输出序列中的部分信息与特征序列对应的信息不匹配,则可以将输出序列中的不匹配信息替换为特征序列中对应的信息。例如,若输入文本中包括词“苹果”,若输出序列中该苹果的分类为水果,而特征序列中为字段“苹果”分配的词性为设备,则可以将输出序列中的分类标签水果替换为设备,从而实现对输出序列的修正。
因此,本申请实施方式中,可以结合神经网络和句法分析来分别从输入文本中提取信息,并结合了两种方式提取到的信息来得到最终更准确的信息,提高信息抽取的准确率,可以解决长尾分布的问题。例如,使用频率前20%的实体占据了用户日常聊天中80%的实体,可以通过训练后的神经网络识别出大部分实体,对于使用频率较低的长尾实体则在“神经”的方法基础上加入了“符号”的方法进行修正,可以理解为通过句法分析对使用频率较低的长尾实体进行补全和修正,从而提升长尾实体的抽取准确率。
可以理解为,本申请采用了句法分析和神经网络相结合的方式构建用户的个人知识图谱。以知识图谱的组织方式对用户数据进行存储的思想,将所有APP或用户行为操作信息集成在一个个人知识图谱中,并以事件类、关注类、通讯类节点为单位进而组织个人知识图谱结构,该组织结构便于高效地提取用户信息。同时采用了神经网络对文本内容进行分析,采用了CRF、多标签分类、Bi-LSTM、CRF等多种技术相结合的方法,提取出图谱所需的多角度的内容,为图谱内知识的获取提供更高效的途径。
此外,针对结构化数据,即预设格式的数据,可以按照预先设定的规则来从输入文本中抽取信息。结构化数据可以是用户在日历、通讯录、相册等预先设定了数据格式的应用中输入的数据。
例如,抽取的流程可以如图8所示,以新建联系人为例:首先得知信息来源为应用程序“通讯录”,其意图(即事件类型)便可理解为“通讯”,其相应信息便有了特定的模板,如人名、联系方式、职位等,在该模板下进行实体识别和关系抽取,即识别出结构化数据中的实体以及实体之间的关联关系,最终得到实体列表以及关系列表。当前实现了新建联系人、日历活动构建、信息流浏览等应用场景。针对其他结构化场景,也可以根据对应的格式来进行信息提取,从而提取到实体的信息以及实体之间的关联关系等。
当然,针对结构化数据,也可以结合神经网络以及句法分析的方式来提取到输出序列,具体可以根据实际应用场景调整,本申请对此并不作限定。
二、PKG构建
在得到信息抽取的抽取结果,即输出序列之后,可以基于输出序列进行PKG构建,如在PKG中增加输出序列中包括的内容,或者更新PKG中与该输出序列对应的部分。
其中,在进行PKG构建时,可以分为多个部分,包括知识分析、知识生成、图谱构建或者图谱扩充等,下面分别进行介绍。
(一)知识分析
在进行知识分析的过程中,可以针对实体之间的关系连接、事件要素分析、情感分析或者时间处理等,下面分别进行介绍。
1、关系链接
通过前述的信息抽取步骤,可以得到输入文本中所包括的实体以及实体之间的关系类型。即可根据实体之间的关系类型,在PKG中构建实体之间的连接关系,从而实现节点之间的连接。
例如,可以结合预先设定的类别限定规则,结合输出序列中的实体以及关系类别,将关系转换为<实体字段1,关系类别,实体字段2>的三元组。如在“关系类:家人”中,对应的实体1、实体2均应为人名或人称代词,而“关系类:导演/作者/编剧/制片人/作曲/作词”对应的实体1为人名或人称代词,实体2则应为影视作品/书籍/歌曲等。具体例如,<小红,家人,小明>,<小明,导演,红高粱>等。
此外,在通过的信息抽取步骤,得到输出序列之后,可以对事件类型进行识别。若事件的类型为日程事件类,则可以进行事件要素分析,若识别出事件类型包括关注事件类,则可以进行情感分析。
2、事件要素分析
可以判断输入文本中的各个实体在事件中的要素,可以按不同事件类型的相应规则,把事件类型的实体按类别对应,以元组的形式进行存储。例如,输出[(实体字段1,同伴),(实体字段2,目的地),(实体字段3,观看)]等形式来表示事件要素。例如,一个餐饮类的事件对应的实体类别应该有同伴、目的地、起始时间、结束时间、食物等(以上类别并不要求全部在一个事件中同时出现)。
随后,若PKG中不存在分析得到的事件,则可以将每个事件要素作为一个节点保存在PKG中,若PKG中存在分析得到的事件,则可以更新PKG中所包括的该事件的信息,从而实现对PKG的实时更新,使PKG可以用于实时保存关于用户的知识,实现针对用户的终身学习。
3、情感分析
若识别出输入文本对应的事件类型为关注事件类,则可以进行情感分析,从而判断输入文本中的感情类别是正向、负向还是中性的情绪。
例如,可以采用正则(regex)判别与朴素贝叶斯分类器相结合的方法对这部分进行处理:对于简单的、情感倾向比较明显的文本,可以使用正则的方法进行判别,例如“我喜欢XXX”、“我不喜欢XXX”、“我讨厌XXX”;对于文本描述复杂度高的场景,可以使用朴素 贝叶斯分类器对文本进行了分类:对数据集进行分类任务的学习和训练之后得到了分类模型。在训练过程中,首先需要收集到并划分相对应分类类别的数据信息,且保证每条数据信息的长度相仿。由于文本信息是以句子的形式出现的,其包含的内容相对比较丰富且多样,因此需要对其进行分词,将句子信息划分为更细粒度的词汇信息,同时进行一些特征处理(例如去除标点符号、停用词等特征,选择关键词以及平滑技术等。然后分别统计不同情感类别中各词汇的出现频率计算其条件概率,结合条件独立假设,就得到了一种词袋子模型,即朴素贝叶斯模型,从而得到情感类别。
4、时间处理
通常,时间是衡量用户行为或关注产生和消失时刻的重要信息,是事件要素的其中一项。记录实体产生的时间,事件的发生和结束时间,对进一步为用户提供建议是非常有帮助的。这个模块的作用是将处理过程中涉及到的自然语言表述的时间统一进行标准化,以同一种格式进行存储,方便后续的使用。
例如,可以将常见的语言表述进行统一化,入“下周一”、“明天”、“昨天下午”等的时间实体表述标准化为“xxxx-xx-xx xx:xx:xx”的时间形式。同时也获取了用户提交请求的时间信息,这些信息都将以用户知识存储起来。
(二)知识生成
在通过上述方式得到了实体之间的关系类别、事件要素、情感类别或者时间等信息后,即可对分析得到的知识进行整合,以便于保存于PKG中。
具体地,可以对得到时间、关系链接或者事件要素等进行整合,即以事件为单位进行整合。例如,如图9所示,在日程事件1中,可以得到用户的输入文本,如“下周二和梦凡去看信条”,然后通过前述的信息抽取以及知识分析步骤,得到事件类型、实体以及实体类别等。在搜索事件2中,得到了用户输入的文本“有点甜”,通过前述的信息抽取以及知识分析等,从而确定该事件的事件类型、实体以及实体类型等。
进一步地,可以对PKG中的知识进行更新或者对原本不存在的知识进行新增:将抽取得到的实体列表进行关系链接,利用elasticSearch中的倒排索引进行搜索和匹配。elasticSearch中的倒排索引将所有被匹配字段再次分割和处理,将信息表倒置存储。结合实体类型、信息来源等的约束,实体字段的匹配可按照其与实体的匹配度分值由高到低给出降序列表,可以取分值最高的实体作为该字段对应的实体进行链接。具体处理如图10中所示,在通过信息抽取以及知识分析得到实体之后,在PKG中进行实体搜索匹配。若在PKG中已精准匹配到与抽取到的实体对应的已有知识,则进行实体链接,即将具有关联关系的实体关联起来。若未匹配到实体,则可以考虑文本提及的其他实体字段,智能区别用户提及的实体,进行推理消岐,提高链接的正确率。可以理解为,可以对用户的个人知识进行模糊匹配,如匹配PKG和抽取到的知识中含义类似的字段或者类似的字段,通过推理来进行消歧,即推理PKG和抽取到的知识中含义类似的字段或者类似的字段是否实际上为同一实体,若是则可以继续进行知识链接,即对实体和实体之间的关系进行链接。若在PKG 中未匹配到与抽取到的信息相同或者类似的实体,则可以按照新知识进行新增。
(三)图谱构建
具体地,在进行知识生成之后,按照预先定义的schema对PKG进行构建。
具体地,PKG的构建以当前用户为中心,以多种不同的分支类别进行延伸:“日程事件”、“关注事件”、“联系事件”,每次延伸都会记录当前的系统时间,以标记数据生成的时间顺序。“日程事件”表示当前构建的内容是一个日程,构建文本中涉及到的事件时间、人物、地点等信息;“关注事件”表示当前构建的内容为用户关注的信息,可分为喜欢(正向)、不喜欢(负向)和关注3种兴趣倾向。
以用户与张三提及“周五我要和李四一起去看少年的你”为例。将获得事件=xxxx-xx-xx xx:xx:xx(时间标准化后的周五),“日程事件类:娱乐”,实体列表包含“李四”和“少年的你”相关信息,[同伴,电影名],用户=“张三”。
以用户与张三提及“周小雨演的少年的你很好看”为例。构建模块将获得事件类型=“关注类:娱乐”,实体列表包含“周小雨”和“少年的你”相关信息,实体=[演员,电影名],关联关系为<周小雨,关系类:演员,少年的你>三元组,用户=“张三”。
因此,本申请实施方式中,以事件为单位构建个人知识图谱,相对于用户画像,本申请提供的方式可以通过个人知识图谱以更细的粒度来描述用户以及对用户的知识进行保存,从而可以更精确地描述用户或者保存用户的知识,以便于后续可以更精准地进行知识回溯,查询到更精确的用户信息。可以理解为,本申请提供的个人知识图谱以用户操作的事件为单位,对用户的操作行为进行记录和存储。对于用户每次的操作行为,划分为不同的意图类型并进行对应意图下的信息分析,得到操作行为的事件元素,加入到图谱中。在使用过程中,能够根据操作行为的某些要素迅速获取到与其在用户操作习惯上相关的内容以及要素本身的相关内容,更贴切用户的使用习惯。同时存储了行为的发生时间,为后续的迭代更新或按序查找提供了途径。以事件为单位的知识图谱,提供了一种新的信息的组织结构方式,为不同需求的查找和分析提供了一种新的渠道。
此外,通常用户知识是由偏好的并且随着时间会由遗忘,本申请通过为每个节点设置权重,并对该权重定期或者实时进行更新,实现对用户知识的记忆。
例如,如图11所示,在更新PKG中的节点的信息时,若该PKG中存在抽取到的实体时,可以通过记忆衰减的方式更新该节点对应的权重,如该权重的计算方式可以表示为:
其中,α、β、γ为归一化处理后的加权系数,次数为通过输入文本确定的事件的发生次数,n表示当前次数,N
max表示该用户的最大次数,入度表示该节点在PKG中与其他节点的关联关系,体现该节点的重要程度,d表示节点入度,D
max表示最大入度。第二项为时间因素,随着该节点的创建时长或者更新时长增加而减小,即与节点的创建时长或者更新时长呈负相关关系。
(四)图谱扩充
具体地,可以使用第一知识图谱来对目标个人知识图谱进行扩充。以该第一知识图谱为用知识图谱(common knowledge graph,CKG)为例,CKG中可以包括各种实体以及实体之间的关联关系。通用知识图谱中包括的实体可以是相同域也可以是不同域的实体。
可以理解为,本申请可以通过垂域知识图谱的信息对PKG进行知识的更新和补全,来挖掘用户的隐性意图。
具体地,可以在CKG中搜索PKG中的实体,在匹配到与PKG中的实体相同的实体之后,在CKG中继续搜寻关联节点的信息,并将CKG中该节点的信息以及关联节点的信息扩充至PKG中,从而通过CKG中包括的更丰富的信息来对PKG进行扩充。
示例性地,如图12所示,在进行信息抽取得到实体列表并进行图谱构建之后,针对PKG中的各个节点(为便于区分称为PKG节点),在CKG中进行垂域知识匹配,若存在与PKG节点匹配的CKG节点,则从CKG中查询与该CKG节点关联的关联节点,然后从该CKG节点和关联节点中提取到信息作为垂域知识,对PKG进行知识更新或者补充,得到信息更丰富的PKG。
以一个场景为例,如图13所示,首先获取PKG中各个节点的信息,如电影领域的《少年的你》、《送你一朵小红花》以及音乐域的《念想》、《陷落美好》等。在CKG中进行搜索匹配,推理出用户关注明星领域的易烊小玺。如把电影域的《钢铁侠》《信条》《流浪地球》在CKG中进行搜索匹配,学习到PKG中没有的新概念——“科幻电影”。然后把这些在垂域知识图谱获取到的信息送入PKG。从而可以实现对PKG中不存在的实体进行知识补全,对PKG中已存在实体但未发现关系的知识进行关系补充,进而实现对PKG的扩充。
因此,本申请实施方式中,过将个人知识图谱与通用知识图谱结合,采用关系补全、推理等功能,挖掘个人知识图谱中各个节点之间存在的更深层次的关系。例如用户关注王小菲和歌曲《红豆》,可以通过通用知识图谱,挖掘出王菲与红豆之间的演唱关系,从而挖掘到更深层次的信息。
三、基于PKG进行推荐
在从输入文本中抽取信息,得到输出序列之后,可以从PKG中查询与该输出序列中的实体对应的节点的信息,并使用该节点的信息生成针对用户的推荐信息。
具体地,PKG可以应用于各种针对用户的推荐场景,如输入法推荐、搜索推荐、行程提醒或者商品推荐等。
一种可能的实施方式中,可以应用于实体预测,如在从输入文本中抽取到的实体之后,可以基于该实体从PKG中查询关联节点,并从关联节点中预测用户即将输入的实体,并在用户的显示界面中进行推荐。
实体预测截图改善PKG推荐时推荐用户当前输入相关的各类信息而造成的信息杂乱甚至预测准确率降低的问题。例如当用户输入“我明天要去”时,此时的推荐类型应当以地名为主(当然也可能存在其他类型的场景),基于PKG可以优先推荐相关的地名类型的实体,能够提高预测的准确率,提升用户体验。
具体地,可以使用regex与贝叶斯概率模型结合的方法。对于常见的简单表述,能够 直接通过正则得到推荐的类型,将各个类型按照发生可能性的大小进行排列。复杂表述则使用神经网络来进行计算,通过贝叶斯概率模型计算得到不同文本后接不同实体类型的概率,并依此给出推荐类型。得到预测的推荐类型列表之后,再结合用户当前涉及的实体以及不同实体的权重等多种因素,进行更好地推荐。
通常,基于知识图谱的推荐方式可以包括基于Embeddig的方式,基于路径的方式,以及结合Embeddig与路径的方式等。示例性地,本申请实施方式中,可以基于PKG的路径进行推荐排序。具体地,可以结合获得的推荐类型、意图类型及节点权重进行推荐。PKG中以事件为组织结构,以事件为桥梁连接不同类型的实体,从而更灵活地设计路径,这种方式很好地解决了实体不属于同一个领域的场景中的推荐问题。推荐的规则可以包括,如图14所示,首先根据推荐类型和事件类型,在PKG中搜索与实体列表相关的意图节点,接着获取该意图节点连接的其他节点并按权重进行排序,选出权重最高的词条作为推荐词。推荐可以根据PKG已有节点进行推荐。例如,以电影为例:用户多次提到信条等科幻电影,那么在推荐过程中结合用户意图等会推荐“信条”、“科幻电影”词条。此外可以结合PKG和CKG,将推理出的节点的延伸词条加入到推荐系统里作为用户的特征向量。
因此,本申请实施方式中,不仅能根据用户信息构建体现用户特征的个人知识图谱,而且可以通过推理对未发现关系的知识进行补充和扩展,实现对用户现有知识的补全和扩展,并将推理内容列入推荐的范围,从而可以为用户推荐更多信息,且可以实现跨域推荐,提高用户体验。例如用户关注多个科幻电影,如《复仇者》、《信条》、《钢铁》等,根据推理挖掘出用户潜在的兴趣关注点——科幻电影,将科幻电影加入到个人知识图谱中,并在相应语境为用户推荐“科幻电影”。跨域推理可以结合不同垂域的内容进行推理,例如,用户关注歌曲《红豆》、电影《天下》等内容,跨域推理出用户的潜在关注包括王小菲。
并且,本申请提供的个人知识图谱中可以包括更细粒度的信息,从而具有更细的个性化推荐粒度,结合用户数据、推荐、意图类型以及权重对用户进行细粒度推荐。其中由模型获得的推荐类型(例如人名)和意图类型(例如娱乐)为推荐提供重要信息,此外权重体现出用户对事物的关注程度不同。
此外,采用了知识图谱对用户的行为操作数据进行了存储,构建了专属于用户本人的、独一无二的个人知识图谱。以往的推荐系统采用表的形式进行用户数据的组织,在存储的清晰度和查找的效率上相较于图的存储存在一定差异。并且图谱的存储方式能够迅速获取到与当前内容有直接相关或n跳相关的内容,表的数据存储则需要较长的时间进行查询和访问。
下面示例性地,对一些可能的应用场景进行介绍。
场景一
本申请提供的方法可以部署于终端,用户可以在通讯软件中接收或者发送信息,一种GUI如图15所示,用户可以在通讯软件中发送消息,此时可以将用户发送的消息作为输入文本,从输入文本中抽取出其中的实体以及实体之间的关系,并构建PKG。
随后,其中一种GUI如图16所示,当用户在输入界面中输入文本时,可以从PKG中筛选出匹配的文本,并预测用户即将输入的文本,并在显示界面中显示,从而使用户可以快 速实现输入。
场景二
一种GUI如图17所示,本申请提供的方法可以部署于终端,用户可以在搜索程序中,获取用户输入的文本,可以从用户输入的文本中提取实体的信息,如“王菲”、“红豆”,以及实体对应的情感类别等,并增加至PKG中。
随后,一种GUI如图18所示,当用户在输入界面中输入文本时,可以从PKG中筛选出匹配的文本,并预测用户即将输入的文本,并在显示界面中显示,从而使用户可以快速实现输入。
场景三
一种GUI如图19所示,本申请提供的方法可以部署于终端,用户可以在日历APP中进行输入,终端可以获取用户结构化日程事件文本,得到结构化数据,并从结构化数据中提取到事件要素对应的实体、时间等信息,并将抽取到的信息增加至PKG中,从而对用户的日程进行记录,以便于及时提醒用户。
场景四
针对终端中部署的语音助手的场景,当用户在语音助手界面中输入文本时,可以在PKG中筛选出用户输入的文本对应的节点,并进一步筛选出关联节点。并在语音助手的显示界面中显示关联节点的信息。并且,在对关联节点的信息进行显示时,还可以对关联节点的信息进行排序,结合每个节点的权重值,按照权重值从大到小分别进行排序,如将权重值较大的节点的信息排列至用户更方便输入的位置。
例如,一种GUI如图20所示,用户可以在语音助手中询问“王萌”的联系方式,终端可以在PKG中查询与实体“王萌”相关的信息,然后筛选出类别为联系方式的信息,并在终端的显示界面中显示。
又例如,一种GUI如图21所示,用户可以请求语音助手播放音乐,终端可以在PKG中查找与音乐相关的信息,如查询到音乐“红豆”,则可以播放音乐“红豆”。
还例如,一种GUI如图22所示,可以通过用户的日常输入数据,学习到用户的偏好信息,并作为搜索推荐引擎的推荐信息使用。
上述结合应用场景对本申请提供的信息获取方法的各个步骤进行了详细介绍,下面对本申请提供的信息获取方法所部署的架构以及完整的应用场景进行示例性介绍。需要说明的是,以下仅概括性介绍架构,其架构下的各个模块的具体执行步骤可以参阅前述图4-图22,以下不再赘述。
示例性地,本申请提供的信息获取方法可以部署于终端,其在终端所部署的架构可以如图23所示。
其中,应用场景层可以包括终端中安装的业务应用程序(APP),应用场景层与算法层之间通过算法接口连接,业务APP可以接收用户数据集,并接收算法层反馈的搜索/推荐引擎适用的数据。该架构的核心部分为算法层,可以包括多个模块:1)PKG(知识系统)的构建,主要涉及:a、用户行为数据(文本数据)的学习即知识抽取,b、用户知识生成和存储;c、用户知识图谱构建,2)PKG的扩充:知识推理和知识更新补全;3)PKG的使用: 如意图预测和知识排序。
数据管理层用于保存或者管理用户数据,如PKG、CKG或schema等可以保存于数据管理层,为算法层提供数据存储、管理等功能,为查询引擎或者推理过程的基础平台。
本申请提供的信息获取方法的完整流程可以如图24所示。
其中,可以获取用户数据,如用户的输入数据,可以包括结构化数据和非结构化数据。
对输入数据进行信息抽取,抽取出实体以及实体之间的关联关系,还对实体形成的事件的类型(即意图类别)或者事件要素等进行分析,还对实体进行情感分析,分析出情感类别。
还可以从输入数据中提取到事件信息,并按照预先设定的格式来进行存储。
因此,在用户知识抽取部分,可以抽取到事件信息、情感类别、意图类别(即事件类型)、实体事件的关联关系以及事件要素等用户知识,存储用户知识。
此外,还从通用知识图谱(即CKG)中查询与用户知识相关的信息,并基于这些信息对用户知识进行更新或者补全等,从而得到更完整的用户知识。
将用户知识基于预先定义的schema更新至PKG中,即图谱构建。例如,构建得到的PKG可以如图25所示,其中,以目标用户“我”为中心,保存与目标用户相关的各种类型的事件,其中具有关联关系的节点相连接。
同时还通过记忆衰减机制为每个节点设置权重,从而通过权重的方式来实现对用户知识的记忆,从而可以更有效地针对目标用户进行推荐。
在应用场景中,可以基于从输入数据中抽取到的信息来进行事件类型预测(即意图预测),并从PKG中查询预测信息,按照每个节点的权重对预测信息进行排序并为用户推荐,提高用户体验。
前述对本申请提供的方法的流程进行了详细介绍,下面对执行上述方法的装置进行介绍。
参阅图26,本申请提供的一种信息获取装置的结构示意图。该信息获取装置可以包括:
输入模块2601,用于获取目标用户的输入文本,输入文本中包括至少一个词,至少一个词形成至少一个事件;
文本处理模块2602,用于基于输入文本获取输出序列,输出序列中包括至少一个事件的类型和要素;
获取模块2603,用于根据输出序列获取个人知识图谱,个人知识图谱中包括多个节点,多个节点包括类型节点和要素节点,类型节点用于表示至少一个事件的类型,要素节点用于表示至少一个事件的要素,同一个事件中的类型对应的类型节点和要素对应的要素节点相关联类型节点与对应于同一个事件的要素节点相关联,个人知识图谱用于为目标用户进行推荐。
在一种可能的实施方式中,若输出序列中包括还每个事件的要素之间的关联关系,则个人知识图谱中同一事件具有关联关系的要素对应的要素节点之间相关联;若输出序列中还包括情感类别,则个人知识图谱中同一事件对应的要素节点之间通过情感类别相关联。
在一种可能的实施方式中,获取模块2603,具体用于:若初始知识图谱中包括第一事 件的信息,则更新初始知识图谱中包括的第一事件对应的要素节点以及要素节点之间的关联关系,得到个人知识图谱,第一事件是至少一个事件中的任意一个事件;若初始知识图谱中不包括第一事件的信息,则在初始知识图谱中增加第一事件的类型和要素对应的节点,并将第一事件的类型节点和要素节点进行关联,得到个人知识图谱。
在一种可能的实施方式中,获取模块2603,具体用于:通过文本处理模型得到输入文本对应的初始序列,初始序列中包括输入文本中的至少一个词向量表示以及至少一个词对应的第一类别标签;对输入文本进行句法分析,得到特征序列,特征序列包括至少一个词对应的第二类别标签;结合初始序列和特征序列得到输出序列,输出序列中包括至少一个事件的要素和类型。
在一种可能的实施方式中,文本处理模块2602,具体用于:对初始序列中与特征序列不匹配的部分进行修正,得到输出序列。
在一种可能的实施方式中,文本处理模块2602,还用于:若特征序列中每个词对应多种第二类别标签,则为每个词确定唯一的第二类别标签,得到更新后的特征序列。
在一种可能的实施方式中,文本处理模块2602,具体用于:根据输入文本,通过文本处理模型,得到初始序列,其中,文本处理模型用于执行以下步骤:对输入文本进行自然语言处理,得到特征序列和实体序列,实体序列包括至少一个词中每个词对应的向量表示,特征序列中包括输入文本对应的特征向量;获取实体序列中的向量对应的位置信息;融合位置信息和特征序列,得到融合序列;对融合序列对应的实体进行分类,得到标签序列,初始序列中包括每个词对应的向量表示以及标签序列。
在一种可能的实施方式中,该装置还包括,扩充模块2604,用于:获取第一知识图谱,第一知识图谱中包括多个节点,该多个节点包括至少一种实体的信息,该第一个人知识图谱中的节点可以表示一种实体,或者,可以表示事件的要素或者类型;从第一知识图谱中获取与个人知识图谱中的节点相关联的关联信息;使用关联信息对个人知识图谱进行扩充,得到扩充后的个人知识图谱。
在一种可能的实施方式中,装置还包括,推荐模块2605,用于:从个人知识图谱中获取与输出序列匹配的至少一个节点的信息;根据至少一个节点的信息为目标用户生成推荐信息,推荐信息用于针对目标用户进行推荐。
在一种可能的实施方式中,推荐模块2605,具体用于:从个人知识图谱中筛选出输出序列对应的至少一个第一节点的信息;从个人知识图谱中查找与至少一个第一节点关联的至少一个第二节点的信息,至少一个节点的信息包括至少一个第一节点的信息和至少一个第二节点的信息。
在一种可能的实施方式中,第一节点的信息和第二节点的信息为不同域的信息。
在一种可能的实施方式中,个人知识图谱中每个节点包括对应的权重,每个节点的权重与保存时长或者更新时长呈负相关关系,保存时长为保存每个节点的信息的时长,更新时长为距离上一次更新每个节点中包括的信息的时长。
在一种可能的实施方式中,推荐模块,具体用于:根据至少一个节点对应的权重,对至少一个节点进行排序;根据至少一个节点的信息以及至少一个节点的排序生成推荐信息。
在一种可能的实施方式中,输入模块2601,具体用于:获取用户输入数据,输入数据包括图像、文本或者语音中的至少一种数据;从输入数据中提取输入文本。
在一种可能的实施方式中,
输入模块2601,还用于获取目标用户的结构化数据,结构化数据为预设格式的数据;
获取模块2603,还用于按照预设规则从结构化数据中提取至少一个事件的信息;
获取模块2603,还用于根据至少一个事件的信息对个人知识图谱进行更新,得到更新后的个人知识图谱。
请参阅图27,本申请提供的另一种信息获取装置的结构示意图,如下所述。
该信息获取装置可以包括处理器2701和存储器2702。该处理器2701和存储器2702通过线路互联。其中,存储器2702中存储有程序指令和数据。
存储器2702中存储了前述图4-图25中的步骤对应的程序指令以及数据。
处理器2701用于执行前述图4-图25中任一实施例所示的信息获取装置执行的方法步骤。
可选地,该信息获取装置还可以包括收发器2703,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图6-图25所示实施例描述的方法中的步骤。
可选地,前述的图27中所示的信息获取装置为芯片。
请参阅图28,本申请提供的另一种电子设备的结构示意图,如下所述。
该电子设备可以包括处理器2801和存储器2802。该处理器2801和存储器2802通过线路互联。其中,存储器2802中存储有程序指令和数据。
存储器2802中存储了前述图4-图25中的步骤对应的程序指令以及数据。
处理器2801用于执行前述图4-图25所示的电子设备执行的方法步骤。
可选地,该电子设备还可以包括收发器2803,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图4-图25所示实施例描述的方法中的步骤。
可选地,前述的图28中所示的电子设备为芯片。
本申请实施例还提供了一种信息获取装置,该信息获取装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图4-图25的方法步骤。
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器2701、处理器2801,或者处理器2701、处理器2801的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中信息获取装置、信息获取装置或者电子设备执行的动作。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上行驶时,使得计算机执行如前述图4-图25所示实施例描述的方法的步骤。
本申请实施例提供的信息获取装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图6-图25所示实施例描述的信息获取方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。
示例性地,请参阅图29,图29为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 290,NPU 290作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2903,通过控制器2904控制运算电路2903提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2903内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路2903是二维脉动阵列。运算电路2903还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2903是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2902中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2901中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2908中。
统一存储器2906用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)2905,DMAC被搬运到权重存储器2902中。输入数据也通过DMAC被搬运到统一存储器2906中。
总线接口单元(bus interface unit,BIU)2910,用于AXI总线与DMAC和取指存储器(instruction fetch buffer,IFB)2909的交互。
总线接口单元2910(bus interface unit,BIU),用于取指存储器2909从外部存储器获取指令,还用于存储单元访问控制器2905从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2906或将权重数据搬 运到权重存储器2902中或将输入数据数据搬运到输入存储器2901中。
向量计算单元2907包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2907能将经处理的输出的向量存储到统一存储器2906。例如,向量计算单元2907可以将线性函数和/或非线性函数应用到运算电路2903的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2907生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2903的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2904连接的取指存储器(instruction fetch buffer)2909,用于存储控制器2904使用的指令;
统一存储器2906,输入存储器2901,权重存储器2902以及取指存储器2909均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,循环神经网络中各层的运算可以由运算电路2903或向量计算单元2907执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图4-图25的方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
Claims (26)
- 一种信息获取方法,其特征在于,包括:获取目标用户的输入文本,所述输入文本中包括至少一个词,所述至少一个词形成至少一个事件;基于所述输入文本获取输出序列,所述输出序列中包括所述至少一个事件的类型和要素;根据所述输出序列获取个人知识图谱,所述个人知识图谱中包括多个节点,所述多个节点包括类型节点和要素节点,所述类型节点用于表示所述至少一个事件的类型,所述要素节点用于表示所述至少一个事件的要素,同一个事件中的类型节点和要素节点相关联,所述个人知识图谱用于为所述目标用户进行推荐。
- 根据权利要求1所述的方法,其特征在于,若所述输出序列中还包括所述至少一个事件的要素之间的关联关系,则所述个人知识图谱中同一事件具有关联关系的要素对应的要素节点之间相关联;或者,若所述输出序列中还包括所述至少一个事件的情感类别,则所述个人知识图谱中同一事件对应的要素节点之间通过所述情感类别相关联。
- 根据权利要求1或2所述的方法,其特征在于,所述输出序列中包括第一事件的类型和要素,所述第一事件是所述至少一个事件中的任意一个事件;所述根据所述输出序列获取个人知识图谱,包括:若初始知识图谱中包括所述第一事件的信息,则更新所述初始知识图谱中所述第一事件对应的要素节点或要素节点之间的关联关系,得到所述个人知识图谱;若所述初始知识图谱中不包括所述第一事件的信息,则在所述初始知识图谱中增加所述第一事件的类型节点和要素节点,并将所述第一事件的类型节点和要素节点进行关联,得到所述个人知识图谱。
- 根据权利要求1-3中任一项所述的方法,其特征在于,所述基于所述输入文本获取输出序列,包括:通过文本处理模型得到所述输入文本对应的初始序列,所述初始序列中包括所述输入文本中的至少一个词的向量表示以及所述至少一个词对应的第一类别标签;对所述输入文本进行句法分析,得到特征序列,所述特征序列包括所述至少一个词对应的第二类别标签;结合所述初始序列和所述特征序列得到所述输出序列,所述输出序列中包括所述至少一个事件的要素和类型。
- 根据权利要求4所述的方法,其特征在于,所述结合所述初始序列和所述特征序列得到所述输出序列,包括:对所述初始序列中与所述特征序列不匹配的部分进行修正,得到所述输出序列。
- 根据权利要求4或5所述的方法,其特征在于,所述方法还包括:若所述特征序列中每个词对应多种第二类别标签,则为所述每个词确定唯一的第二类别标签,得到更新后的特征序列。
- 根据权利要求1-6中任一项所述的方法,其特征在于,所述方法还包括:获取第一知识图谱,所述第一知识图谱中包括多个节点,所述多个节点中包括至少一种实体的信息;从所述第一知识图谱中获取与所述个人知识图谱中的节点相关联的关联信息;使用所述关联信息对所述个人知识图谱进行扩充,得到扩充后的个人知识图谱。
- 根据权利要求1-7中任一项所述的方法,其特征在于,所述方法还包括:从所述个人知识图谱中获取与所述输出序列匹配的至少一个节点的信息;根据所述至少一个节点的信息为所述目标用户生成推荐信息,所述推荐信息用于针对所述目标用户进行推荐。
- 根据权利要求8所述的方法,其特征在于,所述个人知识图谱中每个节点包括对应的权重,每个节点的权重与保存时长或者更新时长呈负相关关系,所述保存时长为保存所述每个节点的信息的时长,所述更新时长为距离上一次更新所述每个节点中包括的信息的时长。
- 根据权利要求9所述的方法,其特征在于,所述根据所述至少一个节点的信息为所述目标用户生成推荐信息,包括:根据所述至少一个节点对应的权重,对所述至少一个节点进行排序;根据所述至少一个节点的信息以及所述至少一个节点的排序生成所述推荐信息。
- 根据权利要求1-10中任一项所述的方法,其特征在于,所述方法还包括:获取所述目标用户的结构化数据,所述结构化数据为预设格式的数据;按照预设规则从所述结构化数据中提取至少一个事件的信息;根据所述至少一个事件的信息对所述个人知识图谱进行更新,得到更新后的个人知识图谱。
- 一种图形用户界面GUI,其特征在于,所述图形用户界面存储在电子设备中,所述电子设备包括显示屏、存储器、一个或多个处理器,所述一个或多个处理器用于执行存储在该存储器中的一个或多个计算机程序,所述图形用户界面包括:响应于目标用户的输入操作生成个人知识图谱,显示所述个人知识图谱,其中,所述 目标用户的输入文本中包括至少一个词,所述至少一个词形成至少一个事件,所述个人知识图谱中包括多个节点,所述多个节点包括类型节点和要素节点,所述类型节点用于表示所述至少一个事件的类型,所述要素节点用于表示所述至少一个事件的要素,同一个事件中的类型节点和要素节点相关联所述个人知识图谱用于为所述目标用户进行推荐。
- 根据权利要求12所述的GUI,其特征在于,所述GUI还包括:响应于从所述第一知识图谱中获取与所述个人知识图谱中的节点相关联的关联信息,使用所述关联信息对所述个人知识图谱进行扩充后得到扩充后的个人知识图谱,显示所述扩充后的个人知识图谱,所述第一知识图谱中包括多个节点,每个节点包括至少一种实体的信息。
- 根据权利要求12或13所述的GUI,其特征在于,所述GUI还包括:响应于根据所述个人知识图谱中获取到的至少一个节点的信息为所述目标用户生成推荐信息,显示所述推荐信息,所述推荐信息用于针对所述目标用户进行推荐。
- 一种信息获取装置,其特征在于,包括:输入模块,用于获取目标用户的输入文本,所述输入文本中包括至少一个词,所述至少一个词形成至少一个事件;文本处理模块,用于基于所述输入文本获取输出序列,所述输出序列中包括所述至少一个事件的类型和要素;获取模块,用于根据所述输出序列获取个人知识图谱,所述个人知识图谱中包括多个节点,所述多个节点包括类型节点和要素节点,所述类型节点用于表示所述至少一个事件的类型,所述要素节点用于表示所述至少一个事件的要素,同一个事件中的类型节点和要素节点相关联,所述个人知识图谱用于为所述目标用户进行推荐。
- 根据权利要求15所述的装置,其特征在于,若所述输出序列中还包括所述至少一个事件的要素之间的关联关系,则所述个人知识图谱中同一事件具有关联关系的要素对应的要素节点之间相关联;或者,若所述输出序列中还包括所述至少一个事件的情感类别,则所述个人知识图谱中同一事件对应的要素节点之间通过所述情感类别相关联。
- 根据权利要求15或16所述的装置,其特征在于,所述输出序列中包括第一事件的类型和要素,所述第一事件是所述至少一个事件中的任意一个事件;所述获取模块,具体用于:若初始知识图谱中包括所述第一事件的信息,则更新所述初始知识图谱中所述第一事件对应的要素节点或要素节点之间的关联关系,得到所述个人知识图谱;若所述初始知识图谱中不包括所述第一事件的信息,则在所述初始知识图谱中增加所 述第一事件的类型节点和要素节点,并将所述第一事件的类型节点和要素节点进行关联,得到所述个人知识图谱。
- 根据权利要求15-17中任一项所述的装置,其特征在于,所述文本处理模块,具体用于:通过文本处理模型得到所述输入文本对应的初始序列,所述初始序列中包括所述输入文本中的至少一个词的向量表示以及所述至少一个词对应的第一类别标签;对所述输入文本进行句法分析,得到特征序列,所述特征序列包括所述至少一个词对应的第二类别标签;结合所述初始序列和所述特征序列得到所述输出序列,所述输出序列中包括所述至少一个事件的要素和类型。
- 根据权利要求15-18中任一项所述的装置,其特征在于,所述装置还包括,扩充模块,用于:获取第一知识图谱,所述第一知识图谱中包括多个节点,每个节点包括至少一种实体的信息;从所述第一知识图谱中获取与所述个人知识图谱中的节点相关联的关联信息;使用所述关联信息对所述个人知识图谱进行扩充,得到扩充后的个人知识图谱。
- 根据权利要求15-19中任一项所述的装置,其特征在于,所述装置还包括,推荐模块,用于:从所述个人知识图谱中获取与所述输出序列匹配的至少一个节点的信息;根据所述至少一个节点的信息为所述目标用户生成推荐信息,所述推荐信息用于针对所述目标用户进行推荐。
- 根据权利要求20所述的装置,其特征在于,所述个人知识图谱中每个节点包括对应的权重,每个节点的权重与保存时长或者更新时长呈负相关关系,所述保存时长为保存所述每个节点的信息的时长,所述更新时长为距离上一次更新所述每个节点中包括的信息的时长。
- 根据权利要求21所述的装置,其特征在于,所述推荐模块,具体用于:根据所述至少一个节点对应的权重,对所述至少一个节点进行排序;根据所述至少一个节点的信息以及所述至少一个节点的排序生成所述推荐信息。
- 一种信息获取装置,其特征在于,包括至少一个处理器和存储器,所述至少一个处理器与所述存储器耦合,用于读取并执行所述存储器中的指令,以执行如权利要求1-11任一项所述的方法。
- 一种电子设备,其特征在于,包括:处理器;存储器;所述存储器存储一个或多个计算机程序,所述一个或多个计算机程序包括指令,当所述指令被所述一个或多个处理器执行时,使得所述电子设备执行如权利要求1-11任一项所述的方法。
- 一种计算机可读存储介质,包括程序,当其被处理单元所执行时,执行如权利要求1至11任一项所述的方法。
- 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至11任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180103399.4A CN118103834A (zh) | 2021-10-21 | 2021-10-21 | 一种信息获取方法以及装置 |
PCT/CN2021/125260 WO2023065211A1 (zh) | 2021-10-21 | 2021-10-21 | 一种信息获取方法以及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/125260 WO2023065211A1 (zh) | 2021-10-21 | 2021-10-21 | 一种信息获取方法以及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023065211A1 true WO2023065211A1 (zh) | 2023-04-27 |
Family
ID=86058655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/125260 WO2023065211A1 (zh) | 2021-10-21 | 2021-10-21 | 一种信息获取方法以及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118103834A (zh) |
WO (1) | WO2023065211A1 (zh) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523268A (zh) * | 2023-06-30 | 2023-08-01 | 广东中大管理咨询集团股份有限公司 | 一种基于大数据画像的人岗匹配分析方法及装置 |
CN116821712A (zh) * | 2023-08-25 | 2023-09-29 | 中电科大数据研究院有限公司 | 非结构化文本与知识图谱的语义匹配方法及装置 |
CN116932780A (zh) * | 2023-09-13 | 2023-10-24 | 之江实验室 | 天文知识图谱构建方法、资源查找方法、设备和介质 |
CN116955836A (zh) * | 2023-09-21 | 2023-10-27 | 腾讯科技(深圳)有限公司 | 推荐方法、装置、设备、计算机可读存储介质及程序产品 |
CN117076660A (zh) * | 2023-10-16 | 2023-11-17 | 浙江同花顺智能科技有限公司 | 一种信息推荐方法、装置、设备及存储介质 |
CN117436457A (zh) * | 2023-11-01 | 2024-01-23 | 人民网股份有限公司 | 反讽识别方法、装置、计算设备及存储介质 |
CN117540062A (zh) * | 2024-01-10 | 2024-02-09 | 广东省电信规划设计院有限公司 | 基于知识图谱的检索模型推荐方法及装置 |
CN117633254A (zh) * | 2024-01-26 | 2024-03-01 | 武汉大学 | 一种基于知识图谱的地图检索用户画像构建方法和系统 |
CN118051969A (zh) * | 2024-04-16 | 2024-05-17 | 卡奥斯工业智能研究院(青岛)有限公司 | 基于智能交互引擎的服装设计方法、装置、设备和介质 |
CN118193725A (zh) * | 2024-03-20 | 2024-06-14 | 广州市阿尔法软件信息技术有限公司 | 基于知识图谱的场景界面主动识别与智能化展示方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118333670A (zh) * | 2024-06-14 | 2024-07-12 | 智者四海(北京)技术有限公司 | 营销数据处理系统、方法、装置、设备及介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170039233A1 (en) * | 2015-08-07 | 2017-02-09 | Sap Se | Sankey diagram graphical user interface customization |
CN106503172A (zh) * | 2016-10-25 | 2017-03-15 | 天闻数媒科技(湖南)有限公司 | 基于知识图谱推荐学习路径的方法和装置 |
CN109800300A (zh) * | 2019-01-08 | 2019-05-24 | 广东小天才科技有限公司 | 一种学习内容推荐方法及系统 |
CN110008349A (zh) * | 2019-02-01 | 2019-07-12 | 阿里巴巴集团控股有限公司 | 计算机执行的事件风险评估的方法及装置 |
CN110334159A (zh) * | 2019-05-29 | 2019-10-15 | 苏宁金融服务(上海)有限公司 | 基于关系图谱的信息查询方法和装置 |
CN111191046A (zh) * | 2019-12-31 | 2020-05-22 | 北京明略软件系统有限公司 | 一种实现信息搜索的方法、装置、计算机存储介质及终端 |
WO2021139191A1 (zh) * | 2020-01-08 | 2021-07-15 | 华为技术有限公司 | 数据标注的方法以及数据标注的装置 |
-
2021
- 2021-10-21 CN CN202180103399.4A patent/CN118103834A/zh active Pending
- 2021-10-21 WO PCT/CN2021/125260 patent/WO2023065211A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170039233A1 (en) * | 2015-08-07 | 2017-02-09 | Sap Se | Sankey diagram graphical user interface customization |
CN106503172A (zh) * | 2016-10-25 | 2017-03-15 | 天闻数媒科技(湖南)有限公司 | 基于知识图谱推荐学习路径的方法和装置 |
CN109800300A (zh) * | 2019-01-08 | 2019-05-24 | 广东小天才科技有限公司 | 一种学习内容推荐方法及系统 |
CN110008349A (zh) * | 2019-02-01 | 2019-07-12 | 阿里巴巴集团控股有限公司 | 计算机执行的事件风险评估的方法及装置 |
CN110334159A (zh) * | 2019-05-29 | 2019-10-15 | 苏宁金融服务(上海)有限公司 | 基于关系图谱的信息查询方法和装置 |
CN111191046A (zh) * | 2019-12-31 | 2020-05-22 | 北京明略软件系统有限公司 | 一种实现信息搜索的方法、装置、计算机存储介质及终端 |
WO2021139191A1 (zh) * | 2020-01-08 | 2021-07-15 | 华为技术有限公司 | 数据标注的方法以及数据标注的装置 |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523268B (zh) * | 2023-06-30 | 2023-09-26 | 广东中大管理咨询集团股份有限公司 | 一种基于大数据画像的人岗匹配分析方法及装置 |
CN116523268A (zh) * | 2023-06-30 | 2023-08-01 | 广东中大管理咨询集团股份有限公司 | 一种基于大数据画像的人岗匹配分析方法及装置 |
CN116821712B (zh) * | 2023-08-25 | 2023-12-19 | 中电科大数据研究院有限公司 | 非结构化文本与知识图谱的语义匹配方法及装置 |
CN116821712A (zh) * | 2023-08-25 | 2023-09-29 | 中电科大数据研究院有限公司 | 非结构化文本与知识图谱的语义匹配方法及装置 |
CN116932780A (zh) * | 2023-09-13 | 2023-10-24 | 之江实验室 | 天文知识图谱构建方法、资源查找方法、设备和介质 |
CN116932780B (zh) * | 2023-09-13 | 2024-01-09 | 之江实验室 | 天文知识图谱构建方法、资源查找方法、设备和介质 |
CN116955836A (zh) * | 2023-09-21 | 2023-10-27 | 腾讯科技(深圳)有限公司 | 推荐方法、装置、设备、计算机可读存储介质及程序产品 |
CN116955836B (zh) * | 2023-09-21 | 2024-01-02 | 腾讯科技(深圳)有限公司 | 推荐方法、装置、设备、计算机可读存储介质及程序产品 |
CN117076660A (zh) * | 2023-10-16 | 2023-11-17 | 浙江同花顺智能科技有限公司 | 一种信息推荐方法、装置、设备及存储介质 |
CN117076660B (zh) * | 2023-10-16 | 2024-01-26 | 浙江同花顺智能科技有限公司 | 一种信息推荐方法、装置、设备及存储介质 |
CN117436457A (zh) * | 2023-11-01 | 2024-01-23 | 人民网股份有限公司 | 反讽识别方法、装置、计算设备及存储介质 |
CN117436457B (zh) * | 2023-11-01 | 2024-05-03 | 人民网股份有限公司 | 反讽识别方法、装置、计算设备及存储介质 |
CN117540062A (zh) * | 2024-01-10 | 2024-02-09 | 广东省电信规划设计院有限公司 | 基于知识图谱的检索模型推荐方法及装置 |
CN117540062B (zh) * | 2024-01-10 | 2024-04-12 | 广东省电信规划设计院有限公司 | 基于知识图谱的检索模型推荐方法及装置 |
CN117633254A (zh) * | 2024-01-26 | 2024-03-01 | 武汉大学 | 一种基于知识图谱的地图检索用户画像构建方法和系统 |
CN117633254B (zh) * | 2024-01-26 | 2024-04-05 | 武汉大学 | 一种基于知识图谱的地图检索用户画像构建方法和系统 |
CN118193725A (zh) * | 2024-03-20 | 2024-06-14 | 广州市阿尔法软件信息技术有限公司 | 基于知识图谱的场景界面主动识别与智能化展示方法 |
CN118051969A (zh) * | 2024-04-16 | 2024-05-17 | 卡奥斯工业智能研究院(青岛)有限公司 | 基于智能交互引擎的服装设计方法、装置、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN118103834A (zh) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023065211A1 (zh) | 一种信息获取方法以及装置 | |
US20210397980A1 (en) | Information recommendation method and apparatus, electronic device, and readable storage medium | |
AU2016256753B2 (en) | Image captioning using weak supervision and semantic natural language vector space | |
US20210248136A1 (en) | Differentiation Of Search Results For Accurate Query Output | |
US9535902B1 (en) | Systems and methods for entity resolution using attributes from structured and unstructured data | |
JP6745384B2 (ja) | 情報をプッシュするための方法及び装置 | |
AU2011269676B2 (en) | Systems of computerized agents and user-directed semantic networking | |
US20170200066A1 (en) | Semantic Natural Language Vector Space | |
CN111615706A (zh) | 基于子流形稀疏卷积神经网络分析空间稀疏数据 | |
CN112231569B (zh) | 新闻推荐方法、装置、计算机设备及存储介质 | |
GB2546360A (en) | Image captioning with weak supervision | |
CN113704388A (zh) | 多任务预训练模型的训练方法、装置、电子设备和介质 | |
US20230096118A1 (en) | Smart dataset collection system | |
CN110619050A (zh) | 意图识别方法及设备 | |
CN116975615A (zh) | 基于视频多模态信息的任务预测方法和装置 | |
CN113806588A (zh) | 搜索视频的方法和装置 | |
CN115114395A (zh) | 内容检索及模型训练方法、装置、电子设备和存储介质 | |
CN117009650A (zh) | 一种推荐方法以及装置 | |
Zhang et al. | Exploring coevolution of emotional contagion and behavior for microblog sentiment analysis: a deep learning architecture | |
CN116186197A (zh) | 话题推荐方法、装置、电子设备及存储介质 | |
Greenberg et al. | Knowledge organization systems: A network for ai with helping interdisciplinary vocabulary engineering | |
CN116578729B (zh) | 内容搜索方法、装置、电子设备、存储介质和程序产品 | |
Yu et al. | A graph attention network under probabilistic linguistic environment based on Bi-LSTM applied to film classification | |
CN116628345B (zh) | 一种内容推荐方法、装置、电子设备和存储介质 | |
Li et al. | RSCOEWR: Radical-Based Sentiment Classification of Online Education Website Reviews |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21960961 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180103399.4 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |