CN114817697A - Method and device for determining label information, electronic equipment and storage medium - Google Patents

Method and device for determining label information, electronic equipment and storage medium Download PDF

Info

Publication number
CN114817697A
CN114817697A CN202110126378.6A CN202110126378A CN114817697A CN 114817697 A CN114817697 A CN 114817697A CN 202110126378 A CN202110126378 A CN 202110126378A CN 114817697 A CN114817697 A CN 114817697A
Authority
CN
China
Prior art keywords
text
features
word
keyword
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110126378.6A
Other languages
Chinese (zh)
Inventor
李天时
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110126378.6A priority Critical patent/CN114817697A/en
Publication of CN114817697A publication Critical patent/CN114817697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method and a device for determining label information, electronic equipment and a storage medium, and the method and the device can be applied to the fields of artificial intelligence, cloud technology, big data and the like. The method comprises the following steps: acquiring a to-be-processed text corresponding to interest information of an object, and extracting key words in the text of the to-be-processed text; extracting text features of a text to be processed; acquiring keyword characteristics of each candidate keyword, wherein the candidate keywords are keywords in a keyword lexicon; determining the text foreign key words corresponding to the text to be processed from the candidate key words based on the text features of the text to be processed and the semantic matching degree between the key word features; determining label information of the object based on the text keywords, wherein the text keywords comprise text inner keywords and text outer keywords. By adopting the mode, the tag information of the user can be expanded, and the richness of the tag information of the user is improved.

Description

Method and device for determining label information, electronic equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for determining tag information, an electronic device, and a storage medium.
Background
Currently, when mining a corresponding interest tag for a user, a to-be-processed text corresponding to interest information of the user needs to be processed, and a keyword of the to-be-processed text is extracted, so that the interest tag of the user is determined based on the extracted keyword. Although the method can determine the interesting tags of the user to a certain extent, the determined interesting tags have the problem of insufficient information to a certain extent, namely the information richness of the existing method for determining the interesting tags of the user still needs to be improved.
Disclosure of Invention
The embodiment of the application provides a method and a device for determining label information, electronic equipment and a storage medium, which expand the label information of a user and improve the richness and comprehensiveness of the label information of the user.
In one aspect, an embodiment of the present application provides a method for determining tag information, where the method includes:
acquiring a text to be processed corresponding to interest information of an object, and extracting key words in the text of the text to be processed;
extracting text features of the text to be processed;
acquiring keyword characteristics of each candidate keyword, wherein the candidate keywords are keywords in a keyword lexicon;
determining out-of-text keywords corresponding to the text to be processed from the candidate keywords based on the semantic matching degree between the text features of the text to be processed and the keyword features;
and determining label information of the object based on the text keywords, wherein the text keywords comprise the text inner key words and the text outer key words.
In one aspect, an embodiment of the present application provides a device for determining tag information, where the device includes:
the keyword processing module is used for acquiring a to-be-processed text corresponding to the interest information of the object and extracting keywords in the text of the to-be-processed text;
the text feature processing module is used for extracting text features of the text to be processed;
the keyword processing module is configured to obtain keyword features of each candidate keyword, and determine, based on a semantic matching degree between a text feature of the to-be-processed text and each keyword feature, an out-of-text keyword corresponding to the to-be-processed text from each candidate keyword, where the candidate keyword is a keyword in a keyword lexicon;
and the label information determining module is used for determining the label information of the object based on text keywords, wherein the text keywords comprise the text inner key words and the text outer key words.
In an optional embodiment, the keyword processing module is further configured to:
extracting the keyword characteristics of each candidate keyword, and storing the keyword characteristics of each candidate keyword into a keyword word bank;
and when the keyword characteristics of each candidate word are obtained, the pre-stored keyword characteristics of each candidate keyword are obtained from the keyword word bank.
In an optional embodiment, the text features of the text to be processed include first word features of words included in the text to be processed, and the keyword features include second word features of words included in the candidate keywords; the keyword processing module is configured to:
for any one of the keyword features, determining the semantic matching degree between the text feature of the text to be processed and the keyword feature in the following mode:
respectively determining the similarity between every two characteristics in each first word characteristic and each second word characteristic, wherein every two characteristics comprise a first word characteristic and a second word characteristic;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the similarity between every two features.
In an optional embodiment, the keyword processing module is configured to:
for any first word feature, determining a weight corresponding to each second word feature based on each first similarity corresponding to the first word feature, and performing weighted summation on each second word feature based on the weight corresponding to each second word feature to obtain an updated first word feature;
for any second word feature, determining a weight corresponding to each first word feature based on each second similarity corresponding to the second word feature, and performing weighted summation on each first word feature based on the weight corresponding to each first word feature to obtain an updated second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the updated first word features and the updated second word features.
In an optional embodiment, the keyword processing module is configured to:
for any one of the first word features, obtaining a local word feature corresponding to the first word feature based on the correlation between the first word feature and the updated first word feature, and obtaining a global word feature corresponding to the first word feature based on the first word feature, the updated first word feature and the local word feature corresponding to the first word feature;
for any one of the second word features, obtaining a local word feature corresponding to the second word feature based on the correlation between the second word feature and the updated second word feature, and obtaining a global word feature corresponding to the second word feature based on the second word feature, the updated second word feature and the local word feature corresponding to the second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the global word features corresponding to the first word features and the global word features corresponding to the second word features.
In an optional embodiment, the keyword processing module is configured to:
determining a first difference characteristic between the first word characteristic and the updated first word characteristic;
determining a first similar characteristic between the first word characteristic and the updated first word characteristic;
the local word features corresponding to the first word features comprise first difference features and first similar features;
determining a second difference characteristic between the second word characteristic and the updated second word characteristic;
determining a second similarity feature between the second word feature and the updated second word feature;
the local word features corresponding to the second word features comprise second difference features and second similar features.
In an optional embodiment, the text feature processing module is configured to:
extracting text features of the text to be processed through a feature extraction model based on the text to be processed;
the determining, from the candidate keywords, the out-of-text keywords corresponding to the text to be processed based on the text features of the text to be processed and the semantic matching degrees between the keyword features includes:
and determining the text foreign key words corresponding to the text to be processed from the candidate key words through a semantic matching model based on the text features of the text to be processed and the key word features.
In one aspect, an embodiment of the present application provides an information recommendation method, including:
acquiring information to be recommended and a text of the information to be recommended;
acquiring label information of each candidate recommendation object, wherein the label information of the candidate recommendation object is determined by adopting a mode in any optional embodiment in the method for determining the label information;
determining a target recommendation object from the candidate recommendation objects based on the matching degree between the text and the label information;
and recommending the information to be recommended to the target recommendation object.
In one aspect, an embodiment of the present application provides an information recommendation device, where the information recommendation device includes:
the information processing module to be recommended is used for acquiring information to be recommended and a text of the information to be recommended;
the system comprises a tag information acquisition module, a tag information determination module and a recommendation module, wherein the tag information acquisition module is used for acquiring tag information of each candidate recommendation object, and the tag information is determined by adopting a mode in a tag information determination method provided in any optional embodiment of the application;
a target recommendation object determining module, configured to determine a target recommendation object from the candidate recommendation objects based on a matching degree between the text and the tag information;
and the information recommending module is used for recommending the information to be recommended to the target recommending object.
In an optional embodiment, the target recommended object determining module is further configured to at least one of:
for any candidate recommendation object, acquiring text features of the text and object tag features of tag information of the candidate recommendation object, and determining matching degree between the text and the tag information based on the text features of the text and the object tag features of the tag information;
extracting text keywords of the text, and determining a matching degree between the text and the tag information based on the text keywords of the text and the tag information, wherein the text keywords include text inner keywords and text outer keywords, and the text keywords are determined in a manner of the tag information determination method provided in any optional embodiment of the present application.
In one aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where the processor and the memory are connected to each other; the memory is used for storing a computer program; the processor is configured to execute the method provided by any possible implementation manner of the determination method of the tag information and/or the method provided by any possible implementation manner of the information recommendation method when the computer program is called.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, where the computer program is executed by a processor to implement the method provided in any possible implementation manner of the method for determining the tag information and/or the method provided in any possible implementation manner of the method for recommending information.
In one aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the method provided by any possible implementation of the tag information determination method and/or the information recommendation method.
The beneficial effects of the embodiment of the application are that:
in the embodiment of the application, when a to-be-processed text corresponding to the information of interest of any object is processed, besides obtaining the text inner key words of the to-be-processed text, the text features of the to-be-processed text can be semantically matched with the key word features of each candidate key word, the text outer key words corresponding to the to-be-processed text are obtained through the semantic matching degree, and the label information of the object is determined based on the text inner key words and the text outer key words. By adopting the method, key information in the text of the to-be-processed text of the interesting information of the object is considered, and the key words outside the text corresponding to the to-be-processed text are also considered, so that the key word information corresponding to the to-be-processed text is more complete and comprehensive. That is to say, when determining the label information of the user, the label information of the user can be determined according to the key words in the text, and also can be determined according to the key words in the text and the key words outside the text, so that the label information of the user is expanded, and the richness and comprehensiveness of the label information of the user are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic structural diagram of an alternative tag information determination system according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an alternative method for determining tag information according to an embodiment of the present disclosure;
FIG. 3a is a schematic diagram illustrating an alternative method for determining foreign keywords according to an embodiment of the present application;
FIG. 3b is a schematic diagram illustrating an alternative method for processing a text/candidate keyword to be processed through a Bert model according to an embodiment of the present application;
FIG. 3c is a diagram illustrating an alternative keyword lexicon established according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an alternative semantic matching principle provided by an embodiment of the present application;
fig. 5 is a flowchart illustrating an alternative information recommendation method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an alternative tag information determination apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an alternative information recommendation device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The method for determining tag information provided by the embodiment of the application can be applied to the fields of natural language processing, machine learning and the like in the field of artificial intelligence, can also be applied to various fields of Cloud technology, such as Cloud computing and Cloud service in Cloud technology, and can also be applied to the field of related data computing processing in the field of big data.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics.
Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML for short) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The cloud technology is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. The method for determining the tag information provided by the embodiment of the application can be realized based on cloud computing (cloud computing) in cloud technology.
Cloud Computing refers to obtaining required resources in an on-demand and easily-extensible manner through a Network, and is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
An artificial intelligence cloud Service is also generally called AIaaS (AI as a Service). The method is a service mode of an artificial intelligence platform, and specifically, the AIaaS platform splits several types of common artificial intelligence services, and provides independent or packaged services, such as processing resource conversion requests, at a cloud.
Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the advent of the cloud era, big data has attracted more and more attention. The method for determining tag information provided in this embodiment needs a special technology based on big data, wherein the technology suitable for big data includes massively parallel processing of a database, data mining, a distributed file system, a distributed database, and the cloud computing.
As an example, fig. 1 shows a schematic structural diagram of a tag information determination system applied in the embodiment of the present application, and it can be understood that the tag information determination method provided in the embodiment of the present application can be applied to, but is not limited to, the application scenario shown in fig. 1.
In this example, as shown in fig. 1, the text processing system in this example may include, but is not limited to, a server 101 of an application program, a network 102, and a user terminal 103 of a client program in which the application program is installed, the user terminal 103 may communicate with the server 101 through the network 102, and the server 101 may determine tag information of a user (e.g., an object). The server 101 includes a database 1011 and a processing engine 1012. The user terminal 103 includes a man-machine interaction screen 1031 (user interface for application programs), a processor 1032 and a memory 1033. The man-machine interaction screen 1031 is used for a user (e.g., an object) to browse information of interest through the man-machine interaction screen. Processor 1032 is configured to process relevant operations for the user. The memory 1033 is used to store the information of interest.
As shown in fig. 1, a specific implementation procedure of the method for determining tag information in the present application may include steps S1-S4:
in step S1, for any object (i.e. user), it is possible to interact with the information of interest to the user, such as browsing, clicking, copying, collecting the information of interest, etc., through the man-machine interaction screen 1031 of the user terminal, where the memory 1033 is used for storing the information of interest.
Step S2, for any object, the processing engine 1012 in the server 101 acquires a to-be-processed text corresponding to the information of interest of the object, extracts a keyword in the text of the to-be-processed text, and extracts a text feature of the to-be-processed text; the database 1011 in the server 101 may be used to store information of interest, text to be processed, keywords in the text, and text features of the object.
In step S3, the processing engine 1012 in the server 101 obtains keyword features of each candidate keyword, where the candidate keywords are keywords in a keyword lexicon, and determines an out-of-text keyword corresponding to the text to be processed from each candidate keyword based on semantic matching degrees between the text features and the keyword features, where the database 1011 in the server 101 may also be used to store the out-of-text keyword.
The keyword features of each candidate keyword may be pre-stored in the local database 1011, and when in use, the processing engine 1012 may directly obtain the keyword features from the local database 1011.
In step S4, the processing engine 1012 in the server 101 determines the tag information of the object based on the text keywords, where the text keywords include the in-text keywords and the out-of-text keywords.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server or a server cluster providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, Wi-Fi, and other networks that enable wireless communication. The user terminal may be a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a notebook computer, a digital broadcast receiver, an MID (Mobile Internet Devices), a PDA (personal digital assistant), a desktop computer, a vehicle-mounted terminal (e.g., a vehicle-mounted navigation terminal), a smart speaker, a smart watch, etc., and the user terminal and the server may be directly or indirectly connected through wired or wireless communication, but are not limited thereto. The determination may also be based on the requirements of the actual application scenario, and is not limited herein.
Referring to fig. 2, fig. 2 is a schematic flowchart of a method for determining tag information, which may be executed by any electronic device, such as a server, or alternatively, may be completed by interaction between a user terminal and the server, and as shown in fig. 2, the method for determining tag information provided by the embodiment of the present application includes the following steps:
step S201, obtaining a to-be-processed text corresponding to the interest information of the object, and extracting a keyword in the text of the to-be-processed text.
Step S202, extracting text features of the text to be processed.
Step S203, obtaining the keyword characteristics of each candidate keyword, wherein the candidate keywords are keywords in a keyword lexicon.
And step S204, determining the out-of-text keywords corresponding to the text to be processed from the candidate keywords based on the semantic matching degree between the text features of the text to be processed and the keyword features.
Step S205, determining the tag information of the object based on a text keyword, where the text keyword includes the text inner keyword and the text outer keyword.
Alternatively, the above interest information may be understood as history information browsed by the user. For example, the information of interest may be advertisement information, APP description information, information article information, game promotion information, and the like. The form of the information of interest may include at least one of video, voice, picture, text, and the like, which is not limited herein.
The information of interest is processed to obtain a text to be processed corresponding to the information of interest, for example, advertisement information, APP description information, information article information, game promotion information are processed to obtain data such as corresponding advertisement title, APP description text information, text information of information article, and the like, and the data are used as the text to be processed. And then extracting words or phrases which have topic correlation and commercial property with the text to be processed from the text to be processed by a keyword extraction technology to obtain key words in the text corresponding to the text to be processed. The keywords in the text are original keywords obtained based on original text information of the text to be processed.
Specifically, the entity tag can be obtained by performing multi-industry named entity recognition on texts of a plurality of scenes, such as an advertisement clicked by a user, an article read by the user, a commodity title of a commodity purchased by the user, and a description of an APP installed or downloaded by the user, and the entity tag can be used as a key word in the text of the object.
And then, obtaining the text features of the text to be processed through feature extraction. And obtaining keyword characteristics of each candidate keyword, wherein the candidate keywords are keywords in a keyword lexicon, each candidate keyword may be a keyword in a pre-established keyword lexicon, and the keyword lexicon may include professional vocabularies of multiple industries, such as the game industry, the financial industry, the advertisement industry, the movie industry, and the like, which is not limited herein.
And performing semantic matching between the text features and the candidate keyword features to obtain a semantic matching degree, and determining at least one text foreign keyword with higher semantic matching degree with the text to be processed from the candidate keywords based on the semantic matching degree. The out-of-text keyword is a keyword other than the original keyword of the original text information of the text to be processed, which is related to the original keyword.
Then, label information of the object (the label information may also be referred to as a keyword label) is determined based on the text inner keyword and the text outer keyword together. For example, the in-text key word and the out-of-text key word may be directly determined as the tag information of the object, feature extraction may be performed on the in-text key word and the out-of-text key word, and the extracted feature information of the in-text key word and the out-of-text key word is used as the tag information of the object, which is not limited herein.
In an example, in the game industry, for the object 1, assuming that the label of the object 1 obtained by the in-text key word corresponding to the information interested in the object 1 is "real person", "street machine", "fishing", the out-text key word corresponding to the in-text key word can be determined in the above manner, and the label corresponding to the out-text key word is "street machine fishing" and "fishing reach", then "real person", "street machine", "fishing", "street machine fishing" and "fishing reach" can all be determined as the label information of the user, and at this time, the label information of the user is expanded, so that the richness of the label information of the user is improved.
As an example, when a user interacts with an interest information (e.g., advertisement information, etc.) (e.g., browses the advertisement information), a text to be processed may be extracted from the interest information, then an in-text keyword is extracted from the text to be processed, an out-text keyword corresponding to the advertisement information is used as an extension of an extraction result of the in-text keyword of the advertisement information, and the out-text keyword is used as tag information of the user to supplement the tag information of the user, so as to achieve a purpose of supplementing the mining of the tag information of the user under an advertisement service.
Specifically, a text to be processed is obtained from an advertisement title, then an external text keyword of the text to be processed is obtained according to the method, the internal text keyword and the external text keyword are combined, label information of a user is determined based on the combined text keyword, and an association relationship is established between the label information of the user and the user. When advertising information is delivered, users can be recalled according to the label information of the users, for example, the users are associated with a certain region, and when certain advertising information is targeted to the region, the users in the region related to the advertising information can be recalled.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
By the embodiment, when a to-be-processed text corresponding to the information of interest of any object is processed, besides obtaining the text inner key words of the to-be-processed text, the text features of the to-be-processed text can be semantically matched with the key word features of each candidate key word, the text outer key words corresponding to the to-be-processed text are obtained through the semantic matching degree, and the label information of the object is determined based on the text inner key words and the text outer key words. By adopting the method, key information in the text of the to-be-processed text of the interesting information of the object is considered, and the key words outside the text corresponding to the to-be-processed text are also considered, so that the key word information corresponding to the to-be-processed text is more complete and comprehensive. That is to say, when determining the label information of the user, the label information of the user can be determined according to the key words in the text, and also can be determined according to the key words in the text and the key words outside the text, so that the label information of the user is expanded, and the richness and comprehensiveness of the label information of the user are improved.
In the related art, when extracting keywords from an input text, only the keywords included in the input text are taken into consideration, which causes a problem of a small amount of information.
In order to solve this problem, the method may determine the foreign key word of the text to be processed in the following manner to perform information expansion on the text to be processed, and in an optional embodiment, the extracting text features of the text to be processed includes:
extracting text features of the text to be processed through a feature extraction model based on the text to be processed;
the determining, from the candidate keywords, the out-of-text keywords corresponding to the text to be processed based on the text features of the text to be processed and the semantic matching degrees between the keyword features includes:
and determining the text foreign key words corresponding to the text to be processed from the candidate key words through a semantic matching model based on the text features of the text to be processed and the key word features.
Optionally, feature extraction may be performed on the text to be processed through the feature extraction model, so as to obtain text features of the text to be processed. The over-feature extraction model may be a Bidirectional Encoder representation (Bert model) based on a Transformer.
And performing semantic matching on the text features of the text to be processed and the keyword features through a semantic matching model, and determining out-of-text keywords corresponding to the text to be processed from the candidate keywords. The semantic matching model may be a Natural Language understanding model (Enhanced LSTM for Natural Language understanding, ESIM for short).
Referring to fig. 3a, fig. 3a is a schematic diagram of an optional principle for determining a keyword outside a text, as shown in fig. 3a, performing word segmentation on a text to be processed to obtain each word segmentation corresponding to the text to be processed (i.e., each circle shown in the left part in fig. 3 a), inputting each word segmentation corresponding to the text to be processed to a presentation layer, i.e., to a Bert model shown in the figure, and predicting the text to be processed through the Bert model (i.e., performing feature extraction on the text to be processed) to obtain a text feature corresponding to the text to be processed.
For any one of the candidate keywords, when obtaining the keyword feature corresponding to the candidate keyword, there may be two ways, specifically as follows:
mode 1: as shown in fig. 3a, the pre-stored keyword features of the candidate keywords are directly obtained from the keyword thesaurus.
Mode 2: as shown in the dotted line portion of fig. 3a, performing word segmentation on the candidate keyword to obtain each word segmentation corresponding to the candidate keyword (i.e., each circle shown in the right part of fig. 3 a), inputting each word segmentation corresponding to the candidate keyword to a presentation layer, i.e., to a Bert model shown in the figure, and predicting the candidate keyword through the Bert model (i.e., performing feature extraction on the candidate keyword) to obtain a keyword feature corresponding to the candidate keyword.
Obtaining keyword features of all candidate keywords according to the mode 1 or the mode 2, performing semantic matching on the text features and the keyword features of all candidate keywords through an ESIM (electronic information storage) model shown in the figure, namely performing semantic matching on the text features and any keyword feature of all candidate keywords to obtain a semantic matching degree between the text features and all keyword features, obtaining scores of the text features and the keyword features of all candidate keywords based on the semantic matching degree, selecting part or all of at least one keyword feature from the keyword features of all candidate keywords based on the scores, and taking the candidate keyword corresponding to the selected part or all of at least one keyword feature as a text foreign keyword of the text features.
When the semantic matching degree is calculated, the semantic matching degree between the text features and any one of the keyword features can be calculated in a cosine similarity mode to obtain the semantic matching degree between the text features and all the keyword features. The cosine similarity, also called cosine similarity, is to evaluate the semantic matching degree of two key words by calculating the cosine value of the included angle between the key words.
Or, the text features and all the keyword features may be subjected to semantic matching degree operation by means of Nearest Neighbor Search (NN for short), K-Nearest Neighbor Search (K-NN for short), or Approximate Nearest Neighbor Search (ANN for short), so as to select part or all of at least one keyword feature from all the keyword features, and select a candidate keyword corresponding to part or all of the at least one keyword feature as a text foreign keyword.
It should be noted that, in the example shown in fig. 3a of the present application, the obtained external keywords of the text may perform information expansion on the keyword information of the text to be processed, and compared with a manner of extracting the keywords only by considering the information of the text to be processed, the information amount of the keywords included in the text to be processed is enriched, and the comprehensiveness of the keywords of the text to be processed is improved.
In actual business, a full word stock matching technology (i.e. the technology for semantically matching the text features with the candidate keywords in the keyword word stock) can be used for solving the problem that the number of the keywords extracted from the keywords in the text is insufficient. Further, for crowd label mining in each industry, word lists suitable for each industry can be used, and keyword mining (also called label information mining) of a specified industry can be performed in the text by combining the technology.
In the embodiment of the present application, a semantic matching model such as the Bert-ESIM in fig. 3a is used to extract the keywords, and compared with the case that a Recurrent Neural Network model (RNN for short) is used to extract the keywords, through experimental verification, the effect on the test data is shown in table 1, and as can be seen from table 1, when the Bert-ESIM model in this example is used to extract the keywords, the accuracy (Precision), Recall (Recall) and F-Measure (that is, F value) of the keyword extraction can be significantly improved.
The accuracy (Precision) is also called "Precision", "accuracy" or "Precision". Recall (Recall), also known as Recall. F-Measure is a weighted average of accuracy (P) and recall (R).
TABLE 1
Figure RE-GDA0003044631310000141
Moreover, after the keyword caching technology is used, the prediction efficiency is also obviously improved, and as shown in table 2, the time-consuming caching technology for predicting the keywords of one text to be processed is compared with the time-consuming caching technology before and after the keywords of the text to be processed are predicted on a 258 ten thousand online word bank as follows:
TABLE 2
Predicting 258 ten thousand candidate keywords
Before caching 120 minutes
After caching 1 minute
Further, the in-text key words and the out-text key words may be determined as tag information of the object. When information to be recommended is required to be recommended, a target recommendation object to be recommended by the information to be recommended can be determined from the candidate recommendation objects according to the matching degree between the text of the information to be recommended and the label information of the candidate recommendation objects, and the information to be recommended is recommended to the target recommendation object.
For example, the information to be recommended may be advertisement information that an advertiser wants to recommend, and text to be processed is extracted from the advertisement information, for example, an advertisement title is extracted from the advertisement information as the text to be processed, and so on. Then, the intra-text keywords (e.g., "leisure" and "catch fellow") of the text to be processed are extracted, the tag information of each candidate recommendation object is obtained, and if the tags of the object 1 are "true person", "street airplane", "catch", "street airplane catch" and "catch fellow", since the intra-text keywords of the text to be processed and the tags of the object 1 have matching keywords (i.e., "catch fellow"), the information to be recommended can be recommended to the object 1.
In the above process, as can be seen from the foregoing description, before the tag information of the object 1 is not extended, the tag information of the object 1 is "true person", "street machine", "fishing", and after the extension, the tag information of the object 1 is "true person", "street machine", "fishing", "street machine fishing", and "fishing reach". When recommending information to be recommended, if the tag information of the object 1 is not expanded, since the tag information of the object 1 does not include "leisure" and "arrival at fish," the information to be recommended may not be recommended to the object 1, and after the tag information of the object 1 is expanded, the tag information of the object 1 includes "arrival at fish," the information to be recommended can be recommended to the object 1, so that the popularization range of the information to be recommended is expanded.
In an example, the tag information of the object may also be updated according to the keyword information corresponding to the text of the information to be recommended, for example, "leisure" in the information to be recommended may be added to the tag information of the object 1.
It is understood that the above is only an example, and the present embodiment is not limited thereto.
How to obtain the text features corresponding to the text to be processed through the Bert model and the keyword features of the candidate keywords through the Bert model are described below with reference to an example, referring to fig. 3b, fig. 3b is a schematic diagram of an optional principle for processing the text to be processed/the candidate keywords through the Bert model according to the embodiment of the present application.
As shown in fig. 3b, taking the text to be processed as an example, the text to be processed is input into the Bert model, where [ CLS]Tok 1, Tok 2 … … Tok N, the input representation of the text to be processed. The Bert model can explicitly represent the text to be processed in a token sequence. [ CLS]Is a special symbol of the Bert model, usually inserted before the text, shown as E in the figure [CLS] 、E 1 、E 2 ……E N Respectively represent [ CLS]Tok 1, Tok 2 … … Tok N. The circles shown in the figure are the core of the Bert model, that is, the bidirectional encoder represents a transform, and only two layers are shown in the figure, and in practical applications, the circles can be set as needed, and are not limited herein. By processing the text to be processed through the transform represented by the bi-directional encoder, an output representation corresponding to the text to be processed, i.e., C, H shown in the figure, can be output 1 、H 2 ……H N Wherein C is [ CLS ]]And taking C as an output result corresponding to the text to be processed, namely the text characteristic.
For the implementation manner of obtaining the keyword features corresponding to the candidate keywords through the Bert model, the above process may be referred to, and details are not described herein again.
In an optional embodiment, the method further includes:
extracting the keyword characteristics of each candidate keyword, and storing the keyword characteristics of each candidate keyword into a keyword word bank;
the above obtaining the keyword characteristics of each candidate word includes:
and acquiring the keyword characteristics of each pre-stored candidate keyword from the keyword word bank.
Optionally, a keyword lexicon may be constructed in advance, and the keyword features of each candidate keyword are stored in the keyword lexicon in advance, so that when the keyword features are used, the keyword features can be directly obtained from the keyword lexicon, the process of extracting the keywords from each candidate keyword is avoided, the operation amount can be greatly reduced, and the operation efficiency is improved.
Optionally, a keyword lexicon needs to be established in advance, and fig. 3c is a schematic diagram of the optional establishment of the keyword lexicon provided in the embodiment of the present application. In actual services, training data corresponding to the service type can be determined according to the service type, for example, the training data can be constructed in scenes such as advertisement services, APP descriptions, E-commerce titles, information articles and the like, and candidate keywords appearing in texts in the services are used for training a Bert model and an ESIM model. For training, a supervised training mode can be used, and positive and negative examples can be obtained through manual labeling. After the Bert model is trained, keyword features of the candidate keywords can be extracted through the Bert model, the extracted keyword features are stored in a keyword word bank in advance, and when the Bert model is used, the keyword features can be directly obtained, and the operation amount is reduced. Specifically, the keyword lexicon may be a lexicon such as an online lexicon of 258 ten thousand and a commercial lexicon of 2000 ten thousand, which is not limited herein.
Through the embodiment, the process of extracting the keywords from the candidate keywords is avoided by pre-storing the keyword characteristics to the keyword lexicon, so that the calculation amount can be greatly reduced, and the calculation efficiency is improved.
The process of semantic matching text features and keyword features is detailed below. Referring to fig. 4, fig. 4 is a schematic diagram of an alternative semantic matching principle provided by an embodiment of the present application. Semantic matching may be performed by using a neural network Bilstm that combines a forward LSTM and a backward LSTM into a bllstm (Bi-directional Short-Term Memory) as shown in fig. 4, or a Tree-like Long Short-Term Memory network (Tree-LSTM), where when a premium in fig. 4 represents a text feature, the hysthesis represents a keyword feature, and when a premium in fig. 4 represents a keyword feature, the hysthesis represents a text feature.
The Input Encoding (Input Encoding) shown in fig. 4 is used to convert the text features and keyword features obtained from the Bert model into text features and keyword features adapted to the BiLSTM model, and performs related processing based on the converted text features and keyword features, and the specific process is as follows:
in an optional embodiment, the text features of the text to be processed include first word features of words included in the text to be processed, and the keyword features include second word features of words included in the candidate keywords;
for any one of the keyword features, determining the semantic matching degree between the text feature of the text to be processed and the keyword feature in the following mode:
respectively determining the similarity between every two characteristics in each first word characteristic and each second word characteristic, wherein every two characteristics comprise a first word characteristic and a second word characteristic;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the similarity between every two features.
Alternatively, in this example, the above-described process is explained using a specific example, for example, with respect to the text to be processed T ═ 2 ═ ct 1 ,t 2 ,…,t n ](e.g., [ real, street, fishing ]]) And word sequence P ═ P of candidate keywords (such as 'street fishing') outside the text to be processed 1 ,p 2 ,…,p m ](e.g. [ street game, fishing)]) And respectively modeling and expressing the two by using a Bert model to obtain expression sequences X and K, wherein the specific formulas are as follows:
X=x 1 ,x 2 ,…,x n ,X=f Bert ([t 1 ,t 2 ,…,t n ]) (1)
K=k 1 ,k 2 ,…,k m ,K=f Bert ([p 1 ,p 2 ,…,p m ]) (2)
wherein X is the text feature, n is the number of words contained in the text to be processed, and X i I is more than or equal to 1 and less than or equal to n for the ith word in the text to be processed, K is the character of the keyword, m is the number of words contained in the candidate keyword, K is j J is more than or equal to 1 and less than or equal to m and is the jth word in the candidate keywords.
Further, for text features and keyword features, semantic matching is further performed using the ESIM model. Performing pairwise calculation on each first word feature and each second word feature in the keyword features based on the text features, and respectively determining the similarity between each pair of features in each first word feature and each second word feature, wherein each pair of features comprises a first word feature and a second word feature; and determining the semantic matching degree between the text feature X and the keyword feature K based on the similarity between every two features. Wherein the similarity between two features is
Figure RE-GDA0003044631310000171
The Local Inference Modeling (Local Inference Modeling) shown in fig. 4 is used to determine global text features and global keyword features corresponding to the text features and the keyword features, that is, the following global text features
Figure RE-GDA0003044631310000172
And global keyword features
Figure RE-GDA0003044631310000173
"-" and "", as shown in the figure, correspond one-to-one to "-" and "", as shown in the following equations (5) and (6), and the specific process is as follows:
in an optional embodiment, for any of the keyword features, the determining the semantic matching degree between the text feature of the text to be processed and the keyword feature based on the similarity between each two features includes:
for any first word feature, determining a weight corresponding to each second word feature based on each first similarity corresponding to the first word feature, and performing weighted summation on each second word feature based on the weight corresponding to each second word feature to obtain an updated first word feature;
for any second word feature, determining a weight corresponding to each first word feature based on each second similarity corresponding to the second word feature, and performing weighted summation on each first word feature based on the weight corresponding to each first word feature to obtain an updated second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the updated first word features and the updated second word features.
Optionally, the similarity between every two features is obtained in the above manner, and further, the text feature X and the keyword feature K of the text to be processed may be weighted based on the similarity between every two features, so as to obtain the updated text feature of the text to be processed corresponding to the text feature X and the keyword feature K
Figure RE-GDA0003044631310000181
And keyword features
Figure RE-GDA0003044631310000182
Wherein the content of the first and second substances,
Figure RE-GDA0003044631310000183
and
Figure RE-GDA0003044631310000184
are determined according to the following formula:
Figure RE-GDA0003044631310000185
Figure RE-GDA0003044631310000186
wherein the content of the first and second substances,
Figure RE-GDA0003044631310000187
for the first word feature of the updated ith word,
Figure RE-GDA0003044631310000188
is the second word characteristic of the updated jth word.
Figure RE-GDA0003044631310000189
From each
Figure RE-GDA00030446313100001810
The components of the composition are as follows,
Figure RE-GDA00030446313100001811
from each
Figure RE-GDA00030446313100001812
And (4) forming.
By updated text features
Figure RE-GDA00030446313100001813
And keyword features
Figure RE-GDA00030446313100001814
And determining the semantic matching degree between the text features and the keyword features.
In an optional embodiment, the determining the semantic matching degree between the text feature of the text to be processed and the keyword feature based on the updated first word features and the updated second word features includes:
for any one of the first word features, obtaining a local word feature corresponding to the first word feature based on the correlation between the first word feature and the updated first word feature, and obtaining a global word feature corresponding to the first word feature based on the first word feature, the updated first word feature and the local word feature corresponding to the first word feature;
for any one of the second word features, obtaining a local word feature corresponding to the second word feature based on the correlation between the second word feature and the updated second word feature, and obtaining a global word feature corresponding to the second word feature based on the second word feature, the updated second word feature and the local word feature corresponding to the second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the global word features corresponding to the first word features and the global word features corresponding to the second word features.
In an optional embodiment, for any of the first word features, the obtaining, based on the association between the first word feature and the updated first word feature, a local word feature corresponding to the first word feature includes:
determining a first difference characteristic between the first word characteristic and the updated first word characteristic;
determining a first similar characteristic between the first word characteristic and the updated first word characteristic;
the local word features corresponding to the first word features comprise first difference features and first similar features;
for any of the second word features, based on the second word feature, the updated second word feature, and the local word feature corresponding to the second word feature, the method includes:
determining a second difference characteristic between the second word characteristic and the updated second word characteristic;
determining a second similarity feature between the second word feature and the updated second word feature;
the local word features corresponding to the second word features comprise second difference features and second similar features.
Optionally, the text features and the keyword features of the updated to-be-processed text are obtained according to the above manner, and further, in order to further enhance the local information, the following operations are respectively performed on the text features and the keyword features of the updated to-be-processed text:
Figure RE-GDA0003044631310000191
Figure RE-GDA0003044631310000192
wherein, -indicates a bit-wise decrease, an-indicates a bit-wise product,
Figure RE-GDA0003044631310000193
and
Figure RE-GDA0003044631310000194
is a local word feature corresponding to the first word feature,
Figure RE-GDA0003044631310000195
and
Figure RE-GDA0003044631310000196
the local word feature corresponding to the second word feature.
Figure RE-GDA0003044631310000197
The global word feature corresponding to the first word feature.
Figure RE-GDA0003044631310000198
The global word feature corresponding to the second word feature.
Figure RE-GDA0003044631310000199
Is a first difference feature corresponding to the first word feature,
Figure RE-GDA00030446313100001910
is the first similar characteristic corresponding to the first word characteristic.
Figure RE-GDA00030446313100001911
A second difference feature corresponding to the second word feature,
Figure RE-GDA00030446313100001912
a second similar feature corresponding to the second word feature. From each
Figure RE-GDA00030446313100001913
Global text features corresponding to constituent text features
Figure RE-GDA00030446313100001914
From each
Figure RE-GDA00030446313100001915
Global keyword features corresponding to constituent keyword features
Figure RE-GDA00030446313100001916
Based on global text features
Figure RE-GDA00030446313100001917
And global keyword features
Figure RE-GDA00030446313100001918
And determining the semantic matching degree between the text features and the keyword features.
The Inference Composition (Inference Composition) shown in fig. 4 is used to determine the final text feature representation and the final keyword feature representation corresponding to the text feature and the keyword feature of the text to be processed, that is, the following final text featuresSymbolizing y doc And keyword feature representation y keyword The specific process is as follows:
further, global text features can be also matched
Figure RE-GDA00030446313100001919
And global keyword features
Figure RE-GDA00030446313100001920
Further processing with Long Short-Term Memory network (LSTM), and averaging/max pooling to obtain final text feature representation y doc And keyword feature representation y keyword
Further, the prediction process, as shown in FIG. 4, represents y based on textual features doc And keyword feature representation y keyword Determining a similarity score between the text characteristic and the keyword characteristic, wherein the specific formula is as follows:
y=W[y doc ;y keyword ]+b (7)
label=argmax l∈y sofmax(y) (8)
wherein, y doc By pairs
Figure RE-GDA0003044631310000201
Average pooling or maximum pooling is performed to obtain y keyword By pairs
Figure RE-GDA0003044631310000202
And performing average pooling or maximum pooling, wherein y is used for representing a similarity score between the text feature and the keyword feature, W and b are network parameters and respectively represent a weight matrix and a bias, label is used for representing a result after y is normalized, and label is a value which is greater than or equal to 0 and less than or equal to 1. argmax is a function, for example, Y ═ f (x), and x0 ═ argmax (f (x)) means that the parameter x0 satisfies that f (x0) is the maximum value of f (x), in other words, argmax (f (x)) is the variable x corresponding to the maximum value of f (x). sofmax (y) is a normalization function.
In an example, in a semantic matching mode, the text features and the keyword features can be respectively modeled and expressed, when keyword prediction is performed on a text to be processed, the keyword features of each candidate keyword in a keyword lexicon can be obtained through a model, and then semantic matching is performed on the candidate keyword and each text to be processed, so that the overall prediction efficiency can be improved. Specifically, for each candidate keyword, the keyword features obtained by the Bert model of each candidate keyword are pre-stored in a local keyword word library, and then are directly used in a manner of loading the keyword features during actual prediction. In practical application, semantic matching is used for word banks such as 258 ten thousand-level online word banks and 2000 ten thousand-level commercial word banks, candidate keywords are cached in advance through the keyword caching technology, and prediction efficiency is accelerated. And when the matching is actually carried out, the vector matching efficiency can be further improved by means of cosine similarity or ANN neighbor retrieval. Taking cosine similarity as an example, in the prediction stage, the text feature X output by the Bert model and the keyword feature K read in by the local cache directly use an average pooling mode to obtain the corresponding text feature y doc And keyword feature y keyword And then judging whether the candidate keyword is the keyword or not according to the cosine similarity between the two features.
Referring to fig. 5, fig. 5 is a schematic flowchart of an optional information recommendation method provided in an embodiment of the present application, where the method may be executed by any electronic device, such as a server or a user terminal, or alternatively, the user terminal and the server interactively complete, and optionally, may be executed by the user terminal, as shown in fig. 5, the information recommendation method provided in the embodiment of the present application includes the following steps:
step S501, information to be recommended and the text of the information to be recommended are obtained.
Step S502, acquiring label information of each candidate recommendation object.
In step S503, a target recommendation object is determined from the candidate recommendation objects based on the matching degree between the text and the tag information.
And step S504, recommending the information to be recommended to the target recommendation object.
Optionally, the information to be recommended may be understood as information to be promoted, and the information to be recommended may be determined according to an actual application scenario. For example, the information to be recommended may be advertisement information, APP description information, information article information, game promotion information, and the like. The form of the information of interest may include at least one of video, voice, picture, text, and the like, which is not limited herein.
The tag information of each candidate recommended object (such as the above object) can be obtained by referring to the foregoing description, and is not described herein again.
According to the matching degree between the text of the information to be recommended and each candidate recommendation information, a target recommendation object can be determined from each candidate recommendation object, and then the information to be recommended is recommended to the target recommendation object.
In an optional embodiment, further comprising at least one of:
for any candidate recommendation object, acquiring text features of the text and object tag features of tag information of the candidate recommendation object, and determining matching degree between the text and the tag information based on the text features of the text and the object tag features of the tag information;
extracting text keywords of the text, and determining the matching degree between the text and the tag information based on the text keywords of the text and the tag information, wherein the text keywords comprise text inner keywords and text outer keywords, and the text keywords are determined by any possible embodiment of a determination method of the tag information.
Optionally, in an example, a text feature of the text of the information to be recommended and an object tag feature of tag information of the candidate recommendation object may be extracted in a feature extraction manner, then, based on a matching degree between the text feature of the text and the object tag feature, the candidate recommendation object corresponding to the text feature of the text whose matching degree exceeds a certain threshold (for example, 90%) is taken as a target recommendation object, and then, the information to be recommended is recommended to the target recommendation object.
In an example, text keywords of a text may also be obtained, a matching degree between the text keywords of the text and the tag information is determined, a candidate recommendation object corresponding to the text keywords of the text with the matching degree exceeding a certain threshold (e.g., 90%) is taken as a target recommendation object, and then the information to be recommended is recommended to the target recommendation object.
For the process of recommending the information to be recommended, the foregoing description may also be referred to.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a tag information determination apparatus according to an embodiment of the present application. The tag information determination apparatus 1 provided in the embodiment of the present application includes:
the keyword processing module 11 is configured to acquire a to-be-processed text corresponding to the information of interest of the object, and extract keywords in the text of the to-be-processed text;
a text feature processing module 12, configured to extract text features of the to-be-processed text;
the keyword processing module 11 is configured to obtain keyword features of each candidate keyword, and determine, based on a semantic matching degree between a text feature of the to-be-processed text and each keyword feature, an out-of-text keyword corresponding to the to-be-processed text from each candidate keyword, where the candidate keyword is a keyword in a keyword lexicon;
and a tag information determining module 13, configured to determine tag information of the object based on a text keyword, where the text keyword includes the text inner keyword and the text outer keyword.
In an optional embodiment, the keyword processing module is further configured to:
extracting the keyword characteristics of each candidate keyword, and storing the keyword characteristics of each candidate keyword into a keyword lexicon;
and when the keyword characteristics of each candidate word are obtained, the pre-stored keyword characteristics of each candidate keyword are obtained from the keyword word bank.
In an optional embodiment, the text features of the text to be processed include first word features of words included in the text to be processed, and the keyword features include second word features of words included in the candidate keywords; the keyword processing module is configured to:
for any one of the keyword features, determining the semantic matching degree between the text feature of the text to be processed and the keyword feature in the following mode:
respectively determining the similarity between every two characteristics in each first word characteristic and each second word characteristic, wherein every two characteristics comprise a first word characteristic and a second word characteristic;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the similarity between every two features.
In an optional embodiment, the keyword processing module is configured to:
for any first word feature, determining a weight corresponding to each second word feature based on each first similarity corresponding to the first word feature, and performing weighted summation on each second word feature based on the weight corresponding to each second word feature to obtain an updated first word feature;
for any second word feature, determining a weight corresponding to each first word feature based on each second similarity corresponding to the second word feature, and performing weighted summation on each first word feature based on the weight corresponding to each first word feature to obtain an updated second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the updated first word features and the updated second word features.
In an optional embodiment, the keyword processing module is configured to:
for any one of the first word features, obtaining a local word feature corresponding to the first word feature based on the correlation between the first word feature and the updated first word feature, and obtaining a global word feature corresponding to the first word feature based on the first word feature, the updated first word feature and the local word feature corresponding to the first word feature;
for any one of the second word features, obtaining a local word feature corresponding to the second word feature based on the correlation between the second word feature and the updated second word feature, and obtaining a global word feature corresponding to the second word feature based on the second word feature, the updated second word feature and the local word feature corresponding to the second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the global word features corresponding to the first word features and the global word features corresponding to the second word features.
In an optional embodiment, the keyword processing module is configured to:
determining a first difference characteristic between the first word characteristic and the updated first word characteristic;
determining a first similar characteristic between the first word characteristic and the updated first word characteristic;
the local word features corresponding to the first word features comprise first difference features and first similar features;
determining a second difference characteristic between the second word characteristic and the updated second word characteristic;
determining a second similarity feature between the second word feature and the updated second word feature;
the local word features corresponding to the second word features comprise second difference features and second similar features.
In an optional embodiment, the text feature processing module is configured to:
extracting text features of the text to be processed through a feature extraction model based on the text to be processed;
the determining, from the candidate keywords, the out-of-text keywords corresponding to the text to be processed based on the text features of the text to be processed and the semantic matching degrees between the keyword features includes:
and determining the text foreign key words corresponding to the text to be processed from the candidate key words through a semantic matching model based on the text features of the text to be processed and the key word features.
In the embodiment of the application, when a to-be-processed text corresponding to the information of interest of any object is processed, besides obtaining the text inner key words of the to-be-processed text, the text features of the to-be-processed text can be semantically matched with the key word features of each candidate key word, the text outer key words corresponding to the to-be-processed text are obtained through the semantic matching degree, and the label information of the object is determined based on the text inner key words and the text outer key words. By adopting the method, key information in the text of the to-be-processed text of the interesting information of the object is considered, and the key words outside the text corresponding to the to-be-processed text are also considered, so that the key word information corresponding to the to-be-processed text is more complete and comprehensive. That is to say, when determining the label information of the user, the label information of the user can be determined according to the key words in the text, and also can be determined according to the key words in the text and the key words outside the text, so that the label information of the user is expanded, and the richness and comprehensiveness of the label information of the user are improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an information recommendation device according to an embodiment of the present application. The information recommendation device 2 provided by the embodiment of the application comprises:
the information to be recommended processing module 21 is configured to obtain information to be recommended and a text of the information to be recommended;
a tag information obtaining module 22, configured to obtain tag information of each candidate recommendation object, where the tag information is determined in a manner in the method for determining tag information provided in any optional embodiment of the present application;
a target recommended object determining module 23, configured to determine a target recommended object from the candidate recommended objects based on a matching degree between the text and the tag information;
and the information recommending module 23 is configured to recommend the information to be recommended to the target recommending object.
In an optional embodiment, the target recommended object determining module is further configured to at least one of:
for any candidate recommendation object, acquiring text features of the text and object tag features of tag information of the candidate recommendation object, and determining matching degree between the text and the tag information based on the text features of the text and the object tag features of the tag information;
extracting text keywords of the text, and determining a matching degree between the text and the tag information based on the text keywords of the text and the tag information, wherein the text keywords include text inner keywords and text outer keywords, and the text keywords are determined in a manner of the tag information determination method provided in any optional embodiment of the present application.
In specific implementation, the apparatus 1 for determining tag information may execute the implementation manners provided in the foregoing steps in fig. 2 through each built-in functional module thereof, which may specifically refer to the implementation manners provided in the foregoing steps, and details are not described herein again.
The information recommendation device 2 can execute the implementation manners provided by the steps in fig. 5 through the built-in function modules, which may specifically refer to the implementation manners provided by the steps, and will not be described herein again.
The foregoing mainly describes that the executing entity is hardware to implement the tag information determining method and/or the information recommending method in the present application, but the executing entity of the tag information determining method and/or the information recommending method in the present application is not limited to hardware, and the executing entity of the tag information determining method and/or the information recommending method in the present application may also be software, and the tag information determining device and/or the information recommending device may be a computer program (including program code) running in a computer device, for example, the tag information determining device and/or the information recommending device is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application.
In some embodiments, the tag information determining apparatus and/or the information recommending apparatus provided by the embodiments of the present invention may be implemented by a combination of hardware and software, and by way of example, the tag information determining apparatus and/or the information recommending apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to perform the tag information determination method and/or the information recommendation method provided by the embodiment of the present invention, for example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
In other embodiments, the tag information determining apparatus and/or the information recommending apparatus provided in the embodiments of the present invention may be implemented in software, and the tag information determining apparatus 1 shown in fig. 6 may be software in the form of a program, a plug-in, and the like, and includes a series of modules, including a keyword processing module 11, a text feature processing module 12, and a tag information determining module 13, for implementing the tag information determining method provided in the embodiments of the present invention. And/or the information recommendation apparatus 2 shown in fig. 7 may be software in the form of a program, a plug-in, and the like, and includes a series of modules, including an information to be recommended processing module 21, a tag information obtaining module 22, a target recommendation object determining module 23, and an information recommendation module 24, for implementing the information recommendation method provided by the embodiment of the present invention.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 1000 in the present embodiment may include: the processor 1001, the network interface 1004, and the memory 1005, and the electronic device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 8, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the electronic device 1000 shown in fig. 8, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke computer programs stored in the memory 1005.
It should be understood that, in some possible embodiments, the processor 1001 may be a Central Processing Unit (CPU), and the processor may be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In a specific implementation, the electronic device 1000 may execute the implementation manners provided in the steps in fig. 2 and fig. 5 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps, and are not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and is executed by a processor to implement the methods provided in each step in fig. 2 and fig. 5, which may specifically refer to implementation manners provided in each step, and are not described herein again.
The computer readable storage medium may be an internal storage unit of the task processing device provided in any of the foregoing embodiments, for example, a hard disk or a memory of an electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. The computer readable storage medium may further include a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), and the like. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided by any one of the possible embodiments of fig. 2 and 5.
The terms "first", "second", and the like in the claims and in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (13)

1. A method for determining tag information, comprising:
acquiring a to-be-processed text corresponding to interest information of an object, and extracting key words in the text of the to-be-processed text;
extracting text features of the text to be processed;
acquiring keyword characteristics of each candidate keyword, wherein the candidate keywords are keywords in a keyword lexicon;
determining a text foreign keyword corresponding to the text to be processed from each candidate keyword based on the text features of the text to be processed and the semantic matching degree between the keyword features;
determining label information of the object based on text keywords, wherein the text keywords comprise the text inner keywords and the text outer keywords.
2. The method of claim 1, further comprising:
extracting the keyword characteristics of each candidate keyword, and storing the keyword characteristics of each candidate keyword into a keyword word bank;
the obtaining of the keyword characteristics of each candidate word includes:
and acquiring the keyword characteristics of each candidate keyword stored in advance from the keyword word bank.
3. The method according to claim 1, wherein the text features of the text to be processed comprise first word features of words included in the text to be processed, and the keyword features comprise second word features of words included in the candidate keywords;
for any keyword feature, determining the semantic matching degree between the text feature of the text to be processed and the keyword feature by adopting the following mode:
respectively determining the similarity between every two characteristics in each first word characteristic and each second word characteristic, wherein every two characteristics comprise a first word characteristic and a second word characteristic;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the similarity between every two features.
4. The method according to claim 3, wherein for any of the keyword features, the determining the semantic matching degree between the text feature of the text to be processed and the keyword feature based on the similarity between each two features comprises:
for any first word feature, determining a weight corresponding to each second word feature based on each first similarity corresponding to the first word feature, and performing weighted summation on each second word feature based on the weight corresponding to each second word feature to obtain an updated first word feature;
for any second word feature, determining a weight corresponding to each first word feature based on each second similarity corresponding to the second word feature, and performing weighted summation on each first word feature based on the weight corresponding to each first word feature to obtain an updated second word feature;
and determining semantic matching degree between the text features of the text to be processed and the keyword features based on the updated first word features and the updated second word features.
5. The method according to claim 4, wherein the determining the semantic matching degree between the text feature of the text to be processed and the keyword feature based on the updated first word features and the updated second word features comprises:
for any first word feature, obtaining a local word feature corresponding to the first word feature based on the relevance between the first word feature and the updated first word feature, and obtaining a global word feature corresponding to the first word feature based on the first word feature, the updated first word feature and the local word feature corresponding to the first word feature;
for any second word feature, obtaining a local word feature corresponding to the second word feature based on the correlation between the second word feature and the updated second word feature, and obtaining a global word feature corresponding to the second word feature based on the second word feature, the updated second word feature and the local word feature corresponding to the second word feature;
and determining the semantic matching degree between the text features of the text to be processed and the keyword features based on the global word features corresponding to the first word features and the global word features corresponding to the second word features.
6. The method according to claim 5, wherein for any of the first word features, obtaining a local word feature corresponding to the first word feature based on the association between the first word feature and the updated first word feature comprises:
determining a first difference feature between the first word feature and the updated first word feature;
determining a first similar feature between the first word feature and the updated first word feature;
the local word features corresponding to the first word features comprise first difference features and first similar features;
for any one of the second word features, based on the second word feature, the updated second word feature and the local word feature corresponding to the second word feature, the method includes:
determining a second difference feature between the second word feature and the updated second word feature;
determining a second similarity feature between the second word feature and the updated second word feature;
wherein the local word features corresponding to the second word features comprise second difference features and second similar features.
7. The method according to any one of claims 1 to 6, wherein the extracting text features of the text to be processed comprises:
extracting text features of the text to be processed through a feature extraction model based on the text to be processed;
determining the text foreign key words corresponding to the text to be processed from the candidate key words based on the semantic matching degree between the text features of the text to be processed and the key word features, wherein the determining comprises the following steps:
and determining the text foreign key words corresponding to the text to be processed from the candidate key words through a semantic matching model based on the text features of the text to be processed and the key word features.
8. An information recommendation method, comprising:
acquiring information to be recommended and a text of the information to be recommended;
acquiring tag information of each candidate recommendation object, wherein the tag information of the candidate recommendation object is determined by the method of any one of claims 1 to 7;
determining a target recommendation object from the candidate recommendation objects based on the matching degree between the text and the label information;
and recommending the information to be recommended to the target recommendation object.
9. The method of claim 8, further comprising at least one of:
for any candidate recommended object, acquiring text features of the text and object tag features of tag information of the candidate recommended object, and determining matching degree between the text and the tag information based on the text features of the text and the object tag features of the tag information;
extracting text keywords of the text, and determining a matching degree between the text and the tag information based on the text keywords of the text and the tag information, wherein the text keywords comprise text inner keywords and text outer keywords, and the text keywords are determined by the method of any one of claims 1 to 7.
10. An apparatus for determining tag information, the apparatus comprising:
the keyword processing module is used for acquiring a to-be-processed text corresponding to the interest information of the object and extracting keywords in the text of the to-be-processed text;
the text feature processing module is used for extracting text features of the text to be processed;
the keyword processing module is further configured to obtain keyword features of each candidate keyword, and determine, based on the text features of the text to be processed and the semantic matching degree between the keyword features, an out-of-text keyword corresponding to the text to be processed from each candidate keyword, where the candidate keyword is a keyword in a keyword lexicon;
and the label information determining module is used for determining the label information of the object based on text keywords, wherein the text keywords comprise the text inner key words and the text outer key words.
11. An information recommendation apparatus, comprising:
the information processing module to be recommended is used for acquiring information to be recommended and texts corresponding to the information to be recommended;
a tag information obtaining module, configured to obtain tag information of each candidate recommendation object, where the tag information is determined in a manner of any one of claims 1 to 7;
the target recommendation object determining module is used for determining a target recommendation object from the candidate recommendation objects based on the matching degree between the text and the label information;
and the information recommending module is used for recommending the information to be recommended to the target recommending object.
12. An electronic device comprising a processor and a memory, the processor and the memory being interconnected;
the memory is used for storing a computer program;
the processor is configured to perform the method of any of claims 1 to 7 or any of claims 8 to 9 when the computer program is invoked.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method as claimed in any one of claims 1 to 7 or any one of claims 8 to 9.
CN202110126378.6A 2021-01-29 2021-01-29 Method and device for determining label information, electronic equipment and storage medium Pending CN114817697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126378.6A CN114817697A (en) 2021-01-29 2021-01-29 Method and device for determining label information, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126378.6A CN114817697A (en) 2021-01-29 2021-01-29 Method and device for determining label information, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114817697A true CN114817697A (en) 2022-07-29

Family

ID=82526174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110126378.6A Pending CN114817697A (en) 2021-01-29 2021-01-29 Method and device for determining label information, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114817697A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649305A (en) * 2024-01-12 2024-03-05 广州小锤科技服务有限公司 Personalized claim micro-service management method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649305A (en) * 2024-01-12 2024-03-05 广州小锤科技服务有限公司 Personalized claim micro-service management method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2022041979A1 (en) Information recommendation model training method and related device
CN111602147A (en) Machine learning model based on non-local neural network
CN110737783A (en) method, device and computing equipment for recommending multimedia content
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN112307351A (en) Model training and recommending method, device and equipment for user behavior
CN111783903B (en) Text processing method, text model processing method and device and computer equipment
KR20170004154A (en) Method and system for automatically summarizing documents to images and providing the image-based contents
CN112085120B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN114201516B (en) User portrait construction method, information recommendation method and related devices
CN113392179A (en) Text labeling method and device, electronic equipment and storage medium
CN115659008A (en) Information pushing system and method for big data information feedback, electronic device and medium
CN113569118B (en) Self-media pushing method, device, computer equipment and storage medium
CN114282528A (en) Keyword extraction method, device, equipment and storage medium
CN116578729B (en) Content search method, apparatus, electronic device, storage medium, and program product
CN114741587A (en) Article recommendation method, device, medium and equipment
CN114817697A (en) Method and device for determining label information, electronic equipment and storage medium
CN115168568B (en) Data content identification method, device and storage medium
CN113741759B (en) Comment information display method and device, computer equipment and storage medium
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
CN115129863A (en) Intention recognition method, device, equipment, storage medium and computer program product
Li et al. An attention-based user profiling model by leveraging multi-modal social media contents
CN116628236B (en) Method and device for delivering multimedia information, electronic equipment and storage medium
CN116340552B (en) Label ordering method, device, equipment and storage medium
Shetty et al. Deep Learning Photograph Caption Generator
Wang et al. Research on Cross‐Platform Image Recommendation Model Fusing Text Information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40070422

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination