CN113704623A - Data recommendation method, device, equipment and storage medium - Google Patents

Data recommendation method, device, equipment and storage medium Download PDF

Info

Publication number
CN113704623A
CN113704623A CN202111017643.3A CN202111017643A CN113704623A CN 113704623 A CN113704623 A CN 113704623A CN 202111017643 A CN202111017643 A CN 202111017643A CN 113704623 A CN113704623 A CN 113704623A
Authority
CN
China
Prior art keywords
content
user
information
features
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111017643.3A
Other languages
Chinese (zh)
Other versions
CN113704623B (en
Inventor
龚静
张恒
吕有才
詹乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202111017643.3A priority Critical patent/CN113704623B/en
Publication of CN113704623A publication Critical patent/CN113704623A/en
Application granted granted Critical
Publication of CN113704623B publication Critical patent/CN113704623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the field of artificial intelligence, and discloses a data recommendation method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait; extracting characteristic information from the image information, wherein the characteristic information comprises user characteristics and content characteristics; classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set; inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set, wherein the content candidate set comprises a plurality of candidate text contents; and the plurality of candidate text contents are sequenced by using a specified sequencing algorithm, and are sent to the user terminal according to the sequenced sequence, so that the efficiency and the accuracy of data recommendation are improved. The present invention relates to blockchain techniques, such as writing image information into blockchains for use in data forensics and other scenarios.

Description

Data recommendation method, device, equipment and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a data recommendation method, device, equipment and storage medium.
Background
The recommendation system recommends information in which a user is interested to the user according to the information demand, the interest and the like of the user, and at present, the mainstream recommendation system generally comprises an indexing stage, a recall stage and a sorting stage, wherein the recall stage mainly selects contents from a content candidate set obtained from the indexing stage directly within limited response time and sends the selected contents to the sorting stage. However, this approach is limited by the huge candidate set and the requirement of real-time performance and complexity, and it is difficult for the recommendation processing logic to achieve a good recommendation effect.
Disclosure of Invention
Embodiments of the present invention provide a data recommendation method, apparatus, device, and storage medium, which are helpful for improving efficiency and accuracy of obtaining a content candidate set corresponding to a user characteristic and a content characteristic, thereby improving efficiency and accuracy of data recommendation.
In a first aspect, an embodiment of the present invention provides a data recommendation method, including:
acquiring portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait;
extracting feature information from the image information, wherein the feature information comprises user features and content features;
classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set;
inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents;
and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence.
Further, the extracting feature information from the image information includes:
performing word segmentation processing on the image information to obtain a word sequence corresponding to the image information;
calculating the TF-IDF value of each word in the word sequence through TF-IDF, and selecting the largest TF-IDF value as the initial characteristic information of the portrait information;
inputting the word sequence into an LSTM-CRF model, and performing feature extraction on the word sequence through a segmentation mapping external feature layer of the LSTM-CRF model to obtain token representation information;
merging the initial characteristic information and the token representation information through a full connection layer of an LSTM-CRF model to obtain merged characteristic information;
and inputting the merged feature information into a CRF layer of an LSTM-CRF model to obtain the feature information of the portrait information.
Further, the classifying the user features and the content features, and expanding the classified user features and the content features to obtain an expanded feature set includes:
inputting the user characteristics and the content characteristics into a pre-trained BERT model to obtain category information of the user characteristics and the content characteristics;
according to the category information of the user features and the content features, determining synonymous user features and synonymous content features corresponding to the category information of the user features and the content features;
determining that the user characteristic, the content characteristic, the synonymous user characteristic, and the synonymous content characteristic constitute the extended feature set.
Further, before the inputting the user features and the content features in the extended feature set into the trained recall model, the method further includes:
inputting each user characteristic and content characteristic in the extended characteristic set into a pre-trained BERT model to obtain a ranking score evaluation index of each user characteristic and content characteristic;
determining the sequence of the ranking score evaluation indexes from big to small as the arrangement sequence of the user characteristics and the content characteristics according to the ranking score evaluation indexes of the user characteristics and the content characteristics;
and sequencing the user characteristics and the content characteristics in the extended characteristic set according to the sequencing order.
Further, the inputting the user features and the content features in the extended feature set into a trained recall model to obtain a content candidate set corresponding to the user features and the content features includes:
extracting corresponding user characteristic vectors and content characteristic vectors from the user characteristics and the content characteristics in the sequenced extended characteristic set;
inputting the user characteristic vector and the content characteristic vector into a trained recall model to obtain a plurality of candidate text contents, and determining the content candidate set according to the candidate text contents.
Further, before the inputting the user features and the content features in the extended feature set into the trained recall model and obtaining the content candidate set corresponding to the user features and the content features, the method further includes:
obtaining a sample set comprising a plurality of sample portrait information, the sample portrait information comprising a sample user portrait and a sample content portrait;
extracting sample user characteristics corresponding to the sample user images and sample content characteristics corresponding to the sample content images in the sample set;
and inputting the sample user characteristics and the sample content characteristics into a preset neural network model for training to obtain the recall model.
Further, the ranking the plurality of candidate text contents using a specified ranking algorithm includes:
scoring each candidate text content in the content candidate set by using the specified sorting algorithm to obtain a score of each candidate text content;
and sequencing the candidate text contents according to the order of the scores of the candidate text contents from high to low.
In a second aspect, an embodiment of the present invention provides a data recommendation apparatus, including:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring portrait information of a user in a target service scene, and the portrait information comprises a user portrait and a content portrait;
an extraction unit configured to extract feature information from the portrait information, the feature information including a user feature and a content feature;
the extension unit is used for classifying the user characteristics and the content characteristics and extending the classified user characteristics and the content characteristics to obtain an extension characteristic set;
a recall unit, configured to input a user feature and a content feature in the extended feature set into a trained recall model, so as to obtain a content candidate set corresponding to the user feature and the content feature, where the content candidate set includes a plurality of candidate text contents;
and the pushing unit is used for sequencing the candidate text contents by using a specified sequencing algorithm and sending the candidate text contents to the user terminal according to the sequenced sequence.
In a third aspect, an embodiment of the present invention provides a computer device, including a processor and a memory, where the memory is used to store a computer program, and the computer program includes a program, and the processor is configured to call the computer program to execute the method of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the method of the first aspect.
The method and the device can acquire portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait; extracting feature information from the image information, wherein the feature information comprises user features and content features; classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set; inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents; and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence. By the method, the efficiency and the accuracy of obtaining the content candidate set corresponding to the user characteristics and the content characteristics are improved, and the efficiency and the accuracy of data recommendation are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a data recommendation method provided by an embodiment of the invention;
FIG. 2 is a schematic block diagram of a data recommendation apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The data recommendation method provided by the embodiment of the invention can be applied to a data recommendation device, and in some embodiments, the data recommendation device is arranged in computer equipment. In certain embodiments, the computer device includes, but is not limited to, one or more of a smartphone, tablet, laptop, and the like.
The method and the device can acquire portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait; extracting feature information from the image information, wherein the feature information comprises user features and content features; classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set; inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents; and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence. The embodiment of the invention is beneficial to improving the efficiency and the accuracy of obtaining the content candidate set corresponding to the user characteristics and the content characteristics through the mode, thereby improving the efficiency and the accuracy of data recommendation.
The embodiment of the application can acquire and process related data (such as portrait information of a user) based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The following describes schematically a data recommendation method provided by an embodiment of the present invention with reference to fig. 1.
Referring to fig. 1, fig. 1 is a schematic flowchart of a data recommendation method according to an embodiment of the present invention, and as shown in fig. 1, the method may be executed by a data recommendation apparatus, where the data recommendation apparatus is disposed in a computer device. Specifically, the method of the embodiment of the present invention includes the following steps.
S101: and obtaining portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait.
In the embodiment of the invention, the data recommendation device can acquire portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait.
In some embodiments, the user representation includes, but is not limited to, a customer manager representation, a customer representation, and the like; in some embodiments, the client manager representation includes client manager base information such as personal base information, post information, primary responsibility services; managing and performance information such as customer group information, product preference, performance ranking, exhibition business information; customer manager behavior information: positive feedback is conscious of cases or products, such as praise, collection, download, attention, stay time, search popularity, positive comments are published, and negative feedback is conscious, such as uninteresting expression, negative comments, reporting and the like. In some embodiments, the customer representation includes customer basic information such as age, native place, education level, occupation, holding products, risk level, etc.; the customer behavior information includes conscious feedback of the product, positive or negative evaluation of the customer manager, and the like.
In certain embodiments, the pictorial information includes, but is not limited to, content in the form of text, pictures, video, and the like.
S102: and extracting characteristic information from the image information, wherein the characteristic information comprises user characteristics and content characteristics.
In the embodiment of the present invention, the data recommendation device may extract feature information from the portrait information, where the feature information includes a user feature and a content feature.
In one embodiment, the data recommender may extract feature information from the image information according to a specified algorithm when extracting the feature information from the image information. In some embodiments, the specified algorithm includes, but is not limited to, Natural Language Processing (NLP), character recognition algorithm OCR, and the like.
In one embodiment, if the portrait information includes a picture or video, the picture or video may be first converted into text, and then feature information may be extracted from the converted text.
In one embodiment, the data recommendation device may perform word segmentation on the image information to obtain a word sequence corresponding to the image information when extracting feature information from the image information; calculating the TF-IDF value of each word in the word sequence through TF-IDF, and selecting the largest TF-IDF value as the initial characteristic information of the portrait information; inputting the word sequence into an LSTM-CRF model, and performing feature extraction on the word sequence through a segmentation mapping external feature layer of the LSTM-CRF model to obtain token representation information; merging the initial characteristic information and the token representation information through a full connection layer of an LSTM-CRF model to obtain merged characteristic information; and inputting the merged feature information into a CRF layer of an LSTM-CRF model to obtain the feature information of the portrait information.
S103: classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set.
In the embodiment of the present invention, the data recommendation device may classify the user features and the content features, and expand the classified user features and the classified content features to obtain an expanded feature set.
In one embodiment, when the data recommendation device classifies the user features and the content features and expands the classified user features and the classified content features to obtain an expanded feature set, the data recommendation device may input the user features and the content features into a pre-trained BERT model to obtain category information of the user features and the content features; according to the category information of the user features and the content features, determining synonymous user features and synonymous content features corresponding to the category information of the user features and the content features, and determining that the user features, the content features, the synonymous user features and the synonymous content features form the extended feature set. By expanding the user characteristics and the content characteristics, the accuracy of content recall is improved.
In an embodiment, when the data recommendation device expands the classified user features and the content features to obtain an expanded feature set, the data recommendation device may also expand the classified user features and the content features in a manual labeling manner, such as detailed feature information of a service scene to which the manual supplementary case belongs, product-corresponding customer group features, and the like. The user characteristics and the content characteristics are manually expanded, so that the accuracy of content recall is further improved.
S104: and inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents.
In this embodiment of the present invention, the data recommendation device may input the user characteristics and the content characteristics in the extended characteristic set into a trained recall model, so as to obtain a content candidate set corresponding to the user characteristics and the content characteristics, where the content candidate set includes a plurality of candidate text contents.
In one embodiment, before inputting the user features and the content features in the extended feature set into the trained recall model, the data recommendation device may input each user feature and content feature in the extended feature set into a pre-trained BERT model to obtain an evaluation index of a ranking score of each user feature and content feature; determining the sequence of the ranking score evaluation indexes from big to small as the arrangement sequence of the user characteristics and the content characteristics according to the ranking score evaluation indexes of the user characteristics and the content characteristics; and sequencing the user characteristics and the content characteristics in the extended characteristic set according to the sequencing order.
In one embodiment, when the user features and the content features in the extended feature set are input into a trained recall model to obtain a content candidate set corresponding to the user features and the content features, the data recommendation device may extract corresponding user feature vectors and content feature vectors from the user features and the content features in the ordered extended feature set; inputting the user characteristic vector and the content characteristic vector into a trained recall model to obtain a plurality of candidate text contents, and determining the content candidate set according to the candidate text contents.
In an embodiment, after determining the content candidate set according to the plurality of candidate text contents, the data recommendation device may calculate a distance between the user feature vector and the content feature vector, add an index identifier to each candidate text content according to a sequence from small to large of the distance, and store the index representation in a redis cache, so as to facilitate to quickly query whether a target recommended content corresponding to a target index identifier exists in the redis cache according to the target index identifier carried in a recommendation request when the recommendation request sent by the user terminal is acquired subsequently, and further improve recommendation efficiency. In some embodiments, the distance between the user feature vector and the content feature vector may be calculated by a similarity calculation or the like.
In one embodiment, the data recommender may obtain a sample set comprising a plurality of sample representation information, the sample representation information comprising a sample user representation and a sample content representation, before inputting the user features and the content features in the extended feature set into a trained recall model to obtain a content candidate set corresponding to the user features and the content features; extracting sample user characteristics corresponding to the sample user images and sample content characteristics corresponding to the sample content images in the sample set; and inputting the sample user characteristics and the sample content characteristics into a preset neural network model for training to obtain the recall model.
In an embodiment, when the sample user characteristic and the sample content characteristic are input into a preset neural network model for training to obtain the recall model, the sample user characteristic and the sample content characteristic may be input into a designated neural network model to obtain a loss function value, the loss function value is compared with a preset threshold value, if a comparison result does not satisfy a preset condition, a model parameter of the designated neural network model is adjusted, the sample user characteristic and the sample content characteristic are input into the neural network model after the model parameter is adjusted for retraining, and when a comparison result of the obtained loss function value and the preset threshold value satisfies the preset condition, the recall model is determined to be obtained.
S105: and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence.
In the embodiment of the present invention, the data recommendation device may sort the candidate text contents by using a designated sorting algorithm, and send the candidate text contents to the user terminal according to the sorted order.
In one embodiment, when the data recommendation device ranks the candidate text contents by using a specified ranking algorithm, the data recommendation device may score each candidate text content in the content candidate set by using the specified ranking algorithm to obtain a score of each candidate text content; and sequencing the candidate text contents according to the order of the scores of the candidate text contents from high to low. In certain embodiments, the specified ranking algorithm includes, but is not limited to, a multiple-target ranking algorithm.
In the embodiment of the invention, a data recommendation device can acquire portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait; extracting feature information from the image information, wherein the feature information comprises user features and content features; classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set; inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents; and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence. By the method, the efficiency and the accuracy of obtaining the content candidate set corresponding to the user characteristics and the content characteristics are improved, and the efficiency and the accuracy of data recommendation are improved.
The embodiment of the invention also provides a data recommendation device, which is used for executing the unit of the method in any one of the preceding claims. Specifically, referring to fig. 2, fig. 2 is a schematic block diagram of a data recommendation device according to an embodiment of the present invention. The data recommendation device of the embodiment includes: an acquisition unit 201, an extraction unit 202, an extension unit 203, a recall unit 204, and a push unit 205.
An obtaining unit 201, configured to obtain portrait information of a user in a target service scenario, where the portrait information includes a user portrait and a content portrait;
an extracting unit 202, configured to extract feature information from the image information, where the feature information includes a user feature and a content feature;
an expansion unit 203, configured to classify the user feature and the content feature, and expand the classified user feature and the classified content feature to obtain an expanded feature set;
a recall unit 204, configured to input the user features and the content features in the extended feature set into a trained recall model, so as to obtain a content candidate set corresponding to the user features and the content features, where the content candidate set includes a plurality of candidate text contents;
a pushing unit 205, configured to sort the candidate text contents by using a specified sorting algorithm, and send the candidate text contents to the user terminal according to the sorted order.
Further, when the extracting unit 202 extracts feature information from the image information, it is specifically configured to:
performing word segmentation processing on the image information to obtain a word sequence corresponding to the image information;
calculating the TF-IDF value of each word in the word sequence through TF-IDF, and selecting the largest TF-IDF value as the initial characteristic information of the portrait information;
inputting the word sequence into an LSTM-CRF model, and performing feature extraction on the word sequence through a segmentation mapping external feature layer of the LSTM-CRF model to obtain token representation information;
merging the initial characteristic information and the token representation information through a full connection layer of an LSTM-CRF model to obtain merged characteristic information;
and inputting the merged feature information into a CRF layer of an LSTM-CRF model to obtain the feature information of the portrait information.
Further, when the extension unit 203 classifies the user features and the content features, and extends the classified user features and the content features to obtain an extension feature set, the extension unit is specifically configured to:
inputting the user characteristics and the content characteristics into a pre-trained BERT model to obtain category information of the user characteristics and the content characteristics;
according to the category information of the user features and the content features, determining synonymous user features and synonymous content features corresponding to the category information of the user features and the content features;
determining that the user characteristic, the content characteristic, the synonymous user characteristic, and the synonymous content characteristic constitute the extended feature set.
Further, before the recall unit 204 inputs the user features and the content features in the extended feature set into the trained recall model, it is further configured to:
inputting each user characteristic and content characteristic in the extended characteristic set into a pre-trained BERT model to obtain a ranking score evaluation index of each user characteristic and content characteristic;
determining the sequence of the ranking score evaluation indexes from big to small as the arrangement sequence of the user characteristics and the content characteristics according to the ranking score evaluation indexes of the user characteristics and the content characteristics;
and sequencing the user characteristics and the content characteristics in the extended characteristic set according to the sequencing order.
Further, when the recall unit 204 inputs the user features and the content features in the extended feature set into the trained recall model to obtain a content candidate set corresponding to the user features and the content features, the recall unit is specifically configured to:
extracting corresponding user characteristic vectors and content characteristic vectors from the user characteristics and the content characteristics in the sequenced extended characteristic set;
inputting the user characteristic vector and the content characteristic vector into a trained recall model to obtain a plurality of candidate text contents, and determining the content candidate set according to the candidate text contents.
Further, before the recall unit 204 inputs the user features and the content features in the extended feature set into the trained recall model and obtains the content candidate set corresponding to the user features and the content features, the recall unit is further configured to:
obtaining a sample set comprising a plurality of sample portrait information, the sample portrait information comprising a sample user portrait and a sample content portrait;
extracting sample user characteristics corresponding to the sample user images and sample content characteristics corresponding to the sample content images in the sample set;
and inputting the sample user characteristics and the sample content characteristics into a preset neural network model for training to obtain the recall model.
Further, when the pushing unit 205 ranks the candidate text contents by using a specified ranking algorithm, it is specifically configured to:
scoring each candidate text content in the content candidate set by using the specified sorting algorithm to obtain a score of each candidate text content;
and sequencing the candidate text contents according to the order of the scores of the candidate text contents from high to low.
In the embodiment of the invention, a data recommendation device can acquire portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait; extracting feature information from the image information, wherein the feature information comprises user features and content features; classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set; inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents; and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence. By the method, the efficiency and the accuracy of obtaining the content candidate set corresponding to the user characteristics and the content characteristics are improved, and the efficiency and the accuracy of data recommendation are improved.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device provided in an embodiment of the present invention, and in some embodiments, the computer device in the embodiment shown in fig. 3 may include: one or more processors 301; one or more input devices 302, one or more output devices 303, and memory 304. The processor 301, the input device 302, the output device 303, and the memory 304 are connected by a bus 305. The memory 304 is used for storing computer programs, including programs, and the processor 301 is used for executing the programs stored in the memory 304. Wherein the processor 301 is configured to invoke the program to perform:
acquiring portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait;
extracting feature information from the image information, wherein the feature information comprises user features and content features;
classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set;
inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents;
and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence.
Further, when the processor 301 extracts feature information from the image information, it is specifically configured to:
performing word segmentation processing on the image information to obtain a word sequence corresponding to the image information;
calculating the TF-IDF value of each word in the word sequence through TF-IDF, and selecting the largest TF-IDF value as the initial characteristic information of the portrait information;
inputting the word sequence into an LSTM-CRF model, and performing feature extraction on the word sequence through a segmentation mapping external feature layer of the LSTM-CRF model to obtain token representation information;
merging the initial characteristic information and the token representation information through a full connection layer of an LSTM-CRF model to obtain merged characteristic information;
and inputting the merged feature information into a CRF layer of an LSTM-CRF model to obtain the feature information of the portrait information.
Further, when the processor 301 classifies the user feature and the content feature, and expands the classified user feature and the content feature to obtain an expanded feature set, the method is specifically configured to:
inputting the user characteristics and the content characteristics into a pre-trained BERT model to obtain category information of the user characteristics and the content characteristics;
according to the category information of the user features and the content features, determining synonymous user features and synonymous content features corresponding to the category information of the user features and the content features;
determining that the user feature, the content feature, the synonymous user feature, and the synonymous content feature constitute the extended feature set.
Further, before the processor 301 inputs the user features and the content features in the extended feature set into the trained recall model, it is further configured to:
inputting each user characteristic and content characteristic in the extended characteristic set into a pre-trained BERT model to obtain a ranking score evaluation index of each user characteristic and content characteristic;
determining the sequence of the ranking score evaluation indexes from big to small as the arrangement sequence of the user characteristics and the content characteristics according to the ranking score evaluation indexes of the user characteristics and the content characteristics;
and sequencing the user characteristics and the content characteristics in the extended characteristic set according to the sequencing order.
Further, when the processor 301 inputs the user features and the content features in the extended feature set into the trained recall model to obtain a content candidate set corresponding to the user features and the content features, the processor is specifically configured to:
extracting corresponding user characteristic vectors and content characteristic vectors from the user characteristics and the content characteristics in the sequenced extended characteristic set;
inputting the user characteristic vector and the content characteristic vector into a trained recall model to obtain a plurality of candidate text contents, and determining the content candidate set according to the candidate text contents.
Further, before the processor 301 inputs the user features and the content features in the extended feature set into the trained recall model and obtains the content candidate set corresponding to the user features and the content features, the processor is further configured to:
obtaining a sample set comprising a plurality of sample portrait information, the sample portrait information comprising a sample user portrait and a sample content portrait;
extracting sample user characteristics corresponding to the sample user images and sample content characteristics corresponding to the sample content images in the sample set;
and inputting the sample user characteristics and the sample content characteristics into a preset neural network model for training to obtain the recall model.
Further, when the processor 301 ranks the candidate text contents by using a specified ranking algorithm, the method is specifically configured to:
scoring each candidate text content in the content candidate set by using the specified sorting algorithm to obtain a score of each candidate text content;
and sequencing the candidate text contents according to the order of the scores of the candidate text contents from high to low.
In the embodiment of the invention, computer equipment can acquire portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait; extracting feature information from the image information, wherein the feature information comprises user features and content features; classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set; inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents; and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence. By the method, the efficiency and the accuracy of obtaining the content candidate set corresponding to the user characteristics and the content characteristics are improved, and the efficiency and the accuracy of data recommendation are improved.
It should be understood that, in the embodiment of the present invention, the Processor 301 may be a Central Processing Unit (CPU), and the Processor may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device 302 may include a touch pad, a microphone, etc., and the output device 303 may include a display (LCD, etc.), a speaker, etc.
The memory 304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 301. A portion of the memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store device type information.
In a specific implementation, the processor 301, the input device 302, and the output device 303 described in this embodiment of the present invention may execute the implementation described in the method embodiment shown in fig. 1 provided in this embodiment of the present invention, and may also execute the implementation of the data recommendation apparatus described in fig. 2 in this embodiment of the present invention, which is not described herein again.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the data recommendation method described in the embodiment corresponding to fig. 1 may be implemented, or the data recommendation apparatus according to the embodiment corresponding to fig. 2 may also be implemented, which is not described herein again.
The computer readable storage medium may be an internal storage unit of the data recommendation device according to any of the foregoing embodiments, for example, a hard disk or a memory of the data recommendation device. The computer readable storage medium may also be an external storage device of the data recommendation device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the data recommendation device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the data recommendation device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the data recommendation device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a computer-readable storage medium, which includes several instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage media comprise: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
It is emphasized that the data may also be stored in a node of a blockchain in order to further ensure the privacy and security of the data. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (10)

1. A method for recommending data, comprising:
acquiring portrait information of a user in a target service scene, wherein the portrait information comprises a user portrait and a content portrait;
extracting feature information from the image information, wherein the feature information comprises user features and content features;
classifying the user characteristics and the content characteristics, and expanding the classified user characteristics and the content characteristics to obtain an expanded characteristic set;
inputting the user characteristics and the content characteristics in the extended characteristic set into a trained recall model to obtain a content candidate set corresponding to the user characteristics and the content characteristics, wherein the content candidate set comprises a plurality of candidate text contents;
and sequencing the candidate text contents by using a specified sequencing algorithm, and sending the candidate text contents to the user terminal according to the sequenced sequence.
2. The method of claim 1, wherein the extracting feature information from the image information comprises:
performing word segmentation processing on the image information to obtain a word sequence corresponding to the image information;
calculating the TF-IDF value of each word in the word sequence through TF-IDF, and selecting the largest TF-IDF value as the initial characteristic information of the portrait information;
inputting the word sequence into an LSTM-CRF model, and performing feature extraction on the word sequence through a segmentation mapping external feature layer of the LSTM-CRF model to obtain token representation information;
merging the initial characteristic information and the token representation information through a full connection layer of an LSTM-CRF model to obtain merged characteristic information;
and inputting the merged feature information into a CRF layer of an LSTM-CRF model to obtain the feature information of the portrait information.
3. The method of claim 2, wherein the classifying the user features and the content features and expanding the classified user features and the content features to obtain an expanded feature set comprises:
inputting the user characteristics and the content characteristics into a pre-trained BERT model to obtain category information of the user characteristics and the content characteristics;
according to the category information of the user features and the content features, determining synonymous user features and synonymous content features corresponding to the category information of the user features and the content features;
determining that the user characteristic, the content characteristic, the synonymous user characteristic, and the synonymous content characteristic constitute the extended feature set.
4. The method of claim 3, wherein before entering the user features and content features in the extended feature set into the trained recall model, further comprising:
inputting each user characteristic and content characteristic in the extended characteristic set into a pre-trained BERT model to obtain a ranking score evaluation index of each user characteristic and content characteristic;
determining the sequence of the ranking score evaluation indexes from big to small as the arrangement sequence of the user characteristics and the content characteristics according to the ranking score evaluation indexes of the user characteristics and the content characteristics;
and sequencing the user characteristics and the content characteristics in the extended characteristic set according to the sequencing order.
5. The method of claim 4, wherein the inputting the user features and the content features in the extended feature set into a trained recall model to obtain a content candidate set corresponding to the user features and the content features comprises:
extracting corresponding user characteristic vectors and content characteristic vectors from the user characteristics and the content characteristics in the sequenced extended characteristic set;
inputting the user characteristic vector and the content characteristic vector into a trained recall model to obtain a plurality of candidate text contents, and determining the content candidate set according to the candidate text contents.
6. The method of claim 5, wherein before entering the user features and the content features in the extended feature set into the trained recall model and obtaining the content candidate set corresponding to the user features and the content features, the method further comprises:
obtaining a sample set comprising a plurality of sample portrait information, the sample portrait information comprising a sample user portrait and a sample content portrait;
extracting sample user characteristics corresponding to the sample user images and sample content characteristics corresponding to the sample content images in the sample set;
and inputting the sample user characteristics and the sample content characteristics into a preset neural network model for training to obtain the recall model.
7. The method of claim 6, wherein said ranking the plurality of candidate textual content using a specified ranking algorithm comprises:
scoring each candidate text content in the content candidate set by using the specified sorting algorithm to obtain a score of each candidate text content;
and sequencing the candidate text contents according to the order of the scores of the candidate text contents from high to low.
8. A data recommendation device, comprising:
the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring portrait information of a user in a target service scene, and the portrait information comprises a user portrait and a content portrait;
an extraction unit configured to extract feature information from the portrait information, the feature information including a user feature and a content feature;
the extension unit is used for classifying the user characteristics and the content characteristics and extending the classified user characteristics and the content characteristics to obtain an extension characteristic set;
a recall unit, configured to input a user feature and a content feature in the extended feature set into a trained recall model, so as to obtain a content candidate set corresponding to the user feature and the content feature, where the content candidate set includes a plurality of candidate text contents;
and the pushing unit is used for sequencing the candidate text contents by using a specified sequencing algorithm and sending the candidate text contents to the user terminal according to the sequenced sequence.
9. A computer device comprising a processor and a memory, wherein the memory is configured to store a computer program and the processor is configured to invoke the computer program to perform the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1-7.
CN202111017643.3A 2021-08-31 2021-08-31 Data recommendation method, device, equipment and storage medium Active CN113704623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111017643.3A CN113704623B (en) 2021-08-31 2021-08-31 Data recommendation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111017643.3A CN113704623B (en) 2021-08-31 2021-08-31 Data recommendation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113704623A true CN113704623A (en) 2021-11-26
CN113704623B CN113704623B (en) 2024-04-16

Family

ID=78658478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111017643.3A Active CN113704623B (en) 2021-08-31 2021-08-31 Data recommendation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113704623B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398557A (en) * 2022-01-18 2022-04-26 平安国际智慧城市科技股份有限公司 Information recommendation method and device based on double portraits, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209552A (en) * 2014-09-02 2017-09-26 托比股份公司 Based on the text input system and method stared
US20190205761A1 (en) * 2017-12-28 2019-07-04 Adeptmind Inc. System and method for dynamic online search result generation
CN110717816A (en) * 2019-07-15 2020-01-21 上海氪信信息技术有限公司 Artificial intelligence technology-based global financial risk knowledge graph construction method
CN111078994A (en) * 2019-11-06 2020-04-28 珠海健康云科技有限公司 Portrait-based medical science popularization article recommendation method and system
CN111144098A (en) * 2019-12-26 2020-05-12 支付宝(杭州)信息技术有限公司 Recall method and device for expanded question sentence
CN111949890A (en) * 2020-09-27 2020-11-17 平安科技(深圳)有限公司 Data recommendation method, equipment, server and storage medium based on medical field
CN112215629A (en) * 2019-07-09 2021-01-12 百度在线网络技术(北京)有限公司 Multi-target advertisement generation system and method based on construction countermeasure sample
CN112559895A (en) * 2021-02-19 2021-03-26 深圳平安智汇企业信息管理有限公司 Data processing method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209552A (en) * 2014-09-02 2017-09-26 托比股份公司 Based on the text input system and method stared
US20190205761A1 (en) * 2017-12-28 2019-07-04 Adeptmind Inc. System and method for dynamic online search result generation
CN112215629A (en) * 2019-07-09 2021-01-12 百度在线网络技术(北京)有限公司 Multi-target advertisement generation system and method based on construction countermeasure sample
CN110717816A (en) * 2019-07-15 2020-01-21 上海氪信信息技术有限公司 Artificial intelligence technology-based global financial risk knowledge graph construction method
CN111078994A (en) * 2019-11-06 2020-04-28 珠海健康云科技有限公司 Portrait-based medical science popularization article recommendation method and system
CN111144098A (en) * 2019-12-26 2020-05-12 支付宝(杭州)信息技术有限公司 Recall method and device for expanded question sentence
CN111949890A (en) * 2020-09-27 2020-11-17 平安科技(深圳)有限公司 Data recommendation method, equipment, server and storage medium based on medical field
CN112559895A (en) * 2021-02-19 2021-03-26 深圳平安智汇企业信息管理有限公司 Data processing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398557A (en) * 2022-01-18 2022-04-26 平安国际智慧城市科技股份有限公司 Information recommendation method and device based on double portraits, electronic equipment and storage medium
CN114398557B (en) * 2022-01-18 2024-04-30 平安国际智慧城市科技股份有限公司 Information recommendation method and device based on double images, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113704623B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN107169049B (en) Application tag information generation method and device
CN105426356B (en) A kind of target information recognition methods and device
CN111539197B (en) Text matching method and device, computer system and readable storage medium
CN110321537B (en) Method and device for generating file
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
KR20200087977A (en) Multimodal ducument summary system and method
CN113722438A (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN116881429B (en) Multi-tenant-based dialogue model interaction method, device and storage medium
US20230410220A1 (en) Information processing apparatus, control method, and program
CN114387061A (en) Product pushing method and device, electronic equipment and readable storage medium
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN114090792A (en) Document relation extraction method based on comparison learning and related equipment thereof
CN113704623B (en) Data recommendation method, device, equipment and storage medium
CN111368066A (en) Method, device and computer readable storage medium for acquiring dialogue abstract
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
CN113569018A (en) Question and answer pair mining method and device
CN112948526A (en) User portrait generation method and device, electronic equipment and storage medium
CN116955591A (en) Recommendation language generation method, related device and medium for content recommendation
JP2020502710A (en) Web page main image recognition method and apparatus
CN111382254A (en) Electronic business card recommendation method, device, equipment and computer readable storage medium
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium
CN113408282B (en) Method, device, equipment and storage medium for topic model training and topic prediction
CN116127066A (en) Text clustering method, text clustering device, electronic equipment and storage medium
CN113010664A (en) Data processing method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant