CN113822065A - Keyword recall method and device, electronic equipment and storage medium - Google Patents

Keyword recall method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113822065A
CN113822065A CN202110867106.1A CN202110867106A CN113822065A CN 113822065 A CN113822065 A CN 113822065A CN 202110867106 A CN202110867106 A CN 202110867106A CN 113822065 A CN113822065 A CN 113822065A
Authority
CN
China
Prior art keywords
recalled
word
sentence
keyword
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110867106.1A
Other languages
Chinese (zh)
Inventor
石磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110867106.1A priority Critical patent/CN113822065A/en
Publication of CN113822065A publication Critical patent/CN113822065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a keyword recall method, a keyword recall device, electronic equipment and a storage medium, wherein the method comprises the following steps: calculating a first similarity between the to-be-recalled word and the seed keyword, determining the to-be-recalled word with the first similarity larger than a first threshold as a target keyword, determining the to-be-recalled word with the first similarity smaller than the first threshold and larger than a second threshold as a candidate keyword, wherein the first threshold is larger than the second threshold; calculating a second similarity between the to-be-recalled sentence and the seed key sentence, wherein the seed key sentence comprises at least one seed key word, and the to-be-recalled sentence comprises at least one to-be-recalled word; taking the to-be-recalled sentence with the second similarity larger than a third threshold value as a candidate to-be-recalled sentence, and determining candidate keywords contained in the candidate to-be-recalled sentence as target keywords; the target keyword is recalled. The technical scheme of the embodiment of the application can improve the recall rate of the keywords.

Description

Keyword recall method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computer information processing, in particular to a keyword recall method and device, electronic equipment and a storage medium.
Background
The information recommendation mode based on the keywords is to determine information pushed to a user according to the keywords input by the user in a search engine. When users put information inside and outside a website, users must choose to put the information in units of keywords, and how to obtain keywords suitable for the users becomes one of key operations. The existing keyword recall method generally has the problem of low recall rate.
Disclosure of Invention
In order to solve the technical problem, embodiments of the present application provide a keyword recall method and apparatus, an electronic device, and a storage medium, which can improve a recall rate of a keyword.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a keyword recall method, including: calculating a first similarity between the to-be-recalled word and the seed keyword, determining the to-be-recalled word with the first similarity larger than a first threshold as a target keyword, determining the to-be-recalled word with the first similarity smaller than the first threshold and larger than a second threshold as a candidate keyword, wherein the first threshold is larger than the second threshold; calculating a second similarity between the to-be-recalled sentence and the seed key sentence, wherein the seed key sentence comprises at least one seed key word, and the to-be-recalled sentence comprises at least one to-be-recalled word; taking the to-be-recalled sentence with the second similarity larger than a third threshold value as a candidate to-be-recalled sentence, and determining candidate keywords contained in the candidate to-be-recalled sentence as target keywords; all target keywords are recalled.
According to an aspect of an embodiment of the present application, there is provided a keyword recall apparatus including: the first calculation module is used for calculating a first similarity between the to-be-recalled word and the seed keyword, determining the to-be-recalled word with the first similarity larger than a first threshold as a target keyword, determining the to-be-recalled word with the first similarity smaller than the first threshold and larger than a second threshold as a candidate keyword, and determining the first threshold larger than the second threshold; the second calculation module is used for calculating a second similarity between the to-be-recalled sentence and the seed key sentence, wherein the seed key sentence comprises at least one seed key word, and the to-be-recalled sentence comprises at least one to-be-recalled word; the analysis module is used for taking the to-be-recalled sentence with the second similarity larger than a third threshold value as a candidate to-be-recalled sentence and determining candidate keywords contained in the candidate to-be-recalled sentence as target keywords; and the recall module is used for recalling all the target keywords.
According to an aspect of the embodiments of the present application, there is provided an electronic device including a processor and a memory, the memory having stored thereon computer-readable instructions, which when executed by the processor, implement the keyword recall method as described above.
According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the keyword recall method as previously provided.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the keyword recall method provided in the various alternative embodiments described above.
In the technical scheme provided by the embodiment of the application, first similarity between the seed keyword and the to-be-recalled word is calculated, a part of the to-be-recalled keyword with larger similarity is determined as a target keyword according to the first similarity, then second similarity between the to-be-recalled sentence and the seed key sentence is calculated, the to-be-recalled word in a part of the to-be-recalled sentence with larger similarity is determined and is used as a candidate keyword, if the first similarity between the candidate keyword and the seed keyword is larger than a second threshold and smaller than a first threshold, the candidate keyword is determined as the target keyword, and finally all the target keywords are recalled. According to the method and the device for recalling the words to be recalled, the key sentences of the seeds are used for recalling the words to be recalled, the problem that the recall rate is insufficient due to the fact that only the keywords recalled by the seed keywords can be made up, and then the recall rate of the keywords is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a flow diagram illustrating a keyword recall method in accordance with an exemplary embodiment of the present application;
FIG. 2 is a flow chart of step S100 in an exemplary embodiment of the embodiment shown in FIG. 1;
FIG. 3 is a flow chart of step S200 in an exemplary embodiment of the embodiment shown in FIG. 1;
FIG. 4 is a flowchart of step S210 in an exemplary embodiment of the embodiment shown in FIG. 3;
FIG. 5 is a flowchart of step S210 in another exemplary embodiment of the embodiment shown in FIG. 3;
FIG. 6 is a flow diagram illustrating a keyword recall method in accordance with another exemplary embodiment of the present application;
FIG. 7 is a flow diagram illustrating a keyword recall method in accordance with another exemplary embodiment of the present application;
FIG. 8 is a flowchart of step S700 in an exemplary embodiment of the embodiment shown in FIG. 7;
FIG. 9 is a block diagram of a keyword recall apparatus shown in an exemplary embodiment of the present application;
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should also be noted that: reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
The keyword recall method and apparatus, the electronic device, and the computer-readable storage medium according to the embodiments of the present application relate to artificial intelligence technology and machine learning technology, and the embodiments will be described in detail below.
The keyword is a word used for recalling the information stream with strong correlation with the keyword from the massive information stream, and the keyword recalling method provided by the embodiment can expand the keyword set. For example, in the process of recommending information streams for users individually, the keyword may be a word input by a user into a search box, the background extracts a series of information related to the keyword from a massive information database based on the keyword input by the user and recommends the information to the user, for example, if the user inputs a keyword "hot" in the search box, the computer background extracts currently more hot information from the information database and recommends the information to the user.
Therefore, the keywords are crucial in the whole information recommendation process, and since the keywords are changed or increased along with the replacement of the information, it is necessary to recall more keywords from massive information data.
A method for recalling keywords comprises the steps of calculating the similarity between seed keywords and words to be recalled, and recalling the words to be recalled with the similarity larger than a set threshold value as the keywords. Through long-term research, the inventor of the application finds that the keyword recall mode is easy to cause the problem of insufficient keyword recall potential due to the problem of Out-of-vocabulary (unknown words). For example, a word a does not appear in the word2vec model training process, or appears less frequently, which results in that the word has poor representation when the word2vec model is used for feature extraction, and is not effectively recalled when the word a is matched with the seed keyword.
Based on this, the keyword recall method provided by the application makes use of grammatical characteristics of word context, namely introduces a sentence-level similarity recall strategy, and is matched with the word-level similarity recall strategy, so that the problem of insufficient recall of some keywords caused by OOV problem is solved, and the recommendation effect of the recommendation side is further improved.
Referring to fig. 1, fig. 1 is a flowchart illustrating a keyword recall method according to an exemplary embodiment of the present application, where the keyword recall method illustrated in fig. 1 includes the following steps:
step S100: calculating a first similarity between the to-be-recalled word and the seed keyword, determining the to-be-recalled word with the first similarity larger than a first threshold as a target keyword, determining the to-be-recalled word with the first similarity smaller than the first threshold and larger than a second threshold as a candidate keyword, wherein the first threshold is larger than the second threshold.
In this embodiment, the word to be recalled and the seed keyword are acquired first.
The background of the computer stores a text library to be recommended uploaded by a large number of users, wherein the text library to be recommended comprises a large number of texts to be recommended, such as commodity information uploaded by merchants on an electronic consumption platform, advertisement information uploaded by bloggers and the like, news texts uploaded by users on the background of a certain browsing webpage and the like.
In this embodiment, the words to be recalled may be obtained from a text library to be recommended, specifically, the words to be recalled are segmented from the text library to be recalled, and the number of the words to be recalled may be multiple.
The seed keyword users recall the words to be recalled as the target keywords, the number of the seed keywords can be multiple, the seed keywords can be obtained in advance through a manual screening mode, and the seed keywords only need to be introduced into a computer when the seed keywords are used, for example, the seed keywords are { popular, near-day, entertainment, … …, sports }.
In this embodiment, the greater the first similarity between the seed keyword and the word to be recalled is, the more similar the word to be recalled is to the seed keyword, otherwise, the more dissimilar the word to be recalled is to the seed keyword. Therefore, it may be directly determined that a part of the words to be recalled are the target keywords by the size of the first similarity, that is, the words to be recalled whose first similarity is greater than the first threshold value, or directly exclude a part of the words to be recalled and determine that the words to be recalled are not the target keywords, that is, the words to be recalled whose first similarity is less than the second threshold value, and for the words to be recalled whose first similarity is greater than the second threshold value and less than the first threshold value, the embodiment determines whether the words to be recalled are the target keywords in the following steps.
Step S200: and calculating a second similarity between the sentence to be recalled and the seed key sentence.
In this embodiment, the seed key sentence includes at least one seed key word, the to-be-recalled sentence includes at least one to-be-recalled word, and the to-be-recalled sentence is derived from the to-be-recommended corpus.
The seed key sentence can be recalled from the library of texts to be recommended based on the seed keyword, for example, by listing or crawling a sentence in the web page containing the seed keyword, as the seed key sentence.
Because the meaning of the word in the sentence containing the word can be inferred through the context semantic scene, the meaning of the word is more clearly expressed in the sentence.
Step S300: and taking the to-be-recalled sentence with the second similarity larger than the third threshold value as a candidate to-be-recalled sentence, and determining candidate keywords contained in the candidate to-be-recalled sentence as target keywords.
In this embodiment, the similarity between the candidate keyword and the seed keyword is greater than a second threshold and smaller than the second threshold, that is, the similarity between the candidate keyword and the seed keyword is in a fuzzy interval, and in this interval, whether the candidate keyword is the target keyword cannot be determined by the similarity between the candidate keyword and the seed keyword alone. Because the second similarity between the to-be-recalled sentence containing the candidate keyword and the seed key sentence containing the seed keyword is greater than the third threshold, and the similarity at the sentence level can reflect the similarity of the words contained in the sentence to a certain extent, the candidate keyword contained in the candidate to-be-recalled sentence can be determined as the target keyword.
In this embodiment, the candidate to-be-recalled sentence does not necessarily include the candidate keyword, and if the candidate to-be-recalled sentence includes the candidate keyword, the candidate keyword is determined as the keyword.
Step S400: the target keyword is recalled.
In this embodiment, the recalled keywords may be continuously used as seed keywords for recalling more keywords, so as to further expand the keyword set.
To sum up, in this embodiment, first similarity between the seed keyword and the to-be-recalled word is calculated, a part of the to-be-recalled keyword with a larger similarity value is determined as the target keyword according to the first similarity, then second similarity between the to-be-recalled sentence and the seed keyword is calculated, the to-be-recalled word in a part of the to-be-recalled sentence with a larger second similarity value is determined and is used as the candidate keyword, if the first similarity value between the candidate keyword and the seed keyword is larger than a second threshold value and smaller than a first threshold value, the candidate keyword is determined as the target keyword, and finally all the target keywords are recalled. According to the method and the device for recalling the words to be recalled, the key sentences of the seeds are used for recalling the words to be recalled, the problem that the recall rate is insufficient due to the fact that only the keywords recalled by the seed keywords can be made up, and then the recall rate of the keywords is improved.
Referring to fig. 2, fig. 2 is a flowchart of an exemplary embodiment of step S100 in the embodiment shown in fig. 1, and as shown in fig. 2, step S100 includes the following steps:
step S110: and respectively carrying out vectorization treatment on the word to be recalled and the seed keyword to obtain a word vector to be recalled and a seed keyword vector.
In this embodiment, the features of the word to be recalled and the seed keyword may be extracted respectively to obtain a word vector to be recalled corresponding to the word to be recalled and a seed keyword vector corresponding to the seed keyword. In this embodiment, Word2vec algorithm may be used to extract features of the Word to be recalled and the seed keyword, so as to obtain a Word vector to be recalled and a seed keyword vector. Specifically, Chinese Wikipedia (zhiwiki) can be used as a training corpus to pre-train a word2vec model, input the word to be recalled and the seed keyword into the word2vec model, and output a word vector to be recalled and a seed keyword vector.
Step S120: and calculating the similarity between the word vector to be recalled and the seed keyword vector, and taking the similarity as a first similarity.
Optionally, cosine values or euclidean distances of the seed keyword feature vectors and the feature vectors of the words to be recalled are calculated, and the cosine values or the euclidean distances are used as the first similarity.
Referring to fig. 3, fig. 3 is a flowchart of an exemplary embodiment of step S200 in the embodiment shown in fig. 1, and as shown in fig. 3, step S200 includes the following steps:
step S210: and acquiring a first feature vector of the sentence to be recalled and a second feature vector of the seed key sentence.
In this step, the feature vectors of the to-be-recalled sentence and the seed key sentence are respectively extracted for the purpose of facilitating the calculation of the similarity between the two.
Referring to fig. 4, fig. 4 is a flowchart of an exemplary embodiment of step S210 in the embodiment shown in fig. 3, and as shown in fig. 4, the process of step S210 obtaining the first feature vector of the sentence to be recalled includes the following steps:
step S211: and performing word segmentation processing on the sentence to be recalled to obtain a plurality of words corresponding to the sentence to be recalled.
Chinese Word Segmentation refers to the Segmentation of a Chinese character sequence into a single Word. Word segmentation is a process of recombining continuous word sequences into word sequences according to a certain specification. Chinese is simply a word, a sentence and a paragraph which can be simply demarcated by an obvious delimiter, only words do not have a formal delimiter, although English also has the same phrase division problem, but on the word level, Chinese is more complicated and more difficult than English. The word segmentation is the basis of natural language processing, Chinese word segmentation is important for natural language processing, and the current word segmentation algorithm mainly comprises two types, namely a rule matching method based on a dictionary and a machine learning method based on statistics. The machine learning method based on statistics mainly comprises a Hidden Markov Model (HMM), a Conditional Random Field (CRF), a Support Vector Machine (SVM), deep learning and the like. The embodiment can directly use the existing Chinese word segmentation algorithm to segment the words of the text, and does not make excessive description on the specific word segmentation process.
Step S212: and acquiring the feature vectors corresponding to the multiple word segments.
Converting a participle into its corresponding feature vector is in fact a Word Embedding (Word Embedding) process, where Word Embedding is a method of converting words in text into digital vectors, which need to be input in digital form in order to analyze them using standard machine learning algorithms. The word embedding process refers to embedding a high-dimensional space with a dimension equal to the number of all words into a continuous vector space with a much lower dimension, each word or phrase being mapped as a vector on the real number domain.
In this embodiment, feature vectors corresponding to Word segmentation can be obtained based on a pre-trained Word2vec model, where the Word2vec model includes a skip-gram model and a Continuous Bag of words (CBOW) model. It should be noted that, in the embodiment, feature vectors corresponding to the participles may be obtained by using all existing word embedding methods, for example, a Global vectors of word representation (GloVe) method, One-hot encoding, an information retrieval technology, distributed representation, and the like, which are not limited specifically herein.
Step S213: and adding the feature vectors corresponding to the multiple participles to obtain a first result vector, and performing normalization processing on the first result vector to obtain a first feature vector.
The normalization processing is a simplified calculation mode, a dimensional expression is converted into a dimensionless expression to become a scalar (scalar), the normalization processing does not change or influence the correlation degree between the seed key sentence and the sentence to be recalled, the first result vector is normalized, and the data in the first result vector is mapped into a range of-1 to 1, so that the similarity calculation is carried out on the first feature vector and a second feature vector which is also subjected to the normalization processing in the following step.
Alternatively, the second feature vector of the seed key sentence may be obtained in the manner described in steps S211 to S213, which is not described herein in detail.
Referring to fig. 5, fig. 5 is a flowchart of an exemplary embodiment of step S210 in the embodiment shown in fig. 3, and the process of step S210 to obtain the first feature vector of the sentence to be recalled includes the following steps:
step S214: and acquiring a word vector corresponding to each word in the sentence to be recalled, and splicing the word vectors of all the words to obtain an initial feature vector corresponding to the sentence to be recalled.
Because each character in the sentence has an obvious separator, before the character vector corresponding to each character in the sentence to be recalled is obtained, the sentence to be recalled does not need to be processed by dividing the character.
In this embodiment, each word in the sentence to be recalled can be represented in the form of a word vector by using a word embedding technique.
Vector splicing is a feature fusion method, and two feature vectors v exist1∈Rn,v2∈RmAnd splicing the two eigenvectors to obtain a fused eigenvector v ═ v1,v2]∈Rm+n. Obviously, the word vectors of each word in the sentence to be recalled are spliced together, and the dimension of the obtained initial feature vector is the dimension obtained by adding the dimensions of each word.
For example, for a sentence to be recalled, "i will eat", it is not set that a word vector corresponding to the word "i" is [1, 4, 2, 6], "a word vector corresponding to the word" will "is [3, 6, 8, 1]," a word vector corresponding to the word "eat" is [8, 4, 5, 3], "a word vector corresponding to the word" meal "is [9, 7, 2, 6], word vectors corresponding to each word are spliced together, and an initial feature matrix corresponding to the sentence to be recalled is obtained as follows:
[1,4,2,6,3,6,8,1,8,4,5,3,9,7,2,6]。
step S215: and performing dimensionality reduction on the initial feature vector to obtain a low-dimensional feature vector.
The dimension reduction processing is actually a way of feature extraction, and the embodiment performs feature extraction on the initial feature vector, so that not only can deeper semantic features in the sentence to be recalled be mined, but also the calculation amount of similarity calculation between the subsequent first feature vector and the second feature vector can be reduced.
In the scheme of this embodiment, dimension reduction processing may be performed on the initial feature vector in a variety of ways, for example, a Neural Network model is established and trained by using Deep learning, and then dimension reduction processing is performed on the initial feature vector based on the Neural Network model, for example, dimension reduction processing may be performed on the initial feature vector based on Neural Network models such as a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), and the like.
Step S216: and carrying out normalization processing on the low-dimensional feature vector to obtain a first feature vector.
In the embodiment, the first feature vector is acquired based on the word vector of the sentence to be recalled, word segmentation is not required to be performed on the sentence to be recalled, and the acquisition process of the first feature vector is simplified. In addition, the initial feature vector of the sentence to be recalled is subjected to dimension reduction processing, so that the calculated amount can be simplified, and the acquisition efficiency of the first feature vector is improved.
The second feature vector corresponding to the seed key sentence can also be obtained by the method described in step S214, step S215, and step S216 of this embodiment, and it should be noted that the second feature vector corresponding to the seed key sentence and the first feature vector corresponding to the sentence to be recalled must have the same dimension.
Step S220: and performing similarity operation on the first feature vector and the second feature vector to obtain a second similarity.
And calculating the cosine value or Euclidean distance between the first feature vector and the second feature vector, wherein the cosine value or the Euclidean distance is used as a second similarity between the seed key sentence and the sentence to be recalled. In the technical scheme provided by this embodiment, first similarity between a seed keyword and a word to be recalled is calculated, a part of the word with recall having a larger similarity value is determined as a target keyword according to the first similarity, then second similarity between the sentence to be recalled and the seed keyword is calculated, a part of the word to be recalled having a larger second similarity value is determined and used as a candidate keyword, if the first similarity value between the candidate keyword and the seed keyword is larger than a second threshold value and smaller than a first threshold value, the candidate keyword is determined as the target keyword, and finally all the target keywords are recalled. According to the method and the device for recalling the words to be recalled, the key sentences of the seeds are used for recalling the words to be recalled, the problem that the recall rate is insufficient due to the fact that only the keywords recalled by the seed keywords can be made up, and then the recall rate of the keywords is improved. Therefore, the flow diagram of the keyword recall method shown in this embodiment may be specifically shown in fig. 6, where on one hand, the keyword recall method performs feature extraction on the seed keyword and the to-be-recalled word respectively, then calculates similarity between the seed keyword and the to-be-recalled word, and takes the to-be-recalled word with similarity score higher than a first threshold as a target keyword, and takes the to-be-recalled word with similarity higher than a second threshold and lower than the first threshold as a candidate keyword; on the other hand, the keyword recall method respectively utilizes the first CNN dimension reduction layer and the second CNN dimension reduction layer to perform dimension reduction processing on the seed key sentence and the sentence to be recalled, then similarity calculation is performed on the seed key sentence and the sentence to be recalled, the sentence to be recalled with the score higher than a third threshold value is used as a candidate sentence to be recalled, and finally, a candidate keyword contained in the candidate sentence to be recalled is used as a target keyword, and the target keyword is recalled. That is, the keyword recall method shown in this embodiment combines the term-level keyword recall with the sentence-level keyword recall to improve the recall rate.
Referring to fig. 7, fig. 7 is a flowchart illustrating a keyword recall method according to an exemplary embodiment of the present application, where, as shown in fig. 7, the keyword recall method includes the following steps:
step S500: and acquiring other participles in the text library except the words to be recalled, wherein the other participles are obtained by carrying out participle processing on all sentences in the text library.
In this embodiment, the text base is a to-be-recommended text base, the to-be-recommended text base includes a plurality of to-be-recalled sentences, the to-be-recalled sentences are segmented, and the to-be-recalled words can be obtained through screening.
In fact, the text to be recommended in the text library is updated in real time along with the real-time uploading of the user, and each participle from the word segmentation of the text to be recommended is updated in real time. Obviously, the words to be recalled contained in the pre-specified set of words to be recalled are all based on the text base before updating, and if there is a large change after updating the text base, even if the set of words to be recalled cannot be updated at this time, some potential keywords may be omitted, and the recall efficiency is reduced. In this layer, the keyword recall method provided in this embodiment is a supplement to the keyword recall method described above, and focuses more on other participles in the text base besides the word to be recalled.
For example, after a text library in a background of a certain browser is updated, the frequency of occurrence of the participle "true incense" in the text library is counted to be suddenly increased and greater than a set value by using a word frequency statistical method, which indicates that the participle is important for the updated text library.
Step S600: and counting word frequencies corresponding to other participles.
The Term Frequency (TF) is the Frequency of words, and counting the Term Frequency of other participles is to count the times of the other participles appearing in the text library.
Step S700: and recalling other segmented words with the word frequency larger than the fourth threshold value as target keywords.
In this embodiment, the word frequencies of other participles except the word to be recalled are counted, and when the word frequency of a certain participle is greater than the fourth threshold, the word is recalled as the target keyword. By the method, the possibility that other words which are not specified as the words to be recalled in the text base are the target keywords is fully considered, the method for recalling the keywords by combining the seed keywords and the seed key sentences can be further optimized, and the recall rate of the keywords can be further improved. Referring to fig. 8, fig. 8 is a flowchart of an exemplary embodiment of step S700 in the embodiment shown in fig. 7, and as shown in fig. 8, step S700 includes the following steps:
step S710: and pushing other participles with the word frequency larger than the fourth threshold value to the user terminal.
For example, when the processor determines that the word frequency of one or more segmented words is greater than the fourth threshold value by using the keyword recall method provided in this embodiment, the one or more segmented words are displayed on the display screen of the user terminal, and at this time, the user may interact with the terminal through a preset terminal interaction interface to determine whether to use the one or more segmented words as the target keyword. The terminal interaction interface may be a button displayed on the display screen, which is "determined as a target keyword" or "not as a target keyword", for example, if the user clicks a part of the participles among the participles and clicks or touches the button "determined as a target keyword", the selected participle may be determined as a target keyword. It should be noted that the above-mentioned terminal interaction interface is only an example, and a user may interact with a terminal through a terminal interaction interface defined by any manufacturer or user, so as to determine whether to use a word that meets the condition as a keyword.
For the user, the participles with the higher word frequency are not necessarily the target keywords, so in order to achieve the purpose of personalized recommendation for the user, the embodiment gives the user the decision right whether the participles with the word frequency higher than the fourth threshold are recalled as the target keywords, and the recall efficiency and accuracy of the keywords can be improved.
Step S720: and if receiving notification information returned by the user terminal, recalling other participles with the word frequency greater than the fourth threshold as target keywords, wherein the notification information is used for indicating the user to determine other participles with the word frequency greater than the fourth threshold as the target keywords.
The keyword recall method provided by the embodiment can determine whether to recall the segmented words with the word frequency greater than the fourth threshold as the target keywords according to the indication of the user, and can recall the keywords individually for the user, thereby improving the recall efficiency. The above-mentioned keyword recall method may be executed by a computer apparatus (or a text processing apparatus). Computer devices herein may include, but are not limited to: a terminal device such as a smartphone, tablet, laptop, desktop, or the like, or a service device such as a data processing server, Web server, application server, or the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the server may be a node server on a block chain. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal device and the service device may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Referring to fig. 9, fig. 9 is a block diagram of a keyword recall apparatus according to an exemplary embodiment of the present application, and as shown in fig. 9, a keyword recall apparatus 500 according to the present embodiment includes a first calculating module 510, a second calculating module 520, an analyzing module 530, and a recall module 540.
The first calculating module 510 is configured to calculate a first similarity between the to-be-recalled word and the seed keyword, determine the to-be-recalled word with the first similarity being greater than a first threshold as the target keyword, determine the to-be-recalled word with the first similarity being less than the first threshold and greater than a second threshold as the candidate keyword, and determine the first threshold being greater than the second threshold; the second calculating module 520 is configured to calculate a second similarity between the to-be-recalled sentence and the seed key sentence, where the seed key sentence includes at least one seed keyword, and the to-be-recalled sentence includes at least one to-be-recalled word; the analysis module 530 is configured to use the to-be-recalled sentence with the second similarity greater than the third threshold as a candidate to-be-recalled sentence, and determine a candidate keyword included in the candidate to-be-recalled sentence as a target keyword; the recall module 540 is used to recall the target keyword.
In an exemplary embodiment, the second calculation module 520 includes an acquisition unit and a calculation unit.
The obtaining unit is used for obtaining a first feature vector of the sentence to be recalled and a second feature vector of the seed key sentence; the calculation unit is used for carrying out similarity calculation on the first feature vector and the second feature vector to obtain a second similarity.
In an exemplary embodiment, the computing unit includes a word segmentation subunit, an acquisition subunit, and a processing subunit.
The word segmentation subunit is used for performing word segmentation processing on the sentence to be recalled to obtain a plurality of words corresponding to the sentence to be recalled; the obtaining subunit is used for obtaining feature vectors corresponding to the multiple word segmentations; the processing subunit is configured to add feature vectors corresponding to the multiple word segmentations to obtain a first result vector, and perform normalization processing on the first result vector to obtain a first feature vector.
In an exemplary embodiment, the calculation unit includes an initial feature vector acquisition subunit, a dimensionality reduction subunit, and a normalization subunit.
The initial feature vector acquisition subunit is used for acquiring word vectors corresponding to all the words in the sentence to be recalled and splicing the word vectors of all the words to obtain an initial feature vector corresponding to the sentence to be recalled; the dimensionality reduction subunit is used for performing dimensionality reduction processing on the initial feature vector to obtain a low-dimensional feature vector; the normalization subunit is configured to perform normalization processing on the low-dimensional feature vector to obtain a first feature vector.
In an exemplary embodiment, the keyword recall apparatus 500 provided by the embodiment further includes other segmentation acquiring modules, a statistic module and a keyword determining module.
The other participle acquisition module is used for acquiring other participles in the text library except the word to be recalled, and the other participles are obtained by carrying out participle processing on all sentences in the text library; the statistic module is used for counting word frequencies corresponding to other participles; and the keyword determining module is used for recalling other participles with the word frequency larger than the fourth threshold value as target keywords.
In an exemplary embodiment, the keyword determination module includes a push unit and a keyword determination unit.
The pushing unit is used for pushing other participles with the word frequency larger than a fourth threshold value to the user terminal; the keyword determining unit is used for recalling other participles with the word frequency larger than a fourth threshold value as target keywords if notification information returned by the user terminal is received, and the notification information is used for indicating the user to determine other participles with the word frequency larger than the fourth threshold value as the target keywords.
It should be noted that the apparatus provided in the foregoing embodiment and the method provided in the foregoing embodiment belong to the same concept, and specific ways for each module, unit or sub-unit to perform operations have been described in detail in the method embodiment, and are not described herein again.
In another exemplary embodiment, the present application provides an electronic device comprising a processor and a memory, wherein the memory has stored thereon computer readable instructions which, when executed by the processor, implement the foregoing keyword recall method.
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes, such as performing the information recommendation method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage portion 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for system operation are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An Input/Output (I/O) interface 1005 is also connected to the bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
Yet another aspect of the present application provides a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor implement the keyword recall method of any one of the preceding embodiments.
Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the keyword recall method provided in the above-described embodiments.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A keyword recall method, comprising:
calculating a first similarity between the to-be-recalled word and the seed keyword, determining the to-be-recalled word with the first similarity larger than a first threshold as a target keyword, determining the to-be-recalled word with the first similarity smaller than the first threshold and larger than a second threshold as a candidate keyword, wherein the first threshold is larger than the second threshold;
calculating a second similarity between the to-be-recalled sentence and a seed key sentence, wherein the seed key sentence comprises at least one seed key word, and the to-be-recalled sentence comprises at least one to-be-recalled word;
taking the to-be-recalled sentence with the second similarity larger than a third threshold value as a candidate to-be-recalled sentence, and determining candidate keywords contained in the candidate to-be-recalled sentence as target keywords;
and recalling the target keyword.
2. The method of claim 1, wherein the calculating a second similarity between the sentence to be recalled and the seed key sentence comprises:
acquiring a first feature vector of a sentence to be recalled and a second feature vector of a seed key sentence;
and performing similarity operation on the first feature vector and the second feature vector to obtain the second similarity.
3. The method of claim 2, wherein the obtaining the first feature vector of the sentence to be recalled comprises:
performing word segmentation processing on the sentence to be recalled to obtain a plurality of words corresponding to the sentence to be recalled;
obtaining feature vectors corresponding to the multiple word segments;
adding the feature vectors corresponding to the multiple word segmentations to obtain a first result vector, and performing normalization processing on the first result vector to obtain the first feature vector.
4. The method of claim 2, wherein the obtaining the first feature vector of the sentence to be recalled comprises:
acquiring a word vector corresponding to each word in a sentence to be recalled, and splicing the word vectors of all the words to obtain an initial feature vector corresponding to the sentence to be recalled;
performing dimensionality reduction on the initial feature vector to obtain a low-dimensional feature vector;
and carrying out normalization processing on the low-dimensional feature vector to obtain the first feature vector.
5. The method of claim 1, wherein the sentence to be recalled is derived from a text library; the method further comprises the following steps:
acquiring other participles in a text library except the word to be recalled, wherein the other participles are obtained by carrying out participle processing on all sentences in the text library;
counting word frequencies corresponding to the other participles;
and recalling other segmented words with the word frequency larger than the fourth threshold value as target keywords.
6. The method of claim 5, wherein recalling other segmented words with a word frequency greater than a fourth threshold as target keywords comprises:
pushing other participles with the word frequency larger than a fourth threshold value to the user terminal;
and if receiving notification information returned by the user terminal, recalling other participles with the word frequency larger than a fourth threshold value as target keywords, wherein the notification information is used for indicating a user to determine the other participles with the word frequency larger than the fourth threshold value as the target keywords.
7. The method of claim 1, wherein the calculating a first similarity between the to-be-recalled word and the seed keyword comprises:
vectorizing the words to be recalled and the seed keywords respectively to obtain a word vector to be recalled and a seed keyword vector;
and calculating the similarity between the word vector to be recalled and the seed keyword vector, and taking the similarity as a first similarity.
8. A keyword recall apparatus, comprising:
the first calculation module is used for calculating a first similarity between the to-be-recalled word and the seed keyword, determining the to-be-recalled word with the first similarity larger than a first threshold as a target keyword, determining the to-be-recalled word with the first similarity smaller than the first threshold and larger than a second threshold as a candidate keyword, and enabling the first threshold to be larger than the second threshold;
the second calculation module is used for calculating a second similarity between the to-be-recalled sentence and a seed key sentence, wherein the seed key sentence comprises at least one seed key word, and the to-be-recalled sentence comprises at least one to-be-recalled word;
the analysis module is used for taking the to-be-recalled sentence with the second similarity larger than a third threshold value as a candidate to-be-recalled sentence and determining candidate keywords contained in the candidate to-be-recalled sentence as target keywords;
and the recall module is used for recalling all the target keywords.
9. An electronic device, comprising:
a memory storing computer readable instructions;
a processor to read computer readable instructions stored by the memory to perform the method of any of claims 1-7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-7.
CN202110867106.1A 2021-07-29 2021-07-29 Keyword recall method and device, electronic equipment and storage medium Pending CN113822065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110867106.1A CN113822065A (en) 2021-07-29 2021-07-29 Keyword recall method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110867106.1A CN113822065A (en) 2021-07-29 2021-07-29 Keyword recall method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113822065A true CN113822065A (en) 2021-12-21

Family

ID=78924071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110867106.1A Pending CN113822065A (en) 2021-07-29 2021-07-29 Keyword recall method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113822065A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115146596A (en) * 2022-07-26 2022-10-04 平安科技(深圳)有限公司 Method and device for generating recall text, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115146596A (en) * 2022-07-26 2022-10-04 平安科技(深圳)有限公司 Method and device for generating recall text, electronic equipment and storage medium
CN115146596B (en) * 2022-07-26 2023-05-02 平安科技(深圳)有限公司 Recall text generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11501182B2 (en) Method and apparatus for generating model
CN109241524B (en) Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN111444340B (en) Text classification method, device, equipment and storage medium
CN110162593B (en) Search result processing and similarity model training method and device
CN107679039B (en) Method and device for determining statement intention
CN107491534B (en) Information processing method and device
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
CN111708950B (en) Content recommendation method and device and electronic equipment
CN110489582B (en) Method and device for generating personalized display image and electronic equipment
CN110705301B (en) Entity relationship extraction method and device, storage medium and electronic equipment
KR101754473B1 (en) Method and system for automatically summarizing documents to images and providing the image-based contents
CN112163165A (en) Information recommendation method, device, equipment and computer readable storage medium
CN113627447B (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN109697239B (en) Method for generating teletext information
EP4310695A1 (en) Data processing method and apparatus, computer device, and storage medium
Suman et al. Why pay more? A simple and efficient named entity recognition system for tweets
CN114240552A (en) Product recommendation method, device, equipment and medium based on deep clustering algorithm
CN112805715A (en) Identifying entity attribute relationships
CN111881292A (en) Text classification method and device
CN113688310A (en) Content recommendation method, device, equipment and storage medium
Wang et al. Sex trafficking detection with ordinal regression neural networks
CN111813993A (en) Video content expanding method and device, terminal equipment and storage medium
CN115438674A (en) Entity data processing method, entity linking method, entity data processing device, entity linking device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination